Documente Academic
Documente Profesional
Documente Cultură
, Wenzhong Shi
Advanced Research Center for Spatial Information Technology, Department of Land Surveying and Geo-informatics,
The Hong Kong Polytechnic University, Hong Kong, China
Received 3 November 2004; received in revised form 17 June 2005; accepted 17 August 2005
Abstract
Automatic generalization is a process for representing geographical objects with different degrees of detail on a digital
map. The positional error for each geographical object is propagated through the process and a generalization error is also
introduced by the generalization. Previous research has focused mainly on measuring the generalization error. This paper
presents an analytical model for assessing the positional error in the generalized object by considering both error
propagation from the original data and the generalization error. The analytical model provides a shape dissimilarity value
that indicates the shape difference between the original data with a positional error and its simplied version. This model is
able to objectively and automatically determine the applicability of the generalized data for further applications to
geographical information system (GIS) problems. It can also deal with a large amount of data in GIS. Therefore, the
analytical model presented, which provides a more comprehensive shape measure for assessing positional error in data
derived from the generalization, is valuable in the development of automatic generalization.
r 2005 Elsevier Ltd. All rights reserved.
Keywords: GIS; Automatic generalization; Positional error; Generalization error; Shape dissimilarity measure
1. Introduction
To present geographical objects on either a paper map or a digital version, many cartographic decisions are
made manually or automatically. Important characteristics of geographical objects are preserved and
unwanted details, for example, are eliminated. Cartographic decisions are taken to improve the quality of
geographical objects on the paper or digital map at different scales and with different degrees of detail, or to
reduce data storage for the digital map, or both (Joa o, 1998).
Cartographic decisions in manual generalization are performed by humans using a number of
generalization processes: (a) selection of geographical objects and thematic attributes to be presented on
the map; (b) simplication of the retained object; (c) classication for grouping similar retained objects into
one class; and (d) symbolization for displaying the retained objects on the map with visual clarity. As part of
ARTICLE IN PRESS
www.elsevier.com/locate/cageo
0098-3004/$ - see front matter r 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cageo.2005.08.002
0
P
1
P
2
P
Curved arc A
( )
0 0
, y x
A
( )
2 2
, y x
A
Polyline A
( )
1 1
, y x
A
( )
0 0 0
, y x P =
( )
1 1 1
, y x P =
( )
2 2 2
, y x P =
( )
3 3 3
, y x P
Fig. 1. A polyline composed of straight segments with different orientations at its vertices, and a curved arc, along which orientation
varies.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 464
A is simplied into line B, which contains straight segments only. It is assumed that each straight segment of
line B is then a simplied version of some arcs of line A. This assumption is always valid when the simplied
version is generated from an automatic line simplication algorithm. For example, one simplied version of
line A in Fig. 1a is a polyline having two straight segments: P
0
P
2
and P
2
P
3
. The rst straight segment P
0
P
2
of line B represents the rst two straight segments P
0
P
1
and P
1
P
2
of line A, and the second straight segment
P
2
P
3
of line B represents the third straight segment P
2
P
3
of line A. The inclination angle of each straight
segment of line B should compare with that of its original representation in line A.
For inclination angle comparison, the inclination angle of line A is transformed as follows. We express line
A in the form y f x and consider the transformation
r
_
x
1
x
0
1 f
0
x
_ _
2
_
dx
_
x
2
x
1
1 f
0
x
_ _
2
_
dx
_
x
x
i
1 f
0
x
_ _
2
_
dx
_
x
1
x
0
1 f
0
x
_ _
2
_
dx
_
x
2
x
1
1 f
0
x
_ _
2
_
dx
_
x
n
x
n1
1 f
0
x
_ _
2
_
dx
, (1)
where x
0
; f x
0
is the starting endpoint of the line, x
n
; f x
n
is the ending endpoint of the line having n
explicit points (including endpoints and nodes), x
j
; f x
j
is the jth explicit point of the line, and x; f x is a
point on the line that lies between x
i
; f x
i1
and x
i1
; f x
i1
. The denominator in this transformation is
equal to the total length along the line (called arc length in mathematics), and the numerator is the arc length
from the starting endpoint to the point x; f x . By using the transformation, any point on the line can be
expressed as a function of r: x g
1
r and y g
2
x . Putting them into the inclination angle y x; y of line A,
we obtain the inclination angle in the form of
~
y r (Fig. 2). The domain of this inclination angle function is all
real numbers between 0 and 1, and the range is all angles between 01 and 3601.
To compare individual straight segments of line B with their corresponding original representations (each of
which contains some arcs of line A), the inclination angle of line B is dened as follows: for r 2 0; 1 ,
~
y
B
r is
the inclination angle for the jth straight segment of line B if Ax; Ay g
1
r ; g
2
r
_ _
is a point on the arc of A
that is the original representation of the jth straight segment of line B.
An average shape dissimilarity measure is thus:
Average_d
y
A; B
_
1
r0
AngDiff
~
y
A
r ;
~
y
B
r
_ _
dr (2)
where for a; b 2 0
; 360
, AngDiff a; b is dened as the minimum angle difference between a and b: when the
absolute difference a b
p180
, AngDiff a; b a b
; when a b
4180
, AngDiff a; b 360
a b
(Fig. 3).
If both lines A and B are composed of straight segments, the average shape dissimilarity measure is a
discrete function:
Average_d
y
A; B
p
AngDiff
~
y
A
r
p
_ _
;
~
y
B
r
p
_ _ _ _
Dr
p
(3)
ARTICLE IN PRESS
Fig. 2. Inclination angle for polyline and curved arc of Fig. 1, where s
i
represents length of the ith straight segment of polyline, and s
0j
represents arc length or length of arc P
0
P
j
along curved arc.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 465
where Dr
p
is the weight factor, dened as the difference between two ratios, r
p
r
p1
, and r
p
is the cumulative
ratio by which one line is partitioned and which is also used to partition the other line. Fig. 4 shows four
straight segments denoted byP
1
P
2
, P
2
P
3
, P
3
P
4
, and P
5
P
6
. If the vertices of the rst three straight segments are
mapped onto straight segment P
5
P
6
, there exist points P
0
1
, P
0
2
, P
0
3
, and P
0
4
on P
5
P
6
such that the following
conditions hold: P
0
1
P
5
, P
0
4
P
6
,
Len P
0
1
P
0
2
_ _
Len P
0
1
P
0
2
_ _
Len P
0
2
P
0
3
_ _
Len P
0
3
P
0
4
_ _
Len P
1
P
2
Len P
1
P
2
Len P
2
P
3
Len P
3
P
4
and
Len P
0
1
P
0
2
_ _
Len P
0
2
P
0
3
_ _
Len P
0
1
P
0
2
_ _
Len P
0
2
P
0
3
_ _
Len P
0
3
P
0
4
_ _ _ _
Len P
1
P
2
Len P
2
P
3
Len P
1
P
2
Len P
2
P
3
Len P
3
P
4
,
where Len represents the length of a segment.
The average shape dissimilarity measure given in Eq. (2) provides an average of the shape dissimilarity
values. The shape dissimilarity value at any point on the simplied line B satises the following condition:
d
y
A; B AngDiff
~
y
A
r ;
~
y
B
r
_ _
pMax_d
y
A; B max
r
AngDiff
~
y
A
r ;
~
y
B
r
_ _
. (4)
The right-hand side of the inequality in Eq. (4), Max_d
y
A; B, is an upper bound of the shape dissimilarity
values between lines A and B.
2.2. A measure for one xed line and one randomly perturbed line
According to the notion of a xed line in Section 2.1, a xed line is interpreted as a line without positional
error or uncertainty (or the true line). A randomly perturbed line is a line with positional error or uncertainty,
the equation for which has random variable(s) assumed to follow specic probability distribution(s).
When one line is xed and the other line contains an error, calculating the average shape dissimilarity value
by substituting the xed line and the mean position of the randomly perturbed line (where the mean position
can be the expectation or the least-squares estimate of the randomly perturbed line) for A and B in Eq. (2) or
(3) yields an estimate of the average shape dissimilarity value, under the assumption that the mean position of
the randomly perturbed line is accurate or equal to the statistical population mean. The average dissimilarity
value becomes very misleading when the assumption is not valid. Nevertheless, given a predened acceptable
ARTICLE IN PRESS
0
( ) , AngDiff
0
( ) , AngDiff
Fig. 3. Minimum angle difference between a and b.
1
P
2
P
3
P
4
P
5
P
6
P
Map points onto
segment
1
P
2
P
3
P
4
P
5 1
' P P =
6 4
' P P =
'
3
P
'
2
P
Fig. 4. Representation of one line mapped onto another line.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 466
level, we can conclude that the randomly perturbed line is similar to the xed line in terms of shape if the
maximum value of the average shape dissimilarity measure is greater than the acceptable level.
An analytical method is presented here to estimate the average shape dissimilarity measure under
uncertainty. The analytical method has the following restriction: the original line is composed of straight
segments and its vertices are inside their statistical condence regions or their surveying error ellipses. The
condence regions or error ellipses are created under the assumption that measurement errors at the vertices
are bivariate normally distributed. For the original curved line, it should be split into straight segments.
When the number of split straight segments is sufciently large, these segments eventually approximate the
curved line.
In the analytical method, the original line (with measurement errors) is considered as a randomly perturbed
line. The error ellipse for each vertex of the original line is a region that contains the least-squares estimate
with a specic condence if it is centered at the true position (which is always unknown), and is a condence
region if it is centered at the least-squares estimate of the position. Both the error ellipse and the condence
region designate precision regions of different probabilities. In the problem of the average shape dissimilarity
measure under uncertainty in the following discussion, the center of each error ellipse is considered as the
least-squares estimate of a vertex of the original line. If the true positions of all vertices of the original line are
given, the average shape dissimilarity measure should be determined according to the approach given in
Section 2.1. Moreover, the xed line refers to the simplied line (the line composed of straight segments and
derived from the simplication). The simplied line is a nal representation of the original line. Therefore,
the shape of the simplied line is directly compared with that of the original line with measurement errors.
This shape comparison shows the shape difference between the nal representation on the map and the
exact representation in the world.
We now elaborate on the analytical method. With reference to Fig. 5, a randomly perturbed straight
segment C represents the original line of two vertices cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
, and the least-squares estimate
of the original line is the solid line with two error ellipses at its endpoints. The error ellipses for the two vertices
of the randomly perturbed straight segment C (having dashed lines as boundaries in the gure) are derived
from their covariance matrices:
cx
0
cx
0
2
s
2
cx0
2r
cx0;cy0
cx
0
cx
0
cy
0
cy
0
s
cx0
s
cy0
cy
0
cy
0
2
s
2
cy0
1 r
2
cx0;cy0
_ _
w
2
p
1 a ;
cx
1
cx
1
2
s
2
cx1
2r
cx0;cy0
cx
1
cx
1
cy
1
cy
1
s
cx1
s
cy1
cy
1
cy
1
2
s
2
cy1
1 r
2
cx1;cy1
_ _
w
2
p
1 a ;
_
_
(5)
where the least-squares estimates of cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
are cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
respectively; the
correlation coefcient of cx
0
and cy
0
is r
cx0;cy0
; the correlation coefcient of cx
1
and cy
1
is r
cx1;cy1
; their
covariance matrices are
s
2
cx0
r
cx0;cy0
s
cx0
s
cy0
r
cx0;cy0
s
cx0
s
cy0
s
2
cy0
_ _
ARTICLE IN PRESS
( )
0 0
, cy cx
( )
1 1
, cy cx
tangent1
Segment C obtained
from least squares
estimate
tangent4
tangent3
tangent2
Fixed Segment D
( )
0 0
, dy dx
( )
1 1
, dy dx
Fig. 5. Randomly perturbed straight segment C that has two disjoint error ellipses around its vertices.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 467
and
s
2
cx1
r
cx1;cy1
s
cx1
s
cy1
r
cx1;cy1
s
cx1
s
cy1
s
2
cy1
_ _
;
and w
2
p
a is the upper 100 (1a)th percentile of a w
2
distribution with p degrees of freedom and p 2. When
a 0.394, Eq. (5) gives the standard error ellipse. The other solid straight segment in the gure is a xed
straight segment D of two vertices, dx
0
; dy
0
_ _
and dx
1
; dy
1
_ _
, representing a simplied line.
The maximum shape dissimilarity measure in the case of one xed straight segment and one randomly
perturbed straight segment can be considered as a non-linear programming problem an objective function
with all non-linear constraints:
MaxAngDiff
~
y
C
;
~
y
D
_ _
Maximum
cx
0
;cy
0
and cx
1
;cy
1
d
y
C; D
subject to cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
fall inside their corresponding error ellipses;
(6)
where d
y
C; D AngDiff
~
y
C
;
~
y
D
_ _
, and
~
y
C
and
~
y
D
represent the inclination angle of straight segments C and
D, respectively.
Instead of solving this difcult non-linear programming problem, we estimate the function
MaxAngDiff
~
y
C
;
~
y
D
_ _
based on common tangents to the two error ellipses for cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
.
2.2.1. Case 1: the two error ellipses neither overlap nor touch
The four common tangents to the two error ellipses are represented by dotted lines in Fig. 5 and are called
tangent1, tangent2, tangent3, and tangent4, where the arrow shows the direction of each tangent. Any two of
the common tangents must not be in opposite directions, so the difference between the inclination angles of the
two tangents should not be greater than 1801. In this example, the inclination angles of the rst three tangents
are in the rst quadrant in a Cartesian coordinate system (0o
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
o90
) and the
inclination angle of tangent4 is in the fourth quadrant (270
o
~
y
Tangent4
o360
max
~
y
D
Max
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
_ _
;
_
360
~
y
D
~
y
Tangent4
_
. 7
The determination of MaxAngDiff
~
y
C
;
~
y
D
_ _
is based on the angle interval I
y
containing
~
y
Tangent i
for all
i 2 1; 2; 3; 4 f g, and the comparison between this angle interval and
~
y
D
. The width of the angle interval should
not exceed 1801, since it has been mentioned that the difference between the inclination angles of any two
tangents should not be greater than 1801. The lower and upper bounds of the angle interval equate to the
minimum and maximum of
~
y
Tangent i
for all i 2 1; 2; 3; 4 f g, respectively, if the difference between the minimum
and maximum is not greater than 1801 (refer to Fig. 7). Otherwise, the inclination angle interval is the union of
two subintervals: the angle between 01 and the maximum of the tangent inclination angles not greater than
1801, 0; max
i
~
y
Tangent i
~
y
Tangent i
p180
_ _
_ _
, and the angle between the minimum of the tangent inclination
ARTICLE IN PRESS
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 468
angles greater than 1801, min
i
~
y
Tangent i
~
y
Tangent i
4180
_ _
; 360
_ _
. Fig. 6 shows an example of this case. The
algorithm for computing the function MaxAngDiff
~
y
C
;
~
y
D
_ _
is as follows:
Step 1:
Compute MinTangentAngle min
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
;
~
y
Tangent4
_ _
and
MaxTangentAngle max
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
;
~
y
Tangent4
_ _
.
Step 2:
If MaxTangentAngle MinTangentAngle
p180
then I
y
MinTangentAngle; MaxTangentAngle
else MaxTangentAngle max
i
~
y
Tangent i
~
y
Tangent i
p180
_ _
MinTangentAngle min
i
~
y
Tangent i
~
y
Tangent i
4180
_ _
I
y
0; MaxTangentAngle [ MinTangentAngle; 360
.
Step 3:
If
~
y
D
2 I
y
or
~
y
D
180
eI
y
and
~
y
D
180
eI
y
_ _
then Difference0 AngDiff y
D
; MinTangentAngle
Difference1 AngDiff y
D
; MaxTangentAngle
if Difference04180
Difference0
if Difference14180
Difference1
MaxAngDiff
~
y
C
;
~
y
D
_ _
max Difference0; Difference1
_ _
else MaxAngDiff
~
y
C
;
~
y
D
_ _
180
.
2.2.2. Case 2: the two error ellipses touch at one point
When the error ellipses for cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
touch at one point, there are three common tangents to
the two error ellipses: tangent1, tangent2, and tangent3. With reference to Fig. 8, tangent1 and tangent2 are
monodirectional and tangent3 is bidirectional. The inclination angles of the rst two monodirectional tangents
ARTICLE IN PRESS
tangent1
tangent4
tangent3
tangent2
Fixed Segment D
0
270
180
90
[ ] [ ) = 360 ,
~ ~
, 0
4 3 Tangent Tangent
I
[ ]
4 2
~
,
~
Tangent Tangent
I
=
tangent4
Fixed segment
Fig. 7. Inclination angle interval I
y
for four tangents in case the difference among inclination angles of tangents is not greater than 1801.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 469
are denoted by
~
y
Tangent1
and
~
y
Tangent2
. Line tangent3 has two inclination angles denoted by
~
y
Tangent3
(where
0p
~
y
Tangent3
o180
) and
~
y
Tangent4
180
~
y
Tangent3
. In a manner similar to the above case, the inclination angle
interval I
y
containing
~
y
Tangent i
for all i 2 1; 2; 3; 4 f g is estimated; however, the width of the inclination angle
interval in this case is 1801 because tangent3 is birectional. The inclination angle interval is either
~
y
Tangent3
;
~
y
Tangent4
_ _
or 0
;
~
y
Tangent3
_ _
[
~
y
Tangent4
; 360
_ _
, which must contain
~
y
Tangent1
and
~
y
Tangent2
:
Step 1:
If
~
y
Tangent1
;
~
y
Tangent2
; 2
~
y
Tangent3
;
~
y
Tangent4
_ _
then I
y
~
y
Tangent3
;
~
y
Tangent4
_ _
else I
y
0;
~
y
Tangent3
_ _
[
~
y
Tangent4
; 360
_ _
.
In the nal step, the inclination angle interval is compared with
~
y
D
to nd MaxAngDiff
~
y
C
;
~
y
D
_ _
(as shown
in Step 3 in the previous case).
2.2.3. Case 3: the two error ellipses overlap
For the case in which two error ellipses for cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
overlap, the function
MaxAngDiff
~
y
C
;
~
y
D
_ _
is 1801.
Replacing the function AngDiff
~
y
A
;
~
y
B
_ _
of Eqs. (2)(4) with the function MaxAngDiff
~
y
A
;
~
y
B
_ _
gives the
shape dissimilarity measures under uncertainty, including the average of the maximum shape dissimilarity
measure in the continuous case in Eq. (8), the average of the maximum shape dissimilarity measure in the
discrete case in Eq. (9), and the upper bound of the maximum shape dissimilarity measure in Eq. (10):
Average_d
y
A; B
_
1
r0
MaxAngDiff
~
y
A
r ;
~
y
B
r
_ _
dr, (8)
Average_d
y
A; B
r
p
MaxAngDiff
~
y
A
r
p
_ _
;
~
y
B
r
p
_ _ _ _
Dr
p
, (9)
d
y
A; B pMax_d
y
A; B max
r
MaxAngDiff
~
y
A
r ;
~
y
B
r
_ _
. (10)
3. A simple example
The above section gives the analytical model used to assess shape dissimilarity between a line and its
simplied version, where the shape dissimilarity degree relates to the generalization error associated with line
simplication and the propagation of the positional error from the original line. This section aims to illustrate
computation of the average shape dissimilarity values for the least-squares estimate of the original line and the
original line with positional error, respectively.
ARTICLE IN PRESS
( )
0 0
, cy cx
( )
1 1
, cy cx
tangent1
Segment C obtained
from least squares
estimate
tangent3
tangent2
Fixed Segment D
( )
0 0
, dy dx
( )
1 1
, dy dx
Fig. 8. Randomly perturbed straight segment C that has two vertices with error ellipses touching each other.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 470
Fig. 9 shows a non-curved polyline E with three vertices, ex
0
; ey
0
_ _
, ex
1
; ey
1
_ _
, and ex
2
; ey
2
_ _
, with least-
squares estimates of ex
0
; ey
0
_ _
0; 0, ex
1
; ey
1
_ _
10; 1, and ex
2
; ey
2
_ _
20; 0 (in mm on a 1:500 scale
map, for example) and their covariance matrices of
0:02 0:002
0:002 0:03
_ _
;
0:03 0:001
0:001 0:02
_ _
; and
0:01 0:000
0:000 0:01
_ _
where the data are articial. Its simplied line E
0
is of vertices ex
0
0
; ey
0
0
_ _
0; 0 and ex
0
1
; ey
0
1
_ _
20; 0 .
To compute the average shape dissimilarity values, the weight factor r
p
in Eqs. (3) and (9) is determined rst.
It is found that in this example the cumulative ratios r
0
, r
1
, and r
2
for partitioning E are 0, 0.5, and 1, and so
the cumulative ratio r
p
in Eqs. (3) and (9) is suitable.
The next step is to compute functions AngDiff
~
y
E
r
p
_ _
;
~
y
E
0 r
p
_ _ _ _
and MaxAngDiff
~
y
E
r
p
_ _
;
~
y
E
0 r
p
_ _ _ _
. The
inclination angle y of a straight segment between any two points x
0
; y
0
_ _
and x
1
; y
1
_ _
is expressed as
tan y
y
1
y
0
x
1
x
0
(11)
where 0
pyo360
. As E
0
is supposed to be xed in the analytical model, function
~
y
E
0
s
p
_ _
is 01 for all s
p
. For
the original polyline E, the inclination angle of its least-squares estimate is
~
y
E
r
p
_ _
5:711
for r
p
o0:5;
354:289
for 0:5pr
p
o1:
_
Then, the function AngDiff
~
y
E
;
~
y
E
0
_ _
is
AngDiff
~
y
E
;
~
y
E
0
_ _
5:711
0 for r
p
o0:5;
360
354:289
j j for 0:5pr
p
o1;
_
5:711
.
Therefore, both the average and the upper bound of the shape dissimilarity values at all points on E
0
, under
the assumption that the least-squares estimate is free from error, are 5.7111.
The function MaxAngDiff
~
y
E
;
~
y
E
0
_ _
is derived based on the inclination angle interval of E and the
inclination angle of E
0
. The inclination angle interval is estimated based on the error ellipse around each vertex
of the line. According to Section 2.1, the inclination angle
~
y
E
r
p
_ _
falls inside the inclination angle interval I
y
:
I
y
3:915
; 7:507
for 0pr
p
o0:5;
352:909
; 355:668
for 0:5pr
p
o1:
_
The maximum angle difference between I
y
r
p
_ _
and
~
y
E
0
r
p
_ _
is
MaxAngDiff
~
y
E
r
p
_ _
;
~
y
E
0 r
p
_ _ _ _
7:507
j j for 0pr
p
o0:5;
360
352:909
j j for 0:5pr
p
o1;
_
7:507
for 0pr
p
o0:5;
7:091
for 0:5pr
p
o1:
_
According to Eqs. (9) and (10), the average shape dissimilarity value for the original line with positional
error is 7.5071 (0.50)+7.0911 (10.5) 7.2991 and the shape dissimilarity value at any point on E
0
is not
greater than 7.5071 (the upper bound of the shape dissimilarity values at all points on E
0
). This average
ARTICLE IN PRESS
(0,0)
(10,1)
(20,0)
' E
E
Fig. 9. An original polyline E of three vertices and its simplied version E
0
(not to scale).
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 471
measure, which also considers the error propagation from the original line, is greater than the average shape
dissimilarity value for the least-squares estimate of the original line (5.7111).
4. A real case study on comparing the measures
This section numerically compares the average shape dissimilarity measure in the certain and uncertain
cases for different simplied versions of a coastline. The simplied lines are generated from three automatic
line-simplication algorithms: the DouglasPeucker algorithm, the perpendicular distance routine, and the
Nth point routine. Fig. 10 shows the simplied lines of 500 points, which are approximately ordered from
higher similarity (Fig. 10b) to lower similarity (Fig. 10d) relative to the original coastline (Fig. 10a). The
average shape dissimilarity value for each simplied line is tabulated in Table 1. This table also gives a similar
result to Fig. 10. However, the average shape dissimilarity value for the original coastline with uncertainty is
greater than that for the least-squares estimate of the original coastline. Therefore, the uncertainty in the
original coastline needs to be considered in the simplication problem, especially when a predened acceptable
level is given to ensure the accuracy of the simplied line.
5. Discussion of the shape dissimilarity measures
The average shape dissimilarity value gives an average inclination angle difference between the original line
and its simplied version. It generally describes the shape difference between the two lines. This shape
dissimilarity value has been computed in two cases in the previous examples. One is the uncertain case in
which the generalization error associated with line simplication and the propagation of positional error from
the original line are considered. In this case, the original line and its simplied version have dissimilar shapes if
the average value is greater than a predened acceptable level. In the certain case, in which the positional
error for the original line is ignored, the average shape dissimilarity value equates to the inclination angle
difference between the simplied line and the least-squares estimate of the original line. If the average in this
case is also greater than the predened acceptable level, the same conclusion as the average shape dissimilarity
measure in the uncertain case will be obtained.
ARTICLE IN PRESS
Fig. 10. (a) Original representation of a coastline and a set of 500-point simplied lines of coastline generated by (b) the DouglasPeucker
algorithm, (c) perpendicular distance routine and (d) Nth point routine.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 472
However, the average shape dissimilarity value in the uncertain case and that in the certain case may
not always lead to the same conclusion. Given the predened acceptable level between the average shape
dissimilarity values in the uncertain and certain cases, the conclusion drawn from the uncertain case is
the original line and its simplied version have dissimilar shapes, while for the certain case the conclusion
is the original line and its simplied version have a similar shape. In such a situation, the simplied line is
possibly dissimilar to the original line and possibly similar to the original line.
The average shape dissimilarity measure shows the degree of shape dissimilarity between the original line
and its simplied data for three possibilities:
(a) high dissimilarity if the average shape dissimilarity values for both the uncertain and certain cases are
greater than the predened acceptable level;
(b) possible dissimilarity if the average shape dissimilarity value in the uncertain case is greater than the
predened acceptable level, but the value in the certain case is smaller; and
(c) low dissimilarity if the average shape dissimilarity values in both cases are smaller than the predened
acceptable level.
The average shape dissimilarity measure can be used to control a line simplication process. A higher
dissimilarity degree infers that the positional error in the simplied line is greater, resulting from propagation
of the positional error for the original data and the generalization error induced by line simplication. In the
case of high dissimilarity, the original data are probably oversimplied. The GIS user may consider
simplifying the original line again with different parameters in the line simplication algorithm, or redening
the acceptable level such that the shape dissimilarity values in the uncertain and certain cases are lower
than the acceptable level. For the case of possible dissimilarity, the GIS user should determine whether the
simplied data are applicable for the GIS application based on visual comparison, or consider redening
the acceptable level. The ideal case is low dissimilarity, implying that the simplied version can represent the
original data well with fewer characteristic points.
6. Conclusion
This paper presents an analytical model for assessing the quality of a map derived by line simplication. The
purpose of line simplication is to represent geographical objects with a few characteristic points. A map
derived by line simplication is an abstract of the original map, so the quality of this map is worse than that of
the original map. The quality of the simplied map can be determined from the shape dissimilarity measure
proposed in this paper.
The proposed shape dissimilarity measure is an objective and automatic quantitative measure of the degree
of dissimilarity between the original line and its simplied version generated by line simplication. By
imitating human analysis of dissimilarity, it can potentially replace the traditional method of assessing
dissimilarity subjectively carried out by humans. The shape dissimilarity measure is also an analytical tool for
determining whether a simplied line is appropriate for further applications to GIS problems based on a user-
dened acceptable level.
ARTICLE IN PRESS
Table 1
Average shape dissimilarity value for individual simplied lines generated from different line simplication algorithms
Line simplication algorithm Average_d
y
Least-squares estimate of original coastline Original coastline with measurement errors
DouglasPeucker algorithm 13.712 19.976
Perpendicular distance routine 27.477 31.431
Nth point routine 28.039 33.197
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 473
Existing error measures for line simplication mainly discuss the generalization error induced by line
simplication. There are a few research works on assessing a combination of the generalization error and the
propagation of positional error in the original line. These studies proposed more comprehensive dissimilarity
measures based on individual size and distance measurements. The analytical model presented here proposes
another comprehensive dissimilarity measure based on shape measurement to assess the quality of the
simplied map by considering both error sources. Furthermore, the example given shows that an estimate of
the quality of the simplied map, under the assumption that there is no positional error in the original map, is
smaller than the quality estimate for the result from the analytical model. The example also shows that the
proposed measure can indicate different degrees of dissimilarity between the original line and its simplied
version.
The proposed analytical model classies the degree of dissimilarity between the original data and the
simplied version into high dissimilarity, possible dissimilarity, and low dissimilarity according to a user-
dened acceptable level, which can be varied for different spatial problems. These three possibilities can show
how well the simplied data represent the original line and let the data user determine whether the original line
should be simplied again with another line simplication algorithm or the user-dened acceptable level
should be redened.
The performance of line simplication algorithms has been evaluated for over two decades. However,
positional errors in the original map have not been considered in these evaluations. In further studies, the
analytical model presented in this paper will be implemented for a range of line simplication algorithms to
assess their performance.
In conclusion, the proposed shape dissimilarity measure is not only applicable in assessing positional error
for line simplication in cartography and GIS, but is also potentially applicable to object-based shape
comparison or recognition (especially for a linear object) in other elds, such as computer vision or remote
sensing. In computer vision, for example, a shape is compared with another shape that is found in an image
database. If these two shapes are dissimilar, the vision system will report a mismatch and return a dissimilarity
measure reecting how poor that mismatch is. To provide a more comprehensive shape comparison, therefore,
it is necessary to further develop this study to measure the shape dissimilarity between two polygonal objects.
Acknowledgements
The authors thank the Surveying and Mapping Ofce, the Government of Hong Kong Special
Administrative Region for providing the map. The work described in this paper was supported by the
Hong Kong Polytechnic University (Project no. G-YX27).
References
Alesheikh, A.A., (1998). Modeling and managing uncertainty in object-based geospatial information system. Ph.D. Dissertation, The
University of Calgary, Alberta, Canada, 188pp.
Amato, N.M., Bayazit, O.B., Dale, L.K., Jones, C., Vallejo, D., 1998. Choosing good distance metrics and local planners for probabilistic
roadmap methods. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Leuven, Belgium,
pp. 630637.
Amrhein, C.G., Grifth, D.A., (1994). Errors in spatial databases: a summary of results from several research projects. In: Proceedings of
the International Symposium on the Spatial Accuracy of National Data Bases, pp. 214226.
Blakemore, M., 1984. Generalization and error in spatial databases. Cartographica 21 (2/3), 131139.
Butteneld, B.P., 1985. Treatment of the cartographic line. Cartographica 22 (2), 126.
Butteneld, B.P., 1991. A rule for describing line feature geometry. In: McMaster, R.B., Butteneld, B.P. (Eds.), Map Generalization:
Making Rules for Knowledge Representation. Longman, Harlow, pp. 150171.
Caspary, W., Scheuring, R., 1992. Error-bands as measures of geometrical accuracy. In: Proceedings of the Third European Conference on
GIS. Munich, Germany, pp. 227233.
Cheung, C.K., Shi, W.Z., 2004. Estimation of the positional uncertainty in line simplication in GIS. The Cartographic Journal 41 (1),
3745.
Cheung, T.C.K., (2003). Assessing positional and modeling uncertainties in vector-based spatial processes and analyses in geographical
information systems. Ph.D. Dissertation, The Hong Kong Polytechnic University, Hong Kong, 257pp.
ARTICLE IN PRESS
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 474
Chrisman, N.R., 1982. A theory of cartographic error and its measurement in digital data base. In: Proceedings of the Auto Carto 5.
Crystal City, VA, pp. 159168.
Dreyfus, H.L., Dreyfus, S.E., 1986. Mind Over Machine: the Power of Human Intuition and Expertise in the Era of the Computer. Free
Press, New York 231pp.
Dubuisson, M., Jain, A.K., 1994. A modied Hausdorff distance for object matching. In: Proceedings of the International Conference on
Pattern Recognition. Jerusalem, Israel, pp. 566568.
Dutton, G., 1992. Handling positional uncertainty in spatial databases. In: Proceedings of the fth International Symposium on Spatial
Data Handling. South Carolina, USA, pp. 460469.
Easa, S.M., 1995. Estimating line segment reliability using Monte Carlo simulation. Surveying and Land Information Systems 55 (3),
136141.
Goodchild, M.F., Hunter, G.J., 1997. A simple positional accuracy measure for linear features. International Journal of Geographical
Information Science 11, 299306.
Guienko, G., Doytsher, Y., 2003. Geographic information system data for supporting feature extraction from high-resolution aerial and
satellite images. Journal of Surveying Engineering 129 (4), 158164.
Huttenlocher, D.P., Klanderman, G., Rucklidge, W., 1993. Comparing images using the Hausdorff distance. IEEE Transactions on
Pattern Analysis and Machine Intelligence 15 (9), 850863.
Jasinski, M.J., 1990. Comparison of complexity measures for cartographic lines. NCGIA Technical Report 90-1.
Jenks, G.F., 1985. Linear simplication: how far can we go? In: Proceedings of the 10th Annual Meeting. Canadian Cartographic
Association, Fredericton, NB.
Joa o, E.M., 1998. Causes and Consequences of Map Generalization. Taylor & Francis, London 266pp.
Little, A.R., (1989). An evaluation of selected computer-assisted line simplication algorithms in the context of map accuracy standards.
In: Proceedings of the 1989 ASPRS/ACSM Annual Conventions, American Society for Photogrammetry and Remote Sensing and
American Congress on Surveying and Mapping, vol. 5, pp. 122132.
McMaster, R.B., 1986. A statistical analysis of mathematical measures for linear simplication. The American Cartographer 13 (2),
103117.
McMaster, R.B., 1987a. Automated line generalization. Cartographica 27, 74111.
McMaster, R.B., 1987b. The geometric properties of numerical generalization. Geographical Analysis 19, 330346.
Perkal, J., 1966. On the length of empirical curves: discussion paper 10. Michigan Inter-University Community of Mathematical
Geographers, Ann Arbor.
Rote, G., 1991. Computing the minimum Hausdorff distance between two point sets on a line under translation. Information Processing
Letters 38, 123127.
Shahriari, N., Tao, V., 2002. Minimizing positional errors in line simplication using adaptive tolerance values. In: Proceedings of the
Symposium on Geospatial Theory. Processing and Applications, Ottawa.
Shi, W.Z., (1994). Modeling Positional and Thematic Uncertainty in Integration of GIS and Remote Sensing. ITC Publication 22.
Singh, R., Papanikolopoulos, N.P., 2000. Planar shape recognition by shape morphing. Pattern in Recognition 33 (10), 16831699.
Srisuk, S., Tamsri, M., Fooprateepsiri, R., Sookavatana, P., Sunat, K., 2003. A new shape matching measure for nonlinear distorted
object recognition. In: Proceedings of the VIIth Digital Image Computing: Techniques and Applications. Sydney, Australia,
pp. 339348.
Stanfel, L.E., Stanfel, C., 1993. A model of the reliability of a line connecting uncertain points. Surveying and Land Information Systems
53 (1), 4952.
Veregin, H., 2000. Quantifying positional error induced by line simplication. International Journal of Geographical Information Science
14, 113130.
Veregin, H., Dai, X., 1999. Minimizing positional error induced by line simplication. In: Proceedings of the International Symposium on
Spatial Data Quality. The Hong Kong Polytechnic University, Hong Kong.
White, E.R., 1985. Assessment of line-generalization algorithms using characteristic points. The American Cartographer 12, 1727.
ARTICLE IN PRESS
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 475