Sunteți pe pagina 1din 14

Computers & Geosciences 32 (2006) 462475

Positional error modeling for line simplication based on


automatic shape similarity analysis in GIS
Chui Kwan Cheung

, Wenzhong Shi
Advanced Research Center for Spatial Information Technology, Department of Land Surveying and Geo-informatics,
The Hong Kong Polytechnic University, Hong Kong, China
Received 3 November 2004; received in revised form 17 June 2005; accepted 17 August 2005
Abstract
Automatic generalization is a process for representing geographical objects with different degrees of detail on a digital
map. The positional error for each geographical object is propagated through the process and a generalization error is also
introduced by the generalization. Previous research has focused mainly on measuring the generalization error. This paper
presents an analytical model for assessing the positional error in the generalized object by considering both error
propagation from the original data and the generalization error. The analytical model provides a shape dissimilarity value
that indicates the shape difference between the original data with a positional error and its simplied version. This model is
able to objectively and automatically determine the applicability of the generalized data for further applications to
geographical information system (GIS) problems. It can also deal with a large amount of data in GIS. Therefore, the
analytical model presented, which provides a more comprehensive shape measure for assessing positional error in data
derived from the generalization, is valuable in the development of automatic generalization.
r 2005 Elsevier Ltd. All rights reserved.
Keywords: GIS; Automatic generalization; Positional error; Generalization error; Shape dissimilarity measure
1. Introduction
To present geographical objects on either a paper map or a digital version, many cartographic decisions are
made manually or automatically. Important characteristics of geographical objects are preserved and
unwanted details, for example, are eliminated. Cartographic decisions are taken to improve the quality of
geographical objects on the paper or digital map at different scales and with different degrees of detail, or to
reduce data storage for the digital map, or both (Joa o, 1998).
Cartographic decisions in manual generalization are performed by humans using a number of
generalization processes: (a) selection of geographical objects and thematic attributes to be presented on
the map; (b) simplication of the retained object; (c) classication for grouping similar retained objects into
one class; and (d) symbolization for displaying the retained objects on the map with visual clarity. As part of
ARTICLE IN PRESS
www.elsevier.com/locate/cageo
0098-3004/$ - see front matter r 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cageo.2005.08.002

Corresponding author. Tel.: +85227664347; Fax: +85223302994.


E-mail address: lstckc@polyu.edu.hk (C.K. Cheung).
the development of digital techniques in cartography, these generalization processes can be carried out
automatically with different algorithms: simplication algorithms for weeding out redundant points or lines
based on some geometric criterion; smoothing algorithms for relocating points of the object to show more
signicant information; displacement algorithms for shifting two objects at a reduced scale to prevent overlap;
and enhancement algorithms for regenerating details of an already simplied object. These algorithms mimic
human analysis.
Representations of geographical objects after generalization are abstracts of the corresponding geographical
objects in the universe. These representations contain generalization errors. Line simplication is the most
commonly applied generalization process in commercial geographical information system (GIS) software
packages (Veregin and Dai, 1999). The positional accuracy of the digital map resulting from a line simplication
process is affected by the generalization error associated with the process. The quality of the source map
available for line simplication also greatly affects the positional accuracy of the resultant digital map.
Assessment of generalization errors associated with cartographic line simplication has been widely
discussed in the literature (Butteneld, 1985, 1991; Jenks, 1985; McMaster, 1986; Jasinski, 1990; Amrhein and
Grifth, 1994; Veregin, 2000; Cheung and Shi, 2004). The generalization error can be measured based on
linear attribute and displacement measurements (McMaster, 1986). In linear attribute measurements, a
similarity (or dissimilarity) function is derived from a geometrical characteristic of each original object and of
the corresponding object derived from the line simplication. The geometrical characteristic can be length,
area, volume, density, moments, convexity, curvature, or bending energy (Amrhein and Grifth, 1994; Singh
and Papanikolopoulos, 2000; Shahriari and Tao, 2002; Guienko and Doytsher, 2003). The rst four
characteristics are related to size, the last three pertain to shape, and moments can be relevant to size or shape,
depending on their order.
Displacement measurements the other approach to generalization error assessment originated in the
gap between the source object and its simplied version. Such a gap can be quantied in terms of a distortion
polygon (White, 1985; McMaster, 1987a, b; Butteneld, 1991), uniform-distance distortion (Veregin, 2000;
Cheung and Shi, 2004), a displacement vector (McMaster, 1986), and critical distance (Little, 1989; Veregin,
2000). All of them are Euclidean metrics, although there are other distance metrics, such as Minkowski and
Manhattan metrics. Amato et al. (1998) have evaluated different distance metrics and have given
recommendations for selecting appropriate combinations of them. In addition, the Hausdorff distance is a
very generic technique used to dene a distance between two non-empty sets. This metric has already been
used in addressing the problem of shape matching (Rote, 1991; Huttenlocher et al., 1993). Srisuk et al. (2003,
p. 344) stated, the conventional Hausdorff distance is not robust to the presence of noise and so there are a
diversity of modied Hausdorff distances that cope with the presence of noise (Dubuisson and Jain, 1994).
Positional error in the original map can be gauged by three main approaches: error-band models (Perkal,
1966; Chrisman, 1982; Blakemore, 1984; Goodchild and Hunter, 1997), error distribution models (Caspary
and Scheuring, 1992; Dutton, 1992; Shi, 1994), and quantity assessment models (Stanfel and Stanfel, 1993;
Easa, 1995). Error-band models provide a region that is deemed to contain the true location of a specic
geographical object. For error distribution models, some show the error distribution of the geographical
object visually, whereas some estimate its probability distribution. Quantity assessment models supply
numerical measurements of the error for the geographical object. Alesheikh (1998) and Cheung (2003) have
provided more detailed discussions of these approaches.
Many research works in the quality assessment of line simplication in cartography and error assessment in
GIS have been developed to study the generalization error associated with simplication and the positional
error on the original map to be simplied, respectively. There are a few research studies of integration of the
generalization error and error propagation from the original map. Amrhein and Grifth (1994) have proposed
the total mean-square error of the simplied line based on an arc length measurement, which is a linear
attribute measurement and pertains to size. Cheung and Shi (2004) have estimated the maximum uncertainty
in the generalized line based on a displacement measurement. Both the total mean-square error and maximum
uncertainty of the simplied line are more comprehensive than the other existing error indicators.
In this paper, we present another comprehensive measure a shape dissimilarity measure that is based on a
linear attribute measurement pertaining to shape. This shape dissimilarity measure quanties the degree of
shape difference between the original representation of the line to be simplied and its simplied version. It
ARTICLE IN PRESS
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 463
expresses shape change in terms of two variables, including the angle of inclination of the original line and that
of its simplied version, where the inclination angle is the anticlockwise angle between the x-axis and a given
straight line. The shape dissimilarity measure can determine whether the original representation and the
simplied version are dissimilar in shape: two shapes are dissimilar if their inclination angle difference is
greater than a user-dened threshold (or maximum permissible error). This measure is further elaborated in
Section 2.
2. A shape dissimilarity measure based on angle of inclination
Every map requires some degree of generalization and shows the real world it represents with different
degrees of detail. The content of each map is a recognizable representation of the real world, so the original
representation and its simplied version of each object should be similar to each other. Human beings have the
ability to recognize the similarity between objects and distinguish different similarity degrees based on shape,
size, and displacement in the generalization problem. However, this direct process for human beings is very
complicated for the computer (Dreyfus and Dreyfus, 1986). The computer determines the similarity degree
from a function of a specic geometrical characteristic or displacement between objects, where this function is
also called a dissimilarity measure. The dissimilarity measure varies inversely with the similarity.
In this paper, a measure of shape dissimilarity is considered. In pattern recognition, shape is independent of
the position of objects. Some space transformations are not able to alter the shape similarity (or dissimilarity)
between two objects of the same feature type (point, line, or polygon). In general, the shape similarity or
dissimilarity measure must be invariant with translation, scaling, and rotation transformations.
For line objects on a map, the characteristics involved are length and angle of inclination. The length of a
line is the measure from one endpoint to the other endpoint along the line. It varies with the scaling
transformation. The length (pertaining to size) is thus not considered to be one specic characteristic on the
shape dissimilarity measure. The angle of inclination is dened as the angle between a tangent to a curved arc
(or between a straight segment) and the x-axis. It is invariant with translation, scaling and rotation
transformations. Therefore, the angle of inclination is used for measuring the shape dissimilarity between line
objects in this paper.
The inclination angle y
A
x; y of a line A gives the counterclockwise angle between the line (or the tangent to
the line) and the x-axis at point x; y (Fig. 1). The inclination angle of a polyline composed of straight
segments is a piecewise constant function, which increases or decreases at the vertices and remains constant
between two consecutive vertices. For a curved arc, it is a continuous function. Its domain is all real numbers,
whereas its range is all non-negative angles less than 3601.
2.1. A measure for one xed line and its simplied line
The inclination angle of a xed line A at point Ax; Ay is denoted by y
A
Ax; Ay. The inclination angle of
another xed line B at point Bx; By is denoted by y
B
Bx; By. The term xed line refers to the line of a
known equation from which the coordinate of any point on the line can be derived exactly. Consider that line
ARTICLE IN PRESS
( )
1 1
, y x
A

0
P
1
P
2
P
Curved arc A
( )
0 0
, y x
A

( )
2 2
, y x
A

Polyline A
( )
1 1
, y x
A

( )
0 0 0
, y x P =
( )
1 1 1
, y x P =
( )
2 2 2
, y x P =
( )
3 3 3
, y x P
Fig. 1. A polyline composed of straight segments with different orientations at its vertices, and a curved arc, along which orientation
varies.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 464
A is simplied into line B, which contains straight segments only. It is assumed that each straight segment of
line B is then a simplied version of some arcs of line A. This assumption is always valid when the simplied
version is generated from an automatic line simplication algorithm. For example, one simplied version of
line A in Fig. 1a is a polyline having two straight segments: P
0
P
2
and P
2
P
3
. The rst straight segment P
0
P
2
of line B represents the rst two straight segments P
0
P
1
and P
1
P
2
of line A, and the second straight segment
P
2
P
3
of line B represents the third straight segment P
2
P
3
of line A. The inclination angle of each straight
segment of line B should compare with that of its original representation in line A.
For inclination angle comparison, the inclination angle of line A is transformed as follows. We express line
A in the form y f x and consider the transformation
r
_
x
1
x
0

1 f
0
x
_ _
2
_
dx
_
x
2
x
1

1 f
0
x
_ _
2
_
dx
_
x
x
i

1 f
0
x
_ _
2
_
dx
_
x
1
x
0

1 f
0
x
_ _
2
_
dx
_
x
2
x
1

1 f
0
x
_ _
2
_
dx
_
x
n
x
n1

1 f
0
x
_ _
2
_
dx
, (1)
where x
0
; f x
0
is the starting endpoint of the line, x
n
; f x
n
is the ending endpoint of the line having n
explicit points (including endpoints and nodes), x
j
; f x
j
is the jth explicit point of the line, and x; f x is a
point on the line that lies between x
i
; f x
i1
and x
i1
; f x
i1
. The denominator in this transformation is
equal to the total length along the line (called arc length in mathematics), and the numerator is the arc length
from the starting endpoint to the point x; f x . By using the transformation, any point on the line can be
expressed as a function of r: x g
1
r and y g
2
x . Putting them into the inclination angle y x; y of line A,
we obtain the inclination angle in the form of
~
y r (Fig. 2). The domain of this inclination angle function is all
real numbers between 0 and 1, and the range is all angles between 01 and 3601.
To compare individual straight segments of line B with their corresponding original representations (each of
which contains some arcs of line A), the inclination angle of line B is dened as follows: for r 2 0; 1 ,
~
y
B
r is
the inclination angle for the jth straight segment of line B if Ax; Ay g
1
r ; g
2
r
_ _
is a point on the arc of A
that is the original representation of the jth straight segment of line B.
An average shape dissimilarity measure is thus:
Average_d
y
A; B
_
1
r0
AngDiff
~
y
A
r ;
~
y
B
r
_ _
dr (2)
where for a; b 2 0

; 360

, AngDiff a; b is dened as the minimum angle difference between a and b: when the
absolute difference a b

p180

, AngDiff a; b a b

; when a b

4180

, AngDiff a; b 360


a b

(Fig. 3).
If both lines A and B are composed of straight segments, the average shape dissimilarity measure is a
discrete function:
Average_d
y
A; B

p
AngDiff
~
y
A
r
p
_ _
;
~
y
B
r
p
_ _ _ _
Dr
p
(3)
ARTICLE IN PRESS
Fig. 2. Inclination angle for polyline and curved arc of Fig. 1, where s
i
represents length of the ith straight segment of polyline, and s
0j
represents arc length or length of arc P
0
P
j
along curved arc.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 465
where Dr
p
is the weight factor, dened as the difference between two ratios, r
p
r
p1

, and r
p
is the cumulative
ratio by which one line is partitioned and which is also used to partition the other line. Fig. 4 shows four
straight segments denoted byP
1
P
2
, P
2
P
3
, P
3
P
4
, and P
5
P
6
. If the vertices of the rst three straight segments are
mapped onto straight segment P
5
P
6
, there exist points P
0
1
, P
0
2
, P
0
3
, and P
0
4
on P
5
P
6
such that the following
conditions hold: P
0
1
P
5
, P
0
4
P
6
,
Len P
0
1
P
0
2
_ _
Len P
0
1
P
0
2
_ _
Len P
0
2
P
0
3
_ _
Len P
0
3
P
0
4
_ _
Len P
1
P
2

Len P
1
P
2
Len P
2
P
3
Len P
3
P
4

and
Len P
0
1
P
0
2
_ _
Len P
0
2
P
0
3
_ _
Len P
0
1
P
0
2
_ _
Len P
0
2
P
0
3
_ _
Len P
0
3
P
0
4
_ _ _ _
Len P
1
P
2
Len P
2
P
3

Len P
1
P
2
Len P
2
P
3
Len P
3
P
4

,
where Len represents the length of a segment.
The average shape dissimilarity measure given in Eq. (2) provides an average of the shape dissimilarity
values. The shape dissimilarity value at any point on the simplied line B satises the following condition:
d
y
A; B AngDiff
~
y
A
r ;
~
y
B
r
_ _
pMax_d
y
A; B max
r
AngDiff
~
y
A
r ;
~
y
B
r
_ _
. (4)
The right-hand side of the inequality in Eq. (4), Max_d
y
A; B, is an upper bound of the shape dissimilarity
values between lines A and B.
2.2. A measure for one xed line and one randomly perturbed line
According to the notion of a xed line in Section 2.1, a xed line is interpreted as a line without positional
error or uncertainty (or the true line). A randomly perturbed line is a line with positional error or uncertainty,
the equation for which has random variable(s) assumed to follow specic probability distribution(s).
When one line is xed and the other line contains an error, calculating the average shape dissimilarity value
by substituting the xed line and the mean position of the randomly perturbed line (where the mean position
can be the expectation or the least-squares estimate of the randomly perturbed line) for A and B in Eq. (2) or
(3) yields an estimate of the average shape dissimilarity value, under the assumption that the mean position of
the randomly perturbed line is accurate or equal to the statistical population mean. The average dissimilarity
value becomes very misleading when the assumption is not valid. Nevertheless, given a predened acceptable
ARTICLE IN PRESS
0

( ) , AngDiff
0

( ) , AngDiff
Fig. 3. Minimum angle difference between a and b.
1
P
2
P
3
P
4
P
5
P
6
P
Map points onto
segment
1
P
2
P
3
P
4
P
5 1
' P P =
6 4
' P P =
'
3
P
'
2
P
Fig. 4. Representation of one line mapped onto another line.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 466
level, we can conclude that the randomly perturbed line is similar to the xed line in terms of shape if the
maximum value of the average shape dissimilarity measure is greater than the acceptable level.
An analytical method is presented here to estimate the average shape dissimilarity measure under
uncertainty. The analytical method has the following restriction: the original line is composed of straight
segments and its vertices are inside their statistical condence regions or their surveying error ellipses. The
condence regions or error ellipses are created under the assumption that measurement errors at the vertices
are bivariate normally distributed. For the original curved line, it should be split into straight segments.
When the number of split straight segments is sufciently large, these segments eventually approximate the
curved line.
In the analytical method, the original line (with measurement errors) is considered as a randomly perturbed
line. The error ellipse for each vertex of the original line is a region that contains the least-squares estimate
with a specic condence if it is centered at the true position (which is always unknown), and is a condence
region if it is centered at the least-squares estimate of the position. Both the error ellipse and the condence
region designate precision regions of different probabilities. In the problem of the average shape dissimilarity
measure under uncertainty in the following discussion, the center of each error ellipse is considered as the
least-squares estimate of a vertex of the original line. If the true positions of all vertices of the original line are
given, the average shape dissimilarity measure should be determined according to the approach given in
Section 2.1. Moreover, the xed line refers to the simplied line (the line composed of straight segments and
derived from the simplication). The simplied line is a nal representation of the original line. Therefore,
the shape of the simplied line is directly compared with that of the original line with measurement errors.
This shape comparison shows the shape difference between the nal representation on the map and the
exact representation in the world.
We now elaborate on the analytical method. With reference to Fig. 5, a randomly perturbed straight
segment C represents the original line of two vertices cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
, and the least-squares estimate
of the original line is the solid line with two error ellipses at its endpoints. The error ellipses for the two vertices
of the randomly perturbed straight segment C (having dashed lines as boundaries in the gure) are derived
from their covariance matrices:
cx
0
cx
0

2
s
2
cx0
2r
cx0;cy0
cx
0
cx
0
cy
0
cy
0

s
cx0
s
cy0

cy
0
cy
0

2
s
2
cy0
1 r
2
cx0;cy0
_ _
w
2
p
1 a ;
cx
1
cx
1

2
s
2
cx1
2r
cx0;cy0
cx
1
cx
1
cy
1
cy
1

s
cx1
s
cy1

cy
1
cy
1

2
s
2
cy1
1 r
2
cx1;cy1
_ _
w
2
p
1 a ;
_

_
(5)
where the least-squares estimates of cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
are cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
respectively; the
correlation coefcient of cx
0
and cy
0
is r
cx0;cy0
; the correlation coefcient of cx
1
and cy
1
is r
cx1;cy1
; their
covariance matrices are
s
2
cx0
r
cx0;cy0
s
cx0
s
cy0
r
cx0;cy0
s
cx0
s
cy0
s
2
cy0
_ _
ARTICLE IN PRESS
( )
0 0
, cy cx
( )
1 1
, cy cx
tangent1
Segment C obtained
from least squares
estimate
tangent4
tangent3
tangent2
Fixed Segment D
( )
0 0
, dy dx
( )
1 1
, dy dx
Fig. 5. Randomly perturbed straight segment C that has two disjoint error ellipses around its vertices.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 467
and
s
2
cx1
r
cx1;cy1
s
cx1
s
cy1
r
cx1;cy1
s
cx1
s
cy1
s
2
cy1
_ _
;
and w
2
p
a is the upper 100 (1a)th percentile of a w
2
distribution with p degrees of freedom and p 2. When
a 0.394, Eq. (5) gives the standard error ellipse. The other solid straight segment in the gure is a xed
straight segment D of two vertices, dx
0
; dy
0
_ _
and dx
1
; dy
1
_ _
, representing a simplied line.
The maximum shape dissimilarity measure in the case of one xed straight segment and one randomly
perturbed straight segment can be considered as a non-linear programming problem an objective function
with all non-linear constraints:
MaxAngDiff
~
y
C
;
~
y
D
_ _
Maximum
cx
0
;cy
0
and cx
1
;cy
1

d
y
C; D
subject to cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
fall inside their corresponding error ellipses;
(6)
where d
y
C; D AngDiff
~
y
C
;
~
y
D
_ _
, and
~
y
C
and
~
y
D
represent the inclination angle of straight segments C and
D, respectively.
Instead of solving this difcult non-linear programming problem, we estimate the function
MaxAngDiff
~
y
C
;
~
y
D
_ _
based on common tangents to the two error ellipses for cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
.
2.2.1. Case 1: the two error ellipses neither overlap nor touch
The four common tangents to the two error ellipses are represented by dotted lines in Fig. 5 and are called
tangent1, tangent2, tangent3, and tangent4, where the arrow shows the direction of each tangent. Any two of
the common tangents must not be in opposite directions, so the difference between the inclination angles of the
two tangents should not be greater than 1801. In this example, the inclination angles of the rst three tangents
are in the rst quadrant in a Cartesian coordinate system (0o
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
o90

) and the
inclination angle of tangent4 is in the fourth quadrant (270

o
~
y
Tangent4
o360

), while the inclination angle


of the straight segment C obtained from the least-squares estimate is in the rst quadrant and is smaller
than the maximum inclination angle of tangent1, tangent2, and tangent3. It is noted that any line passing
through the two error ellipses must have the inclination angle
~
y
C
smaller than the maximum value of the
inclination angles of tangent1, tangent2, and tangent3, or greater than the inclination angle of tangent4,
~
y
C
oMax
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
_ _
or
~
y
C
4
~
y
Tangent4
. This inclination angle range of the randomly perturbed
straight segment C is then compared with the inclination angle
~
y
D
of the xed straight segment D, which is
smaller than Max
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
_ _
in the rst quadrant (refer to Fig. 6). The maximum difference
in the inclination angle between the randomly perturbed straight segment C and the xed straight segment D is
the maximum of the absolute differences between
~
y
D
and Max
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
_ _
and between
~
y
D
and
~
y
Tangent4
:
MaxAngDiff
~
y
C
;
~
y
D

max
~
y
D
Max
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
_ _

;
_
360

~
y
D

~
y
Tangent4

_
. 7
The determination of MaxAngDiff
~
y
C
;
~
y
D
_ _
is based on the angle interval I
y
containing
~
y
Tangent i
for all
i 2 1; 2; 3; 4 f g, and the comparison between this angle interval and
~
y
D
. The width of the angle interval should
not exceed 1801, since it has been mentioned that the difference between the inclination angles of any two
tangents should not be greater than 1801. The lower and upper bounds of the angle interval equate to the
minimum and maximum of
~
y
Tangent i
for all i 2 1; 2; 3; 4 f g, respectively, if the difference between the minimum
and maximum is not greater than 1801 (refer to Fig. 7). Otherwise, the inclination angle interval is the union of
two subintervals: the angle between 01 and the maximum of the tangent inclination angles not greater than
1801, 0; max
i
~
y
Tangent i
~
y
Tangent i
p180

_ _
_ _
, and the angle between the minimum of the tangent inclination
ARTICLE IN PRESS
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 468
angles greater than 1801, min
i
~
y
Tangent i
~
y
Tangent i
4180

_ _
; 360

_ _
. Fig. 6 shows an example of this case. The
algorithm for computing the function MaxAngDiff
~
y
C
;
~
y
D
_ _
is as follows:
Step 1:
Compute MinTangentAngle min
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
;
~
y
Tangent4
_ _
and
MaxTangentAngle max
~
y
Tangent1
;
~
y
Tangent2
;
~
y
Tangent3
;
~
y
Tangent4
_ _
.
Step 2:
If MaxTangentAngle MinTangentAngle

p180

then I
y
MinTangentAngle; MaxTangentAngle
else MaxTangentAngle max
i
~
y
Tangent i
~
y
Tangent i
p180

_ _
MinTangentAngle min
i
~
y
Tangent i
~
y
Tangent i
4180

_ _
I
y
0; MaxTangentAngle [ MinTangentAngle; 360

.
Step 3:
If
~
y
D
2 I
y
or
~
y
D
180

eI
y
and
~
y
D
180

eI
y
_ _
then Difference0 AngDiff y
D
; MinTangentAngle
Difference1 AngDiff y
D
; MaxTangentAngle
if Difference04180

then Difference0 360

Difference0
if Difference14180

then Difference1 360

Difference1
MaxAngDiff
~
y
C
;
~
y
D
_ _
max Difference0; Difference1
_ _
else MaxAngDiff
~
y
C
;
~
y
D
_ _
180

.
2.2.2. Case 2: the two error ellipses touch at one point
When the error ellipses for cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
touch at one point, there are three common tangents to
the two error ellipses: tangent1, tangent2, and tangent3. With reference to Fig. 8, tangent1 and tangent2 are
monodirectional and tangent3 is bidirectional. The inclination angles of the rst two monodirectional tangents
ARTICLE IN PRESS
tangent1
tangent4
tangent3
tangent2
Fixed Segment D
0
270
180
90

[ ] [ ) = 360 ,
~ ~
, 0
4 3 Tangent Tangent
I

Fig. 6. Inclination angles of four tangents for C and D.


0
270
180
tangent1
tangent3
tangent2
90

[ ]
4 2
~
,
~
Tangent Tangent
I

=
tangent4
Fixed segment
Fig. 7. Inclination angle interval I
y
for four tangents in case the difference among inclination angles of tangents is not greater than 1801.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 469
are denoted by
~
y
Tangent1
and
~
y
Tangent2
. Line tangent3 has two inclination angles denoted by
~
y
Tangent3
(where
0p
~
y
Tangent3
o180

) and
~
y
Tangent4
180


~
y
Tangent3
. In a manner similar to the above case, the inclination angle
interval I
y
containing
~
y
Tangent i
for all i 2 1; 2; 3; 4 f g is estimated; however, the width of the inclination angle
interval in this case is 1801 because tangent3 is birectional. The inclination angle interval is either
~
y
Tangent3
;
~
y
Tangent4
_ _
or 0

;
~
y
Tangent3
_ _
[
~
y
Tangent4
; 360

_ _
, which must contain
~
y
Tangent1
and
~
y
Tangent2
:
Step 1:
If
~
y
Tangent1
;
~
y
Tangent2
; 2
~
y
Tangent3
;
~
y
Tangent4
_ _
then I
y

~
y
Tangent3
;
~
y
Tangent4
_ _
else I
y
0;
~
y
Tangent3
_ _
[
~
y
Tangent4
; 360

_ _
.
In the nal step, the inclination angle interval is compared with
~
y
D
to nd MaxAngDiff
~
y
C
;
~
y
D
_ _
(as shown
in Step 3 in the previous case).
2.2.3. Case 3: the two error ellipses overlap
For the case in which two error ellipses for cx
0
; cy
0
_ _
and cx
1
; cy
1
_ _
overlap, the function
MaxAngDiff
~
y
C
;
~
y
D
_ _
is 1801.
Replacing the function AngDiff
~
y
A
;
~
y
B
_ _
of Eqs. (2)(4) with the function MaxAngDiff
~
y
A
;
~
y
B
_ _
gives the
shape dissimilarity measures under uncertainty, including the average of the maximum shape dissimilarity
measure in the continuous case in Eq. (8), the average of the maximum shape dissimilarity measure in the
discrete case in Eq. (9), and the upper bound of the maximum shape dissimilarity measure in Eq. (10):
Average_d
y
A; B
_
1
r0
MaxAngDiff
~
y
A
r ;
~
y
B
r
_ _
dr, (8)
Average_d
y
A; B

r
p
MaxAngDiff
~
y
A
r
p
_ _
;
~
y
B
r
p
_ _ _ _
Dr
p
, (9)
d
y
A; B pMax_d
y
A; B max
r
MaxAngDiff
~
y
A
r ;
~
y
B
r
_ _
. (10)
3. A simple example
The above section gives the analytical model used to assess shape dissimilarity between a line and its
simplied version, where the shape dissimilarity degree relates to the generalization error associated with line
simplication and the propagation of the positional error from the original line. This section aims to illustrate
computation of the average shape dissimilarity values for the least-squares estimate of the original line and the
original line with positional error, respectively.
ARTICLE IN PRESS
( )
0 0
, cy cx
( )
1 1
, cy cx
tangent1
Segment C obtained
from least squares
estimate
tangent3
tangent2
Fixed Segment D
( )
0 0
, dy dx
( )
1 1
, dy dx
Fig. 8. Randomly perturbed straight segment C that has two vertices with error ellipses touching each other.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 470
Fig. 9 shows a non-curved polyline E with three vertices, ex
0
; ey
0
_ _
, ex
1
; ey
1
_ _
, and ex
2
; ey
2
_ _
, with least-
squares estimates of ex
0
; ey
0
_ _
0; 0, ex
1
; ey
1
_ _
10; 1, and ex
2
; ey
2
_ _
20; 0 (in mm on a 1:500 scale
map, for example) and their covariance matrices of
0:02 0:002
0:002 0:03
_ _
;
0:03 0:001
0:001 0:02
_ _
; and
0:01 0:000
0:000 0:01
_ _
where the data are articial. Its simplied line E
0
is of vertices ex
0
0
; ey
0
0
_ _
0; 0 and ex
0
1
; ey
0
1
_ _
20; 0 .
To compute the average shape dissimilarity values, the weight factor r
p
in Eqs. (3) and (9) is determined rst.
It is found that in this example the cumulative ratios r
0
, r
1
, and r
2
for partitioning E are 0, 0.5, and 1, and so
the cumulative ratio r
p
in Eqs. (3) and (9) is suitable.
The next step is to compute functions AngDiff
~
y
E
r
p
_ _
;
~
y
E
0 r
p
_ _ _ _
and MaxAngDiff
~
y
E
r
p
_ _
;
~
y
E
0 r
p
_ _ _ _
. The
inclination angle y of a straight segment between any two points x
0
; y
0
_ _
and x
1
; y
1
_ _
is expressed as
tan y
y
1
y
0
x
1
x
0
(11)
where 0

pyo360

. As E
0
is supposed to be xed in the analytical model, function
~
y
E
0
s
p
_ _
is 01 for all s
p
. For
the original polyline E, the inclination angle of its least-squares estimate is
~
y
E
r
p
_ _

5:711

for r
p
o0:5;
354:289

for 0:5pr
p
o1:
_
Then, the function AngDiff
~
y
E
;
~
y
E
0
_ _
is
AngDiff
~
y
E
;
~
y
E
0
_ _

5:711

0 for r
p
o0:5;
360

354:289

j j for 0:5pr
p
o1;
_
5:711

.
Therefore, both the average and the upper bound of the shape dissimilarity values at all points on E
0
, under
the assumption that the least-squares estimate is free from error, are 5.7111.
The function MaxAngDiff
~
y
E
;
~
y
E
0
_ _
is derived based on the inclination angle interval of E and the
inclination angle of E
0
. The inclination angle interval is estimated based on the error ellipse around each vertex
of the line. According to Section 2.1, the inclination angle
~
y
E
r
p
_ _
falls inside the inclination angle interval I
y
:
I
y

3:915

; 7:507

for 0pr
p
o0:5;
352:909

; 355:668

for 0:5pr
p
o1:
_
The maximum angle difference between I
y
r
p
_ _
and
~
y
E
0
r
p
_ _
is
MaxAngDiff
~
y
E
r
p
_ _
;
~
y
E
0 r
p
_ _ _ _

7:507

j j for 0pr
p
o0:5;
360

352:909

j j for 0:5pr
p
o1;
_

7:507

for 0pr
p
o0:5;
7:091

for 0:5pr
p
o1:
_
According to Eqs. (9) and (10), the average shape dissimilarity value for the original line with positional
error is 7.5071 (0.50)+7.0911 (10.5) 7.2991 and the shape dissimilarity value at any point on E
0
is not
greater than 7.5071 (the upper bound of the shape dissimilarity values at all points on E
0
). This average
ARTICLE IN PRESS
(0,0)
(10,1)
(20,0)
' E
E
Fig. 9. An original polyline E of three vertices and its simplied version E
0
(not to scale).
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 471
measure, which also considers the error propagation from the original line, is greater than the average shape
dissimilarity value for the least-squares estimate of the original line (5.7111).
4. A real case study on comparing the measures
This section numerically compares the average shape dissimilarity measure in the certain and uncertain
cases for different simplied versions of a coastline. The simplied lines are generated from three automatic
line-simplication algorithms: the DouglasPeucker algorithm, the perpendicular distance routine, and the
Nth point routine. Fig. 10 shows the simplied lines of 500 points, which are approximately ordered from
higher similarity (Fig. 10b) to lower similarity (Fig. 10d) relative to the original coastline (Fig. 10a). The
average shape dissimilarity value for each simplied line is tabulated in Table 1. This table also gives a similar
result to Fig. 10. However, the average shape dissimilarity value for the original coastline with uncertainty is
greater than that for the least-squares estimate of the original coastline. Therefore, the uncertainty in the
original coastline needs to be considered in the simplication problem, especially when a predened acceptable
level is given to ensure the accuracy of the simplied line.
5. Discussion of the shape dissimilarity measures
The average shape dissimilarity value gives an average inclination angle difference between the original line
and its simplied version. It generally describes the shape difference between the two lines. This shape
dissimilarity value has been computed in two cases in the previous examples. One is the uncertain case in
which the generalization error associated with line simplication and the propagation of positional error from
the original line are considered. In this case, the original line and its simplied version have dissimilar shapes if
the average value is greater than a predened acceptable level. In the certain case, in which the positional
error for the original line is ignored, the average shape dissimilarity value equates to the inclination angle
difference between the simplied line and the least-squares estimate of the original line. If the average in this
case is also greater than the predened acceptable level, the same conclusion as the average shape dissimilarity
measure in the uncertain case will be obtained.
ARTICLE IN PRESS
Fig. 10. (a) Original representation of a coastline and a set of 500-point simplied lines of coastline generated by (b) the DouglasPeucker
algorithm, (c) perpendicular distance routine and (d) Nth point routine.
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 472
However, the average shape dissimilarity value in the uncertain case and that in the certain case may
not always lead to the same conclusion. Given the predened acceptable level between the average shape
dissimilarity values in the uncertain and certain cases, the conclusion drawn from the uncertain case is
the original line and its simplied version have dissimilar shapes, while for the certain case the conclusion
is the original line and its simplied version have a similar shape. In such a situation, the simplied line is
possibly dissimilar to the original line and possibly similar to the original line.
The average shape dissimilarity measure shows the degree of shape dissimilarity between the original line
and its simplied data for three possibilities:
(a) high dissimilarity if the average shape dissimilarity values for both the uncertain and certain cases are
greater than the predened acceptable level;
(b) possible dissimilarity if the average shape dissimilarity value in the uncertain case is greater than the
predened acceptable level, but the value in the certain case is smaller; and
(c) low dissimilarity if the average shape dissimilarity values in both cases are smaller than the predened
acceptable level.
The average shape dissimilarity measure can be used to control a line simplication process. A higher
dissimilarity degree infers that the positional error in the simplied line is greater, resulting from propagation
of the positional error for the original data and the generalization error induced by line simplication. In the
case of high dissimilarity, the original data are probably oversimplied. The GIS user may consider
simplifying the original line again with different parameters in the line simplication algorithm, or redening
the acceptable level such that the shape dissimilarity values in the uncertain and certain cases are lower
than the acceptable level. For the case of possible dissimilarity, the GIS user should determine whether the
simplied data are applicable for the GIS application based on visual comparison, or consider redening
the acceptable level. The ideal case is low dissimilarity, implying that the simplied version can represent the
original data well with fewer characteristic points.
6. Conclusion
This paper presents an analytical model for assessing the quality of a map derived by line simplication. The
purpose of line simplication is to represent geographical objects with a few characteristic points. A map
derived by line simplication is an abstract of the original map, so the quality of this map is worse than that of
the original map. The quality of the simplied map can be determined from the shape dissimilarity measure
proposed in this paper.
The proposed shape dissimilarity measure is an objective and automatic quantitative measure of the degree
of dissimilarity between the original line and its simplied version generated by line simplication. By
imitating human analysis of dissimilarity, it can potentially replace the traditional method of assessing
dissimilarity subjectively carried out by humans. The shape dissimilarity measure is also an analytical tool for
determining whether a simplied line is appropriate for further applications to GIS problems based on a user-
dened acceptable level.
ARTICLE IN PRESS
Table 1
Average shape dissimilarity value for individual simplied lines generated from different line simplication algorithms
Line simplication algorithm Average_d
y
Least-squares estimate of original coastline Original coastline with measurement errors
DouglasPeucker algorithm 13.712 19.976
Perpendicular distance routine 27.477 31.431
Nth point routine 28.039 33.197
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 473
Existing error measures for line simplication mainly discuss the generalization error induced by line
simplication. There are a few research works on assessing a combination of the generalization error and the
propagation of positional error in the original line. These studies proposed more comprehensive dissimilarity
measures based on individual size and distance measurements. The analytical model presented here proposes
another comprehensive dissimilarity measure based on shape measurement to assess the quality of the
simplied map by considering both error sources. Furthermore, the example given shows that an estimate of
the quality of the simplied map, under the assumption that there is no positional error in the original map, is
smaller than the quality estimate for the result from the analytical model. The example also shows that the
proposed measure can indicate different degrees of dissimilarity between the original line and its simplied
version.
The proposed analytical model classies the degree of dissimilarity between the original data and the
simplied version into high dissimilarity, possible dissimilarity, and low dissimilarity according to a user-
dened acceptable level, which can be varied for different spatial problems. These three possibilities can show
how well the simplied data represent the original line and let the data user determine whether the original line
should be simplied again with another line simplication algorithm or the user-dened acceptable level
should be redened.
The performance of line simplication algorithms has been evaluated for over two decades. However,
positional errors in the original map have not been considered in these evaluations. In further studies, the
analytical model presented in this paper will be implemented for a range of line simplication algorithms to
assess their performance.
In conclusion, the proposed shape dissimilarity measure is not only applicable in assessing positional error
for line simplication in cartography and GIS, but is also potentially applicable to object-based shape
comparison or recognition (especially for a linear object) in other elds, such as computer vision or remote
sensing. In computer vision, for example, a shape is compared with another shape that is found in an image
database. If these two shapes are dissimilar, the vision system will report a mismatch and return a dissimilarity
measure reecting how poor that mismatch is. To provide a more comprehensive shape comparison, therefore,
it is necessary to further develop this study to measure the shape dissimilarity between two polygonal objects.
Acknowledgements
The authors thank the Surveying and Mapping Ofce, the Government of Hong Kong Special
Administrative Region for providing the map. The work described in this paper was supported by the
Hong Kong Polytechnic University (Project no. G-YX27).
References
Alesheikh, A.A., (1998). Modeling and managing uncertainty in object-based geospatial information system. Ph.D. Dissertation, The
University of Calgary, Alberta, Canada, 188pp.
Amato, N.M., Bayazit, O.B., Dale, L.K., Jones, C., Vallejo, D., 1998. Choosing good distance metrics and local planners for probabilistic
roadmap methods. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Leuven, Belgium,
pp. 630637.
Amrhein, C.G., Grifth, D.A., (1994). Errors in spatial databases: a summary of results from several research projects. In: Proceedings of
the International Symposium on the Spatial Accuracy of National Data Bases, pp. 214226.
Blakemore, M., 1984. Generalization and error in spatial databases. Cartographica 21 (2/3), 131139.
Butteneld, B.P., 1985. Treatment of the cartographic line. Cartographica 22 (2), 126.
Butteneld, B.P., 1991. A rule for describing line feature geometry. In: McMaster, R.B., Butteneld, B.P. (Eds.), Map Generalization:
Making Rules for Knowledge Representation. Longman, Harlow, pp. 150171.
Caspary, W., Scheuring, R., 1992. Error-bands as measures of geometrical accuracy. In: Proceedings of the Third European Conference on
GIS. Munich, Germany, pp. 227233.
Cheung, C.K., Shi, W.Z., 2004. Estimation of the positional uncertainty in line simplication in GIS. The Cartographic Journal 41 (1),
3745.
Cheung, T.C.K., (2003). Assessing positional and modeling uncertainties in vector-based spatial processes and analyses in geographical
information systems. Ph.D. Dissertation, The Hong Kong Polytechnic University, Hong Kong, 257pp.
ARTICLE IN PRESS
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 474
Chrisman, N.R., 1982. A theory of cartographic error and its measurement in digital data base. In: Proceedings of the Auto Carto 5.
Crystal City, VA, pp. 159168.
Dreyfus, H.L., Dreyfus, S.E., 1986. Mind Over Machine: the Power of Human Intuition and Expertise in the Era of the Computer. Free
Press, New York 231pp.
Dubuisson, M., Jain, A.K., 1994. A modied Hausdorff distance for object matching. In: Proceedings of the International Conference on
Pattern Recognition. Jerusalem, Israel, pp. 566568.
Dutton, G., 1992. Handling positional uncertainty in spatial databases. In: Proceedings of the fth International Symposium on Spatial
Data Handling. South Carolina, USA, pp. 460469.
Easa, S.M., 1995. Estimating line segment reliability using Monte Carlo simulation. Surveying and Land Information Systems 55 (3),
136141.
Goodchild, M.F., Hunter, G.J., 1997. A simple positional accuracy measure for linear features. International Journal of Geographical
Information Science 11, 299306.
Guienko, G., Doytsher, Y., 2003. Geographic information system data for supporting feature extraction from high-resolution aerial and
satellite images. Journal of Surveying Engineering 129 (4), 158164.
Huttenlocher, D.P., Klanderman, G., Rucklidge, W., 1993. Comparing images using the Hausdorff distance. IEEE Transactions on
Pattern Analysis and Machine Intelligence 15 (9), 850863.
Jasinski, M.J., 1990. Comparison of complexity measures for cartographic lines. NCGIA Technical Report 90-1.
Jenks, G.F., 1985. Linear simplication: how far can we go? In: Proceedings of the 10th Annual Meeting. Canadian Cartographic
Association, Fredericton, NB.
Joa o, E.M., 1998. Causes and Consequences of Map Generalization. Taylor & Francis, London 266pp.
Little, A.R., (1989). An evaluation of selected computer-assisted line simplication algorithms in the context of map accuracy standards.
In: Proceedings of the 1989 ASPRS/ACSM Annual Conventions, American Society for Photogrammetry and Remote Sensing and
American Congress on Surveying and Mapping, vol. 5, pp. 122132.
McMaster, R.B., 1986. A statistical analysis of mathematical measures for linear simplication. The American Cartographer 13 (2),
103117.
McMaster, R.B., 1987a. Automated line generalization. Cartographica 27, 74111.
McMaster, R.B., 1987b. The geometric properties of numerical generalization. Geographical Analysis 19, 330346.
Perkal, J., 1966. On the length of empirical curves: discussion paper 10. Michigan Inter-University Community of Mathematical
Geographers, Ann Arbor.
Rote, G., 1991. Computing the minimum Hausdorff distance between two point sets on a line under translation. Information Processing
Letters 38, 123127.
Shahriari, N., Tao, V., 2002. Minimizing positional errors in line simplication using adaptive tolerance values. In: Proceedings of the
Symposium on Geospatial Theory. Processing and Applications, Ottawa.
Shi, W.Z., (1994). Modeling Positional and Thematic Uncertainty in Integration of GIS and Remote Sensing. ITC Publication 22.
Singh, R., Papanikolopoulos, N.P., 2000. Planar shape recognition by shape morphing. Pattern in Recognition 33 (10), 16831699.
Srisuk, S., Tamsri, M., Fooprateepsiri, R., Sookavatana, P., Sunat, K., 2003. A new shape matching measure for nonlinear distorted
object recognition. In: Proceedings of the VIIth Digital Image Computing: Techniques and Applications. Sydney, Australia,
pp. 339348.
Stanfel, L.E., Stanfel, C., 1993. A model of the reliability of a line connecting uncertain points. Surveying and Land Information Systems
53 (1), 4952.
Veregin, H., 2000. Quantifying positional error induced by line simplication. International Journal of Geographical Information Science
14, 113130.
Veregin, H., Dai, X., 1999. Minimizing positional error induced by line simplication. In: Proceedings of the International Symposium on
Spatial Data Quality. The Hong Kong Polytechnic University, Hong Kong.
White, E.R., 1985. Assessment of line-generalization algorithms using characteristic points. The American Cartographer 12, 1727.
ARTICLE IN PRESS
C.K. Cheung, W.Z. Shi / Computers & Geosciences 32 (2006) 462475 475

S-ar putea să vă placă și