Documente Academic
Documente Profesional
Documente Cultură
Abstract
Several bottom-up, context-free, algorithms for the detection of regions of interest in pictures were analyzed,
evaluated and compared. Our aim is to develop new criteria related to human performance for these algorithms and
perhaps to be able to design more biologically plausible perceptive machines. We introduce the statistical and com-
putational platform we have been using to compare sequences of regions of interest, both biological (eye movements)
and arti®cial (algorithms). Ó 1998 Elsevier Science B.V. All rights reserved.
0167-8655/98/$ ± see front matter Ó 1998 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 8 6 5 5 ( 9 8 ) 0 0 0 7 7 - 4
1038 C.M. Privitera, L.W. Stark / Pattern Recognition Letters 19 (1998) 1037±1043
1981; Stark and Choi, 1996). The ROIs of our s
i1 ; j1 ;
i2 ; j2
algorithms are identifying bear analogy to these
multiple glances or ®xations in human vision. Gr
d
i1 ; j1 ;
i2 ; j2 j cos
h1 ÿ h2 j:
2
Our overall purpose is to develop ecient image The ®rst factor Gr is a gaussian of ®xed vari-
analysis algorithms, to compare their properties ance, r 3 pixels and d(á) represents the distance
with one another and with human ®xations. We function. The second factor represents a simpli®ed
have analyzed image processing algorithms notion of symmetry: h1 and h2 correspond to the
broadly selected from the above taxonomy (Stark angles of the gray-level intensity gradient of the
and Privitera, 1997). In this paper we compare two pixels (i1 , j1 ) and (i2 , j2 ). The factor achieves
results from six of them and we introduce the the maximum value when the gradients of the two
statistical and the algorithmic background we have points are oriented in the same direction. The
been exploiting. gaussian represents a distance weight function
which introduces localization in the symmetry
evaluation. Our de®nition of symmetry was con-
sequently based on the orientation correspon-
2. ROI detecting algorithms
dences of gradients around the centered point
(Reisfeld et al., 1995). Alternatively, a normaliza-
The useful information content of a generic
tion of the axial quadratic moment could be used
picture can be abstracted by dierent image pa-
instead to compute the symmetry transform (Di
rameters which in turn can be identi®ed by rele-
Gesu and Valenti, 1995).
vant image processing algorithms. In this sense,
O, dierence in the gray-level orientation, a
applying algorithms to a picture means to map
statistical-type kernel, is analyzed in early visual
that image into dierent domains, where for each
cortices. Center-surround dierence is determined
domain a speci®c set of parameters is extracted.
®rst convoluting the image with four Gabor masks
After the image has been processed, only the loci
of angles 0°, 45°, 90° and 135°, respectively (see
of the local maxima from each domain are re-
also (Niebur and Koch, 1996)). For each pixels x,
tained; these maxima are then clustered in order to
y, the scalar result of the four convolutions are
yield a limited number of ROIs.
then associated with four unit vectors corre-
Six dierent algorithms were studied:
sponding to the four dierent orientations. The
S, symmetry transform,
orientation vector o
x; y is represented by the
O, dierence in the gray-level orientation,
vectorial sum of these four weighted unit vectors.
E, edges per unit area,
We de®ne the center-surround dierence trans-
F, center-surround, as in receptive ®elds of an-
form as follows:
imal vision,
N, entropy, O
x; y
1 ÿ o
x; y m
x;
yk yk;
o
x; ykkm
x;
C, Michaelson contrast.
S, symmetry, a structural approach, appears to be
3
a very prominent spatial relation. For each pixel
y is the average orientation vector
where m
x;
x,y of the image, we de®ne a local symmetry
evaluated within the neighborhood of 7 ´ 7 pixels.
magnitude S(x,y) as follows:
The ®rst factor of the equation achieves high val-
X ues for big dierences in orientation between the
S
x; y s
i1 ; j1 ;
i2 ; j2 ;
1
i1 ;j1 ;
i2 ;j2 2C
x;y
center pixel and the surroundings. The second
factor acts as a low-pass ®lter for the orientation
where C
x; y is the neighborhood of radius 7 of feature.
point x,y de®ned along the horizontal and vertical E, edges per unit area is determined by detecting
axis (C
x; y
x ÿ r; y; . . . ;
x; y; . . . ;
x r; y; edges in an image, using the canny extension of the
x; y ÿ r; . . . ;
x; y r) and s
i1 ; j1 ;
i2 ; j2 is sobel operator (Canny, 1986) and then congregating
de®ned by the following equation: the edges detected with a gaussian of r 3 pixels.
C.M. Privitera, L.W. Stark / Pattern Recognition Letters 19 (1998) 1037±1043 1039
Fig. 1. Picture transformation. The picture transformations are a result of and provide descriptions of the action of each algorithm
(upper and lower right, E and S, respectively). Note, the untransformed picture in the larger left panel. Local maxima (circa 100) from
each transformation were then retained.
1040 C.M. Privitera, L.W. Stark / Pattern Recognition Letters 19 (1998) 1037±1043
Each of our six IP algorithms, of course, contrib- about two degrees, and similar in size to human
uted the intensity of its selected parameter to foveal spans. This ®nal selection of joined-ROIs
®nding the local maxima and the values of re- (Fig. 2) then enabled a distance-similarity metric,
sulting clustered ROI domains. Sp , to determine how close were the ROIs identi-
If we had used only image processing algo- ®ed by two algorithms. The individual source of
rithms and not the clustering procedure, we could the elements, that is, the original ROIs, used in
have selected nine local maxima directly and de- these ®nal interactive steps were preserved to il-
®ned them to be the ROI domains. Those selected lustrate the procedure (as circles and squares in
ROIs, however, might be much more closely Fig. 2, right panel).
spaced. Thus the clustering algorithm was actually
an eccentricity-weighting algorithm, where even 4.2. Similarity of sequences, Ss
lower local maxima that were eccentrically located
could be selected to form a domain. As mentioned above, ROIs were ordered by the
value assigned by the image processing algorithm
conceptually similar in some ways to the temporal
4. Similarity metrics ordering of human eye ®xations in a scanpath.
Two dierent ROI sequences, squares and circles,
4.1. Similarity of ROI loci, Sp Fig. 2, left upper and lower panels, corresponded
to two sets of ROIs from two of the IP algorithms.
Comparison of ®nal clusters of ROIs begun Then, joined-ROIs were ®nally ordered into strings
with taking two sets of ROIs, generated by two of ordered points. In Fig. 2, we have for example:
dierent algorithms and combining these two sets. stringS abcfeffgdc and stringF afbffdcdf . The
Then this double set of ROIs was clustered using a string editing similarity index Ss was de®ned by an
distance measure taken from a k-means pre-eval- optimization algorithm (Stark and Choi, 1996)
uation. This evaluation determined a region for with unit cost assigned to the three dierent op-
calling coincident any ROIs that were closer than erations deletion, insertion and shifting.
this distance and non-coincident for ROIs that were Dierent ROI sequences corresponding to two
further apart than this distance; the distance was sets of ROIs from two of the IP algorithms are
Fig. 2. Clustering algorithm. After the picture transformation, a clustering algorithm is used to reduce the high valued maxima (circa
100) to few (circa 10). These 10 were then ordered by value and the constructed ordered sequences (upper and lower left panel) were
then connected by vectors in analogy to human sequences of ®xations. The two set of ROIs are ®nally combined (right panel) into a
selected number of joined-ROIs which are used to de®ne distance measures between the two algorithms.
C.M. Privitera, L.W. Stark / Pattern Recognition Letters 19 (1998) 1037±1043 1041
often similar in location, Sp , indicating that two IP N. Our collection of algorithms could thus be
algorithms have chosen similar ROIs. However, sorted for similarities or for dierences in gener-
they are usually dissimilar in sequence, Ss , indi- ating ROIs. This is likely to be of value in selecting
cating that the two IP algorithms have assigned algorithms for dierent tasks.
orders of values or sequences of values that were The global Sp , G 0.23, represented the aver-
dissimilar. aged Sp for all algorithms and for the two dierent
pictures used; some of these might be related, but
4.3. Markov matrices, St most clearly have no reason to be so related. This
G value could be considered a bottom anchor for
We can also create, based on the above men- Sp ; a further bottom anchor was Ra, the random
tioned joined-ROIs, transition markov matrices, Sp , calculated for coincidence among all loci of the
[M1]s (Fig. 3, S left matrix and F right matrix). appropriate size.
The transition similarity index, St , is based upon The middle and lower left ordered matrices of
cross-correlation between the coecients of the similarity coecients show little coherence even
[M1]s. among the four selected algorithms. This provides
Thus, all of our comparisons between pairs of quantitative support for our conclusion that we
algorithms yielded three dierent indices of simi- can select independent algorithms that yet select
larity. For the example illustrated above we have similar ROI loci. These communally identi®ed loci
Sp 1, Ss 0.34 and St 0.09. have greater validity than those chosen by any one
algorithm. However, the sequences were dicult
4.4. Coherence of algorithms to recapture from one algorithm to another and
therefore our selected group of algorithms yielded
The similarity measures, Sp , Ss and St de®ned no more precise estimate of sequential coherence.
above, tell us how closely the ROIs identi®ed by
two algorithms resemble each other in locus, Sp , in
sequence, Ss , and in transition from one ROI to 5. Discussion
the next, St . These similarities have been arranged
into three ordered matrices (Fig. 4). Two advantages of our methodology are task
For Sp , (Fig. 4, upper matrix) enclosed within de®nition and metrics for similarities. We have set
the dotted box are six similarity values docu- precised task for the image processing algorithms
menting the high Sp similarity for four selected studied ± to identify loci of ROIs and, if possible,
algorithms E, O, S and F. The average value of the sequences or strings of loci of ROIs. This is a
similarity indices was L 0.62 for the selected more de®ned task than just processing images for
algorithms (right column, upper table). This can human subjective judgement ± undeniably human
be compared to the average value, L 0.4, of vision can compensate for many aspects of loss of
similarity indices for the entire group of six algo- information. The quantitative measure of similar-
rithms, including two disparate algorithms, C and ity of loci of ROIs, Sp , enabled the selection of
Fig. 3. Transition matrices. Transition matrices derived during the comparisons of the two algorithms shown in Fig. 2.
1042 C.M. Privitera, L.W. Stark / Pattern Recognition Letters 19 (1998) 1037±1043
Fig. 4. Similarity measures among algorithms. Cross-comparison values (left column) of the six algorithms, E,O,S,F,G,N, for the three
indices, Sp , Ss and St . Coecients range from 0 to 1. Averaged coecients are printed in adjoining table (right column). For Sp , in the
upper matrix (enclosed within the dotted box), are six similarity values documenting the high Sp similarity for four selected algorithms.
Note, that for the middle and lower matrices and tables, the selected group of algorithms and indeed, all the algorithms together, did
not show commonality of sequences, strings and transitions (see text).
algorithms that function for our task in a similar see Stark and Privitera, 1997). From this com-
fashion, even though they may have processed parison we may learn something more about top-
pictures in quite dierent manners. The fact that down, context-dependent, visual processing and
we found only low similarities of sequences and understand the structure and the nature of internal
transitions, Ss and St , indicates that selecting for representations upon which humans perception is
commonality of algorithms in forming sequences is based.
not yet possible. In other studies, beyond the scope of the pres-
A wide selection of algorithms provided an ent paper, we have applied our methodology to a
opportunity to study quantitatively their dieren- much richer collection of pictures, scenes and
ces and similarities in terms of the precise task. works of art. These range widely from natural and
This is of great interest as it indicates the general constructed landscapes and cityscapes, to groups
nature of an image and how it is processed either of persons and animals and objects, to single
by algorithms or by humans. We are concerned portraits then paintings (among them the two
that we might need to provide weighting coe- paintings for the data in this Pattern Recognition
cients for the dierent algorithms in order to op- Letters) and still lifes (Privitera and Stark, 1998, in
timize the predicting capabilities of the ensemble preparation). In Fig. 4, the table coecients fall
(Stark and Privitera, 1997). into typical ranges that we have obtained with a
We are in the process of comparing our algo- much larger collection of algorithms and a much
rithms with eye movement ®xations (Privitera and wider set of images from our extended corpus of
Stark, 1998, in preparation; for preliminary results pictures. However, we wish to direct the readers'
C.M. Privitera, L.W. Stark / Pattern Recognition Letters 19 (1998) 1037±1043 1043