Roi

Pattern Recognition Letters 19 (1998) 1037±1043
Evaluating image processing algorithms that predict regions of

interest
Claudio M. Privitera *, Lawrence W. Stark
Neurology and Telerobotics Units, University of California, 486 Minor Hall, UC Berkeley, CA 94704-2020, USA
Received 8 August 1997; received in revised form 20 April 1998
Abstract
Several bottom-up, context-free, algorithms for the detection of regions of interest in pictures were analyzed,
evaluated and compared. Our aim is to develop new criteria related to human performance for these algorithms and
perhaps to be able to design more biologically plausible perceptive machines. We introduce the statistical and com-
putational platform we have been using to compare sequences of regions of interest, both biological (eye movements)
and arti®cial (algorithms). Ó 1998 Elsevier Science B.V. All rights reserved.
Keywords: Regions of visual interest; Algorithm comparisons; Eye movements
1. Introduction example (Reed and Hans Du Buf, 1993; Haralick,

1979)). Firstly, structural approaches based on an
Machine vision often requires a selection assumption that images have detectable and rec-
mechanism whereby only a small subset of the ognizable primitives distributed according to some
input visual stimulus is analyzed in detail. This placement rules; examples are matched ®lters.
subset may be arranged into a number of loci, Secondly, statistical approaches based on statistical
classically called regions-of-interest (ROIs). These characteristics of the texture of the picture; ex-
ROIs can be determined using image processing amples are co-occurrence matrices and entropy
algorithms that analyze, for example, spatial fre- functions. Thirdly, model approaches that hy-
quency, texture conformation or other informative pothesize underlying processes for generation of
values of loci of the visual stimulus. Not only are local regions. These are analyzed on the basis of
the location of the ROIs of value, but also their speci®c parameters governing these generators:
sequences in terms of ordering of selection coe- examples are fractal descriptors.
cients can be usefully evaluated and compared. Human perception has been explained in part
Image processing algorithms may be classi®ed into by the scanpath theory that utilizes a top-down,
three principal approaches (for a survey, see for internal cognitive model of what we see, to control
not only our vision, but also to drive the sequences
of rapid eye movements and ®xations, or glances,
*
Corresponding author. Tel.: 510 642 5309; fax: 510 642 that so eciently travel over a scene or picture of
7196; e-mail: claudio@milo.berkeley.edu interest (Noton and Stark, 1969; Stark and Ellis,
0167-8655/98/$ ± see front matter Ó 1998 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 8 6 5 5 ( 9 8 ) 0 0 0 7 7 - 4
1038 C.M. Privitera, L.W. Stark / Pattern Recognition Letters 19 (1998) 1037±1043
1981; Stark and Choi, 1996). The ROIs of our si1 ; j1 ; i2 ; j2
algorithms are identifying bear analogy to these
multiple glances or ®xations in human vision. Gr di1 ; j1 ; i2 ; j2 j cosh1 ÿ h2 j: 2
Our overall purpose is to develop ecient image The ®rst factor Gr is a gaussian of ®xed vari-
analysis algorithms, to compare their properties ance, r 3 pixels and d(á) represents the distance
with one another and with human ®xations. We function. The second factor represents a simpli®ed
have analyzed image processing algorithms notion of symmetry: h1 and h2 correspond to the
broadly selected from the above taxonomy (Stark angles of the gray-level intensity gradient of the
and Privitera, 1997). In this paper we compare two pixels (i1 , j1 ) and (i2 , j2 ). The factor achieves
results from six of them and we introduce the the maximum value when the gradients of the two
statistical and the algorithmic background we have points are oriented in the same direction. The
been exploiting. gaussian represents a distance weight function
which introduces localization in the symmetry
evaluation. Our de®nition of symmetry was con-
sequently based on the orientation correspon-
2. ROI detecting algorithms
dences of gradients around the centered point
(Reisfeld et al., 1995). Alternatively, a normaliza-
The useful information content of a generic
tion of the axial quadratic moment could be used
picture can be abstracted by dierent image pa-
instead to compute the symmetry transform (Di
rameters which in turn can be identi®ed by rele-
GesuÂ and Valenti, 1995).
vant image processing algorithms. In this sense,
O, dierence in the gray-level orientation, a
applying algorithms to a picture means to map
statistical-type kernel, is analyzed in early visual
that image into dierent domains, where for each
cortices. Center-surround dierence is determined
domain a speci®c set of parameters is extracted.
®rst convoluting the image with four Gabor masks
After the image has been processed, only the loci
of angles 0°, 45°, 90° and 135°, respectively (see
of the local maxima from each domain are re-
also (Niebur and Koch, 1996)). For each pixels x,
tained; these maxima are then clustered in order to
y, the scalar result of the four convolutions are
yield a limited number of ROIs.
then associated with four unit vectors corre-
Six dierent algorithms were studied:
sponding to the four dierent orientations. The
S, symmetry transform,
orientation vector ox; y is represented by the
O, dierence in the gray-level orientation,
vectorial sum of these four weighted unit vectors.
E, edges per unit area,
We de®ne the center-surround dierence trans-
F, center-surround, as in receptive ®elds of an-
form as follows:
imal vision,
N, entropy, Ox; y 1 ÿ ox; y mx;
yk yk;
ox; ykkmx;
C, Michaelson contrast.
S, symmetry, a structural approach, appears to be 3
a very prominent spatial relation. For each pixel
y is the average orientation vector
where mx;
x,y of the image, we de®ne a local symmetry
evaluated within the neighborhood of 7 ´ 7 pixels.
magnitude S(x,y) as follows:
The ®rst factor of the equation achieves high val-
X ues for big dierences in orientation between the
Sx; y si1 ; j1 ; i2 ; j2 ; 1
i1 ;j1 ;i2 ;j2 2Cx;y
center pixel and the surroundings. The second
factor acts as a low-pass ®lter for the orientation
where Cx; y is the neighborhood of radius 7 of feature.
point x,y de®ned along the horizontal and vertical E, edges per unit area is determined by detecting
axis (Cx; y x ÿ r; y; . . . ; x; y; . . . ; x r; y; edges in an image, using the canny extension of the
x; y ÿ r; . . . ; x; y r) and si1 ; j1 ; i2 ; j2 is sobel operator (Canny, 1986) and then congregating
de®ned by the following equation: the edges detected with a gaussian of r 3 pixels.
C.M. Privitera, L.W. Stark / Pattern Recognition Letters 19 (1998) 1037±1043 1039
F, a center-surround on/o quasi-receptive ®eld local maxima (approximately 100) down to an

mask, positive in the center and negative in the ordered set of about nine ROIs.
periphery, is convoluted with the image. The initial set of local maxima was clustered
P N, entropy is locally calculated as connecting local maxima by gradually increasing
i2G fi log fi , where fi is the frequency of the i the acceptance radius for joining them. During
gray level within the 7 ´ 7 surrounding of the each step of the clustering process, all local max-
center pixel and G is the local set of gray levels. ima less than a speci®c radius apart were clustered
Local maxima de®ned by this factor emphasize together. Each cluster inherited the maximum
texture variance. The block size of 7 ´ 7 (0.3 ´ 0.3 value of its component points (local maxima): the
degree block) depends on the image scaling. locus of this highest valued maximum for each
C, Michaelson contrast, is most useful in iden- cluster then also determined the locus of that
tifying high contrast elements, generally consid- cluster. Only that maximum point was retained; all
ered to be an important choice feature for human the other composing local maxima were deleted.
vision. Michaelson contrast is calculated as The procedure was repeated while increasing the
acceptance radius at each step. The termination
Lm ÿ LM

L L ; decision to end the clustering process was set when
m M
only nine clusters remained. Thus, the image pro-
where Lm is the mean luminance within a 7 ´ 7 cessing ROIs were similar in number to human eye
surrounding of the center pixel and LM is the ®xation glances looking at similar pictures.
overall mean luminance of the image. The nine ROI domains were assigned values
depending upon the value of the highest local
maximum incorporated into that domain; alter-
3. Clustering and sequencing natively the number of local maxima included, or
other criteria could be used for this sequencing.
The algorithm procedures above (see, for ex- The values, ordered from highest to lowest, per-
ample, Fig. 1) resulted in de®ning local maxima mitted us to relate the sequences of image ROIs
widely over the picture. Then a clustering algo- (Fig. 2, left panel) to sequences of human ®xa-
rithm was utilized to reduce these initial set of tions. In this way no masking was necessitated.
Fig. 1. Picture transformation. The picture transformations are a result of and provide descriptions of the action of each algorithm
(upper and lower right, E and S, respectively). Note, the untransformed picture in the larger left panel. Local maxima (circa 100) from
each transformation were then retained.
Each of our six IP algorithms, of course, contrib- about two degrees, and similar in size to human
uted the intensity of its selected parameter to foveal spans. This ®nal selection of joined-ROIs
®nding the local maxima and the values of re- (Fig. 2) then enabled a distance-similarity metric,
sulting clustered ROI domains. Sp , to determine how close were the ROIs identi-
If we had used only image processing algo- ®ed by two algorithms. The individual source of
rithms and not the clustering procedure, we could the elements, that is, the original ROIs, used in
have selected nine local maxima directly and de- these ®nal interactive steps were preserved to il-
®ned them to be the ROI domains. Those selected lustrate the procedure (as circles and squares in
ROIs, however, might be much more closely Fig. 2, right panel).
spaced. Thus the clustering algorithm was actually
an eccentricity-weighting algorithm, where even 4.2. Similarity of sequences, Ss
lower local maxima that were eccentrically located
could be selected to form a domain. As mentioned above, ROIs were ordered by the
value assigned by the image processing algorithm
conceptually similar in some ways to the temporal
4. Similarity metrics ordering of human eye ®xations in a scanpath.
Two dierent ROI sequences, squares and circles,
4.1. Similarity of ROI loci, Sp Fig. 2, left upper and lower panels, corresponded
to two sets of ROIs from two of the IP algorithms.
Comparison of ®nal clusters of ROIs begun Then, joined-ROIs were ®nally ordered into strings
with taking two sets of ROIs, generated by two of ordered points. In Fig. 2, we have for example:
dierent algorithms and combining these two sets. stringS abcfeffgdc and stringF afbffdcdf . The
Then this double set of ROIs was clustered using a string editing similarity index Ss was de®ned by an
distance measure taken from a k-means pre-eval- optimization algorithm (Stark and Choi, 1996)
uation. This evaluation determined a region for with unit cost assigned to the three dierent op-
calling coincident any ROIs that were closer than erations deletion, insertion and shifting.
this distance and non-coincident for ROIs that were Dierent ROI sequences corresponding to two
further apart than this distance; the distance was sets of ROIs from two of the IP algorithms are
Fig. 2. Clustering algorithm. After the picture transformation, a clustering algorithm is used to reduce the high valued maxima (circa
100) to few (circa 10). These 10 were then ordered by value and the constructed ordered sequences (upper and lower left panel) were
then connected by vectors in analogy to human sequences of ®xations. The two set of ROIs are ®nally combined (right panel) into a
selected number of joined-ROIs which are used to de®ne distance measures between the two algorithms.
often similar in location, Sp , indicating that two IP N. Our collection of algorithms could thus be
algorithms have chosen similar ROIs. However, sorted for similarities or for dierences in gener-
they are usually dissimilar in sequence, Ss , indi- ating ROIs. This is likely to be of value in selecting
cating that the two IP algorithms have assigned algorithms for dierent tasks.
orders of values or sequences of values that were The global Sp , G 0.23, represented the aver-
dissimilar. aged Sp for all algorithms and for the two dierent
pictures used; some of these might be related, but
4.3. Markov matrices, St most clearly have no reason to be so related. This
G value could be considered a bottom anchor for
We can also create, based on the above men- Sp ; a further bottom anchor was Ra, the random
tioned joined-ROIs, transition markov matrices, Sp , calculated for coincidence among all loci of the
[M1]s (Fig. 3, S left matrix and F right matrix). appropriate size.
The transition similarity index, St , is based upon The middle and lower left ordered matrices of
cross-correlation between the coecients of the similarity coecients show little coherence even
[M1]s. among the four selected algorithms. This provides
Thus, all of our comparisons between pairs of quantitative support for our conclusion that we
algorithms yielded three dierent indices of simi- can select independent algorithms that yet select
larity. For the example illustrated above we have similar ROI loci. These communally identi®ed loci
Sp 1, Ss 0.34 and St 0.09. have greater validity than those chosen by any one
algorithm. However, the sequences were dicult
4.4. Coherence of algorithms to recapture from one algorithm to another and
therefore our selected group of algorithms yielded
The similarity measures, Sp , Ss and St de®ned no more precise estimate of sequential coherence.
above, tell us how closely the ROIs identi®ed by
two algorithms resemble each other in locus, Sp , in
sequence, Ss , and in transition from one ROI to 5. Discussion
the next, St . These similarities have been arranged
into three ordered matrices (Fig. 4). Two advantages of our methodology are task
For Sp , (Fig. 4, upper matrix) enclosed within de®nition and metrics for similarities. We have set
the dotted box are six similarity values docu- precised task for the image processing algorithms
menting the high Sp similarity for four selected studied ± to identify loci of ROIs and, if possible,
algorithms E, O, S and F. The average value of the sequences or strings of loci of ROIs. This is a
similarity indices was L 0.62 for the selected more de®ned task than just processing images for
algorithms (right column, upper table). This can human subjective judgement ± undeniably human
be compared to the average value, L 0.4, of vision can compensate for many aspects of loss of
similarity indices for the entire group of six algo- information. The quantitative measure of similar-
rithms, including two disparate algorithms, C and ity of loci of ROIs, Sp , enabled the selection of
Fig. 3. Transition matrices. Transition matrices derived during the comparisons of the two algorithms shown in Fig. 2.
Fig. 4. Similarity measures among algorithms. Cross-comparison values (left column) of the six algorithms, E,O,S,F,G,N, for the three
indices, Sp , Ss and St . Coecients range from 0 to 1. Averaged coecients are printed in adjoining table (right column). For Sp , in the
upper matrix (enclosed within the dotted box), are six similarity values documenting the high Sp similarity for four selected algorithms.
Note, that for the middle and lower matrices and tables, the selected group of algorithms and indeed, all the algorithms together, did
not show commonality of sequences, strings and transitions (see text).
algorithms that function for our task in a similar see Stark and Privitera, 1997). From this com-
fashion, even though they may have processed parison we may learn something more about top-
pictures in quite dierent manners. The fact that down, context-dependent, visual processing and
we found only low similarities of sequences and understand the structure and the nature of internal
transitions, Ss and St , indicates that selecting for representations upon which humans perception is
commonality of algorithms in forming sequences is based.
not yet possible. In other studies, beyond the scope of the pres-
A wide selection of algorithms provided an ent paper, we have applied our methodology to a
opportunity to study quantitatively their dieren- much richer collection of pictures, scenes and
ces and similarities in terms of the precise task. works of art. These range widely from natural and
This is of great interest as it indicates the general constructed landscapes and cityscapes, to groups
nature of an image and how it is processed either of persons and animals and objects, to single
by algorithms or by humans. We are concerned portraits then paintings (among them the two
that we might need to provide weighting coe- paintings for the data in this Pattern Recognition
cients for the dierent algorithms in order to op- Letters) and still lifes (Privitera and Stark, 1998, in
timize the predicting capabilities of the ensemble preparation). In Fig. 4, the table coecients fall
(Stark and Privitera, 1997). into typical ranges that we have obtained with a
We are in the process of comparing our algo- much larger collection of algorithms and a much
rithms with eye movement ®xations (Privitera and wider set of images from our extended corpus of
Stark, 1998, in preparation; for preliminary results pictures. However, we wish to direct the readers'
attention to the main thrust of this paper ± the References

algorithms used and the methods employed to
compare them. Canny, J., 1986. A computational approach to edge detection.
IEEE Transactions on Pattern Analysis and Machine
Many open problems remain: how to collect
Intelligence 8 (6), 679±698.
and classify and relate even a wider variety of Di Ges u, V., Valenti, C., 1995. The discrete symmetry trans-
image processing algorithms? What modi®cations form in computer vision. Technical Report 011-95, Labo-
of a picture leave the picture information relatively ratory for Computer Science (DMA), University of
constant and what modi®cations strongly deform Palermo.
Haralick, R.M., 1979. Statistical and structural approaches to
the picture information?
texture. Proceedings of IEEE 67, 786±804.
Niebur, E., Koch, C., 1996. Control of selective visual
attention: Modeling the ``where'' pathway. In: Touretzky,
Acknowledgements D.S., Mozer, M.C., Hasselmo, M.E. (Eds.), Advances in
Neural Information Processing Systems, Vol. 8. MIT Press,
Cambridge, MA, pp. 802±808.
We thank our sponsors for partial support: the
Noton, D., Stark, L., 1969. Eye movements and visual
NASA-Ames Research Center (Drs. Stephen Ellis perception. IEEE Transactions on Man±Machine Systems
and Robert Welch, technical monitors of dierent 1, 1±9.
Cooperative Agreements) and the Fujita Research Reed, R.T., Hans Du Buf, J.M., 1993. A review of recent
and Neuroptics Corporations, whose presidents, texture segmentation and feature extraction techniques.
CVGIP: Image Processing 57 (3), 359±372.
Ken Kawamura and Kamran Siminou, respec-
Reisfeld, D., Wolfson, H., Yeshurun, Y., 1995. Context-free
tively, allowed us to interpret our guidelines quite attentional operators: The generalized symmetry transform.
loosely. Our colleagues in the laboratory, Michela International Journal of Computer Vision 14, 119±130.
Azzariti, Ted Blackmon, Yeuk Fai Ho, Veit Stark, L., Choi, Y., 1996. Experimental metaphysics: The
Hagenmeyer, Yong Yu have always been helpful, scanpath as an epistemological mechanism. In: Zangemeis-
ter, W.H., Stiehl, H.S., Freksa, C., Visual Attention and
not the least in keeping a motley array of com-
Cognition. Elsevier, Amsterdam, pp. 3±69.
puters functioning smoothly. Other scienti®c col- Stark, L., Ellis, S., 1981. Scanpaths revisited: Cognitive models
leagues, past and present have been generous in direct active looking. In: Fisher, M., Senders (Eds.), Eye
their advice ± Stephen Ellis and Charles Neveu at Movements, Cognitive and Visual Perception. Erlbaum, NJ,
NASA, Irwin Sobel at Hewlett-Packard, and at pp. 193±226.
Stark, L., Privitera, C., 1997. Top-down and bottom-up image
UCB, Jerry Feldman, CS and David Brillinger,
processing. In: Proceedings of IEEE International Confer-
Statistics; also the anonymous referees of Pattern ence on Neural Networks, Vol. 4. Houston, TX, 9±12 June,
Recognition Letters. pp. 2294±2299.

Roi

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Roi

Încărcat de

Drepturi de autor:

Formate disponibile

Pattern Recognition Letters 19 (1998) 1037±1043

Evaluating image processing algorithms that predict regions of

Keywords: Regions of visual interest; Algorithm comparisons; Eye movements

1. Introduction example (Reed and Hans Du Buf, 1993; Haralick,

F, a center-surround on/o quasi-receptive ®eld local maxima (approximately 100) down to an

attention to the main thrust of this paper ± the References

S-ar putea să vă placă și

Roi

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Roi

Încărcat de

Drepturi de autor:

Formate disponibile

Pattern Recognition Letters 19 (1998) 1037±1043

Evaluating image processing algorithms that predict regions of

Keywords: Regions of visual interest; Algorithm comparisons; Eye movements

1. Introduction example (Reed and Hans Du Buf, 1993; Haralick,

F, a center-surround on/o quasi-receptive ®eld local maxima (approximately 100) down to an

attention to the main thrust of this paper ± the References

S-ar putea să vă placă și

F, a center-surround on/o quasi-receptive ®eld local maxima (approximately 100) down to an