A New Line Symmetry Distance Based Automatic Clustering Technique: Application to Image Segmentation
Sriparna Saha, ^{1} Ujjwal Maulik ^{2}
^{1} Image Processing and Modeling, Interdisciplinary Center for Scientiﬁc Computing (IWR), University of Heidelberg, Heidelberg, Germany
^{2} Department of Theoretical Bioinformatics, DKFZ (Deutsches Krebsforschungszentrum, German Cancer Research Center), Heidelberg, Germany
Received 3 March 2009; accepted 16 May 2010
ABSTRACT: In this article, at ﬁrst an automatic clustering technique using the concept of line symmetry property is developed. The pro posed realcoded variable string length genetic clustering technique (VGALS clustering) is able to evolve the number of clusters present in the data set automatically. Here assignment of points to different clusters is done based on the line symmetry based distance rather than the Euclidean distance. The cluster centers are encoded in the chromosomes, whose value may vary. A newly developed line sym
metry based cluster validity index, LineSymindex, is used as a mea sure of ‘‘goodness’’ of the corresponding partitioning. This validity index is able to correctly indicate the presence of clusters of different sizes as long as they are line symmetrical. A Kdtree based data structure is used to reduce the complexity of computing the line sym metry distance. The proposed technique is then applied to automati cally segment different images. At ﬁrst, the superiority of the pro posed method to automatically segment the image data sets over Fuzzy Cmeans clustering technique, wellknown meanshift based method and GAPS clustering with Symindex based method, are demonstrated for three remote sensing satellite images. Thereafter it is applied on several simulated T1weighted, T2weighted, and pro ton density normal and MS lesion magnetic resonance brain images. The proposed method is able to detect most of the regions well. Su periority of the proposed method over Fuzzy Cmeans and Expecta tion Maximization clustering algorithms are demonstrated quantita tively. The automatic segmentation obtained by VGALS clustering technique is also compared with the available ground truth informa
tion. VV C
2011 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 21, 86–100,
2011; Published online in Wiley Online Library (wileyonlinelibrary.com). DOI
10.1002/ima.20243
Key words: unsupervised classiﬁcation; cluster validity index; symmetry; line symmetry based distance; Principal component analysis; Kd tree; magnetic resonance image
Correspondence to: Sriparna Saha; email: sriparna.saha@gmail.com
' 2011 Wiley Periodicals, Inc.
I. INTRODUCTION Remote sensing satellite images have signiﬁcant applications in dif ferent areas like climate studies, assessment of forest resources, examining marine environments, etc. For remote sensing applica tions, classiﬁcation is an important task where the pixels in the images are classiﬁed into homogeneous regions, each of which cor responds to some particular landcover type. The problem of pixel classiﬁcation is often posed as clustering (Saha and Bandyopad hyay, 2010) in the intensity space (Maulik and Bandyopadhyay, 2003). In the unsupervised pixel classiﬁcation framework, various clustering algorithms like Fuzzy Cmeans (Cannon et al., 1986), and statistical methods have been used for the purpose of satellite image segmentation. Recently application of genetic algorithms (GAs) in the ﬁeld of pixel classiﬁcation has obtained signiﬁcant attention of the researchers (Maulik and Bandyopadhyay, 2003; Saha and Bandyopadhyay, 2008). A contextsensitive clustering technique for unsupervised image segmentation based on graphcut initialization and expectationmaximization algorithm is developed by Tyagi et al., (2008). An unsupervised hyperspectral image classi ﬁcation technique based on fuzzyclustering algorithms that spa tially exploit membership relations is proposed by Bilgin et al., (2008). A multiresolution remote sensing image clustering tech nique is proposed by Wemmert et al., (2009) which uses informa tion contained in both spatial resolutions. An agglomerative hier archical clustering method for large multispectral images, which uses both spectral and spatial information for the aggregation deci sion, is proposed by Marcal and Castro, (2005). A new spectralspa tial classiﬁcation scheme for hyperspectral images which combines the results of a pixel wise support vector machine classiﬁcation and the segmentation map obtained by partitional clustering using ma jority voting is proposed by Tarabalka et al., (2009). Recently, a multiobjective fuzzy genetic clustering technique for pixel classiﬁ cation has been proposed by Bandyopadhyay et al., (2007). By Cha mundeeswari et al., (2007), an unsupervised classiﬁcation algorithm
using Maximum a posteriori (MAP) segmentation for SAR images is presented. Some more clustering techniques for satellite image segmentation can be found elsewhere (Hilbert, 1977; Kauth et al., 1977; Ghassemian and Landgrebe, 1988; Chen and Landgrebe, 1989; Sayood, 1992; Friedl et al., 2002; Pugh and Waxman, 2006; Chamundeeswari et al., 2007; Guo et al., 2009; Gandhi et al., accepted; Jaffar et al., accepted). Fully automatic brain tissue classiﬁcation from magnetic reso nance images (MRI) is also an important research issue. The accu rate segmentation of MR images into different tissue classes, like gray matter (GM), white matter (WM), and cerebrospinal ﬂuid (CSF), is an important task. Additionally, regional volume calcula tions sometimes also bring even more useful diagnostic information in diseases like Alzheimer disease, in movement disorders such as Parkinson or Parkinson related syndrome, in white matter metabolic or inﬂammatory disease, in congenital brain malformations or peri natal brain damage, or in post traumatic syndrome. The automatic segmentation of brain MR images, however, remains a difﬁcult problem. Clustering approaches have been widely used for segmen tation of MR brain images. The use of neural networks, evolution ary computation and/or fuzzy clustering techniques for MR image segmentation has been investigated in (Suckling et al., 1999; Bhan darkar and Zhang, 1999; Saha and Bandyopadhyay, 2007, 2009, accepted). To cluster a data set, some similarity or dissimilarity criteria has to be deﬁned. A new type of nonmetric distance, based on point symmetry, (d _{p}_{s} ), is proposed in (Bandyopadhyay and Saha, 2007). For reducing the complexity of point symmetry distance computa tion, Kdtree based data structure is used. From the geometrical symmetry viewpoint, point symmetry and line symmetry are two widely discussed issues. Inspired by this, a line symmetry based dis tance was proposed in (Saha and Bandyopadhyay, 2007). But the proposed distance had several drawbacks. The major shortcoming of the old line symmetry based distance was that its application is limited to twodimensional data sets only. A new line symmetry based technique using principal component analysis is developed in (Saha and Bandyopadhyay, accepted) that removes the limitations of (Saha and Bandyopadhyay, 2007). This new line symmetry based distance is more general and is applicable for any dimensional data sets. In (Saha and Bandyopadhyay, accepted), a genetic clustering technique is also developed based on this line symmetry distance. But this technique a priori assumes the number of clusters present in a data set. In this article, we have developed a line symmetry based automatic clustering technique. The motivation of this article is to develop a line symmetrybased automatic genetic clustering technique for image segmentation. In this article, a variable string length genetic line symmetry (VGALSclustering) based clustering technique is proposed which is then used to automatically segment the remote sensing satellite images. Centers of the clusters are encoded in the chromosome whose values vary over a certain range. Here assignment of points to different clusters is done based on a newly proposed line symme try based distance rather than the Euclidean distance. A new cluster validity index based on the newly developed linesymmetry based distance, LineSymindex, is proposed here and thereafter it is uti lized for computing the ﬁtness of the chromosomes. The effective ness of the proposed line symmetrybased automatic clustering technique (VGALSclustering) is shown for automatically partition ing Indian remote sensing (IRS) satellite images of the parts of the city of Kolkata and SPOT image of the part of the city of Kolkata.
Comparisons are made with those obtained by Fuzzy Cmeans clus tering technique, popular Meanshift based segmentation technique (Comaniciu and Meer, 2002) and GAPS clustering with Symindex based method (Bandyopadhyay and Saha, 2007). The segmentation results are compared qualitatively and quantitatively. The effectiveness of the proposed algorithm is thereafter shown in segmenting the MR images of the normal brain and MR brain images with multiple sclerosis lesions. The segmentation results are then compared with the available ground truth informa tion. For the purpose of comparison, the wellknown Fuzzy C means algorithm (Bezdek, 1973) and the Expectation Maximiza tion (EM) (Jain et al., 1999) clustering algorithm are also exe cuted, ﬁrstly with the number of clusters automatically deter mined by the VGALS clustering and then with the actual number of clusters present in the images. The segmentation results are compared with that provided by VGALS clustering algorithm quantitatively. In a part of the experiment, fuzzy variable string length genetic algorithm (FuzzyVGA) (Maulik and Bandyopadhyay, 2003) which uses the Euclidean distance for computing the membership values of points to different clusters is also executed on the MR brain images to automatically segment it. The results are also compared with those obtained by VGALS clustering technique. compared qualitatively and quantitatively.
II. SYMMETRY BASED DISTANCES
In this section, at ﬁrst the existing point symmetry based distance is described in brief. Then a new deﬁnition of the newly developed line symmetry based distance is proposed.
A. Existing Point Symmetry Based Distance. In this section,
a new PS distance (Bandyopadhyay and Saha, 2007), d _{p}_{s} (x,c), asso
ciated with point x with respect to a center c is described. Let a
point be x. The symmetrical (reﬂected) point of x with respect to a
particular centre c is 2 3 c 2 x. Let us denote this by x*. Let knear
unique nearest neighbors of x* be at Euclidean distances of d _{i} s, i 5
1,2,
knear.
Then
d _{p}_{s} ðx ; c Þ ¼ d _{s}_{y}_{m} ðx ; c Þ3d _{e} ðx ; c
Þ;
ð1Þ
¼ ^{P} knear
i¼1
d i
knear
3d _{e} ðx ; c
Þ;
ð2Þ
where d _{e} (x,c) is the Euclidean distance between the point x and c.
The complexity of computing d _{p}_{s} (x,c) is of order n, where n is the total number of data points. For all the n points and K clusters, the complexity becomes of order n ^{2} K. Thus, to reduce this, the Kd tree based nearest neighbor search ANNlib (Mount and Arya, 2005) (Approximate Nearest Neighbor), which is a library written in C11 (obtained from http://www.cs.umd.edu/ mount/ANN) is used in (Bandyopadhyay and Saha, 2007). ANNlib is used to ﬁnd d _{i} , i 5 1 to knear, in Eq. (2) efﬁciently.
B. Newly Developed Line Symmetry Based Distance (Saha and Bandyopadhyay, accepted). Given a particular data set, we ﬁrst ﬁnd the ﬁrst principal axis of this data set using Principal Component Analysis (Jolliffe, 1986). Let the eigen vec tor of the covariance matrix of the data set with highest eigen
eg _{d} ], where d is the dimension of the
value be [eg _{1} eg _{2} eg _{3} eg _{4}
Vol. 21, 86–100 (2011)
87
original data. Then the ﬁrst principal axis of the data set is given by:
ðx _{1} c _{1} Þ _{¼} ðx _{2} c _{2} Þ
eg _{1}
eg _{2}
¼
^{ð}^{x} ^{d} ^{} ^{c} ^{d} ^{Þ}
_{¼}
eg _{d}
where the center of the data set is c 5 {c _{1} , c _{2} ,
The obtained principal axis is treated as the symmetrical line of the relevant cluster, i.e., if the data set is indeed symmetrical then it should also be symmetric with respect to the ﬁrst principal axis of the dataset identiﬁed by the principal component analysis (PCA). This symmetrical line is used to measure the amount of line symme try of a particular point in that cluster. To measure the amount of
, c _{d} }.
line symmetry of a point (x) with respect to a particular line i, d _{l}_{s} (x, i), the following steps are followed.
1. For a particular data point x, calculate the projected point p _{i} on the relevant symmetrical line i.
2.
Find d _{s}_{y}_{m} (x,p _{i} ) as:
:d _{s}_{y}_{m} ðx ; p _{i} Þ ¼ P knear knear
i¼1
d i
ð3Þ
where knear unique nearest neighbors of x* 5 2 3 p _{i} 2 x are at
Euclidean distances of d _{i} s, i 5 1,2,
(Mount and Arya, 2005) utilizing Kdtree based nearest neigh bor search is used to reduce the complexity of computing these d _{i} s (as described in Section IIC). Then the amount of line sym
ANN library
knear.
metry of a particular point x with respect to the symmetrical line of cluster i, is calculated as:
:d _{l}_{s} ðx ; iÞ ¼ d _{s}_{y}_{m} ðx ; p _{i} Þ3d _{e} ðx ; c
Þ
ð4Þ
where c is the centroid of the particular cluster i and d _{e} (x,c) is
the Euclidean distance between the point x and c.
It can be seen from Eq. (3) that knear cannot be chosen equal to 1,
since if x* exists in the data set then d _{s}_{y}_{m} (x, p _{i} ) 5 0 and hence there
will be no impact of the Euclidean distance in the deﬁnition of d _{l}_{s} (x, i). On the contrary, large values of knear may not be suitable because it may underestimate the amount of symmetry of a point with respect to the ﬁrst principal axis. Here knear is chosen equal to 2. It may be noted that the proper value of knear largely depends on the distribu tion of the data set. A ﬁxed value of knear may have many draw backs. For instance, for very large clusters (with too many points), 2 neighbors may not be enough as it is very likely that a few neighbors would have a distance close to zero. On the other hand, clusters with too few points are more likely to be scattered, and the distance of the two neighbors may be too large. Thus, a proper choice of knear is an important issue that needs to be addressed in the future. Note that every point symmetric cluster is also line symmetric with respect to its central axis. Thus, the proposed distance is able to detect line symmetric as well as point symmetric clusters. The previ ous point symmetry based distance was only able to measure the amount of point symmetry of a point with respect to a cluster center. But the proposed line symmetry based distance measures total line symmetry with respect to the symmetrical line of a cluster. The above line symmetry based distance is realized more clearly
from Figure 1. Here x is a particular data set. c is the center of a par ticular cluster. The ﬁrst principal axis of the given cluster is denoted
88 Vol. 21, 86–100 (2011)
Figure 1. Example of line symmetry based distance [color ﬁgure
can
wileyonlinelibrary.com.].
the online issue, which is available at
be
viewed
in
by Line l. This line is treated as the symmetrical line of the particu
lar cluster. The projected point of x on this symmetrical line is
denoted by p. Let the reﬂected point of x with respect to p be
denoted by x ^{0} . The ﬁrst two nearest neighbors (here knear is chosen
equal to 2) of x ^{0} are at Euclidean distances of d _{1} and d _{2} , respec
tively. Then the total amount of symmetry of x with respect to the
projected
d _{s}_{y}_{m} ðx ; p Þ ¼ ^{d} ^{1} ^{þ}^{d} ^{2} . Therefore, the total line symmetry based distance
follows:
point
p
on
line
Line
l
is
calculated
as
of the point x with respect to the symmetrical line of cluster l is cal
culated as d _{l}_{s} (x,l) 5 d _{s}_{y}_{m} (x, p) 3 d _{e} (x, c) where d _{e} (x, c) is the Eu
clidean distance between the point x and the cluster center c. It is evident that the symmetrical distance computation is very time consuming because it involves the computation of the nearest
neighbors. Computation of d _{l}_{s} (x,i) is of complexity O(N). Hence for N points and K clusters, the complexity of computing the line sym
metry based distance between all points to different clusters is
O(N ^{2} K). To reduce the computational complexity, an approximate nearest neighbor search using the Kdtree approach is adopted in this article.
C. Kdtree Based Nearest Neighbor Computation. A K
dimensional tree, or Kdtree is a spacepartitioning data structure for organizing points in a Kdimensional space. A Kdtree uses only those splitting planes those are perpendicular to one of the coordinate axes. Approximate Nearest Neighbor (ANN) is a library written in C11 (Mount and Arya, 2005), which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions. In this article
ANN library utilizing Kdtree for nearest neighbor search is used to
knear, in Eq. (3) efﬁciently. Thus, it
ﬁnd d _{i} s, where i 5 1,
, requires the construction of a Kdtree consisting of N points in the data set, where N is the size of the data set. The construction of Kd tree requires O(NlogN) time and O(N) space (Anderberg, 2000). (Friedman et al., 1977) reported O(logN) expected time for ﬁnding the nearest neighbor using Kdtree.
III. VGALSCLUSTERING: VARIABLE STRING LENGTH
GENETIC LINE SYMMETRY DISTANCE BASED CLUSTERING TECHNIQUE In this section, the use of variable string length genetic algorithm using the newly developed line symmetry based distance (VGALS clustering) is proposed for automatically evolving the number of clusters present in a data set. Here we have considered the best
partition to be the one that corresponds to the maximum value of the proposed LineSymindex which is deﬁned later. Both the num ber of clusters as well as the appropriate partitioning of the data are evolved simultaneously using the search capability of genetic algo rithms. Since the number of clusters is considered to be variable, the string lengths of different chromosomes in the same population are allowed to vary. As a consequence, the crossover and mutation operators are suitably modiﬁed to tackle the concept of variable length chromosomes. The technique is described below in detail.
A. String Representation and Population Initialization. In
VGALSclustering, the chromosomes are made up of real numbers which represent the coordinates of the centers of the partitions. If chromosome i encodes the centers of M _{i} clusters in N dimensional space then its length l _{i} is taken to be N * M _{i} . Each center is consid ered to be indivisible. Each string i in the population initially enco des the centers of a number, M _{i} , of clusters, such that M _{i} 5 (rand()mod M*) 1 2. Here, rand() is a function returning an integer, and M* is a soft estimate of the upper bound of the number of clus ters. The number of clusters will therefore range from two to M* 1 1. The M _{i} centers encoded in a chromosome are randomly selected distinct points from the data set. Thereafter ﬁve iterations of the K means algorithm is executed with the set of centers encoded in each chromosome. The resultant centers are used to replace the centers in the corresponding chromosomes. This makes the centers sepa rated initially. Example. Let M* 5 10. Let the random number M _{i} be equal to 4 for chromosome i. Then this chromosome will encode the centers of 4 clusters. Let the four cluster centers (four randomly chosen points from the data set) be (10.0, 5.0) (20.4, 13.2) (15.8, 2.9) (22.7, 17.7). Thus the chromosome may look like (20.4, 13.2) (15.8, 2.9) (10.0, 5.0) (22.7, 17.7).
B. Fitness Computation. This is composed of two steps. First,
assignment of n points to different clusters are done by using the
newly developed line symmetry based distance, d _{l}_{s} . Next, a new cluster validity index based on the newly developed line symmetry distance, LineSymindex, is computed and used as a measure of the ﬁtness of the chromosome.
(1) Computing the membership values. Let a particular chromo some encode centers of K number of clusters. At ﬁrst, the ﬁrst princi pal axis of each cluster is determined using principal component
analysis (Jolliffe, 1986). Then the assignment of each point x _{j} , j 5
1,2,
the cluster center nearest to x _{j} in the line symmetrical sense. That is,
N, to K different clusters are done in the following way. Find
we ﬁnd the cluster center k that is nearest to the input pattern x _{j} using
the minimumvalue criterion: k 5 argmin _{i}_{5}_{1}_{,}
d _{l}_{s} (x _{j}_{,}_{i} ) where the
line symmetry based distance d _{l}_{s} (x _{j}_{,}_{i} ) is computed by Eq. (4). If the
K
corresponding d _{s}_{y}_{m} (x _{j} , k) [as deﬁned in Eq. (3)] is smaller than a pre
speciﬁed parameter y, then assign that particular point x _{j} to kth clus ter. Otherwise assignment is done based on the minimum Euclidean distance criterion as normally used in (Bandyopadhyay and Maulik,
2002) or the Kmeans algorithm, i.e., assign x _{j} to kth cluster where k
5 argmin _{i}_{5}_{1} ,
cluster and d _{e} (x _{j} , c _{i} ) denotes the Euclidean distance between the point
K d _{e} (x _{j} , c _{i} ). Here, c _{i} denotes the center of the ith
x _{j} and the cluster center c _{i} . The reason for doing such an assignment is as follows: in the intermediate stages of the algorithm, when the centers are not yet properly evolved, then the minimum d _{l}_{s} value for a point is expected to be quite large, since the point might not be symmetric with respect to any cluster. In such cases, using
Euclidean distance for cluster assignment appears to be intuitively more appropriate. The value of y is kept equal to the maximum nearest neighbor distance among all the points in the data set as described in (Ban dyopadhyay and Saha, 2007, 2008). It is to be noted that if a point is indeed symmetric with respect to the principal axis of some clus ter center then the symmetrical distance computed in the above way will be small, and can be bounded as follows. Let d _{N}_{N} ^{m}^{a}^{x} be the maximum nearest neighbor distance in the data set. That is
d
max
NN
¼ max i¼1;
N
d _{N}_{N} ðx _{i} Þ;
ð5Þ
where d _{N}_{N} (x _{i} ) is the nearest neighbor distance of x _{i} . Assuming that
x* lies within the data space, it may be noted that
d 1 ^{d}
max
NN
2
and
d _{2} ^{3}^{d}
max
NN
2
;
ð6Þ
resulted in, ^{d} ^{1} ^{þ}^{d} ^{2} d
: Ideally, a point x is exactly symmetrical
with respect to the principal axis of some cluster if d _{1} 5 0. However
considering the uncertainty of the location of a point as the sphere
2
max
NN
of radius d _{N}_{N} ^{m}^{a}^{x} around x, we have kept the threshold y equals to d _{N}_{N} ^{m}^{a}^{x} . Thus the computation of y is automatic and does not require user intervention. After the assignments are done, the cluster centers encoded in the chromosome are replaced by the mean points of the respective clusters. This is referred to as the Kmeans like update center operation.
(2) Fitness calculation: The ﬁtness of a chromosome is com puted using a newly deﬁned cluster validity index, LineSymindex Note that this index is inspired by the point symmetry based cluster validity index Symindex (Saha and Bandyopadhyay, 2008; Ban
dyopadhyay and Saha, 2008). Let K cluster centres be denoted by c _{i} where 1 i K and n _{i} denotes the number of points present in the i th cluster. Then LineSymindex is deﬁned as follows:
ð7Þ
K ^{3} E 1 K
1
3D _{K} ;
LineSymðKÞ ¼
where K is the number of clusters present in that chromosome. Here,
_{i} c j :
E K ¼ ^{P}
i¼1
D _{K} is the maximum Euclidean distance between any two cluster
K
E _{i} such that E _{i} ¼ ^{P}
i
n
j¼1 ^{d} ls ^{ð}^{x} ^{} ^{i}
_{j} ; iÞ and D _{K} ¼ max ^{K}
i;j¼1 ^{c}
i
centers among all centers. Here x _{j} denotes the jth point of the ith
cluster. d _{l}_{s} (x _{j} , i) is computed by Eq. (4) where i denotes the symmet
rical line (ﬁrst principal axis) of the ith cluster. The objective is to maximize the LineSymindex in order to obtain the actual number of clusters and to achieve proper cluster ing. As formulated in Eq. (7), LineSym is a composition of three factors, these are 1/K, 1=E _{K} and D _{K} . The ﬁrst factor increases as K decreases; as LineSym needs to be maximized for optimal cluster ing, it will prefer to decrease the value of K. The second factor is the within cluster total line symmetrical distance. For clusters which have good symmetrical structure, E _{i} value is less. This, in turn, indi cates that formation of more number of clusters, which are symmet rical in shape, would be encouraged. Finally the third factor, D _{K} , measuring the maximum separation between a pair of clusters, increases with the value of K. As these three factors are comple mentary in nature, so they are expected to compete and balance each other critically for determining the proper partitioning.
i
Vol. 21, 86–100 (2011)
89
The ﬁtness function for chromosome j is deﬁned as LineSym _{j} , i.e., the LineSym index computed for partitioning encoded in that chromosome. The objective of the GA is to maximize this ﬁtness function.
C. Selection. Conventional proportional selection is applied on
the population of chromosomes. Here, a chromosome receives a number of copies that is proportional to its ﬁtness in the population. We have used roulette wheel strategy for implementing the propor tional selection scheme.
D. Crossover. For the purpose of crossover, the cluster centers
are considered to be indivisible, i.e., the crossover points can only lie in between two cluster centers. The crossover operation, applied stochastically with probability of crossover (l _{c} ), must ensure that information exchange takes place in such a way that both the off spring encode the centers of at least two clusters. For this, the oper ator is deﬁned as follows (Maulik and Bandyopadhyay, 2003): Let parent chromosomes P _{1} and P _{2} encode M _{1} and M _{2} cluster centers, respectively. s _{1} , the crossover point in P _{1} , is generated as s _{1} 5rand() mod M _{1} . Let s _{2} be the crossover point in P _{2} , and it may vary in between [LB(s _{2} ),UB(s _{2} )], where LB(s _{2} ) and UB(s _{2} ) indicate the lower and upper bounds of the range of s _{2} , respectively. LB(s _{2} )
and UB(s _{2} ) are given by LB (s _{2} ) 5 min[2,max[0,2 2 (M _{1} 2 s _{1} )]] and UB (s _{2} ) 5 [M _{2} 2 max[0,2 2 s _{1} ]]. Therefore s _{2} is given by
s _{2} ¼ LBðs _{2} Þ þ randðÞmodðUBðs _{2} Þ LBðs _{2} ÞÞ
ifðUBðs _{2} Þ LBðs _{2} ÞÞ;
¼ 0
otherwise:
It can be veriﬁed by some simple calculations that if the crossover points s _{1} and s _{2} are chosen according to the above rules, then none of the offsprings generated would have less than two clusters.
E. Mutation. Mutation is applied on each chromosome with prob
ability l _{m} . Mutation is of three types.
1. Each cluster center encoded in a chromosome is mutated with probability l _{m} in the following way. The cluster center is replaced with a random variable drawn from a Laplacian distribution, pðeÞ / e ^{} ^{j}^{e} ^{} ^{l}^{j} , where the scaling factor d sets the magnitude of perturbation. Here l is the value at the position which is to be perturbed. The scaling factor d is chosen equal to 1.0. The old value at the position is replaced with the newly generated value.
2. One randomly generated cluster center is removed from the chromosome, i.e., the total number of clusters encoded in the chromosome is decreased by 1.
3. The total number of clusters encoded in the chromosome is increased by 1. One randomly chosen point from the data set is encoded as the new cluster center.
d
Any one of the above mentioned types of mutation is applied randomly on a particular chromosome if it is selected for mutation.
F. Termination. In this article, we have executed the algorithm for a ﬁxed number of generations. Moreover, the elitist model of GAs has been used, where the best string seen so far is stored in a location within the population. The best string of the last generation provides the solution to the clustering problem.
90 Vol. 21, 86–100 (2011)
G. Complexity Analysis of VGALS Clustering Technique. Below we have analyzed the time complexity of the proposed VGALS clustering technique. Here N: total number of points in the data set, M* : Maximum possible number of clusters and d: dimension of the data.
As discussed above Kdtree data structure has been used in order to ﬁnd the nearest neighbor of a particular point. The construction of Kdtree requires O(NlogN) time and O(N) space (Anderberg, 2000). Initialization of GA needs O(Popsize 3 stringlength) time where Popsize and stringlength indicate the population size and the length of each chromosome in the GA, respectively. Note that stringlength is O(M* 3 d) where d is the dimension
of the data set and M* is the soft estimate of the upper bound of the number of clusters. Fitness Computation is composed of three steps.
1. In order to ﬁnd membership values of each point to all clus ter centers minimum line symmetrical distance of that point with respect to all clusters have to be calculated. At ﬁrst Principal Component Analysis (PCA) is applied to detect the ﬁrst Principal axis. This will take O(N 3 d ^{2} ) time. There after this ﬁrst Principal axis is used as the symmetrical line of the respective cluster. In order to determine line symmetry
based distance, Kdtree based nearest neighbor search is used. If the points are roughly uniformly distributed, then the expected case complexity is O(c ^{d} 1 logN), where c is a
constant depending on dimension and the point distribution. This is O(logN) if the dimension d is a constant (Bentley et al., 1980). (Friedman et al., 1977) also reported O(logN) expected time for ﬁnding the nearest neighbor. So in order to ﬁnd minimal symmetrical distance of a particular point, O(M*logN) time is needed. Thus total complexity of com puting membership values of N points to M* clusters is O(M*NlogN).
2. For updating the centers total complexity is O(M*).
3. Total complexity for computing the ﬁtness values is O(N 3 M*).
So the ﬁtness evaluation has total complexity5 O(Popsize 3 M*NlogN).
Selection step of the GA requires O(Popsize 3 stringlength) time. Mutation and Crossover require O(Popsize 3 stringlength) time each.
Thus summing up the above complexities, and considering string length N, total time complexity becomes O(M*NlogN 3 Popsize) per generation. For maximum Maxgen number of generations total complexity becomes O(M*NlogN 3 Popsize 3 Maxgen).
H. Space Complexity Analysis. The major space requirement of VGALS clustering is due to its population. Thus, the total space complexity of VGALS clustering is O(Popsize 3 Stringlength), i.e., O(Popsize 3 d 3 M*). Also for each population we have to keep a membership matrix of size N 3 M*. Thus total space compexity will be O(Popsize 3 N 3 M*) as N d.
IV EXPERIMENTAL RESULTS A. Results on Satellite Images. In this section at ﬁrst, the ex perimental results obtained after application of the above mentioned
Figure 2.
feature space.
VGALSclustering technique for segmenting two remote sensing satellite images of the parts of the cities of Kolkata are provided. The two satellite images are of sizes 512 3 512, i.e., the size of the data set to be clustered in all the images is 262,144. For these multi spectral satellite images, the feature vector is composed of the in tensity values at different bands of the image. The parameters of the proposed algorithm are as follows: population size520, number of generations520, probability of crossover50.8 and probability of mutation50.2. For the purpose of comparison, popular Fuzzy C means (FCM) (Bezdek, 1981) clustering algorithm, a recent method of satellite image segmentation proposed in (Bandyopadhyay and Saha, 2007) (GAPS clustering with Symindex based method), meanshift based segmentation technique (Comaniciu and Meer, 2002) are also executed on reallife images. The results are com pared both qualitatively and quantitatively.
Data distribution of IRS image of Kolkata in the ﬁrst three
B. IRS Image of Kolkata. The data used here was acquired from Indian Remote Sensing Satellite (IRS1A) using the LISSII sensor
Figure 3.
gram equalization.
IRS image of Kolkata in the near infra red band with histo
Figure 4. 
Clustered IRS image of Kolkata using VGALSclustering 
technique. 
that has a resolution of 36.25 3 36.25 h. The image is contained in four spectral bands namely, blue band of wavelength 0.45–0.52 lm, green band of wavelength 0.52–0.59 lm, red band of wavelength 0.62 – 0.68 lm, and near infra red band of wavelength 0.77 – 0.86 lm. Thus, the feature vector of each image pixel is composed of four intensity values at different bands. The distribution of the pix els in the ﬁrst three feature space of this image is shown in Figure 2. It can be easily seen from the Figure 2 that the entire data can be partitioned into several different shaped clusters where symmetry does exist. Figure 3 shows the Kolkata image in the near infra red band. Some characteristic regions in the image are the river Hooghly cut ting across the middle of the image, several ﬁsheries observed
Figure 5.
Clustered IRS image of Kolkata using FCM clustering.
Vol. 21, 86–100 (2011)
91
Figure 6.
clustering technique.
Clustered IRS image of Kolkata using meanshift based
toward the lowerright portion, a township, SaltLake, to the upper left hand side of the ﬁsheries. This township is bounded on the top by a canal. Two parallel lines observed towards the upper right hand side of the image correspond to the airstrips in the Dumdum airport. Other than these, there are several water bodies, roads, etc. in the image. From our ground knowledge, we know that the image has four clusters (Maulik and Bandyopadhyay, 2003) and these four clusters correspond to the classes turbid water, pond water, concrete and open space. The VGALSclustering technique automatically provides four clusters for this image data (Fig. 4). It may be noted that the water class has been differentiated into turbid water (the Hooghly) and pond water (ﬁsheries, etc.) because of a difference in their spectral properties. The canal bounding SaltLake from the upper portion has also been correctly classiﬁed as pond water. Figure 5 shows the Kolkata image partitioned in four clusters using FCM algorithm (Bezdek, 1973). But the segmentation result is unsatisfactory from the human visualization judgement. As can be seen, the river Hooghly as well as the city region has been incorrectly classiﬁed as belonging to the same class. Therefore we can conclude that although some regions, viz., ﬁsheries, canal bounding SaltLake, parts of the airstrip, etc., have been correctly identiﬁed, a signiﬁcant amount of confusion is evident in the FCM clustering result. GAPS clustering technique with Sym index based method automatically provides K 5 4 number of clusters from this data set. The corre sponding partitioning is shown in Figure 7. The automatically seg mented image after application of the meanshift based segmenta tion technique (Comaniciu and Meer, 2002) is shown in Figure 6.
Figure 7. Clustered IRS image of Kolkata using GAPS clustering technique with Symindex based method.
To validate the segmentation result obtained by VGALScluster ing quantitatively, here two wellknown Euclidean distance based cluster validity indices, namely I index (Maulik and Bandyopad hyay, 2002) and XBindex (Xie and Beni, 1991) values are also computed. These are provided in Table I. Smaller values of XB index and larger values of I index correspond to good clustering. The values again show that the segmentation provided by VGALS clustering is much better than the existing other methods.
C. SPOT Image of Kolkata. The French satellites SPOT (Sys tems Probataire d’Observation de la Terre) (Richards, 1993), launched in 1986 and 1990, carry two imaging devices that consist of a linear array of charge coupled device (CCD) detectors. Two imaging modes are possible, the multispectral and panchromatic modes. The 512 3 512 SPOT image of a part of the city of Kolkata is available in three bands in the multispectral mode. These bands are:
Band 1 — green band of wavelength 0.50 – 0.59 lm Band 2 — red band of wavelength 0.61 – 0.68 lm Band 3 — near infra red band of wavelength 0.79 – 0.89 lm.
Thus, here feature vector of each image pixel composed of three intensity values at different bands. The distribution of the pixels in the feature space of this image is shown in Figure 9. It can be easily seen from the Figure 9 that the entire data can be partitioned into several hyperspherical clusters.
Table 1. I index and XBindex values of the segmented Mumbai and Kolkata satellite images provided by VGALSclustering, FCM clustering, and method proposed in (Bandyopadhyay and Saha, 2007)
Kolkata IRS 
Mumbai IRS 
SPOT Kolkata 

Index 
VGALS 
FCM 
GAPS with Sym 
VGALS 
FCM 
GAPS with Sym 
VGALS 
FCM 
GAPS with Sym 
_{I} _{i}_{n}_{d}_{e}_{x} 
24.24 
5.71 
18.27 
214.78 
23.06 
180.45 
26.58 
5.71 
20.67 
XB index 
1.75 
23.67 
2.23 
2.12 
4.67 
2.91 
2.28 
12.23 
3.05 
92 Vol. 21, 86–100 (2011)
Figure 8.
histogram equalization.
SPOT image of Kolkata in the near infra red band with
Some important landcovers of Kolkata are present in the image. Most of these can be identiﬁed, from a knowledge about the area, more easily in the near infrared band of the input image (Fig. 8). These are the following: The prominent black stretch across the ﬁg ure is the river Hooghly. Portions of a bridge (referred to as the sec ond bridge), which was under construction when the picture was taken, protrude into the Hooghly near its bend around the center of the image. There are two distinct black, elongated patches below the river, on the left side of the image. These are water bodies, the one to the left being Garden Reach lake and the one to the right being Khidirpore dockyard. Just to the right of these water bodies, there is a very thin line, starting from the right bank of the river, and going to the bottom edge of the picture. This is a canal called the Talis nala. Above the Talis nala, on the right side of the picture, there is a triangular patch, the race course. On the top, right hand side of the image, there is a thin line, stretching from the top edge,
Figure 10.
ing technique.
Clustered SPOT image of Kolkata using VGALS cluster
and ending on the middle, left edge. This is the Beleghata canal with a road by its side. There are several roads on the right side of the image, near the middle and top portions. These are not very obvious from the images. A bridge cuts the river near the top of the image. This is referred to as the ﬁrst bridge. The proposed VGALS clustering method automatically provides K 5 7 as the optimal number of clusters for this image data set (cor responding partitioning is shown in Figure 10). As identiﬁed in (Pal et al., 2001) the above satellite image has seven classes namely, tur bid water, concrete, pure water, vegetation, habitation, open space and roads (including bridges). The partitioning provided by the
Figure 9. 
Data distribution of SPOT image of Kolkata in the feature 
Figure 11. 
Clustered SPOT image of Kolkata using FCM clustering 
space. 
technique. 
Vol. 21, 86–100 (2011)
93
Figure 12. Clustered SPOT image of Kolkata using meanshift based clustering technique.
VGALS clustering technique separates almost all the regions well. The Talis nala has been identiﬁed properly by the proposed method (shown in Figure 10). The bridge is also correctly identiﬁed by the proposed algorithm. This again shows that the proposed VGALS is able to detect clusters of widely varying sizes. The segmentation result obtained by Fuzzy Cmeans algorithm on this image for K 5 7 (actual number of clusters present in this image data set) is shown in Figure 11. It can be seen from Figure 11 that FCM algorithm is not able to detect the bridge. The automatically segmented image obtained after application of meanshift based segmentation tech nique (Comaniciu and Meer, 2002) is shown in Figure 12. Here
Figure 13. Clustered SPOT image of Kolkata using GAPS cluster ing technique with Symindex based method.
94 Vol. 21, 86–100 (2011)
Table II. Confusion matrix of the partitioning obtained by VGALS cluster ing technique for numeric SPOT Kolkata image. Here following notations are used: TW: turbid water, C: concrete, PW: pure water, V: vegetation, H: habitation, OS: open space and R: roads
Ground Truth (Percent)
Class 
TW 
C 
PW 
V 
H 
OS 
R 
TW 
87 
0 
13 
0 
0 
0 
0 
C 
0 
83 
0 
0 
0 
0 
1 
PW 
13 
0 
82 
0 
0 
0 
5 
V 
0 
0 
0 
83 
15 
3 
0 
H 
0 
12 
0 
10 
80 
10 
1 
OS 
0 
0 
0 
7 
4 
81 
4 
R 
0 
5 
5 
0 
1 
6 
89 
also the bridge has not been detected. The method proposed in (Saha and Bandyopadhyay, 2008) (GAPS along with Symindex) provides K 5 6 clusters from this data set. The corresponding parti tioning is shown in Figure 13. This technique is also able to detect the bridge correctly. In order to validate the segmentation result obtained by VGALSclustering quantitatively, here two wellknown Euclidean distance based cluster validity indices, namely I index (Maulik and Bandyopadhyay, 2002) and XBindex (Xie and Beni, 1991) values are also computed. These are provided in Table I. Smaller values of XBindex and larger values of I index correspond to good cluster ing. The values again show that the segmentation provided by VGALSclustering is much better than the existing clustering techniques. To validate the results, 932 pixel positions were manually selected from seven different land cover types which were labelled accordingly. The confusion matrix of the partitioning obtained by VGALS clustering technique for this data set is shown in Table II. The class by class classiﬁcation accuracies are 87, 83, 82, 83, 80, 81, and 89, respectively. The overall accuracy is 87.
D. Results on MR Brain Images. The MR images of the brain chosen for the experiments are available in three bands: T _{1}  weighted, proton density (p _{d} )weighted and T _{2} weighted. The nor mal brain images are obtained from Brainweb database (BrainWeb). The images correspond to the 1 mm slice thickness, 3% noise (calculated relative to the brightest tissue) and with 20% intensity nonuniformity. The image of size 217 3 181 is available in 181 different z planes. The proposed clustering algorithm is exe cuted on seven of these z planes. The parameters of the VGALS
Table III. Minkowski Scores (MS) Obtained by FCM, EM and VGALS clustering algorithms on simulated MR volumes for normal brain projected on different z planes. Here # AC, # OC denotes, respectively, the actual number of clusters and the automatically obtained number of clusters (after application of VGALS).
MS for AC
MS for OC
z Plane No. 
#AC 
FCM 
EM 
#OC 
FCM 
EM 
VGALS 
1 
6 
1.077 
1.052 
11 
0.80 
1.17 
0.77 
2 
6 
0.76 
0.78 
9 
0.65 
0.83 
0.62 
3 
6 
0.57 
0.76 
8 
0.62 
0.64 
0.59 
36 
9 
0.90 
0.98 
9 
0.90 
0.98 
0.84 
72 
10 
0.75 
0.74 
10 
0.75 
0.74 
0.70 
108 
9 
0.79 
0.589 
10 
0.81 
0.68 
0.701 
144 
9 
0.82 
0.72 
11 
0.49 
0.60 
0.32 
Figure 14.
(a) Original T1weighted MR image of the normal brain in z1 plane. (b) Segmentation obtained by VGALS clustering technique.
algorithm are as follows: population size520, total number of gen erations515, probability of crossover (l _{c} )50.9, probability of mutation (l _{m} )50.02. Number of clusters, K, is varied from 2 to 20. For the normal MR brain image, the ground truth information is available to us. There are a total of 10 classes present in the images. But the number of classes varies along the different z planes. Ten classes are Background, CSF, Grey Matter, White Matter, Fat, Mus cle/Skin, Skin, Skull, Glial Matter, and Connective. Table III shows the actual number of clusters and the number of clusters automati cally determined by the proposed VGALS clustering technique (af ter application on the above mentioned brain images projected on different z planes). To measure the segmentation solution quantita tively, we have also calculated Minkowski Score (MS) (BenHur and Guyon, 2003). This is a measure of the quality of a solution given the true clustering. Let T be the ‘‘true’’ solution and S the so lution we wish to measure. Denote by n _{1}_{1} the number of pairs of elements that are in the same cluster in both S and T. Denote by n _{0}_{1} the number of pairs that are in the same cluster only in S and not in T, and by n _{1}_{0} the number of pairs that are in the same cluster in T and not in S. Minkowski Score (MS) is then deﬁned as:
MSðT; SÞ ¼
. For MS, the optimum score is 0, with lower
scores being ‘‘better.’’ The MS scores obtained by VGALS cluster ing corresponding to the 7 brain images are also reported in Table III. For the purpose of comparison, we have executed Fuzzy C means (FCM) (Bezdek, 1973) and Expectation Maximization (EM) (Jain et al., 1999) algorithms on the above mentioned brain datasets with two different values of K. In the ﬁrst case, K is kept equal to the actual number of clusters that present in that particular plane. Next, it is set equal to that automatically determined by VGALS algorithm. The MS scores obtained by both the comparing algorithms are also reported in Table III for all the seven images. Results show that the MS scores corresponding to the partitionings provided by the VGALS clustering, in general, is the minimum among all the parti tions. This implies the superior performance of VGALS to automati cally detect the proper partitioning from MR normal brain images. Figures 14a, 15a, 16a, 17a, and 18a show the original MR normal brain images in T1 band projected on z1, z36, z72, z108, z144 planes, respectively. Figures 14b, 15b, 16b, 17b, and 18b show, respectively, the corresponding automatically segmented images obtained after application of VGALS clustering algorithm.
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
q
n 01 þn 10 n 11 þn 10
Figure 15.
(a) Original T1weighted MR image of the normal brain in z36 plane. (b) Segmentation obtained by VGALS clustering technique.
Vol. 21, 86–100 (2011)
95
Figure 16. 
(a) Original T1weighted MR image of the normal brain in z72 plane. (b) Segmentation obtained by VGALS clustering technique. 


Figure 17. 
(a) Original T1weighted MR image of the normal brain in z108 plane. (b) Segmentation obtained by VGALS clustering technique. 


Figure 18. 
(a) Original T1weighted MR image of the normal brain in z144 plane. (b) Segmentation obtained by VGALS clustering technique. 
96 Vol. 21, 86–100 (2011)
Table IV. Minkowski Scores (MS) obtained by FCM, EM and VGALS clustering algorithms on simulated MR volumes for brain with multiple sclerosis lesions projected on different z planes. Here # AC, # OC denotes, respectively, the actual number of clusters and the automatically obtained number of clusters (after application of VGALS)
MS for AC
MS for OC
z Plane No. 
#AC 
FCM 
EM 
#OC 
FCM 
EM 
VGALS 
1 
6 
0.59 
0.59 
9 
0.80 
1.12 
0.77 
2 
6 
0.75 
0.76 
8 
0.66 
0.83 
0.66 
5 
6 
0.74 
0.76 
9 
0.69 
0.77 
0.75 
36 
9 
0.99 
1.01 
9 
0.99 
1.01 
0.88 
72 
11 
0.72 
0.63 
9 
0.77 
0.67 
0.75 
108 
10 
0.78 
0.58 
10 
0.80 
0.71 
0.71 
144 
9 
0.31 
0.76 
11 
0.89 
0.88 
0.88 
Next, the proposed algorithm is executed on some simulated MR volumes for brain with multiple sclerosis lesions obtained from (BrainWeb). These images are again available in 3 modalities T _{1}  weighted, proton density (p _{d} )weighted and T _{2} weighted. These images also correspond to 1 mm slice thickness, 3% noise (calcu lated relative to the brightest tissue) and with 20% intensity non uniformity. Now, the images contain a total of 11 classes. These are Background, CSF, Grey Matter, White Matter, Fat, Muscle/Skin, Skin, Skull, Glial Matter, Connective and MS Lesion. However, the number of classes varies along the z planes. The image is available in 181 different z planes. FuzzyVGAPS clustering algorithm is ex ecuted on the images projected on 7 different zplanes. The parame ters of the algorithm are same as the above. The ground truth infor mation is available to us. MS score (BenHur and Guyon, 2003) is calculated after application of VGALS clustering technique in order to measure the ‘‘goodness’’ of the solutions. Table IV shows the actual number of clusters present in the image, the obtained number of clusters and the ‘‘goodness’’ of the corresponding partitioning in terms of the MS Score after application of VGALS clustering algo rithm on these 7 different MS Lesion Brain images. For the purpose of comparison, Fuzzy Cmeans algorithm (Bezdek, 1973) and EM algorithm (Jain et al., 1999) are again executed on these images, ﬁrstly with the number of clusters automatically determined by the VGALS and then with the actual number of clusters present in the images. The MS scores of the corresponding partitionings are also
provided in Table IV. The results again show that the MS scores corresponding to the partitionings provided by the VGALS cluster ing is, in general, the minimum. This again reveals the effectiveness of the VGALS clustering in automatically segmenting the MR brain images with multiple sclerosis lesions. Figures 19a, 20a, 21a, 22a, and 23a show the original MS Lesion Brain image in T1 band pro jected on z2, z36, z72, z108 and z144 planes, respectively. Figures 19b, 20b, 21b, 22b, and 23b show, respectively, the corresponding automatically segmented images obtained after application of VGALS clustering algorithm. The calculated MS scores show that the proposed VGALS clustering performs well in segmenting both normal brain image and brain image with multiple sclerosis lesions. Further, experiments are carried out in order to establish the effectiveness of the line symmetry based distance in VGALS clus tering technique. VGALS clustering technique is now executed on the ﬁrst 10 z planes of the MR brain images with multiple sclerosis lesions. The corresponding MS scores are now provided in Table V. The results are then compared with those obtained by FuzzyVGA (Maulik and Bandyopadhyay, 2003) (fuzzy variable string length genetic algorithm based clustering technique), optimizing popular Euclidean distance based XieBeni (XB) index (Xie and Beni, 1991), where Euclidean distance is used to compute the member ship values of points to different clusters. The number of clusters and the MS scores obtained by FuzzyVGA after applying it on the ﬁrst 10 z planes of the MR brain images with multiple sclerosis
Figure 19.
clustering technique.
(a) Original T1weighted MRI image of the brain with multiple sclerosis lesions in z2 plane. (b) Segmentation obtained by VGALS
Vol. 21, 86–100 (2011)
97
Figure 20.
clustering technique.
(a) Original T1weighted MRI image of the brain with multiple sclerosis lesions in z36 plane. (b) Segmentation obtained by VGALS
Figure 21.
clustering technique.
(a) Original T1weighted MRI image of the brain with multiple sclerosis lesions in z72 plane. (b) Segmentation obtained by VGALS
Figure 22.
clustering technique.
(a) Original T1weighted MRI image of the brain with multiple sclerosis lesions in z108 plane. (b) Segmentation obtained by VGALS
98 Vol. 21, 86–100 (2011)
Figure 23.
clustering technique.
(a) Original T1weighted MRI image of the brain with multiple sclerosis lesions in z144 plane. (b) Segmentation obtained by VGALS
lesions are also provided in Table V. Results show that the proposed VGALS clustering technique is more effective than the Fuzzy VGA. This establish the fact that the line symmetry based distance is more effective in segmenting the MR brain images than the exist ing Euclidean distance.
V. DISCUSSION AND CONCLUSION
In this article, a new variable string length genetic algorithm based clustering technique is developed which utilizes a recently developed line symmetry based distance for assignment of points to different clusters. A new cluster validity index based on the line symmetry based distance is also developed here and thereafter it is utilized for computing the ﬁtness of the proposed genetic clustering technique. The proposed clustering technique can automatically determine the appropriate partitioning and the appropriate number of partitions from a given data set having line symmetrical clusters. The effective ness of the proposed technique is shown in detecting the proper parti tioning from two remote sensing satellite images of the parts of the cities of Kolkata. Results are compared with those obtained by the Fuzzy Cmeans clustering technique, Meanshift based method and GAPSclustering with Symindex based method (Bandyopadhyay and Saha, 2007). Thereafter, the effectiveness of the proposed algo
Table V. The automatically obtained cluster (OC) number and the corresponding Minkowski Scores (MS) after application of VGA And VGALS clustering algorithms on simulated MR volumes for brain with multiple sclerosis lesions projected on ﬁrst 10 z planes
VGA
VGALS
z Plane No. 
AC 
OC 
MS 
OC 
MS 
1 
6 
2 
1.21 
11 
0.77 
2 
6 
2 
1.20 
8 
0.77 
3 
6 
2 
1.19 
11 
0.78 
4 
6 
5 
0.69 
9 
0.77 
5 
6 
2 
1.184 
9 
0.75 
6 
6 
2 
1.18 
8 
0.71 
7 
6 
2 
1.17 
9 
0.78 
8 
6 
2 
1.16 
10 
0.73 
9 
6 
2 
1.16 
11 
0.84 
10 
9 
2 
1.17 
11 
0.83 
rithm is shown in segmenting several MR brain images. The segmen tation results are then compared with the available ground truth infor mation. For the purpose of comparison, the wellknown Fuzzy C means and EM algorithms are also executed on these images. Experi mental results show that VGALS clustering is not only able to auto matically segment the MR brain images into different tissue classes but the corresponding segmentation results are also the best. Note that present work does not use any spatial information while segmenting the images. But incorporation of spatial informa tion surely improves the quality of the results. Thus some new methods of incorporating the spatial information have to be invented in the future. Future work also includes the development of some fuzzy genetic clustering technique based on the line sym metry based distance. Developing some new form of symmetry, like plane symmetry etc. is also another important future research work. Authors are currently working in this direction.
REFERENCES
Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarz kopf. Computational Geometry: Algorithms and Applications. Springer Verlag, Berlin, Germany, 2nd edition, 2000.
S. Bandyopadhyay and U. Maulik, Genetic clustering for automatic evolution
of clusters and application to image classiﬁcation, Pattern Recogn (2002),
1197–1208.
S. Bandyopadhyay and S. Saha, GAPS: A clustering method using a new
point symmetry based distance measure, Pattern Recogn 40 (2007), 3451.
S. Bandyopadhyay and S. Saha, A point symmetry based clustering technique
for automatic evolution of clusters, IEEE Trans Knowl Data Eng 20 (2008),
1–17.
S. Bandyopadhyay, U. Maulik, and A. Mukhopadhyay, Multiobjective genetic clustering for pixel classiﬁcation in remote sensing imagery, IEEE Trans Geosci Remote Sens 45 (2007), 1506–1511.
A. BenHur and I. Guyon, Detecting Stable Clusters using Principal Compo
nent Analysis in Methods in Molecular Biology, M. Brownstein and A. Kohodursky, (Editors), Humana Press, 2003.
J.L. Bentley, B.W. Weide, and A.C. Yao, Optimal expectedtime algorithms for closest point problems, ACM Trans Math Software 6 (1980), 563–580.
J.C. Bezdek, Fuzzy mathematics in pattern classiﬁcation, Ph.D. dissertation, Cornell University, Ithaca, NY, 1973.
Vol. 21, 86–100 (2011)
99
J.C. Bezdek, Pattern recognition with fuzzy objective function algorithms, Plenum, New York, 1981.
S.M. Bhandarkar and H. Zhang, Image segmentation using evolutionary computation, IEEE Trans Evol Comp 3 (1999), 1–21.
G. Bilgin, S. Erturk, and T. Yildirim, Unsupervised classiﬁcation of hyper
spectralimage data using fuzzy approaches that spatially exploit member ship relations, IEEE Geosci Remote Sens Lett 5 (2008), 673–677.
BrainWeb: Simulated brain database. Available at: http://www.bic.mni. mcgill.ca/brainweb; http://www.bic.mni.mcgill.ca/brainweb/.
C.A. Cocosco, V. Kollokian, R.K.S. Kwan, A.C. Evans: ‘‘BrainWeb:
Online Interface to a 3D MRI Simulated Brain Database’’ NeuroImage, vol. 5, no. 4, part 2/4, S425, 1997 – Proceedings of 3rd International Confer ence on Functional Mapping of the Human Brain, Copenhagen, May, 1997.
R.L. Cannon, R. Dave, J.C. Bezdek, and M. Trivedi, Segmentation of a the matic mapper image using fuzzy cmeans clustering algorithm, IEEE Trans Geosci Remote Sens 24 (1986), 400–408.
V.V. Chamundeeswari, D. Singh, and K. Singh, Unsupervised land cover classiﬁcation of SAR images by contour tracing, Proc. of IEEE International Geoscience and Remote Sensing Symposium (IGARSS, 2007), Barcelona, Spain, July, 23–28, 2007, pp. 547–550.
C.C.T. Chen and D.A. Landgrebe, A spectral design system for the HIRIS/ MODIS era, IEEE Trans Geosci Remote Sens 27 (1989), 681–686.
D. Comaniciu and P. Meer, Mean shift: A robust approach toward feature
space analysis, IEEE Trans Pattern Anal Machine Intell 24 (2002), 603–619.
M.A. Friedl, D.K. McIver, J.C.F. Hodges, X.Y. Zhang, D. Muchoney, A.H. Strahler, C.E. Woodcock, S. Gopal, A. Schneider, A. Cooper, A. Baccini, F. Gao, C. Schaaf, Global land cover mapping from MODIS: Algorithms and early results, Remote Sens Environ 83 (2002), 287–302.
J.H. Friedman, J.L. Bently, and R.A. Finkel, An algorithm for ﬁnding best matches in logarithmic expected time, ACM Trans Math Software 3 (1977),
209–226.
V. Gandhi, J.M. Kang, S. Shekhar, J. Ju, E.D. Kolaczyk, and S. Gopal, Con
text inclusive function evaluation: a case study with EMbased multiscale multigranular image classiﬁcation. Knowl. Inf. Syst. 21, 2 (Oct. 2009), 231–247. DOI5http://dx.doi.org/10.1007/s1011500902080
H. Ghassemian and P.A. Landgrebe, Object oriented feature extraction
method for image data compaction, IEEE Control Syst Mag 8 (1988), 42–48.
D. Guo, H. Xiong, V. Atluri, and N.R. Adam, Object discovery in high
resolution remote sensing images: a semantic perspective, Knowl Inf Syst 19 (2009), 211–233. DOI=http://dx.doi.org/10.1007/s1011500801604.
E.E. Hilbert, Cluster compression algorithma joint clustering data compres sion concept, JPL Publication, NASA, Technical Report, 1977
M.A. Jaffar, A. Hussain, and A.M. Mirza, Fuzzy entropy based optimization
of clusters for the segmentation of lungs in CT scanned images. Knowl. Inf. Syst. 24, 1 (Jul. 2010), 91111. DOI=http://dx.doi.org/10.1007/s10115009
0225z.
A.K. Jain, M.N. Murthy, and P.J. Flynn, Data clustering: A review, ACM Comput Rev 31, 3 (Sep. 1999), 264–323. DOI=http://doi.acm.org/10.1145/
331499.331504.
I. Jolliffe, Principal component analysis, Springer Series in Statistics, Eng land, 1986.
R. Kauth, A. Pentland, and G. Thomas, BLOB: An unsupervised clustering
approach to spatial preprocessing of MSS imagery, Proceedings of 11th Int.
Symp. Remote Sensing of the Environment, Ann Arbor, Mich, 1977, 1309–
1317.
A. Marcal and L. Castro, Hierarchical clustering of multispectral images
using combined spectral and spatial criteria, IEEE Geosci Remote Sens Lett 2 (2005), 59–63.
100 Vol. 21, 86–100 (2011)
U. Maulik and S. Bandyopadhyay, Performance evaluation of some cluster
ing algorithms and validity indices, IEEE Trans Pattern Anal Machine Intell 24 (2002), 1650–1654.
U. Maulik and S. Bandyopadhyay, Fuzzy partitioning using a realcoded
variablelength genetic algorithm for pixel classiﬁcation, IEEE Trans Geosci
Remote Sens 41 (2003), 1075–1081.
D.M. Mount and S. Arya, ANN: A library for approximate nearest neighbor searching, 2005, Available at:http://www.cs.umd.edu/ mount/ANN.
S.K. Pal, S. Bandyopadhyay, and C.A. Murthy, Genetic classiﬁers for remotely sensed images: Comparison with standard methods, Int J Remote Sens 22 (2001), 2545–2569.
M.L. Pugh and A.M. Waxman, Classiﬁcation of spectrallysimilar land cover using multispectral neural image fusion and the fuzzy ARTMAP neu ral classiﬁer, Proc. of IEEE International Geoscience and Remote Sensing Symposium (IGARSS, 2006), Denver, Colorado, July 31, 2006–Aug. 4, 2006, pp. 1808–1811.
J.A. Richards, Remote sensing digital image analysis: An introduction, SpringerVerlag, New York, 1993.
S. Saha and S. Bandyopadhyay, MRI brain image segmentation by fuzzy
symmetry based genetic clustering technique, Proceedings of the 2007 IEEE Congress on Evolutionary Computation (CEC’07), Singapore, 2007a, pp.
4417–4424.
S. Saha and S. Bandyopadhyay, A genetic clustering technique using a new
line symmetry based distance measure, Proceedings of Fifth International Conference on Advanced Computing and Communications (ADCOM’07),
IEEE Computer Society, Guwahati, India, 2007b, pp. 365–370.
S. Saha and S. Bandyopadhyay, Application of a new symmetry based clus
ter validity index for satellite image segmentation, IEEE Geosci Remote Sens Lett 5 (2008), 166–170.
S. Saha and S. Bandyopadhyay, MR brain image segmentation using a
multiseed based automatic clustering technique, Fundam Inform 97 (2009),
199–214.
S. Saha and S. Bandyopadhyay, A new multiobjective clustering technique
based on the concepts of stability and symmetry, Knowl Inf Syst 23 (2010),
1–27.
S. Saha and S. Bandyopadhyay, Automatic MR brain image segmentation
using a multiseed based multiobjective clustering approach, Applied Intelli
gence. DOI: 10.1007/s1048901002316.
S. Saha and S. Bandyopadhyay, On principle axis based line symmetry clus
tering techniques, Memetic Computing. DOI: 10.1007/s1229301000490.
K. Sayood, Data compression in remote sensing applications, IEEE Geosci
Remote Sens Newslett 84 (1992), 7–15.
J. Suckling, T. Sigmundsson, K. Greenwood, and E. Bullmore, A modiﬁed fuzzy clustering algorithm for operator independent brain tissue classiﬁca tion of dual echo MR images, Magn Reson Imaging 17 (1999), 1065–1076.
Y. Tarabalka, J.A. Benediktsson, and J. Chanussot, Spectralspatial classiﬁ
cation of hyperspectral imagery based on partitional clustering techniques, IEEE Trans Geosci Remote Sensi 47 (2009), 2973–2987.
M. Tyagi, F. Bovolo, A. Mehra, S. Chaudhuri, and L. Bruzzone, A context
sensitive clustering technique based on graphcut initialization and expecta tionmaximization algorithm, IEEE Geosci Remote Sens Lett 5 (2008), 21–
25. 

C. 
Wemmert, A. Puissant, G. Forestier, and P. Gancarski, Multiresolution 
remote sensing image clustering, IEEE Geosci Remote Sens Lett 6 (2009),
533–537.
X.L. Xie and G. Beni, A validity measure for fuzzy clustering, IEEE Trans Pattern Anal Machine Intell 13 (1991), 841–847.
Copyright of International Journal of Imaging Systems & Technology is the property of John Wiley & Sons, Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.
Mult mai mult decât documente.
Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.
Anulați oricând.