Sunteți pe pagina 1din 16

A New Line Symmetry Distance Based Automatic Clustering Technique: Application to Image Segmentation

Sriparna Saha, 1 Ujjwal Maulik 2

1 Image Processing and Modeling, Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, Heidelberg, Germany

2 Department of Theoretical Bioinformatics, DKFZ (Deutsches Krebsforschungszentrum, German Cancer Research Center), Heidelberg, Germany

Received 3 March 2009; accepted 16 May 2010

ABSTRACT: In this article, at first an automatic clustering technique using the concept of line symmetry property is developed. The pro- posed real-coded variable string length genetic clustering technique (VGALS clustering) is able to evolve the number of clusters present in the data set automatically. Here assignment of points to different clusters is done based on the line symmetry based distance rather than the Euclidean distance. The cluster centers are encoded in the chromosomes, whose value may vary. A newly developed line sym-

metry based cluster validity index, LineSym-index, is used as a mea- sure of ‘‘goodness’’ of the corresponding partitioning. This validity index is able to correctly indicate the presence of clusters of different sizes as long as they are line symmetrical. A Kd-tree based data structure is used to reduce the complexity of computing the line sym- metry distance. The proposed technique is then applied to automati- cally segment different images. At first, the superiority of the pro- posed method to automatically segment the image data sets over Fuzzy C-means clustering technique, well-known mean-shift based method and GAPS clustering with Sym-index based method, are demonstrated for three remote sensing satellite images. Thereafter it is applied on several simulated T1-weighted, T2-weighted, and pro- ton density normal and MS lesion magnetic resonance brain images. The proposed method is able to detect most of the regions well. Su- periority of the proposed method over Fuzzy C-means and Expecta- tion Maximization clustering algorithms are demonstrated quantita- tively. The automatic segmentation obtained by VGALS clustering technique is also compared with the available ground truth informa-

tion. VV C

2011 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 21, 86–100,

2011; Published online in Wiley Online Library (wileyonlinelibrary.com). DOI

10.1002/ima.20243

Key words: unsupervised classification; cluster validity index; symmetry; line symmetry based distance; Principal component analysis; Kd tree; magnetic resonance image

Correspondence to: Sriparna Saha; e-mail: sriparna.saha@gmail.com

' 2011 Wiley Periodicals, Inc.

I. INTRODUCTION Remote sensing satellite images have significant applications in dif- ferent areas like climate studies, assessment of forest resources, examining marine environments, etc. For remote sensing applica- tions, classification is an important task where the pixels in the images are classified into homogeneous regions, each of which cor- responds to some particular landcover type. The problem of pixel classification is often posed as clustering (Saha and Bandyopad- hyay, 2010) in the intensity space (Maulik and Bandyopadhyay, 2003). In the unsupervised pixel classification framework, various clustering algorithms like Fuzzy C-means (Cannon et al., 1986), and statistical methods have been used for the purpose of satellite image segmentation. Recently application of genetic algorithms (GAs) in the field of pixel classification has obtained significant attention of the researchers (Maulik and Bandyopadhyay, 2003; Saha and Bandyopadhyay, 2008). A context-sensitive clustering technique for unsupervised image segmentation based on graph-cut initialization and expectation-maximization algorithm is developed by Tyagi et al., (2008). An unsupervised hyperspectral image classi- fication technique based on fuzzy-clustering algorithms that spa- tially exploit membership relations is proposed by Bilgin et al., (2008). A multiresolution remote sensing image clustering tech- nique is proposed by Wemmert et al., (2009) which uses informa- tion contained in both spatial resolutions. An agglomerative hier- archical clustering method for large multispectral images, which uses both spectral and spatial information for the aggregation deci- sion, is proposed by Marcal and Castro, (2005). A new spectral-spa- tial classification scheme for hyperspectral images which combines the results of a pixel wise support vector machine classification and the segmentation map obtained by partitional clustering using ma- jority voting is proposed by Tarabalka et al., (2009). Recently, a multiobjective fuzzy genetic clustering technique for pixel classifi- cation has been proposed by Bandyopadhyay et al., (2007). By Cha- mundeeswari et al., (2007), an unsupervised classification algorithm

using Maximum a posteriori (MAP) segmentation for SAR images is presented. Some more clustering techniques for satellite image segmentation can be found elsewhere (Hilbert, 1977; Kauth et al., 1977; Ghassemian and Landgrebe, 1988; Chen and Landgrebe, 1989; Sayood, 1992; Friedl et al., 2002; Pugh and Waxman, 2006; Chamundeeswari et al., 2007; Guo et al., 2009; Gandhi et al., accepted; Jaffar et al., accepted). Fully automatic brain tissue classification from magnetic reso- nance images (MRI) is also an important research issue. The accu- rate segmentation of MR images into different tissue classes, like gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF), is an important task. Additionally, regional volume calcula- tions sometimes also bring even more useful diagnostic information in diseases like Alzheimer disease, in movement disorders such as Parkinson or Parkinson related syndrome, in white matter metabolic or inflammatory disease, in congenital brain malformations or peri- natal brain damage, or in post traumatic syndrome. The automatic segmentation of brain MR images, however, remains a difficult problem. Clustering approaches have been widely used for segmen- tation of MR brain images. The use of neural networks, evolution- ary computation and/or fuzzy clustering techniques for MR image segmentation has been investigated in (Suckling et al., 1999; Bhan- darkar and Zhang, 1999; Saha and Bandyopadhyay, 2007, 2009, accepted). To cluster a data set, some similarity or dissimilarity criteria has to be defined. A new type of nonmetric distance, based on point symmetry, (d ps ), is proposed in (Bandyopadhyay and Saha, 2007). For reducing the complexity of point symmetry distance computa- tion, Kd-tree based data structure is used. From the geometrical symmetry viewpoint, point symmetry and line symmetry are two widely discussed issues. Inspired by this, a line symmetry based dis- tance was proposed in (Saha and Bandyopadhyay, 2007). But the proposed distance had several drawbacks. The major shortcoming of the old line symmetry based distance was that its application is limited to two-dimensional data sets only. A new line symmetry based technique using principal component analysis is developed in (Saha and Bandyopadhyay, accepted) that removes the limitations of (Saha and Bandyopadhyay, 2007). This new line symmetry based distance is more general and is applicable for any dimensional data sets. In (Saha and Bandyopadhyay, accepted), a genetic clustering technique is also developed based on this line symmetry distance. But this technique a priori assumes the number of clusters present in a data set. In this article, we have developed a line symmetry based automatic clustering technique. The motivation of this article is to develop a line symmetry-based automatic genetic clustering technique for image segmentation. In this article, a variable string length genetic line symmetry (VGALS-clustering) based clustering technique is proposed which is then used to automatically segment the remote sensing satellite images. Centers of the clusters are encoded in the chromosome whose values vary over a certain range. Here assignment of points to different clusters is done based on a newly proposed line symme- try based distance rather than the Euclidean distance. A new cluster validity index based on the newly developed line-symmetry based distance, LineSym-index, is proposed here and thereafter it is uti- lized for computing the fitness of the chromosomes. The effective- ness of the proposed line symmetry-based automatic clustering technique (VGALS-clustering) is shown for automatically partition- ing Indian remote sensing (IRS) satellite images of the parts of the city of Kolkata and SPOT image of the part of the city of Kolkata.

Comparisons are made with those obtained by Fuzzy C-means clus- tering technique, popular Mean-shift based segmentation technique (Comaniciu and Meer, 2002) and GAPS clustering with Sym-index based method (Bandyopadhyay and Saha, 2007). The segmentation results are compared qualitatively and quantitatively. The effectiveness of the proposed algorithm is thereafter shown in segmenting the MR images of the normal brain and MR brain images with multiple sclerosis lesions. The segmentation results are then compared with the available ground truth informa- tion. For the purpose of comparison, the well-known Fuzzy C- means algorithm (Bezdek, 1973) and the Expectation Maximiza- tion (EM) (Jain et al., 1999) clustering algorithm are also exe- cuted, firstly with the number of clusters automatically deter- mined by the VGALS clustering and then with the actual number of clusters present in the images. The segmentation results are compared with that provided by VGALS clustering algorithm quantitatively. In a part of the experiment, fuzzy variable string length genetic algorithm (Fuzzy-VGA) (Maulik and Bandyopadhyay, 2003) which uses the Euclidean distance for computing the membership values of points to different clusters is also executed on the MR brain images to automatically segment it. The results are also compared with those obtained by VGALS clustering technique. compared qualitatively and quantitatively.

II. SYMMETRY BASED DISTANCES

In this section, at first the existing point symmetry based distance is described in brief. Then a new definition of the newly developed line symmetry based distance is proposed.

A. Existing Point Symmetry Based Distance. In this section,

a new PS distance (Bandyopadhyay and Saha, 2007), d ps (x,c), asso-

ciated with point x with respect to a center c is described. Let a

point be x. The symmetrical (reflected) point of x with respect to a

particular centre c is 2 3 c 2 x. Let us denote this by x*. Let knear

unique nearest neighbors of x* be at Euclidean distances of d i s, i 5

1,2,

knear.

Then

d ps ðx ; c Þ ¼ d sym ðx ; c Þ3d e ðx ; c

Þ;

ð1Þ

¼ P knear

i¼1

d i

knear

3d e ðx ; c

Þ;

ð2Þ

where d e (x,c) is the Euclidean distance between the point x and c.

The complexity of computing d ps (x,c) is of order n, where n is the total number of data points. For all the n points and K clusters, the complexity becomes of order n 2 K. Thus, to reduce this, the Kd- tree based nearest neighbor search ANNlib (Mount and Arya, 2005) (Approximate Nearest Neighbor), which is a library written in C11 (obtained from http://www.cs.umd.edu/ mount/ANN) is used in (Bandyopadhyay and Saha, 2007). ANNlib is used to find d i , i 5 1 to knear, in Eq. (2) efficiently.

B. Newly Developed Line Symmetry Based Distance (Saha and Bandyopadhyay, accepted). Given a particular data set, we first find the first principal axis of this data set using Principal Component Analysis (Jolliffe, 1986). Let the eigen vec- tor of the co-variance matrix of the data set with highest eigen

eg d ], where d is the dimension of the

value be [eg 1 eg 2 eg 3 eg 4

Vol. 21, 86–100 (2011)

87

original data. Then the first principal axis of the data set is given by:

ðx 1 c 1 Þ ¼ ðx 2 c 2 Þ

eg 1

eg 2

¼

ðx d c d Þ

¼

eg d

where the center of the data set is c 5 {c 1 , c 2 ,

The obtained principal axis is treated as the symmetrical line of the relevant cluster, i.e., if the data set is indeed symmetrical then it should also be symmetric with respect to the first principal axis of the dataset identified by the principal component analysis (PCA). This symmetrical line is used to measure the amount of line symme- try of a particular point in that cluster. To measure the amount of

, c d }.

line symmetry of a point (x) with respect to a particular line i, d ls (x, i), the following steps are followed.

1. For a particular data point x, calculate the projected point p i on the relevant symmetrical line i.

2.

Find d sym (x,p i ) as:

:d sym ðx ; p i Þ ¼ P knear knear

i¼1

d i

ð3Þ

where knear unique nearest neighbors of x* 5 2 3 p i 2 x are at

Euclidean distances of d i s, i 5 1,2,

(Mount and Arya, 2005) utilizing Kd-tree based nearest neigh- bor search is used to reduce the complexity of computing these d i s (as described in Section II-C). Then the amount of line sym-

ANN library

knear.

metry of a particular point x with respect to the symmetrical line of cluster i, is calculated as:

:d ls ðx ; iÞ ¼ d sym ðx ; p i Þ3d e ðx ; c

Þ

ð4Þ

where c is the centroid of the particular cluster i and d e (x,c) is

the Euclidean distance between the point x and c.

It can be seen from Eq. (3) that knear cannot be chosen equal to 1,

since if x* exists in the data set then d sym (x, p i ) 5 0 and hence there

will be no impact of the Euclidean distance in the definition of d ls (x, i). On the contrary, large values of knear may not be suitable because it may underestimate the amount of symmetry of a point with respect to the first principal axis. Here knear is chosen equal to 2. It may be noted that the proper value of knear largely depends on the distribu- tion of the data set. A fixed value of knear may have many draw- backs. For instance, for very large clusters (with too many points), 2 neighbors may not be enough as it is very likely that a few neighbors would have a distance close to zero. On the other hand, clusters with too few points are more likely to be scattered, and the distance of the two neighbors may be too large. Thus, a proper choice of knear is an important issue that needs to be addressed in the future. Note that every point symmetric cluster is also line symmetric with respect to its central axis. Thus, the proposed distance is able to detect line symmetric as well as point symmetric clusters. The previ- ous point symmetry based distance was only able to measure the amount of point symmetry of a point with respect to a cluster center. But the proposed line symmetry based distance measures total line symmetry with respect to the symmetrical line of a cluster. The above line symmetry based distance is realized more clearly

from Figure 1. Here x is a particular data set. c is the center of a par- ticular cluster. The first principal axis of the given cluster is denoted

88 Vol. 21, 86–100 (2011)

of the given cluster is denoted 88 Vol. 21, 86–100 (2011) Figure 1. Example of line

Figure 1. Example of line symmetry based distance [color figure

can

wileyonlinelibrary.com.].

the online issue, which is available at

be

viewed

in

by Line l. This line is treated as the symmetrical line of the particu-

lar cluster. The projected point of x on this symmetrical line is

denoted by p. Let the reflected point of x with respect to p be

denoted by x 0 . The first two nearest neighbors (here knear is chosen

equal to 2) of x 0 are at Euclidean distances of d 1 and d 2 , respec-

tively. Then the total amount of symmetry of x with respect to the

projected

d sym ðx ; p Þ ¼ d 1 þd 2 . Therefore, the total line symmetry based distance

follows:

point

p

2
2

on

line

Line

l

is

calculated

as

of the point x with respect to the symmetrical line of cluster l is cal-

culated as d ls (x,l) 5 d sym (x, p) 3 d e (x, c) where d e (x, c) is the Eu-

clidean distance between the point x and the cluster center c. It is evident that the symmetrical distance computation is very time consuming because it involves the computation of the nearest

neighbors. Computation of d ls (x,i) is of complexity O(N). Hence for N points and K clusters, the complexity of computing the line sym-

metry based distance between all points to different clusters is

O(N 2 K). To reduce the computational complexity, an approximate nearest neighbor search using the Kd-tree approach is adopted in this article.

C. Kd-tree Based Nearest Neighbor Computation. A K-

dimensional tree, or Kd-tree is a space-partitioning data structure for organizing points in a K-dimensional space. A Kd-tree uses only those splitting planes those are perpendicular to one of the coordinate axes. Approximate Nearest Neighbor (ANN) is a library written in C11 (Mount and Arya, 2005), which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions. In this article

ANN library utilizing Kd-tree for nearest neighbor search is used to

knear, in Eq. (3) efficiently. Thus, it

find d i s, where i 5 1,

, requires the construction of a Kd-tree consisting of N points in the data set, where N is the size of the data set. The construction of Kd- tree requires O(NlogN) time and O(N) space (Anderberg, 2000). (Friedman et al., 1977) reported O(logN) expected time for finding the nearest neighbor using Kd-tree.

III. VGALS-CLUSTERING: VARIABLE STRING LENGTH

GENETIC LINE SYMMETRY DISTANCE BASED CLUSTERING TECHNIQUE In this section, the use of variable string length genetic algorithm using the newly developed line symmetry based distance (VGALS- clustering) is proposed for automatically evolving the number of clusters present in a data set. Here we have considered the best

partition to be the one that corresponds to the maximum value of the proposed LineSym-index which is defined later. Both the num- ber of clusters as well as the appropriate partitioning of the data are evolved simultaneously using the search capability of genetic algo- rithms. Since the number of clusters is considered to be variable, the string lengths of different chromosomes in the same population are allowed to vary. As a consequence, the crossover and mutation operators are suitably modified to tackle the concept of variable length chromosomes. The technique is described below in detail.

A. String Representation and Population Initialization. In

VGALS-clustering, the chromosomes are made up of real numbers which represent the coordinates of the centers of the partitions. If chromosome i encodes the centers of M i clusters in N dimensional space then its length l i is taken to be N * M i . Each center is consid- ered to be indivisible. Each string i in the population initially enco- des the centers of a number, M i , of clusters, such that M i 5 (rand()mod M*) 1 2. Here, rand() is a function returning an integer, and M* is a soft estimate of the upper bound of the number of clus- ters. The number of clusters will therefore range from two to M* 1 1. The M i centers encoded in a chromosome are randomly selected distinct points from the data set. Thereafter five iterations of the K- means algorithm is executed with the set of centers encoded in each chromosome. The resultant centers are used to replace the centers in the corresponding chromosomes. This makes the centers sepa- rated initially. Example. Let M* 5 10. Let the random number M i be equal to 4 for chromosome i. Then this chromosome will encode the centers of 4 clusters. Let the four cluster centers (four randomly chosen points from the data set) be (10.0, 5.0) (20.4, 13.2) (15.8, 2.9) (22.7, 17.7). Thus the chromosome may look like (20.4, 13.2) (15.8, 2.9) (10.0, 5.0) (22.7, 17.7).

B. Fitness Computation. This is composed of two steps. First,

assignment of n points to different clusters are done by using the

newly developed line symmetry based distance, d ls . Next, a new cluster validity index based on the newly developed line symmetry distance, LineSym-index, is computed and used as a measure of the fitness of the chromosome.

(1) Computing the membership values. Let a particular chromo- some encode centers of K number of clusters. At first, the first princi- pal axis of each cluster is determined using principal component

analysis (Jolliffe, 1986). Then the assignment of each point x j , j 5

1,2,

the cluster center nearest to x j in the line symmetrical sense. That is,

N, to K different clusters are done in the following way. Find

we find the cluster center k that is nearest to the input pattern x j using

the minimum-value criterion: k 5 argmin i51,

d ls (x j,i ) where the

line symmetry based distance d ls (x j,i ) is computed by Eq. (4). If the

K

corresponding d sym (x j , k) [as defined in Eq. (3)] is smaller than a pre-

specified parameter y, then assign that particular point x j to kth clus- ter. Otherwise assignment is done based on the minimum Euclidean distance criterion as normally used in (Bandyopadhyay and Maulik,

2002) or the K-means algorithm, i.e., assign x j to kth cluster where k

5 argmin i51 ,

cluster and d e (x j , c i ) denotes the Euclidean distance between the point

K d e (x j , c i ). Here, c i denotes the center of the ith

x j and the cluster center c i . The reason for doing such an assignment is as follows: in the intermediate stages of the algorithm, when the centers are not yet properly evolved, then the minimum d ls value for a point is expected to be quite large, since the point might not be symmetric with respect to any cluster. In such cases, using

Euclidean distance for cluster assignment appears to be intuitively more appropriate. The value of y is kept equal to the maximum nearest neighbor distance among all the points in the data set as described in (Ban- dyopadhyay and Saha, 2007, 2008). It is to be noted that if a point is indeed symmetric with respect to the principal axis of some clus- ter center then the symmetrical distance computed in the above way will be small, and can be bounded as follows. Let d NN max be the maximum nearest neighbor distance in the data set. That is

d

max

NN

¼ max i¼1;

N

d NN ðx i Þ;

ð5Þ

where d NN (x i ) is the nearest neighbor distance of x i . Assuming that

x* lies within the data space, it may be noted that

d 1 d

max

NN

2

and

d 2 3d

max

NN

2

;

ð6Þ

resulted in, d 1 þd 2 d

: Ideally, a point x is exactly symmetrical

with respect to the principal axis of some cluster if d 1 5 0. However

considering the uncertainty of the location of a point as the sphere

2

max

NN

of radius d NN max around x, we have kept the threshold y equals to d NN max . Thus the computation of y is automatic and does not require user intervention. After the assignments are done, the cluster centers encoded in the chromosome are replaced by the mean points of the respective clusters. This is referred to as the K-means like update center operation.

(2) Fitness calculation: The fitness of a chromosome is com- puted using a newly defined cluster validity index, LineSym-index Note that this index is inspired by the point symmetry based cluster validity index Sym-index (Saha and Bandyopadhyay, 2008; Ban-

dyopadhyay and Saha, 2008). Let K cluster centres be denoted by c i where 1 i K and n i denotes the number of points present in the i th cluster. Then LineSym-index is defined as follows:

ð7Þ

K 3 E 1 K

1

3D K ;

LineSymðKÞ ¼

where K is the number of clusters present in that chromosome. Here,

i c j :

E K ¼ P

i¼1

D K is the maximum Euclidean distance between any two cluster

K

E i such that E i ¼ P

i

n

j¼1 d ls ðx i

j ; iÞ and D K ¼ max K

i;j¼1 c

i

centers among all centers. Here x j denotes the jth point of the ith

cluster. d ls (x j , i) is computed by Eq. (4) where i denotes the symmet-

rical line (first principal axis) of the ith cluster. The objective is to maximize the LineSym-index in order to obtain the actual number of clusters and to achieve proper cluster- ing. As formulated in Eq. (7), LineSym is a composition of three factors, these are 1/K, 1=E K and D K . The first factor increases as K decreases; as LineSym needs to be maximized for optimal cluster- ing, it will prefer to decrease the value of K. The second factor is the within cluster total line symmetrical distance. For clusters which have good symmetrical structure, E i value is less. This, in turn, indi- cates that formation of more number of clusters, which are symmet- rical in shape, would be encouraged. Finally the third factor, D K , measuring the maximum separation between a pair of clusters, increases with the value of K. As these three factors are comple- mentary in nature, so they are expected to compete and balance each other critically for determining the proper partitioning.

i

Vol. 21, 86–100 (2011)

89

The fitness function for chromosome j is defined as LineSym j , i.e., the LineSym -index computed for partitioning encoded in that chromosome. The objective of the GA is to maximize this fitness function.

C. Selection. Conventional proportional selection is applied on

the population of chromosomes. Here, a chromosome receives a number of copies that is proportional to its fitness in the population. We have used roulette wheel strategy for implementing the propor- tional selection scheme.

D. Crossover. For the purpose of crossover, the cluster centers

are considered to be indivisible, i.e., the crossover points can only lie in between two cluster centers. The crossover operation, applied stochastically with probability of crossover (l c ), must ensure that information exchange takes place in such a way that both the off- spring encode the centers of at least two clusters. For this, the oper- ator is defined as follows (Maulik and Bandyopadhyay, 2003): Let parent chromosomes P 1 and P 2 encode M 1 and M 2 cluster centers, respectively. s 1 , the crossover point in P 1 , is generated as s 1 5rand() mod M 1 . Let s 2 be the crossover point in P 2 , and it may vary in between [LB(s 2 ),UB(s 2 )], where LB(s 2 ) and UB(s 2 ) indicate the lower and upper bounds of the range of s 2 , respectively. LB(s 2 )

and UB(s 2 ) are given by LB (s 2 ) 5 min[2,max[0,2 2 (M 1 2 s 1 )]] and UB (s 2 ) 5 [M 2 2 max[0,2 2 s 1 ]]. Therefore s 2 is given by

s 2 ¼ LBðs 2 Þ þ randðÞmodðUBðs 2 Þ LBðs 2 ÞÞ

ifðUBðs 2 Þ LBðs 2 ÞÞ;

¼ 0

otherwise:

It can be verified by some simple calculations that if the crossover points s 1 and s 2 are chosen according to the above rules, then none of the offsprings generated would have less than two clusters.

E. Mutation. Mutation is applied on each chromosome with prob-

ability l m . Mutation is of three types.

1. Each cluster center encoded in a chromosome is mutated with probability l m in the following way. The cluster center is replaced with a random variable drawn from a Laplacian distribution, pðeÞ / e je lj , where the scaling factor d sets the magnitude of perturbation. Here l is the value at the position which is to be perturbed. The scaling factor d is chosen equal to 1.0. The old value at the position is replaced with the newly generated value.

2. One randomly generated cluster center is removed from the chromosome, i.e., the total number of clusters encoded in the chromosome is decreased by 1.

3. The total number of clusters encoded in the chromosome is increased by 1. One randomly chosen point from the data set is encoded as the new cluster center.

d

Any one of the above mentioned types of mutation is applied randomly on a particular chromosome if it is selected for mutation.

F. Termination. In this article, we have executed the algorithm for a fixed number of generations. Moreover, the elitist model of GAs has been used, where the best string seen so far is stored in a location within the population. The best string of the last generation provides the solution to the clustering problem.

90 Vol. 21, 86–100 (2011)

G. Complexity Analysis of VGALS Clustering Technique. Below we have analyzed the time complexity of the proposed VGALS clustering technique. Here N: total number of points in the data set, M* : Maximum possible number of clusters and d: dimension of the data.

As discussed above Kd-tree data structure has been used in order to find the nearest neighbor of a particular point. The construction of Kd-tree requires O(NlogN) time and O(N) space (Anderberg, 2000). Initialization of GA needs O(Popsize 3 stringlength) time where Popsize and stringlength indicate the population size and the length of each chromosome in the GA, respectively. Note that stringlength is O(M* 3 d) where d is the dimension

of the data set and M* is the soft estimate of the upper bound of the number of clusters. Fitness Computation is composed of three steps.

1. In order to find membership values of each point to all clus- ter centers minimum line symmetrical distance of that point with respect to all clusters have to be calculated. At first Principal Component Analysis (PCA) is applied to detect the first Principal axis. This will take O(N 3 d 2 ) time. There- after this first Principal axis is used as the symmetrical line of the respective cluster. In order to determine line symmetry

based distance, Kd-tree based nearest neighbor search is used. If the points are roughly uniformly distributed, then the expected case complexity is O(c d 1 logN), where c is a

constant depending on dimension and the point distribution. This is O(logN) if the dimension d is a constant (Bentley et al., 1980). (Friedman et al., 1977) also reported O(logN) expected time for finding the nearest neighbor. So in order to find minimal symmetrical distance of a particular point, O(M*logN) time is needed. Thus total complexity of com- puting membership values of N points to M* clusters is O(M*NlogN).

2. For updating the centers total complexity is O(M*).

3. Total complexity for computing the fitness values is O(N 3 M*).

So the fitness evaluation has total complexity5 O(Popsize 3 M*NlogN).

Selection step of the GA requires O(Popsize 3 stringlength) time. Mutation and Crossover require O(Popsize 3 stringlength) time each.

Thus summing up the above complexities, and considering string- length N, total time complexity becomes O(M*NlogN 3 Popsize) per generation. For maximum Maxgen number of generations total complexity becomes O(M*NlogN 3 Popsize 3 Maxgen).

H. Space Complexity Analysis. The major space requirement of VGALS clustering is due to its population. Thus, the total space complexity of VGALS clustering is O(Popsize 3 Stringlength), i.e., O(Popsize 3 d 3 M*). Also for each population we have to keep a membership matrix of size N 3 M*. Thus total space compexity will be O(Popsize 3 N 3 M*) as N d.

IV EXPERIMENTAL RESULTS A. Results on Satellite Images. In this section at first, the ex- perimental results obtained after application of the above mentioned

Figure 2. feature space. VGALS-clustering technique for segmenting two remote sensing satellite images of the

Figure 2.

feature space.

VGALS-clustering technique for segmenting two remote sensing satellite images of the parts of the cities of Kolkata are provided. The two satellite images are of sizes 512 3 512, i.e., the size of the data set to be clustered in all the images is 262,144. For these multi- spectral satellite images, the feature vector is composed of the in- tensity values at different bands of the image. The parameters of the proposed algorithm are as follows: population size520, number of generations520, probability of crossover50.8 and probability of mutation50.2. For the purpose of comparison, popular Fuzzy C- means (FCM) (Bezdek, 1981) clustering algorithm, a recent method of satellite image segmentation proposed in (Bandyopadhyay and Saha, 2007) (GAPS clustering with Sym-index based method), mean-shift based segmentation technique (Comaniciu and Meer, 2002) are also executed on real-life images. The results are com- pared both qualitatively and quantitatively.

Data distribution of IRS image of Kolkata in the first three

B. IRS Image of Kolkata. The data used here was acquired from Indian Remote Sensing Satellite (IRS-1A) using the LISS-II sensor

Remote Sensing Satellite (IRS-1A) using the LISS-II sensor Figure 3. gram equalization. IRS image of Kolkata

Figure 3.

gram equalization.

IRS image of Kolkata in the near infra red band with histo-

IRS image of Kolkata in the near infra red band with histo- Figure 4. Clustered IRS

Figure 4.

Clustered IRS image of Kolkata using VGALS-clustering

technique.

that has a resolution of 36.25 3 36.25 h. The image is contained in four spectral bands namely, blue band of wavelength 0.45–0.52 lm, green band of wavelength 0.52–0.59 lm, red band of wavelength 0.62 – 0.68 lm, and near infra red band of wavelength 0.77 – 0.86 lm. Thus, the feature vector of each image pixel is composed of four intensity values at different bands. The distribution of the pix- els in the first three feature space of this image is shown in Figure 2. It can be easily seen from the Figure 2 that the entire data can be partitioned into several different shaped clusters where symmetry does exist. Figure 3 shows the Kolkata image in the near infra red band. Some characteristic regions in the image are the river Hooghly cut- ting across the middle of the image, several fisheries observed

across the middle of the image, several fisheries observed Figure 5. Clustered IRS image of Kolkata

Figure 5.

Clustered IRS image of Kolkata using FCM clustering.

Vol. 21, 86–100 (2011)

91

Figure 6. clustering technique. Clustered IRS image of Kolkata using mean-shift based toward the lower-right

Figure 6.

clustering technique.

Clustered IRS image of Kolkata using mean-shift based

toward the lower-right portion, a township, SaltLake, to the upper- left hand side of the fisheries. This township is bounded on the top by a canal. Two parallel lines observed towards the upper right- hand side of the image correspond to the airstrips in the Dumdum airport. Other than these, there are several water bodies, roads, etc. in the image. From our ground knowledge, we know that the image has four clusters (Maulik and Bandyopadhyay, 2003) and these four clusters correspond to the classes turbid water, pond water, concrete and open space. The VGALS-clustering technique automatically provides four clusters for this image data (Fig. 4). It may be noted that the water class has been differentiated into turbid water (the Hooghly) and pond water (fisheries, etc.) because of a difference in their spectral properties. The canal bounding SaltLake from the upper portion has also been correctly classified as pond water. Figure 5 shows the Kolkata image partitioned in four clusters using FCM algorithm (Bezdek, 1973). But the segmentation result is unsatisfactory from the human visualization judgement. As can be seen, the river Hooghly as well as the city region has been incorrectly classified as belonging to the same class. Therefore we can conclude that although some regions, viz., fisheries, canal bounding SaltLake, parts of the airstrip, etc., have been correctly identified, a significant amount of confusion is evident in the FCM clustering result. GAPS clustering technique with Sym index based method automatically provides K 5 4 number of clusters from this data set. The corre- sponding partitioning is shown in Figure 7. The automatically seg- mented image after application of the mean-shift based segmenta- tion technique (Comaniciu and Meer, 2002) is shown in Figure 6.

technique (Comaniciu and Meer, 2002) is shown in Figure 6. Figure 7. Clustered IRS image of

Figure 7. Clustered IRS image of Kolkata using GAPS clustering technique with Sym-index based method.

To validate the segmentation result obtained by VGALS-cluster- ing quantitatively, here two well-known Euclidean distance based cluster validity indices, namely I index (Maulik and Bandyopad- hyay, 2002) and XB-index (Xie and Beni, 1991) values are also computed. These are provided in Table I. Smaller values of XB- index and larger values of I index correspond to good clustering. The values again show that the segmentation provided by VGALS- clustering is much better than the existing other methods.

C. SPOT Image of Kolkata. The French satellites SPOT (Sys- tems Probataire d’Observation de la Terre) (Richards, 1993), launched in 1986 and 1990, carry two imaging devices that consist of a linear array of charge coupled device (CCD) detectors. Two imaging modes are possible, the multispectral and panchromatic modes. The 512 3 512 SPOT image of a part of the city of Kolkata is available in three bands in the multispectral mode. These bands are:

Band 1 — green band of wavelength 0.50 – 0.59 lm Band 2 — red band of wavelength 0.61 – 0.68 lm Band 3 — near infra red band of wavelength 0.79 – 0.89 lm.

Thus, here feature vector of each image pixel composed of three intensity values at different bands. The distribution of the pixels in the feature space of this image is shown in Figure 9. It can be easily seen from the Figure 9 that the entire data can be partitioned into several hyperspherical clusters.

Table 1. I index and XB-index values of the segmented Mumbai and Kolkata satellite images provided by VGALS-clustering, FCM- clustering, and method proposed in (Bandyopadhyay and Saha, 2007)

 

Kolkata IRS

 

Mumbai IRS

 

SPOT Kolkata

Index

VGALS

FCM

GAPS with Sym

VGALS

FCM

GAPS with Sym

VGALS

FCM

GAPS with Sym

I index

24.24

5.71

18.27

214.78

23.06

180.45

26.58

5.71

20.67

XB index

1.75

23.67

2.23

2.12

4.67

2.91

2.28

12.23

3.05

92 Vol. 21, 86–100 (2011)

Figure 8. histogram equalization. SPOT image of Kolkata in the near infra red band with

Figure 8.

histogram equalization.

SPOT image of Kolkata in the near infra red band with

Some important landcovers of Kolkata are present in the image. Most of these can be identified, from a knowledge about the area, more easily in the near infra-red band of the input image (Fig. 8). These are the following: The prominent black stretch across the fig- ure is the river Hooghly. Portions of a bridge (referred to as the sec- ond bridge), which was under construction when the picture was taken, protrude into the Hooghly near its bend around the center of the image. There are two distinct black, elongated patches below the river, on the left side of the image. These are water bodies, the one to the left being Garden Reach lake and the one to the right being Khidirpore dockyard. Just to the right of these water bodies, there is a very thin line, starting from the right bank of the river, and going to the bottom edge of the picture. This is a canal called the Talis nala. Above the Talis nala, on the right side of the picture, there is a triangular patch, the race course. On the top, right hand side of the image, there is a thin line, stretching from the top edge,

image, there is a thin line, stretching from the top edge, Figure 10. ing technique. Clustered
image, there is a thin line, stretching from the top edge, Figure 10. ing technique. Clustered

Figure 10.

ing technique.

Clustered SPOT image of Kolkata using VGALS cluster-

and ending on the middle, left edge. This is the Beleghata canal with a road by its side. There are several roads on the right side of the image, near the middle and top portions. These are not very obvious from the images. A bridge cuts the river near the top of the image. This is referred to as the first bridge. The proposed VGALS clustering method automatically provides K 5 7 as the optimal number of clusters for this image data set (cor- responding partitioning is shown in Figure 10). As identified in (Pal et al., 2001) the above satellite image has seven classes namely, tur- bid water, concrete, pure water, vegetation, habitation, open space and roads (including bridges). The partitioning provided by the

roads (including bridges). The partitioning provided by the Figure 9. Data distribution of SPOT image of

Figure 9.

Data distribution of SPOT image of Kolkata in the feature

Figure 11.

Clustered SPOT image of Kolkata using FCM clustering

space.

technique.

Vol. 21, 86–100 (2011)

93

Figure 12. Clustered SPOT image of Kolkata using mean-shift based clustering technique. VGALS clustering technique

Figure 12. Clustered SPOT image of Kolkata using mean-shift based clustering technique.

VGALS clustering technique separates almost all the regions well. The Talis nala has been identified properly by the proposed method (shown in Figure 10). The bridge is also correctly identified by the proposed algorithm. This again shows that the proposed VGALS is able to detect clusters of widely varying sizes. The segmentation result obtained by Fuzzy C-means algorithm on this image for K 5 7 (actual number of clusters present in this image data set) is shown in Figure 11. It can be seen from Figure 11 that FCM algorithm is not able to detect the bridge. The automatically segmented image obtained after application of mean-shift based segmentation tech- nique (Comaniciu and Meer, 2002) is shown in Figure 12. Here

nique (Comaniciu and Meer, 2002) is shown in Figure 12. Here Figure 13. Clustered SPOT image

Figure 13. Clustered SPOT image of Kolkata using GAPS cluster- ing technique with Sym-index based method.

94 Vol. 21, 86–100 (2011)

Table II. Confusion matrix of the partitioning obtained by VGALS cluster- ing technique for numeric SPOT Kolkata image. Here following notations are used: TW: turbid water, C: concrete, PW: pure water, V: vegetation, H: habitation, OS: open space and R: roads

Ground Truth (Percent)

Class

TW

C

PW

V

H

OS

R

TW

87

0

13

0

0

0

0

C

0

83

0

0

0

0

1

PW

13

0

82

0

0

0

5

V

0

0

0

83

15

3

0

H

0

12

0

10

80

10

1

OS

0

0

0

7

4

81

4

R

0

5

5

0

1

6

89

also the bridge has not been detected. The method proposed in (Saha and Bandyopadhyay, 2008) (GAPS along with Sym-index) provides K 5 6 clusters from this data set. The corresponding parti- tioning is shown in Figure 13. This technique is also able to detect the bridge correctly. In order to validate the segmentation result obtained by VGALS-clustering quantitatively, here two well-known Euclidean distance based cluster validity indices, namely I index (Maulik and Bandyopadhyay, 2002) and XB-index (Xie and Beni, 1991) values are also computed. These are provided in Table I. Smaller values of XB-index and larger values of I index correspond to good cluster- ing. The values again show that the segmentation provided by VGALS-clustering is much better than the existing clustering techniques. To validate the results, 932 pixel positions were manually selected from seven different land cover types which were labelled accordingly. The confusion matrix of the partitioning obtained by VGALS clustering technique for this data set is shown in Table II. The class by class classification accuracies are 87, 83, 82, 83, 80, 81, and 89, respectively. The overall accuracy is 87.

D. Results on MR Brain Images. The MR images of the brain chosen for the experiments are available in three bands: T 1 - weighted, proton density (p d )-weighted and T 2 -weighted. The nor- mal brain images are obtained from Brainweb database (BrainWeb). The images correspond to the 1 mm slice thickness, 3% noise (calculated relative to the brightest tissue) and with 20% intensity nonuniformity. The image of size 217 3 181 is available in 181 different z planes. The proposed clustering algorithm is exe- cuted on seven of these z planes. The parameters of the VGALS

Table III. Minkowski Scores (MS) Obtained by FCM, EM and VGALS clustering algorithms on simulated MR volumes for normal brain projected on different z planes. Here # AC, # OC denotes, respectively, the actual number of clusters and the automatically obtained number of clusters (after application of VGALS).

MS for AC

MS for OC

z Plane No.

#AC

FCM

EM

#OC

FCM

EM

VGALS

1

6

1.077

1.052

11

0.80

1.17

0.77

2

6

0.76

0.78

9

0.65

0.83

0.62

3

6

0.57

0.76

8

0.62

0.64

0.59

36

9

0.90

0.98

9

0.90

0.98

0.84

72

10

0.75

0.74

10

0.75

0.74

0.70

108

9

0.79

0.589

10

0.81

0.68

0.701

144

9

0.82

0.72

11

0.49

0.60

0.32

Figure 14. (a) Original T1-weighted MR image of the normal brain in z1 plane. (b)

Figure 14.

(a) Original T1-weighted MR image of the normal brain in z1 plane. (b) Segmentation obtained by VGALS clustering technique.

algorithm are as follows: population size520, total number of gen- erations515, probability of crossover (l c )50.9, probability of mutation (l m )50.02. Number of clusters, K, is varied from 2 to 20. For the normal MR brain image, the ground truth information is available to us. There are a total of 10 classes present in the images. But the number of classes varies along the different z planes. Ten classes are Background, CSF, Grey Matter, White Matter, Fat, Mus- cle/Skin, Skin, Skull, Glial Matter, and Connective. Table III shows the actual number of clusters and the number of clusters automati- cally determined by the proposed VGALS clustering technique (af- ter application on the above mentioned brain images projected on different z planes). To measure the segmentation solution quantita- tively, we have also calculated Minkowski Score (MS) (Ben-Hur and Guyon, 2003). This is a measure of the quality of a solution given the true clustering. Let T be the ‘‘true’’ solution and S the so- lution we wish to measure. Denote by n 11 the number of pairs of elements that are in the same cluster in both S and T. Denote by n 01 the number of pairs that are in the same cluster only in S and not in T, and by n 10 the number of pairs that are in the same cluster in T and not in S. Minkowski Score (MS) is then defined as:

MSðT; SÞ ¼

. For MS, the optimum score is 0, with lower

scores being ‘‘better.’’ The MS scores obtained by VGALS cluster- ing corresponding to the 7 brain images are also reported in Table III. For the purpose of comparison, we have executed Fuzzy C- means (FCM) (Bezdek, 1973) and Expectation Maximization (EM) (Jain et al., 1999) algorithms on the above mentioned brain datasets with two different values of K. In the first case, K is kept equal to the actual number of clusters that present in that particular plane. Next, it is set equal to that automatically determined by VGALS algorithm. The MS scores obtained by both the comparing algorithms are also reported in Table III for all the seven images. Results show that the MS scores corresponding to the partitionings provided by the VGALS clustering, in general, is the minimum among all the parti- tions. This implies the superior performance of VGALS to automati- cally detect the proper partitioning from MR normal brain images. Figures 14a, 15a, 16a, 17a, and 18a show the original MR normal brain images in T1 band projected on z1, z36, z72, z108, z144 planes, respectively. Figures 14b, 15b, 16b, 17b, and 18b show, respectively, the corresponding automatically segmented images obtained after application of VGALS clustering algorithm.

ffiffiffiffiffiffiffiffiffiffiffiffi

q

n 01 þn 10 n 11 þn 10

q n 01 þ n 10 n 11 þ n 10 Figure 15. (a) Original T1-weighted

Figure 15.

(a) Original T1-weighted MR image of the normal brain in z36 plane. (b) Segmentation obtained by VGALS clustering technique.

Vol. 21, 86–100 (2011)

95

Figure 16. (a) Original T1-weighted MR image of the normal brain in z72 plane. (b)

Figure 16.

(a) Original T1-weighted MR image of the normal brain in z72 plane. (b) Segmentation obtained by VGALS clustering technique.

(b) Segmentation obtained by VGALS clustering technique. Figure 17. (a) Original T1-weighted MR image of the

Figure 17.

(a) Original T1-weighted MR image of the normal brain in z108 plane. (b) Segmentation obtained by VGALS clustering technique.

(b) Segmentation obtained by VGALS clustering technique. Figure 18. (a) Original T1-weighted MR image of the

Figure 18.

(a) Original T1-weighted MR image of the normal brain in z144 plane. (b) Segmentation obtained by VGALS clustering technique.

96 Vol. 21, 86–100 (2011)

Table IV. Minkowski Scores (MS) obtained by FCM, EM and VGALS clustering algorithms on simulated MR volumes for brain with multiple sclerosis lesions projected on different z planes. Here # AC, # OC denotes, respectively, the actual number of clusters and the automatically obtained number of clusters (after application of VGALS)

MS for AC

MS for OC

z Plane No.

#AC

FCM

EM

#OC

FCM

EM

VGALS

1

6

0.59

0.59

9

0.80

1.12

0.77

2

6

0.75

0.76

8

0.66

0.83

0.66

5

6

0.74

0.76

9

0.69

0.77

0.75

36

9

0.99

1.01

9

0.99

1.01

0.88

72

11

0.72

0.63

9

0.77

0.67

0.75

108

10

0.78

0.58

10

0.80

0.71

0.71

144

9

0.31

0.76

11

0.89

0.88

0.88

Next, the proposed algorithm is executed on some simulated MR volumes for brain with multiple sclerosis lesions obtained from (BrainWeb). These images are again available in 3 modalities T 1 - weighted, proton density (p d )-weighted and T 2 -weighted. These images also correspond to 1 mm slice thickness, 3% noise (calcu- lated relative to the brightest tissue) and with 20% intensity non- uniformity. Now, the images contain a total of 11 classes. These are Background, CSF, Grey Matter, White Matter, Fat, Muscle/Skin, Skin, Skull, Glial Matter, Connective and MS Lesion. However, the number of classes varies along the z planes. The image is available in 181 different z planes. Fuzzy-VGAPS clustering algorithm is ex- ecuted on the images projected on 7 different z-planes. The parame- ters of the algorithm are same as the above. The ground truth infor- mation is available to us. MS score (Ben-Hur and Guyon, 2003) is calculated after application of VGALS clustering technique in order to measure the ‘‘goodness’’ of the solutions. Table IV shows the actual number of clusters present in the image, the obtained number of clusters and the ‘‘goodness’’ of the corresponding partitioning in terms of the MS Score after application of VGALS clustering algo- rithm on these 7 different MS Lesion Brain images. For the purpose of comparison, Fuzzy C-means algorithm (Bezdek, 1973) and EM algorithm (Jain et al., 1999) are again executed on these images, firstly with the number of clusters automatically determined by the VGALS and then with the actual number of clusters present in the images. The MS scores of the corresponding partitionings are also

provided in Table IV. The results again show that the MS scores corresponding to the partitionings provided by the VGALS cluster- ing is, in general, the minimum. This again reveals the effectiveness of the VGALS clustering in automatically segmenting the MR brain images with multiple sclerosis lesions. Figures 19a, 20a, 21a, 22a, and 23a show the original MS Lesion Brain image in T1 band pro- jected on z2, z36, z72, z108 and z144 planes, respectively. Figures 19b, 20b, 21b, 22b, and 23b show, respectively, the corresponding automatically segmented images obtained after application of VGALS clustering algorithm. The calculated MS scores show that the proposed VGALS clustering performs well in segmenting both normal brain image and brain image with multiple sclerosis lesions. Further, experiments are carried out in order to establish the effectiveness of the line symmetry based distance in VGALS clus- tering technique. VGALS clustering technique is now executed on the first 10 z planes of the MR brain images with multiple sclerosis lesions. The corresponding MS scores are now provided in Table V. The results are then compared with those obtained by Fuzzy-VGA (Maulik and Bandyopadhyay, 2003) (fuzzy variable string length genetic algorithm based clustering technique), optimizing popular Euclidean distance based Xie-Beni (XB) index (Xie and Beni, 1991), where Euclidean distance is used to compute the member- ship values of points to different clusters. The number of clusters and the MS scores obtained by Fuzzy-VGA after applying it on the first 10 z planes of the MR brain images with multiple sclerosis

10 z planes of the MR brain images with multiple sclerosis Figure 19. clustering technique. (a)

Figure 19.

clustering technique.

(a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z2 plane. (b) Segmentation obtained by VGALS

Vol. 21, 86–100 (2011)

97

Figure 20. clustering technique. (a) Original T1-weighted MRI image of the brain with multiple sclerosis

Figure 20.

clustering technique.

(a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z36 plane. (b) Segmentation obtained by VGALS

lesions in z36 plane. (b) Segmentation obtained by VGALS Figure 21. clustering technique. (a) Original T1-weighted

Figure 21.

clustering technique.

(a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z72 plane. (b) Segmentation obtained by VGALS

lesions in z72 plane. (b) Segmentation obtained by VGALS Figure 22. clustering technique. (a) Original T1-weighted

Figure 22.

clustering technique.

(a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z108 plane. (b) Segmentation obtained by VGALS

98 Vol. 21, 86–100 (2011)

Figure 23. clustering technique. (a) Original T1-weighted MRI image of the brain with multiple sclerosis

Figure 23.

clustering technique.

(a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z144 plane. (b) Segmentation obtained by VGALS

lesions are also provided in Table V. Results show that the proposed VGALS clustering technique is more effective than the Fuzzy- VGA. This establish the fact that the line symmetry based distance is more effective in segmenting the MR brain images than the exist- ing Euclidean distance.

V. DISCUSSION AND CONCLUSION

In this article, a new variable string length genetic algorithm based clustering technique is developed which utilizes a recently developed line symmetry based distance for assignment of points to different clusters. A new cluster validity index based on the line symmetry based distance is also developed here and thereafter it is utilized for computing the fitness of the proposed genetic clustering technique. The proposed clustering technique can automatically determine the appropriate partitioning and the appropriate number of partitions from a given data set having line symmetrical clusters. The effective- ness of the proposed technique is shown in detecting the proper parti- tioning from two remote sensing satellite images of the parts of the cities of Kolkata. Results are compared with those obtained by the Fuzzy C-means clustering technique, Mean-shift based method and GAPS-clustering with Sym-index based method (Bandyopadhyay and Saha, 2007). Thereafter, the effectiveness of the proposed algo-

Table V. The automatically obtained cluster (OC) number and the corresponding Minkowski Scores (MS) after application of VGA And VGALS clustering algorithms on simulated MR volumes for brain with multiple sclerosis lesions projected on first 10 z planes

VGA

VGALS

z Plane No.

AC

OC

MS

OC

MS

1

6

2

1.21

11

0.77

2

6

2

1.20

8

0.77

3

6

2

1.19

11

0.78

4

6

5

0.69

9

0.77

5

6

2

1.184

9

0.75

6

6

2

1.18

8

0.71

7

6

2

1.17

9

0.78

8

6

2

1.16

10

0.73

9

6

2

1.16

11

0.84

10

9

2

1.17

11

0.83

rithm is shown in segmenting several MR brain images. The segmen- tation results are then compared with the available ground truth infor- mation. For the purpose of comparison, the well-known Fuzzy C- means and EM algorithms are also executed on these images. Experi- mental results show that VGALS clustering is not only able to auto- matically segment the MR brain images into different tissue classes but the corresponding segmentation results are also the best. Note that present work does not use any spatial information while segmenting the images. But incorporation of spatial informa- tion surely improves the quality of the results. Thus some new methods of incorporating the spatial information have to be invented in the future. Future work also includes the development of some fuzzy genetic clustering technique based on the line sym- metry based distance. Developing some new form of symmetry, like plane symmetry etc. is also another important future research work. Authors are currently working in this direction.

REFERENCES

Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarz- kopf. Computational Geometry: Algorithms and Applications. Springer- Verlag, Berlin, Germany, 2nd edition, 2000.

S. Bandyopadhyay and U. Maulik, Genetic clustering for automatic evolution

of clusters and application to image classification, Pattern Recogn (2002),

1197–1208.

S. Bandyopadhyay and S. Saha, GAPS: A clustering method using a new

point symmetry based distance measure, Pattern Recogn 40 (2007), 3451.

S. Bandyopadhyay and S. Saha, A point symmetry based clustering technique

for automatic evolution of clusters, IEEE Trans Knowl Data Eng 20 (2008),

1–17.

S. Bandyopadhyay, U. Maulik, and A. Mukhopadhyay, Multiobjective genetic clustering for pixel classification in remote sensing imagery, IEEE Trans Geosci Remote Sens 45 (2007), 1506–1511.

A. Ben-Hur and I. Guyon, Detecting Stable Clusters using Principal Compo-

nent Analysis in Methods in Molecular Biology, M. Brownstein and A. Kohodursky, (Editors), Humana Press, 2003.

J.L. Bentley, B.W. Weide, and A.C. Yao, Optimal expected-time algorithms for closest point problems, ACM Trans Math Software 6 (1980), 563–580.

J.C. Bezdek, Fuzzy mathematics in pattern classification, Ph.D. dissertation, Cornell University, Ithaca, NY, 1973.

Vol. 21, 86–100 (2011)

99

J.C. Bezdek, Pattern recognition with fuzzy objective function algorithms, Plenum, New York, 1981.

S.M. Bhandarkar and H. Zhang, Image segmentation using evolutionary computation, IEEE Trans Evol Comp 3 (1999), 1–21.

G. Bilgin, S. Erturk, and T. Yildirim, Unsupervised classification of hyper-

spectral-image data using fuzzy approaches that spatially exploit member- ship relations, IEEE Geosci Remote Sens Lett 5 (2008), 673–677.

BrainWeb: Simulated brain database. Available at: http://www.bic.mni. mcgill.ca/brainweb; http://www.bic.mni.mcgill.ca/brainweb/.

C.A. Cocosco, V. Kollokian, R.K.-S. Kwan, A.C. Evans: ‘‘BrainWeb:

Online Interface to a 3D MRI Simulated Brain Database’’ NeuroImage, vol. 5, no. 4, part 2/4, S425, 1997 – Proceedings of 3-rd International Confer- ence on Functional Mapping of the Human Brain, Copenhagen, May, 1997.

R.L. Cannon, R. Dave, J.C. Bezdek, and M. Trivedi, Segmentation of a the- matic mapper image using fuzzy c-means clustering algorithm, IEEE Trans Geosci Remote Sens 24 (1986), 400–408.

V.V. Chamundeeswari, D. Singh, and K. Singh, Unsupervised land cover classification of SAR images by contour tracing, Proc. of IEEE International Geoscience and Remote Sensing Symposium (IGARSS, 2007), Barcelona, Spain, July, 23–28, 2007, pp. 547–550.

C.-C.T. Chen and D.A. Landgrebe, A spectral design system for the HIRIS/ MODIS era, IEEE Trans Geosci Remote Sens 27 (1989), 681–686.

D. Comaniciu and P. Meer, Mean shift: A robust approach toward feature

space analysis, IEEE Trans Pattern Anal Machine Intell 24 (2002), 603–619.

M.A. Friedl, D.K. McIver, J.C.F. Hodges, X.Y. Zhang, D. Muchoney, A.H. Strahler, C.E. Woodcock, S. Gopal, A. Schneider, A. Cooper, A. Baccini, F. Gao, C. Schaaf, Global land cover mapping from MODIS: Algorithms and early results, Remote Sens Environ 83 (2002), 287–302.

J.H. Friedman, J.L. Bently, and R.A. Finkel, An algorithm for finding best matches in logarithmic expected time, ACM Trans Math Software 3 (1977),

209–226.

V. Gandhi, J.M. Kang, S. Shekhar, J. Ju, E.D. Kolaczyk, and S. Gopal, Con-

text inclusive function evaluation: a case study with EM-based multi-scale multi-granular image classification. Knowl. Inf. Syst. 21, 2 (Oct. 2009), 231–247. DOI5http://dx.doi.org/10.1007/s10115-009-0208-0

H. Ghassemian and P.A. Landgrebe, Object oriented feature extraction

method for image data compaction, IEEE Control Syst Mag 8 (1988), 42–48.

D. Guo, H. Xiong, V. Atluri, and N.R. Adam, Object discovery in high-

resolution remote sensing images: a semantic perspective, Knowl Inf Syst 19 (2009), 211–233. DOI=http://dx.doi.org/10.1007/s10115-008-0160-4.

E.E. Hilbert, Cluster compression algorithm-a joint clustering data compres- sion concept, JPL Publication, NASA, Technical Report, 1977

M.A. Jaffar, A. Hussain, and A.M. Mirza, Fuzzy entropy based optimization

of clusters for the segmentation of lungs in CT scanned images. Knowl. Inf. Syst. 24, 1 (Jul. 2010), 91-111. DOI=http://dx.doi.org/10.1007/s10115-009-

0225-z.

A.K. Jain, M.N. Murthy, and P.J. Flynn, Data clustering: A review, ACM Comput Rev 31, 3 (Sep. 1999), 264–323. DOI=http://doi.acm.org/10.1145/

331499.331504.

I. Jolliffe, Principal component analysis, Springer Series in Statistics, Eng- land, 1986.

R. Kauth, A. Pentland, and G. Thomas, BLOB: An unsupervised clustering

approach to spatial preprocessing of MSS imagery, Proceedings of 11th Int.

Symp. Remote Sensing of the Environment, Ann Arbor, Mich, 1977, 1309–

1317.

A. Marcal and L. Castro, Hierarchical clustering of multispectral images

using combined spectral and spatial criteria, IEEE Geosci Remote Sens Lett 2 (2005), 59–63.

100 Vol. 21, 86–100 (2011)

U. Maulik and S. Bandyopadhyay, Performance evaluation of some cluster-

ing algorithms and validity indices, IEEE Trans Pattern Anal Machine Intell 24 (2002), 1650–1654.

U. Maulik and S. Bandyopadhyay, Fuzzy partitioning using a real-coded

variable-length genetic algorithm for pixel classification, IEEE Trans Geosci

Remote Sens 41 (2003), 1075–1081.

D.M. Mount and S. Arya, ANN: A library for approximate nearest neighbor searching, 2005, Available at:http://www.cs.umd.edu/ mount/ANN.

S.K. Pal, S. Bandyopadhyay, and C.A. Murthy, Genetic classifiers for remotely sensed images: Comparison with standard methods, Int J Remote Sens 22 (2001), 2545–2569.

M.L. Pugh and A.M. Waxman, Classification of spectrally-similar land cover using multi-spectral neural image fusion and the fuzzy ARTMAP neu- ral classifier, Proc. of IEEE International Geoscience and Remote Sensing Symposium (IGARSS, 2006), Denver, Colorado, July 31, 2006–Aug. 4, 2006, pp. 1808–1811.

J.A. Richards, Remote sensing digital image analysis: An introduction, Springer-Verlag, New York, 1993.

S. Saha and S. Bandyopadhyay, MRI brain image segmentation by fuzzy

symmetry based genetic clustering technique, Proceedings of the 2007 IEEE Congress on Evolutionary Computation (CEC’07), Singapore, 2007a, pp.

4417–4424.

S. Saha and S. Bandyopadhyay, A genetic clustering technique using a new

line symmetry based distance measure, Proceedings of Fifth International Conference on Advanced Computing and Communications (ADCOM’07),

IEEE Computer Society, Guwahati, India, 2007b, pp. 365–370.

S. Saha and S. Bandyopadhyay, Application of a new symmetry based clus-

ter validity index for satellite image segmentation, IEEE Geosci Remote Sens Lett 5 (2008), 166–170.

S. Saha and S. Bandyopadhyay, MR brain image segmentation using a

multi-seed based automatic clustering technique, Fundam Inform 97 (2009),

199–214.

S. Saha and S. Bandyopadhyay, A new multiobjective clustering technique

based on the concepts of stability and symmetry, Knowl Inf Syst 23 (2010),

1–27.

S. Saha and S. Bandyopadhyay, Automatic MR brain image segmentation

using a multiseed based multiobjective clustering approach, Applied Intelli-

gence. DOI: 10.1007/s10489-010-0231-6.

S. Saha and S. Bandyopadhyay, On principle axis based line symmetry clus-

tering techniques, Memetic Computing. DOI: 10.1007/s12293-010-0049-0.

K. Sayood, Data compression in remote sensing applications, IEEE Geosci

Remote Sens Newslett 84 (1992), 7–15.

J. Suckling, T. Sigmundsson, K. Greenwood, and E. Bullmore, A modified fuzzy clustering algorithm for operator independent brain tissue classifica- tion of dual echo MR images, Magn Reson Imaging 17 (1999), 1065–1076.

Y. Tarabalka, J.A. Benediktsson, and J. Chanussot, Spectralspatial classifi-

cation of hyperspectral imagery based on partitional clustering techniques, IEEE Trans Geosci Remote Sensi 47 (2009), 2973–2987.

M. Tyagi, F. Bovolo, A. Mehra, S. Chaudhuri, and L. Bruzzone, A context-

sensitive clustering technique based on graph-cut initialization and expecta- tion-maximization algorithm, IEEE Geosci Remote Sens Lett 5 (2008), 21–

25.

C.

Wemmert, A. Puissant, G. Forestier, and P. Gancarski, Multiresolution

remote sensing image clustering, IEEE Geosci Remote Sens Lett 6 (2009),

533–537.

X.L. Xie and G. Beni, A validity measure for fuzzy clustering, IEEE Trans Pattern Anal Machine Intell 13 (1991), 841–847.

Copyright of International Journal of Imaging Systems & Technology is the property of John Wiley & Sons, Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.