A New Line Symmetry Distance Based Automatic

A New Line Symmetry Distance Based Automatic Clustering Technique: Application to Image Segmentation
Sriparna Saha,1 Ujjwal Maulik2 Image Processing and Modeling, Interdisciplinary Center for Scientic Computing (IWR), University of Heidelberg, Heidelberg, Germany Department of Theoretical Bioinformatics, DKFZ (Deutsches Krebsforschungszentrum, German Cancer Research Center), Heidelberg, Germany
2 1
Received 3 March 2009; accepted 16 May 2010
ABSTRACT: In this article, at rst an automatic clustering technique using the concept of line symmetry property is developed. The proposed real-coded variable string length genetic clustering technique (VGALS clustering) is able to evolve the number of clusters present in the data set automatically. Here assignment of points to different clusters is done based on the line symmetry based distance rather than the Euclidean distance. The cluster centers are encoded in the chromosomes, whose value may vary. A newly developed line symmetry based cluster validity index, LineSym-index, is used as a measure of goodness of the corresponding partitioning. This validity index is able to correctly indicate the presence of clusters of different sizes as long as they are line symmetrical. A Kd-tree based data structure is used to reduce the complexity of computing the line symmetry distance. The proposed technique is then applied to automatically segment different images. At rst, the superiority of the proposed method to automatically segment the image data sets over Fuzzy C-means clustering technique, well-known mean-shift based method and GAPS clustering with Sym-index based method, are demonstrated for three remote sensing satellite images. Thereafter it is applied on several simulated T1-weighted, T2-weighted, and proton density normal and MS lesion magnetic resonance brain images. The proposed method is able to detect most of the regions well. Superiority of the proposed method over Fuzzy C-means and Expectation Maximization clustering algorithms are demonstrated quantitatively. The automatic segmentation obtained by VGALS clustering technique is also compared with the available ground truth informaC tion. V 2011 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 21, 86100,
2011; Published online in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/ima.20243
Key words: unsupervised classication; cluster validity index; symmetry; line symmetry based distance; Principal component analysis; Kd tree; magnetic resonance image
Correspondence to: Sriparna Saha; e-mail: sriparna.saha@gmail.com
I. INTRODUCTION Remote sensing satellite images have signicant applications in different areas like climate studies, assessment of forest resources, examining marine environments, etc. For remote sensing applications, classication is an important task where the pixels in the images are classied into homogeneous regions, each of which corresponds to some particular landcover type. The problem of pixel classication is often posed as clustering (Saha and Bandyopadhyay, 2010) in the intensity space (Maulik and Bandyopadhyay, 2003). In the unsupervised pixel classication framework, various clustering algorithms like Fuzzy C-means (Cannon et al., 1986), and statistical methods have been used for the purpose of satellite image segmentation. Recently application of genetic algorithms (GAs) in the eld of pixel classication has obtained signicant attention of the researchers (Maulik and Bandyopadhyay, 2003; Saha and Bandyopadhyay, 2008). A context-sensitive clustering technique for unsupervised image segmentation based on graph-cut initialization and expectation-maximization algorithm is developed by Tyagi et al., (2008). An unsupervised hyperspectral image classication technique based on fuzzy-clustering algorithms that spatially exploit membership relations is proposed by Bilgin et al., (2008). A multiresolution remote sensing image clustering technique is proposed by Wemmert et al., (2009) which uses information contained in both spatial resolutions. An agglomerative hierarchical clustering method for large multispectral images, which uses both spectral and spatial information for the aggregation decision, is proposed by Marcal and Castro, (2005). A new spectral-spatial classication scheme for hyperspectral images which combines the results of a pixel wise support vector machine classication and the segmentation map obtained by partitional clustering using majority voting is proposed by Tarabalka et al., (2009). Recently, a multiobjective fuzzy genetic clustering technique for pixel classication has been proposed by Bandyopadhyay et al., (2007). By Chamundeeswari et al., (2007), an unsupervised classication algorithm
' 2011 Wiley Periodicals, Inc.
using Maximum a posteriori (MAP) segmentation for SAR images is presented. Some more clustering techniques for satellite image segmentation can be found elsewhere (Hilbert, 1977; Kauth et al., 1977; Ghassemian and Landgrebe, 1988; Chen and Landgrebe, 1989; Sayood, 1992; Friedl et al., 2002; Pugh and Waxman, 2006; Chamundeeswari et al., 2007; Guo et al., 2009; Gandhi et al., accepted; Jaffar et al., accepted). Fully automatic brain tissue classication from magnetic resonance images (MRI) is also an important research issue. The accurate segmentation of MR images into different tissue classes, like gray matter (GM), white matter (WM), and cerebrospinal uid (CSF), is an important task. Additionally, regional volume calculations sometimes also bring even more useful diagnostic information in diseases like Alzheimer disease, in movement disorders such as Parkinson or Parkinson related syndrome, in white matter metabolic or inammatory disease, in congenital brain malformations or perinatal brain damage, or in post traumatic syndrome. The automatic segmentation of brain MR images, however, remains a difcult problem. Clustering approaches have been widely used for segmentation of MR brain images. The use of neural networks, evolutionary computation and/or fuzzy clustering techniques for MR image segmentation has been investigated in (Suckling et al., 1999; Bhandarkar and Zhang, 1999; Saha and Bandyopadhyay, 2007, 2009, accepted). To cluster a data set, some similarity or dissimilarity criteria has to be dened. A new type of nonmetric distance, based on point symmetry, (dps), is proposed in (Bandyopadhyay and Saha, 2007). For reducing the complexity of point symmetry distance computation, Kd-tree based data structure is used. From the geometrical symmetry viewpoint, point symmetry and line symmetry are two widely discussed issues. Inspired by this, a line symmetry based distance was proposed in (Saha and Bandyopadhyay, 2007). But the proposed distance had several drawbacks. The major shortcoming of the old line symmetry based distance was that its application is limited to two-dimensional data sets only. A new line symmetry based technique using principal component analysis is developed in (Saha and Bandyopadhyay, accepted) that removes the limitations of (Saha and Bandyopadhyay, 2007). This new line symmetry based distance is more general and is applicable for any dimensional data sets. In (Saha and Bandyopadhyay, accepted), a genetic clustering technique is also developed based on this line symmetry distance. But this technique a priori assumes the number of clusters present in a data set. In this article, we have developed a line symmetry based automatic clustering technique. The motivation of this article is to develop a line symmetry-based automatic genetic clustering technique for image segmentation. In this article, a variable string length genetic line symmetry (VGALS-clustering) based clustering technique is proposed which is then used to automatically segment the remote sensing satellite images. Centers of the clusters are encoded in the chromosome whose values vary over a certain range. Here assignment of points to different clusters is done based on a newly proposed line symmetry based distance rather than the Euclidean distance. A new cluster validity index based on the newly developed line-symmetry based distance, LineSym-index, is proposed here and thereafter it is utilized for computing the tness of the chromosomes. The effectiveness of the proposed line symmetry-based automatic clustering technique (VGALS-clustering) is shown for automatically partitioning Indian remote sensing (IRS) satellite images of the parts of the city of Kolkata and SPOT image of the part of the city of Kolkata.
Comparisons are made with those obtained by Fuzzy C-means clustering technique, popular Mean-shift based segmentation technique (Comaniciu and Meer, 2002) and GAPS clustering with Sym-index based method (Bandyopadhyay and Saha, 2007). The segmentation results are compared qualitatively and quantitatively. The effectiveness of the proposed algorithm is thereafter shown in segmenting the MR images of the normal brain and MR brain images with multiple sclerosis lesions. The segmentation results are then compared with the available ground truth information. For the purpose of comparison, the well-known Fuzzy Cmeans algorithm (Bezdek, 1973) and the Expectation Maximization (EM) (Jain et al., 1999) clustering algorithm are also executed, rstly with the number of clusters automatically determined by the VGALS clustering and then with the actual number of clusters present in the images. The segmentation results are compared with that provided by VGALS clustering algorithm quantitatively. In a part of the experiment, fuzzy variable string length genetic algorithm (Fuzzy-VGA) (Maulik and Bandyopadhyay, 2003) which uses the Euclidean distance for computing the membership values of points to different clusters is also executed on the MR brain images to automatically segment it. The results are also compared with those obtained by VGALS clustering technique. compared qualitatively and quantitatively. II. SYMMETRY BASED DISTANCES In this section, at rst the existing point symmetry based distance is described in brief. Then a new denition of the newly developed line symmetry based distance is proposed. A. Existing Point Symmetry Based Distance. In this section, a new PS distance (Bandyopadhyay and Saha, 2007), dps(x,c), associated with point x with respect to a center c is described. Let a point be x. The symmetrical (reected) point of x with respect to a particular centre c is 2 3 c 2 x. Let us denote this by x*. Let knear unique nearest neighbors of x* be at Euclidean distances of dis, i 5 1,2,. . .knear. Then x x x dps ; c dsym ; c3de ; c; Pknear 1
i1 di 3de ; c; x knear
where de(x,c) is the Euclidean distance between the point x and c. The complexity of computing dps(x,c) is of order n, where n is the total number of data points. For all the n points and K clusters, the complexity becomes of order n2K. Thus, to reduce this, the Kdtree based nearest neighbor search ANNlib (Mount and Arya, 2005) (Approximate Nearest Neighbor), which is a library written in C11 (obtained from http://www.cs.umd.edu/$mount/ANN) is used in (Bandyopadhyay and Saha, 2007). ANNlib is used to nd di, i 5 1 to knear, in Eq. (2) efciently. B. Newly Developed Line Symmetry Based Distance (Saha and Bandyopadhyay, accepted). Given a particular data set, we rst nd the rst principal axis of this data set using Principal Component Analysis (Jolliffe, 1986). Let the eigen vector of the co-variance matrix of the data set with highest eigen value be [eg1 eg2 eg3 eg4 . . . egd], where d is the dimension of the
Vol. 21, 86100 (2011)
87
original data. Then the rst principal axis of the data set is given by: x1 c1 x2 c2 xd cd ... eg1 eg2 egd where the center of the data set is c 5 {c1, c2, . . ., cd}. The obtained principal axis is treated as the symmetrical line of the relevant cluster, i.e., if the data set is indeed symmetrical then it should also be symmetric with respect to the rst principal axis of the dataset identied by the principal component analysis (PCA). This symmetrical line is used to measure the amount of line symmetry of a particular point in that cluster. To measure the amount of line symmetry of a point (x) with respect to a particular line i, dls(x, i), the following steps are followed. 1. For a particular data point x, calculate the projected point pi on the relevant symmetrical line i. 2. Find dsym(x,pi) as: Pknear :dsym ; pi x
i1 di knear
Figure 1. Example of line symmetry based distance [color gure can be viewed in the online issue, which is available at wileyonlinelibrary.com.].
where knear unique nearest neighbors of x* 5 2 3 pi 2 x are at Euclidean distances of dis, i 5 1,2,. . .knear. ANN library (Mount and Arya, 2005) utilizing Kd-tree based nearest neighbor search is used to reduce the complexity of computing these dis (as described in Section II-C). Then the amount of line symmetry of a particular point x with respect to the symmetrical line of cluster i, is calculated as: :dls ; i dsym ; pi 3de ; c x x x 4
where c is the centroid of the particular cluster i and de (x,c) is the Euclidean distance between the point x and c. It can be seen from Eq. (3) that knear cannot be chosen equal to 1, since if x* exists in the data set then dsym(x, pi) 5 0 and hence there will be no impact of the Euclidean distance in the denition of dls(x, i). On the contrary, large values of knear may not be suitable because it may underestimate the amount of symmetry of a point with respect to the rst principal axis. Here knear is chosen equal to 2. It may be noted that the proper value of knear largely depends on the distribution of the data set. A xed value of knear may have many drawbacks. For instance, for very large clusters (with too many points), 2 neighbors may not be enough as it is very likely that a few neighbors would have a distance close to zero. On the other hand, clusters with too few points are more likely to be scattered, and the distance of the two neighbors may be too large. Thus, a proper choice of knear is an important issue that needs to be addressed in the future. Note that every point symmetric cluster is also line symmetric with respect to its central axis. Thus, the proposed distance is able to detect line symmetric as well as point symmetric clusters. The previous point symmetry based distance was only able to measure the amount of point symmetry of a point with respect to a cluster center. But the proposed line symmetry based distance measures total line symmetry with respect to the symmetrical line of a cluster. The above line symmetry based distance is realized more clearly from Figure 1. Here x is a particular data set. c is the center of a particular cluster. The rst principal axis of the given cluster is denoted
by Line l. This line is treated as the symmetrical line of the particular cluster. The projected point of x on this symmetrical line is denoted by p. Let the reected point of x with respect to p be denoted by x0 . The rst two nearest neighbors (here knear is chosen equal to 2) of x0 are at Euclidean distances of d1 and d2, respectively. Then the total amount of symmetry of x with respect to the projected point p on line Line l is calculated as follows: dsym ; p d1 d2 . Therefore, the total line symmetry based distance x 2 of the point x with respect to the symmetrical line of cluster l is calculated as dls(x,l) 5 dsym(x, p) 3 de(x, c) where de(x, c) is the Euclidean distance between the point x and the cluster center c. It is evident that the symmetrical distance computation is very time consuming because it involves the computation of the nearest neighbors. Computation of dls(x,i) is of complexity O(N). Hence for N points and K clusters, the complexity of computing the line symmetry based distance between all points to different clusters is O(N2K). To reduce the computational complexity, an approximate nearest neighbor search using the Kd-tree approach is adopted in this article. C. Kd-tree Based Nearest Neighbor Computation. A Kdimensional tree, or Kd-tree is a space-partitioning data structure for organizing points in a K-dimensional space. A Kd-tree uses only those splitting planes those are perpendicular to one of the coordinate axes. Approximate Nearest Neighbor (ANN) is a library written in C11 (Mount and Arya, 2005), which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions. In this article ANN library utilizing Kd-tree for nearest neighbor search is used to nd dis, where i 5 1,. . ., knear, in Eq. (3) efciently. Thus, it requires the construction of a Kd-tree consisting of N points in the data set, where N is the size of the data set. The construction of Kdtree requires O(NlogN) time and O(N) space (Anderberg, 2000). (Friedman et al., 1977) reported O(logN) expected time for nding the nearest neighbor using Kd-tree. III. VGALS-CLUSTERING: VARIABLE STRING LENGTH GENETIC LINE SYMMETRY DISTANCE BASED CLUSTERING TECHNIQUE In this section, the use of variable string length genetic algorithm using the newly developed line symmetry based distance (VGALSclustering) is proposed for automatically evolving the number of clusters present in a data set. Here we have considered the best
88
Vol. 21, 86100 (2011)
partition to be the one that corresponds to the maximum value of the proposed LineSym-index which is dened later. Both the number of clusters as well as the appropriate partitioning of the data are evolved simultaneously using the search capability of genetic algorithms. Since the number of clusters is considered to be variable, the string lengths of different chromosomes in the same population are allowed to vary. As a consequence, the crossover and mutation operators are suitably modied to tackle the concept of variable length chromosomes. The technique is described below in detail. A. String Representation and Population Initialization. In VGALS-clustering, the chromosomes are made up of real numbers which represent the coordinates of the centers of the partitions. If chromosome i encodes the centers of Mi clusters in N dimensional space then its length li is taken to be N * Mi. Each center is considered to be indivisible. Each string i in the population initially encodes the centers of a number, Mi, of clusters, such that Mi 5 (rand()mod M*) 1 2. Here, rand() is a function returning an integer, and M* is a soft estimate of the upper bound of the number of clusters. The number of clusters will therefore range from two to M* 1 1. The Mi centers encoded in a chromosome are randomly selected distinct points from the data set. Thereafter ve iterations of the Kmeans algorithm is executed with the set of centers encoded in each chromosome. The resultant centers are used to replace the centers in the corresponding chromosomes. This makes the centers separated initially. Example. Let M* 5 10. Let the random number Mi be equal to 4 for chromosome i. Then this chromosome will encode the centers of 4 clusters. Let the four cluster centers (four randomly chosen points from the data set) be (10.0, 5.0) (20.4, 13.2) (15.8, 2.9) (22.7, 17.7). Thus the chromosome may look like (20.4, 13.2) (15.8, 2.9) (10.0, 5.0) (22.7, 17.7). B. Fitness Computation. This is composed of two steps. First, assignment of n points to different clusters are done by using the newly developed line symmetry based distance, dls. Next, a new cluster validity index based on the newly developed line symmetry distance, LineSym-index, is computed and used as a measure of the tness of the chromosome. (1) Computing the membership values. Let a particular chromosome encode centers of K number of clusters. At rst, the rst principal axis of each cluster is determined using principal component analysis (Jolliffe, 1986). Then the assignment of each point xj, j 5 1,2, . . . N, to K different clusters are done in the following way. Find the cluster center nearest to xj in the line symmetrical sense. That is, we nd the cluster center k that is nearest to the input pattern xj using the minimum-value criterion: k 5 argmini51,. . .K dls(xj,i) where the line symmetry based distance dls(xj,i) is computed by Eq. (4). If the corresponding dsym(xj, k) [as dened in Eq. (3)] is smaller than a prespecied parameter y, then assign that particular point xj to kth cluster. Otherwise assignment is done based on the minimum Euclidean distance criterion as normally used in (Bandyopadhyay and Maulik, 2002) or the K-means algorithm, i.e., assign xj to kth cluster where k 5 argmini51, . . . K de(xj, ci). Here, ci denotes the center of the ith cluster and de(xj, ci) denotes the Euclidean distance between the point xj and the cluster center ci. The reason for doing such an assignment is as follows: in the intermediate stages of the algorithm, when the centers are not yet properly evolved, then the minimum dls value for a point is expected to be quite large, since the point might not be symmetric with respect to any cluster. In such cases, using
Euclidean distance for cluster assignment appears to be intuitively more appropriate. The value of y is kept equal to the maximum nearest neighbor distance among all the points in the data set as described in (Bandyopadhyay and Saha, 2007, 2008). It is to be noted that if a point is indeed symmetric with respect to the principal axis of some cluster center then the symmetrical distance computed in the above way max will be small, and can be bounded as follows. Let dNN be the maximum nearest neighbor distance in the data set. That is
max x dNN maxi1;...N dNN i ;
where dNN(xi) is the nearest neighbor distance of xi. Assuming that x* lies within the data space, it may be noted that d1
max dNN 2
and
d2
max 3dNN ; 2
max resulted in, d1 d2 dNN : Ideally, a point x is exactly symmetrical 2 with respect to the principal axis of some cluster if d1 5 0. However considering the uncertainty of the location of a point as the sphere max of radius dNN around x, we have kept the threshold y equals to max dNN . Thus the computation of y is automatic and does not require user intervention. After the assignments are done, the cluster centers encoded in the chromosome are replaced by the mean points of the respective clusters. This is referred to as the K-means like update center operation.
(2) Fitness calculation: The tness of a chromosome is computed using a newly dened cluster validity index, LineSym-index Note that this index is inspired by the point symmetry based cluster validity index Sym-index (Saha and Bandyopadhyay, 2008; Bandyopadhyay and Saha, 2008). Let K cluster centres be denoted by ci where 1 i K and ni denotes the number of points present in the i th cluster. Then LineSym-index is dened as follows: LineSymK 1 1 3 3DK ; K EK 7
where K is the number of clusters present in that chromosome. Here, K P P Ei such that Ei ni dls ij ; i and DK maxK ci cj : x EK i;j1 j1 DK is the maximum Euclidean distance between any two cluster centers among all centers. Here xij denotes the jth point of the ith cluster. dls(xij, i) is computed by Eq. (4) where i denotes the symmetrical line (rst principal axis) of the ith cluster. The objective is to maximize the LineSym-index in order to obtain the actual number of clusters and to achieve proper clustering. As formulated in Eq. (7), LineSym is a composition of three factors, these are 1/K, 1=E K and DK. The rst factor increases as K decreases; as LineSym needs to be maximized for optimal clustering, it will prefer to decrease the value of K. The second factor is the within cluster total line symmetrical distance. For clusters which have good symmetrical structure, Ei value is less. This, in turn, indicates that formation of more number of clusters, which are symmetrical in shape, would be encouraged. Finally the third factor, DK, measuring the maximum separation between a pair of clusters, increases with the value of K. As these three factors are complementary in nature, so they are expected to compete and balance each other critically for determining the proper partitioning.
i1
Vol. 21, 86100 (2011)
89
The tness function for chromosome j is dened as LineSymj, i.e., the LineSym -index computed for partitioning encoded in that chromosome. The objective of the GA is to maximize this tness function. C. Selection. Conventional proportional selection is applied on the population of chromosomes. Here, a chromosome receives a number of copies that is proportional to its tness in the population. We have used roulette wheel strategy for implementing the proportional selection scheme. D. Crossover. For the purpose of crossover, the cluster centers are considered to be indivisible, i.e., the crossover points can only lie in between two cluster centers. The crossover operation, applied stochastically with probability of crossover (lc), must ensure that information exchange takes place in such a way that both the offspring encode the centers of at least two clusters. For this, the operator is dened as follows (Maulik and Bandyopadhyay, 2003): Let parent chromosomes P1 and P2 encode M1 and M2 cluster centers, respectively. s1, the crossover point in P1, is generated as s1 5rand() mod M1. Let s2 be the crossover point in P2, and it may vary in between [LB(s2),UB(s2)], where LB(s2) and UB(s2) indicate the lower and upper bounds of the range of s2, respectively. LB(s2) and UB(s2) are given by LB (s2) 5 min[2,max[0,2 2 (M1 2 s1)]] and UB (s2) 5 [M2 2 max[0,2 2 s1]]. Therefore s2 is given by s2 LBs2 randmodUBs2 LBs2 ifUBs2 ! LBs2 ; 0 otherwise: It can be veried by some simple calculations that if the crossover points s1 and s2 are chosen according to the above rules, then none of the offsprings generated would have less than two clusters. E. Mutation. Mutation is applied on each chromosome with probability lm. Mutation is of three types. 1. Each cluster center encoded in a chromosome is mutated with probability lm in the following way. The cluster center is replaced with a random variable drawn from a Laplacian jelj distribution, pe / e d , where the scaling factor d sets the magnitude of perturbation. Here l is the value at the position which is to be perturbed. The scaling factor d is chosen equal to 1.0. The old value at the position is replaced with the newly generated value. 2. One randomly generated cluster center is removed from the chromosome, i.e., the total number of clusters encoded in the chromosome is decreased by 1. 3. The total number of clusters encoded in the chromosome is increased by 1. One randomly chosen point from the data set is encoded as the new cluster center. Any one of the above mentioned types of mutation is applied randomly on a particular chromosome if it is selected for mutation. F. Termination. In this article, we have executed the algorithm for a xed number of generations. Moreover, the elitist model of GAs has been used, where the best string seen so far is stored in a location within the population. The best string of the last generation provides the solution to the clustering problem.
G. Complexity Analysis of VGALS Clustering Technique. Below we have analyzed the time complexity of the proposed VGALS clustering technique. Here N: total number of points in the data set, M* : Maximum possible number of clusters and d: dimension of the data. As discussed above Kd-tree data structure has been used in order to nd the nearest neighbor of a particular point. The construction of Kd-tree requires O(NlogN) time and O(N) space (Anderberg, 2000). Initialization of GA needs O(Popsize 3 stringlength) time where Popsize and stringlength indicate the population size and the length of each chromosome in the GA, respectively. Note that stringlength is O(M* 3 d) where d is the dimension of the data set and M* is the soft estimate of the upper bound of the number of clusters. Fitness Computation is composed of three steps. 1. In order to nd membership values of each point to all cluster centers minimum line symmetrical distance of that point with respect to all clusters have to be calculated. At rst Principal Component Analysis (PCA) is applied to detect the rst Principal axis. This will take O(N 3 d2) time. Thereafter this rst Principal axis is used as the symmetrical line of the respective cluster. In order to determine line symmetry based distance, Kd-tree based nearest neighbor search is used. If the points are roughly uniformly distributed, then the expected case complexity is O(cd 1 logN), where c is a constant depending on dimension and the point distribution. This is O(logN) if the dimension d is a constant (Bentley et al., 1980). (Friedman et al., 1977) also reported O(logN) expected time for nding the nearest neighbor. So in order to nd minimal symmetrical distance of a particular point, O(M*logN) time is needed. Thus total complexity of computing membership values of N points to M* clusters is O(M*NlogN). 2. For updating the centers total complexity is O(M*). 3. Total complexity for computing the tness values is O(N 3 M*). So the tness evaluation has total complexity5 O(Popsize 3 M*NlogN). Selection step of the GA requires O(Popsize 3 stringlength) time. Mutation and Crossover require O(Popsize 3 stringlength) time each. Thus summing up the above complexities, and considering stringlength ( N, total time complexity becomes O(M*NlogN 3 Popsize) per generation. For maximum Maxgen number of generations total complexity becomes O(M*NlogN 3 Popsize 3 Maxgen). H. Space Complexity Analysis. The major space requirement of VGALS clustering is due to its population. Thus, the total space complexity of VGALS clustering is O(Popsize 3 Stringlength), i.e., O(Popsize 3 d 3 M*). Also for each population we have to keep a membership matrix of size N 3 M*. Thus total space compexity will be O(Popsize 3 N 3 M*) as N ) d. IV EXPERIMENTAL RESULTS A. Results on Satellite Images. In this section at rst, the experimental results obtained after application of the above mentioned
90
Vol. 21, 86100 (2011)
Figure 2. Data distribution of IRS image of Kolkata in the rst three feature space.
VGALS-clustering technique for segmenting two remote sensing satellite images of the parts of the cities of Kolkata are provided. The two satellite images are of sizes 512 3 512, i.e., the size of the data set to be clustered in all the images is 262,144. For these multispectral satellite images, the feature vector is composed of the intensity values at different bands of the image. The parameters of the proposed algorithm are as follows: population size520, number of generations520, probability of crossover50.8 and probability of mutation50.2. For the purpose of comparison, popular Fuzzy Cmeans (FCM) (Bezdek, 1981) clustering algorithm, a recent method of satellite image segmentation proposed in (Bandyopadhyay and Saha, 2007) (GAPS clustering with Sym-index based method), mean-shift based segmentation technique (Comaniciu and Meer, 2002) are also executed on real-life images. The results are compared both qualitatively and quantitatively. B. IRS Image of Kolkata. The data used here was acquired from Indian Remote Sensing Satellite (IRS-1A) using the LISS-II sensor
Figure 4. Clustered IRS image of Kolkata using VGALS-clustering technique.
that has a resolution of 36.25 3 36.25 h. The image is contained in four spectral bands namely, blue band of wavelength 0.450.52 lm, green band of wavelength 0.520.59 lm, red band of wavelength 0.62 0.68 lm, and near infra red band of wavelength 0.77 0.86 lm. Thus, the feature vector of each image pixel is composed of four intensity values at different bands. The distribution of the pixels in the rst three feature space of this image is shown in Figure 2. It can be easily seen from the Figure 2 that the entire data can be partitioned into several different shaped clusters where symmetry does exist. Figure 3 shows the Kolkata image in the near infra red band. Some characteristic regions in the image are the river Hooghly cutting across the middle of the image, several sheries observed
Figure 3. IRS image of Kolkata in the near infra red band with histogram equalization.
Figure 5. Clustered IRS image of Kolkata using FCM clustering.
Vol. 21, 86100 (2011)
91
Figure 6. Clustered IRS image of Kolkata using mean-shift based clustering technique.
Figure 7. Clustered IRS image of Kolkata using GAPS clustering technique with Sym-index based method.
toward the lower-right portion, a township, SaltLake, to the upperleft hand side of the sheries. This township is bounded on the top by a canal. Two parallel lines observed towards the upper righthand side of the image correspond to the airstrips in the Dumdum airport. Other than these, there are several water bodies, roads, etc. in the image. From our ground knowledge, we know that the image has four clusters (Maulik and Bandyopadhyay, 2003) and these four clusters correspond to the classes turbid water, pond water, concrete and open space. The VGALS-clustering technique automatically provides four clusters for this image data (Fig. 4). It may be noted that the water class has been differentiated into turbid water (the Hooghly) and pond water (sheries, etc.) because of a difference in their spectral properties. The canal bounding SaltLake from the upper portion has also been correctly classied as pond water. Figure 5 shows the Kolkata image partitioned in four clusters using FCM algorithm (Bezdek, 1973). But the segmentation result is unsatisfactory from the human visualization judgement. As can be seen, the river Hooghly as well as the city region has been incorrectly classied as belonging to the same class. Therefore we can conclude that although some regions, viz., sheries, canal bounding SaltLake, parts of the airstrip, etc., have been correctly identied, a signicant amount of confusion is evident in the FCM clustering result. GAPS clustering technique with Sym index based method automatically provides K 5 4 number of clusters from this data set. The corresponding partitioning is shown in Figure 7. The automatically segmented image after application of the mean-shift based segmentation technique (Comaniciu and Meer, 2002) is shown in Figure 6.
To validate the segmentation result obtained by VGALS-clustering quantitatively, here two well-known Euclidean distance based cluster validity indices, namely I index (Maulik and Bandyopadhyay, 2002) and XB-index (Xie and Beni, 1991) values are also computed. These are provided in Table I. Smaller values of XBindex and larger values of I index correspond to good clustering. The values again show that the segmentation provided by VGALSclustering is much better than the existing other methods. C. SPOT Image of Kolkata. The French satellites SPOT (Systems Probataire dObservation de la Terre) (Richards, 1993), launched in 1986 and 1990, carry two imaging devices that consist of a linear array of charge coupled device (CCD) detectors. Two imaging modes are possible, the multispectral and panchromatic modes. The 512 3 512 SPOT image of a part of the city of Kolkata is available in three bands in the multispectral mode. These bands are: Band 1 green band of wavelength 0.50 0.59 lm Band 2 red band of wavelength 0.61 0.68 lm Band 3 near infra red band of wavelength 0.79 0.89 lm. Thus, here feature vector of each image pixel composed of three intensity values at different bands. The distribution of the pixels in the feature space of this image is shown in Figure 9. It can be easily seen from the Figure 9 that the entire data can be partitioned into several hyperspherical clusters.
Table 1. I index and XB-index values of the segmented Mumbai and Kolkata satellite images provided by VGALS-clustering, FCMclustering, and method proposed in (Bandyopadhyay and Saha, 2007) Kolkata IRS Index VGALS 24.24 1.75 FCM 5.71 23.67 GAPS with Sym 18.27 2.23 VGALS 214.78 2.12 Mumbai IRS FCM 23.06 4.67 GAPS with Sym 180.45 2.91 VGALS 26.58 2.28 SPOT Kolkata FCM 5.71 12.23 GAPS with Sym 20.67 3.05
I index
XB index
92
Vol. 21, 86100 (2011)
Figure 8. SPOT image of Kolkata in the near infra red band with histogram equalization.
Some important landcovers of Kolkata are present in the image. Most of these can be identied, from a knowledge about the area, more easily in the near infra-red band of the input image (Fig. 8). These are the following: The prominent black stretch across the gure is the river Hooghly. Portions of a bridge (referred to as the second bridge), which was under construction when the picture was taken, protrude into the Hooghly near its bend around the center of the image. There are two distinct black, elongated patches below the river, on the left side of the image. These are water bodies, the one to the left being Garden Reach lake and the one to the right being Khidirpore dockyard. Just to the right of these water bodies, there is a very thin line, starting from the right bank of the river, and going to the bottom edge of the picture. This is a canal called the Talis nala. Above the Talis nala, on the right side of the picture, there is a triangular patch, the race course. On the top, right hand side of the image, there is a thin line, stretching from the top edge,
Figure 10. Clustered SPOT image of Kolkata using VGALS clustering technique.
and ending on the middle, left edge. This is the Beleghata canal with a road by its side. There are several roads on the right side of the image, near the middle and top portions. These are not very obvious from the images. A bridge cuts the river near the top of the image. This is referred to as the rst bridge. The proposed VGALS clustering method automatically provides K 5 7 as the optimal number of clusters for this image data set (corresponding partitioning is shown in Figure 10). As identied in (Pal et al., 2001) the above satellite image has seven classes namely, turbid water, concrete, pure water, vegetation, habitation, open space and roads (including bridges). The partitioning provided by the
Figure 9. Data distribution of SPOT image of Kolkata in the feature space.
Figure 11. Clustered SPOT image of Kolkata using FCM clustering technique.
Vol. 21, 86100 (2011)
93
Table II. Confusion matrix of the partitioning obtained by VGALS clustering technique for numeric SPOT Kolkata image. Here following notations are used: TW: turbid water, C: concrete, PW: pure water, V: vegetation, H: habitation, OS: open space and R: roads Ground Truth (Percent) Class TW C PW V H OS R TW 87 0 13 0 0 0 0 C 0 83 0 0 12 0 5 PW 13 0 82 0 0 0 5 V 0 0 0 83 10 7 0 H 0 0 0 15 80 4 1 OS 0 0 0 3 10 81 6 R 0 1 5 0 1 4 89
Figure 12. Clustered SPOT image of Kolkata using mean-shift based clustering technique.
VGALS clustering technique separates almost all the regions well. The Talis nala has been identied properly by the proposed method (shown in Figure 10). The bridge is also correctly identied by the proposed algorithm. This again shows that the proposed VGALS is able to detect clusters of widely varying sizes. The segmentation result obtained by Fuzzy C-means algorithm on this image for K 5 7 (actual number of clusters present in this image data set) is shown in Figure 11. It can be seen from Figure 11 that FCM algorithm is not able to detect the bridge. The automatically segmented image obtained after application of mean-shift based segmentation technique (Comaniciu and Meer, 2002) is shown in Figure 12. Here
also the bridge has not been detected. The method proposed in (Saha and Bandyopadhyay, 2008) (GAPS along with Sym-index) provides K 5 6 clusters from this data set. The corresponding partitioning is shown in Figure 13. This technique is also able to detect the bridge correctly. In order to validate the segmentation result obtained by VGALS-clustering quantitatively, here two well-known Euclidean distance based cluster validity indices, namely I index (Maulik and Bandyopadhyay, 2002) and XB-index (Xie and Beni, 1991) values are also computed. These are provided in Table I. Smaller values of XB-index and larger values of I index correspond to good clustering. The values again show that the segmentation provided by VGALS-clustering is much better than the existing clustering techniques. To validate the results, 932 pixel positions were manually selected from seven different land cover types which were labelled accordingly. The confusion matrix of the partitioning obtained by VGALS clustering technique for this data set is shown in Table II. The class by class classication accuracies are 87, 83, 82, 83, 80, 81, and 89, respectively. The overall accuracy is 87. D. Results on MR Brain Images. The MR images of the brain chosen for the experiments are available in three bands: T1 weighted, proton density (pd)-weighted and T2 -weighted. The normal brain images are obtained from Brainweb database (BrainWeb). The images correspond to the 1 mm slice thickness, 3% noise (calculated relative to the brightest tissue) and with 20% intensity nonuniformity. The image of size 217 3 181 is available in 181 different z planes. The proposed clustering algorithm is executed on seven of these z planes. The parameters of the VGALS
Table III. Minkowski Scores (MS) Obtained by FCM, EM and VGALS clustering algorithms on simulated MR volumes for normal brain projected on different z planes. Here # AC, # OC denotes, respectively, the actual number of clusters and the automatically obtained number of clusters (after application of VGALS). MS for AC z Plane No. 1 2 3 36 72 108 144 #AC 6 6 6 9 10 9 9 FCM 1.077 0.76 0.57 0.90 0.75 0.79 0.82 EM 1.052 0.78 0.76 0.98 0.74 0.589 0.72 #OC 11 9 8 9 10 10 11 FCM 0.80 0.65 0.62 0.90 0.75 0.81 0.49 MS for OC EM 1.17 0.83 0.64 0.98 0.74 0.68 0.60 VGALS 0.77 0.62 0.59 0.84 0.70 0.701 0.32
Figure 13. Clustered SPOT image of Kolkata using GAPS clustering technique with Sym-index based method.
94
Vol. 21, 86100 (2011)
Figure 14. (a) Original T1-weighted MR image of the normal brain in z1 plane. (b) Segmentation obtained by VGALS clustering technique.
algorithm are as follows: population size520, total number of generations515, probability of crossover (lc)50.9, probability of mutation (lm)50.02. Number of clusters, K, is varied from 2 to 20. For the normal MR brain image, the ground truth information is available to us. There are a total of 10 classes present in the images. But the number of classes varies along the different z planes. Ten classes are Background, CSF, Grey Matter, White Matter, Fat, Muscle/Skin, Skin, Skull, Glial Matter, and Connective. Table III shows the actual number of clusters and the number of clusters automatically determined by the proposed VGALS clustering technique (after application on the above mentioned brain images projected on different z planes). To measure the segmentation solution quantitatively, we have also calculated Minkowski Score (MS) (Ben-Hur and Guyon, 2003). This is a measure of the quality of a solution given the true clustering. Let T be the true solution and S the solution we wish to measure. Denote by n11 the number of pairs of elements that are in the same cluster in both S and T. Denote by n01 the number of pairs that are in the same cluster only in S and not in T, and by n10 the number of pairs that are in the same cluster in T and not in S. Minkowski Score (MS) is then dened as:
q MST; S n01 n10 . For MS, the optimum score is 0, with lower n11 n10 scores being better. The MS scores obtained by VGALS clustering corresponding to the 7 brain images are also reported in Table III. For the purpose of comparison, we have executed Fuzzy Cmeans (FCM) (Bezdek, 1973) and Expectation Maximization (EM) (Jain et al., 1999) algorithms on the above mentioned brain datasets with two different values of K. In the rst case, K is kept equal to the actual number of clusters that present in that particular plane. Next, it is set equal to that automatically determined by VGALS algorithm. The MS scores obtained by both the comparing algorithms are also reported in Table III for all the seven images. Results show that the MS scores corresponding to the partitionings provided by the VGALS clustering, in general, is the minimum among all the partitions. This implies the superior performance of VGALS to automatically detect the proper partitioning from MR normal brain images. Figures 14a, 15a, 16a, 17a, and 18a show the original MR normal brain images in T1 band projected on z1, z36, z72, z108, z144 planes, respectively. Figures 14b, 15b, 16b, 17b, and 18b show, respectively, the corresponding automatically segmented images obtained after application of VGALS clustering algorithm.
Vol. 21, 86100 (2011)
95
96
Vol. 21, 86100 (2011)
Table IV. Minkowski Scores (MS) obtained by FCM, EM and VGALS clustering algorithms on simulated MR volumes for brain with multiple sclerosis lesions projected on different z planes. Here # AC, # OC denotes, respectively, the actual number of clusters and the automatically obtained number of clusters (after application of VGALS) MS for AC z Plane No. 1 2 5 36 72 108 144 #AC 6 6 6 9 11 10 9 FCM 0.59 0.75 0.74 0.99 0.72 0.78 0.31 EM 0.59 0.76 0.76 1.01 0.63 0.58 0.76 #OC 9 8 9 9 9 10 11 FCM 0.80 0.66 0.69 0.99 0.77 0.80 0.89 MS for OC EM 1.12 0.83 0.77 1.01 0.67 0.71 0.88 VGALS 0.77 0.66 0.75 0.88 0.75 0.71 0.88
Next, the proposed algorithm is executed on some simulated MR volumes for brain with multiple sclerosis lesions obtained from (BrainWeb). These images are again available in 3 modalities T1 weighted, proton density (pd)-weighted and T2 -weighted. These images also correspond to 1 mm slice thickness, 3% noise (calculated relative to the brightest tissue) and with 20% intensity nonuniformity. Now, the images contain a total of 11 classes. These are Background, CSF, Grey Matter, White Matter, Fat, Muscle/Skin, Skin, Skull, Glial Matter, Connective and MS Lesion. However, the number of classes varies along the z planes. The image is available in 181 different z planes. Fuzzy-VGAPS clustering algorithm is executed on the images projected on 7 different z-planes. The parameters of the algorithm are same as the above. The ground truth information is available to us. MS score (Ben-Hur and Guyon, 2003) is calculated after application of VGALS clustering technique in order to measure the goodness of the solutions. Table IV shows the actual number of clusters present in the image, the obtained number of clusters and the goodness of the corresponding partitioning in terms of the MS Score after application of VGALS clustering algorithm on these 7 different MS Lesion Brain images. For the purpose of comparison, Fuzzy C-means algorithm (Bezdek, 1973) and EM algorithm (Jain et al., 1999) are again executed on these images, rstly with the number of clusters automatically determined by the VGALS and then with the actual number of clusters present in the images. The MS scores of the corresponding partitionings are also
provided in Table IV. The results again show that the MS scores corresponding to the partitionings provided by the VGALS clustering is, in general, the minimum. This again reveals the effectiveness of the VGALS clustering in automatically segmenting the MR brain images with multiple sclerosis lesions. Figures 19a, 20a, 21a, 22a, and 23a show the original MS Lesion Brain image in T1 band projected on z2, z36, z72, z108 and z144 planes, respectively. Figures 19b, 20b, 21b, 22b, and 23b show, respectively, the corresponding automatically segmented images obtained after application of VGALS clustering algorithm. The calculated MS scores show that the proposed VGALS clustering performs well in segmenting both normal brain image and brain image with multiple sclerosis lesions. Further, experiments are carried out in order to establish the effectiveness of the line symmetry based distance in VGALS clustering technique. VGALS clustering technique is now executed on the rst 10 z planes of the MR brain images with multiple sclerosis lesions. The corresponding MS scores are now provided in Table V. The results are then compared with those obtained by Fuzzy-VGA (Maulik and Bandyopadhyay, 2003) (fuzzy variable string length genetic algorithm based clustering technique), optimizing popular Euclidean distance based Xie-Beni (XB) index (Xie and Beni, 1991), where Euclidean distance is used to compute the membership values of points to different clusters. The number of clusters and the MS scores obtained by Fuzzy-VGA after applying it on the rst 10 z planes of the MR brain images with multiple sclerosis
Figure 19. (a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z2 plane. (b) Segmentation obtained by VGALS clustering technique.
Vol. 21, 86100 (2011)
97
98
Vol. 21, 86100 (2011)
lesions are also provided in Table V. Results show that the proposed VGALS clustering technique is more effective than the FuzzyVGA. This establish the fact that the line symmetry based distance is more effective in segmenting the MR brain images than the existing Euclidean distance. V. DISCUSSION AND CONCLUSION In this article, a new variable string length genetic algorithm based clustering technique is developed which utilizes a recently developed line symmetry based distance for assignment of points to different clusters. A new cluster validity index based on the line symmetry based distance is also developed here and thereafter it is utilized for computing the tness of the proposed genetic clustering technique. The proposed clustering technique can automatically determine the appropriate partitioning and the appropriate number of partitions from a given data set having line symmetrical clusters. The effectiveness of the proposed technique is shown in detecting the proper partitioning from two remote sensing satellite images of the parts of the cities of Kolkata. Results are compared with those obtained by the Fuzzy C-means clustering technique, Mean-shift based method and GAPS-clustering with Sym-index based method (Bandyopadhyay and Saha, 2007). Thereafter, the effectiveness of the proposed algoTable V. The automatically obtained cluster (OC) number and the corresponding Minkowski Scores (MS) after application of VGA And VGALS clustering algorithms on simulated MR volumes for brain with multiple sclerosis lesions projected on rst 10 z planes VGA z Plane No. 1 2 3 4 5 6 7 8 9 10 AC 6 6 6 6 6 6 6 6 6 9 OC 2 2 2 5 2 2 2 2 2 2 MS 1.21 1.20 1.19 0.69 1.184 1.18 1.17 1.16 1.16 1.17 OC 11 8 11 9 9 8 9 10 11 11 VGALS MS 0.77 0.77 0.78 0.77 0.75 0.71 0.78 0.73 0.84 0.83
rithm is shown in segmenting several MR brain images. The segmentation results are then compared with the available ground truth information. For the purpose of comparison, the well-known Fuzzy Cmeans and EM algorithms are also executed on these images. Experimental results show that VGALS clustering is not only able to automatically segment the MR brain images into different tissue classes but the corresponding segmentation results are also the best. Note that present work does not use any spatial information while segmenting the images. But incorporation of spatial information surely improves the quality of the results. Thus some new methods of incorporating the spatial information have to be invented in the future. Future work also includes the development of some fuzzy genetic clustering technique based on the line symmetry based distance. Developing some new form of symmetry, like plane symmetry etc. is also another important future research work. Authors are currently working in this direction. REFERENCES
Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarzkopf. Computational Geometry: Algorithms and Applications. SpringerVerlag, Berlin, Germany, 2nd edition, 2000. S. Bandyopadhyay and U. Maulik, Genetic clustering for automatic evolution of clusters and application to image classication, Pattern Recogn (2002), 11971208. S. Bandyopadhyay and S. Saha, GAPS: A clustering method using a new point symmetry based distance measure, Pattern Recogn 40 (2007), 3451. S. Bandyopadhyay and S. Saha, A point symmetry based clustering technique for automatic evolution of clusters, IEEE Trans Knowl Data Eng 20 (2008), 117. S. Bandyopadhyay, U. Maulik, and A. Mukhopadhyay, Multiobjective genetic clustering for pixel classication in remote sensing imagery, IEEE Trans Geosci Remote Sens 45 (2007), 15061511. A. Ben-Hur and I. Guyon, Detecting Stable Clusters using Principal Component Analysis in Methods in Molecular Biology, M. Brownstein and A. Kohodursky, (Editors), Humana Press, 2003. J.L. Bentley, B.W. Weide, and A.C. Yao, Optimal expected-time algorithms for closest point problems, ACM Trans Math Software 6 (1980), 563580. J.C. Bezdek, Fuzzy mathematics in pattern classication, Ph.D. dissertation, Cornell University, Ithaca, NY, 1973.
Vol. 21, 86100 (2011)
99
J.C. Bezdek, Pattern recognition with fuzzy objective function algorithms, Plenum, New York, 1981. S.M. Bhandarkar and H. Zhang, Image segmentation using evolutionary computation, IEEE Trans Evol Comp 3 (1999), 121. G. Bilgin, S. Erturk, and T. Yildirim, Unsupervised classication of hyperspectral-image data using fuzzy approaches that spatially exploit membership relations, IEEE Geosci Remote Sens Lett 5 (2008), 673677. BrainWeb: Simulated brain database. Available at: http://www.bic.mni. mcgill.ca/brainweb; http://www.bic.mni.mcgill.ca/brainweb/. C.A. Cocosco, V. Kollokian, R.K.-S. Kwan, A.C. Evans: BrainWeb: Online Interface to a 3D MRI Simulated Brain Database NeuroImage, vol. 5, no. 4, part 2/4, S425, 1997 Proceedings of 3-rd International Conference on Functional Mapping of the Human Brain, Copenhagen, May, 1997. R.L. Cannon, R. Dave, J.C. Bezdek, and M. Trivedi, Segmentation of a thematic mapper image using fuzzy c-means clustering algorithm, IEEE Trans Geosci Remote Sens 24 (1986), 400408. V.V. Chamundeeswari, D. Singh, and K. Singh, Unsupervised land cover classication of SAR images by contour tracing, Proc. of IEEE International Geoscience and Remote Sensing Symposium (IGARSS, 2007), Barcelona, Spain, July, 2328, 2007, pp. 547550. C.-C.T. Chen and D.A. Landgrebe, A spectral design system for the HIRIS/ MODIS era, IEEE Trans Geosci Remote Sens 27 (1989), 681686. D. Comaniciu and P. Meer, Mean shift: A robust approach toward feature space analysis, IEEE Trans Pattern Anal Machine Intell 24 (2002), 603619. M.A. Friedl, D.K. McIver, J.C.F. Hodges, X.Y. Zhang, D. Muchoney, A.H. Strahler, C.E. Woodcock, S. Gopal, A. Schneider, A. Cooper, A. Baccini, F. Gao, C. Schaaf, Global land cover mapping from MODIS: Algorithms and early results, Remote Sens Environ 83 (2002), 287302. J.H. Friedman, J.L. Bently, and R.A. Finkel, An algorithm for nding best matches in logarithmic expected time, ACM Trans Math Software 3 (1977), 209226. V. Gandhi, J.M. Kang, S. Shekhar, J. Ju, E.D. Kolaczyk, and S. Gopal, Context inclusive function evaluation: a case study with EM-based multi-scale multi-granular image classication. Knowl. Inf. Syst. 21, 2 (Oct. 2009), 231247. DOI5http://dx.doi.org/10.1007/s10115-009-0208-0 H. Ghassemian and P.A. Landgrebe, Object oriented feature extraction method for image data compaction, IEEE Control Syst Mag 8 (1988), 4248. D. Guo, H. Xiong, V. Atluri, and N.R. Adam, Object discovery in highresolution remote sensing images: a semantic perspective, Knowl Inf Syst 19 (2009), 211233. DOI=http://dx.doi.org/10.1007/s10115-008-0160-4. E.E. Hilbert, Cluster compression algorithm-a joint clustering data compression concept, JPL Publication, NASA, Technical Report, 1977 M.A. Jaffar, A. Hussain, and A.M. Mirza, Fuzzy entropy based optimization of clusters for the segmentation of lungs in CT scanned images. Knowl. Inf. Syst. 24, 1 (Jul. 2010), 91-111. DOI=http://dx.doi.org/10.1007/s10115-0090225-z. A.K. Jain, M.N. Murthy, and P.J. Flynn, Data clustering: A review, ACM Comput Rev 31, 3 (Sep. 1999), 264323. DOI=http://doi.acm.org/10.1145/ 331499.331504. I. Jolliffe, Principal component analysis, Springer Series in Statistics, England, 1986. R. Kauth, A. Pentland, and G. Thomas, BLOB: An unsupervised clustering approach to spatial preprocessing of MSS imagery, Proceedings of 11th Int. Symp. Remote Sensing of the Environment, Ann Arbor, Mich, 1977, 1309 1317. A. Marcal and L. Castro, Hierarchical clustering of multispectral images using combined spectral and spatial criteria, IEEE Geosci Remote Sens Lett 2 (2005), 5963.
U. Maulik and S. Bandyopadhyay, Performance evaluation of some clustering algorithms and validity indices, IEEE Trans Pattern Anal Machine Intell 24 (2002), 16501654. U. Maulik and S. Bandyopadhyay, Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classication, IEEE Trans Geosci Remote Sens 41 (2003), 10751081. D.M. Mount and S. Arya, ANN: A library for approximate nearest neighbor searching, 2005, Available at:http://www.cs.umd.edu/$mount/ANN. S.K. Pal, S. Bandyopadhyay, and C.A. Murthy, Genetic classiers for remotely sensed images: Comparison with standard methods, Int J Remote Sens 22 (2001), 25452569. M.L. Pugh and A.M. Waxman, Classication of spectrally-similar land cover using multi-spectral neural image fusion and the fuzzy ARTMAP neural classier, Proc. of IEEE International Geoscience and Remote Sensing Symposium (IGARSS, 2006), Denver, Colorado, July 31, 2006Aug. 4, 2006, pp. 18081811. J.A. Richards, Remote sensing digital image analysis: An introduction, Springer-Verlag, New York, 1993. S. Saha and S. Bandyopadhyay, MRI brain image segmentation by fuzzy symmetry based genetic clustering technique, Proceedings of the 2007 IEEE Congress on Evolutionary Computation (CEC07), Singapore, 2007a, pp. 44174424. S. Saha and S. Bandyopadhyay, A genetic clustering technique using a new line symmetry based distance measure, Proceedings of Fifth International Conference on Advanced Computing and Communications (ADCOM07), IEEE Computer Society, Guwahati, India, 2007b, pp. 365370. S. Saha and S. Bandyopadhyay, Application of a new symmetry based cluster validity index for satellite image segmentation, IEEE Geosci Remote Sens Lett 5 (2008), 166170. S. Saha and S. Bandyopadhyay, MR brain image segmentation using a multi-seed based automatic clustering technique, Fundam Inform 97 (2009), 199214. S. Saha and S. Bandyopadhyay, A new multiobjective clustering technique based on the concepts of stability and symmetry, Knowl Inf Syst 23 (2010), 127. S. Saha and S. Bandyopadhyay, Automatic MR brain image segmentation using a multiseed based multiobjective clustering approach, Applied Intelligence. DOI: 10.1007/s10489-010-0231-6. S. Saha and S. Bandyopadhyay, On principle axis based line symmetry clustering techniques, Memetic Computing. DOI: 10.1007/s12293-010-0049-0. K. Sayood, Data compression in remote sensing applications, IEEE Geosci Remote Sens Newslett 84 (1992), 715. J. Suckling, T. Sigmundsson, K. Greenwood, and E. Bullmore, A modied fuzzy clustering algorithm for operator independent brain tissue classication of dual echo MR images, Magn Reson Imaging 17 (1999), 10651076. Y. Tarabalka, J.A. Benediktsson, and J. Chanussot, Spectralspatial classication of hyperspectral imagery based on partitional clustering techniques, IEEE Trans Geosci Remote Sensi 47 (2009), 29732987. M. Tyagi, F. Bovolo, A. Mehra, S. Chaudhuri, and L. Bruzzone, A contextsensitive clustering technique based on graph-cut initialization and expectation-maximization algorithm, IEEE Geosci Remote Sens Lett 5 (2008), 21 25. C. Wemmert, A. Puissant, G. Forestier, and P. Gancarski, Multiresolution remote sensing image clustering, IEEE Geosci Remote Sens Lett 6 (2009), 533537. X.L. Xie and G. Beni, A validity measure for fuzzy clustering, IEEE Trans Pattern Anal Machine Intell 13 (1991), 841847.
100
Vol. 21, 86100 (2011)
Copyright of International Journal of Imaging Systems & Technology is the property of John Wiley & Sons, Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

A New Line Symmetry Distance Based Automatic

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

A New Line Symmetry Distance Based Automatic

Încărcat de

Drepturi de autor:

Formate disponibile

A New Line Symmetry Distance Based Automatic Clustering Technique: Application to Image Segmentation

Received 3 March 2009; accepted 16 May 2010

Correspondence to: Sriparna Saha; e-mail: sriparna.saha@gmail.com

' 2011 Wiley Periodicals, Inc.

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Figure 4. Clustered IRS image of Kolkata using VGALS-clustering technique.

Figure 5. Clustered IRS image of Kolkata using FCM clustering.

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Figure 9. Data distribution of SPOT image of Kolkata in the feature space.

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

Vol. 21, 86100 (2011)

S-ar putea să vă placă și