Documente Academic
Documente Profesional
Documente Cultură
Abstract
The mammography is the most effective procedure to diagnosis the breast cancer at an early stage. This paper proposes mammogram
image segmentation using Rough K-Means (RKM) clustering algorithm. The median filter is used for pre-processing of image and it is
normally used to reduce noise in an image. The 14 Haralick features are extracted from mammogram image using Gray Level Cooccurrence Matrix (GLCM) for different angles. The features are clustered by K-Means, Fuzzy C-Means (FCM) and Rough K-Means
algorithms to segment the region of interests for classification. The result of the segmentation algorithms compared and analyzed
using Mean Square Error (MSE) and Root Means Square Error (RMSE). It is observed that the proposed method produces better
results that the existing methods.
Keywords Mammogram, Data mining, Image Processing, Feature Extraction, Rough K- Means and Image
Segmentation
----------------------------------------------------------------------***----------------------------------------------------------------------1. INTRODUCTION
Breast cancer is the most common type of cancer found among
women. It is the most frequent form of cancer and one in 22
women in India is likely to suffer from breast cancer [1]. This is
the second main cause of cancer death in women. Breast cancer
in India is in rise and rapidly becoming the leading cancer in
females and death toll is increasing at fast rate and no effective
way to treat this disease yet. So, early detection becomes a
critical factor to cure the disease and improving the surviving
rate. Generally the X-ray mammography is a valuable and most
reliable method in early detection.
Image segmentation refers to the process of partitioning a digital
image to multiple segments or set of pixels. The goal of
segmentation is to simplify the representation of an image into
different segments that is more meaningful and easier to
analyze. Image segmentation is typically used to locate objects
and boundaries in images. It is also the process of assigning
a label to every pixel in an image such that pixels with the
same label share certain visual characteristics. The result of
image segmentation is a set of segments that collectively
cover the entire image. Image segmentation is nothing but
the process of dividing an image into disjoint homogenous
regions [2]. These regions usually contain similar objects of
interest. The homogeneity of the segmented regions can be
measured using pixel intensity. Image segmentation techniques
can be broadly classied as into ve main classes threshold
based, Cluster based, Edge based, Region based, and
Watershed based segmentation [3]. This paper focuses on
cluster based segmentation.
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
66
O r G i | Pj T, j 1, 2,3,... , m * n .
i
O T G i j , j 1, 2,3,... , m * n, s.t Pj T .
i
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
67
B T G i | Pj T, j 1, 2,3,.. , m * n .
i
B T G i j, j 1, 2,3, .. , m * n s.t. Pj T
i
Where
R OT 1
OT
R BT 1
BT
OT
BT
OT OT
OT
BT BT
BT
5. PRE-PROCESSING
Pre-processing is an important issue in low-level image
processing. It is possible to filter out the noise present in
image using filtering. A high pass filter passes the frequent
changes in the gray level and a low pass filter reduces the
frequent changes in the gray level of an image. That is; the low
pass filter smoothes and often removes the sharp edges. A
special type of low pass filter is the Median filter. The Median
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
68
6. FEATURE EXTRACTION
The idea is to calculate the co-occurrence matrix for small
regions of the image and then use this matrix to find statistic
values, for instance Contrast, Correlation, Uniformity,
Homogeneity, Probability, Inverse and Entropy. The distance
and angle is converted to a vertical and a horizontal offset in
pixels according to the following list of angle offset.
Gray-Level Co-occurrence Matrix (GLCM) is one of the texture
descriptors most used in the literature. Starting to summarize
different researches we can find works by Bovis and Singh [19]
studying how to detect masses in mammograms on the basis of
textural features using five co-occurrence matrices statistics
extracted from four spatial orientations, horizontal, left diagonal,
vertical and right diagonal corresponding to (00, 450, 900 and
1350) and four pixel distance (d = 1, 3, 6 and 9). Hence, a
classification is performed using each feature vector and linear
discriminate analysis. According to Marti et al. [20], GLCMs
are frequently used in computer vision obtaining satisfactory
results as texture classifiers in different applications. Their
approach uses mutual information with the purpose to calculate
the amount of mutual information between images using
histograms distributions obtained by grey-level co-occurrence
matrices. Blot and Zwiggelaar [21, 22] proposed two approaches
based in detection and enhancement of structures in images
using GLCM. The purpose is to compare the difference between
these two matrices obtaining a probability estimate of the
abnormal image structures in the ROI. Finally, another study
based on background texture extraction for classification of Blot
and Zwiggelaar [23] presented their work where there is a
statistical difference between GLCM for image regions that
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
69
7. CLUSTERING ALGORITHM
The main objective in cluster analysis is to group objects that are
similar in one cluster and separate objects that are dissimilar by
assigning them to different clusters. One of the most popular
clustering methods is K-Means clustering algorithm. It classifies
object to a pre-defined number of clusters, which is given by the
user (assume K clusters). The idea is to choose random cluster
centres, one for each cluster. These centres are preferred to be as
far as possible from each other. In this algorithm mostly
Euclidean distance is used to find distance between data points
and centroids [7]. The Euclidean distance between two multidimensional data points X = (x1, x2, x3, ..., xm) and Y = (y1,
y2, y3, ..., ym) is described as follows:
D( X , Y )
x
n
i 0
yj
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
70
Procedure:
distance,
for
, j 1, 2, , K
all
and
using
cluster
data
i 1, 2, , n .
1 i n
1 j K
where
( d ij ) 1 m
,
Ii
1
K
1 m
U ij ( d ij )
i 1
jIi
0,
U ij 1, j I i , I i
j I
Ii
1 i n
j 1 j K ; d
ij
approximation
1 j K
i 1
(U ij ) m
xi
U (K ) of K Clusters.
Procedure:
Step1: Randomly assign each data object one lower
approximation U (K). By property 2, the data object
also belongs to upper approximation
same Cluster.
Step 2: Compute Cluster Centroids Cj
If
U (K ) of the
U (K ) and U ( K ) U ( K )
(U ij ) m
Cj
i 1
xj
x U ( K )
U (K )
Else
If
U (K ) and U ( K ) U ( K )
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
71
The MSE and RMSE values for the RKM segmentation, FCM
segmentation and K-Means segmentation are tabulated in table
I, table II, table III, table IV, table V and table VI respectively.
According to the segmentation errors means square error (MSE)
and root mean square error (RMSE) the GLCM at distance 1
and angle 450 gives the best result for all tested image. These
are demonstrated in Figure 5.
xj
U (K ) U (K )
U (K )
Wu
x (U ( K ) U ( K ))
xj
U (K ) U (K )
MDB01
7
U (K)
8. EXPERIMENTAL RESULTS
In this paper, the image samples are taken from the benchmark
MIAS database analyzing for analyzing the proposed method.
14 Haralick features were extracted using Gray level Cooccurrence Matrix (GLCM). The sub-matrices of size 5 x 5 is
used for constructing GLCM at different angle with distance d =
1 and then feature are extracted. Further feature are clustered
into five groups by RKM algorithm, each groups is partition into
one segment, the segmented image show in Figure 3. The same
features are used to cluster using K- Means and FCM algorithms
with five groups each groups is partition into one segment, the
segmented image shown in Figure 4 and Figure 5. The quality of
segmentation result are measured using MSE and RMSE if the
error value becomes low means that the better results. Figure2
shows the proposed system.
Clustering
Image Segmentation
MDB
0114
MDB
213
MDB
290
MDB01
7
MDB0
18
MDB
0114
MDB
213
MDB
290
Original
MDB0
72
Feature Extraction
MDB0
18
Image Database
Pre-Processing
MDB0
72
Original
Angle 00
x U ( K )
Angle 450
C j Wl
x (U ( K ) U ( K ))
Angle 900
Else
Angle1350
Cj
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
72
MDB
018
MDB0
114
MDB
213
MDB
290
Original
MD
B017
MSE
mdb0
17
9.75e+
003
8.05e+
003
9.82e+
003
9.11e+
003
mdb0
72
7.65e+
003
9.17e+
003
8.09e+
003
7.15e+
003
mdb0
18
6.27e
+003
6.34e
+003
6.02e
+003
5.74e
+003
mdb1
14
8.23e
+003
8.26e
+003
8.06e
+003
1.10e
+004
mdb2
13
5.63e
+003
5.77e
+003
5.79e
+003
6.18e
+003
mdb29
0
7.38e+
003
7.31e+
003
8.06e+
003
6.91e+
003
in-order to error
mdb2
13
75.04
mdb29
0
85.91
75.97
85.54
76.13
89.79
78.65
83.13
Performance Analysis on
rate (MSE)
Sa
MSE
mpl mdb0
mdb0
e
17
72
Ima
ge
An 1.08e+ 1.18e+
gle 004
004
00
An 8.11e+ 1.06e+
gle 003
004
450
An 1.11e+ 1.30e+
gle 004
004
900
An 1.16e+ 1.29e+
gle 004
004
135
0
Performance Analysis on
rate (RMSE)
Sa
RMSE
mpl mdb0
mdb0
e
17
72
Ima
ge
An 104.17 109.00
gle
00
An 90.08
102.99
gle
450
An 105.78 114.08
gle
900
An 107.83 113.58
gle
135
0
mdb0
18
mdb11
4
mdb2
13
mdb2
90
1.41e+
004
8.77e+
003
8.84e
+003
1.10e
+004
1.01e+
004
8.41e+
003
7.94e
+003
9.43e
+003
1.19e+
004
9.97e+
003
9.86e
+003
1.07e
+004
1.10e+
004
1.17e+
004
1.01e
+004
1.09e
+004
mdb0
18
mdb11
4
mdb2
13
mdb2
90
118.99
93.68
94.05
105.9
9
100.92
91.75
89.10
97.11
109.20
99.89
99.31
103.7
1
105.20
108.32
100.6
1
104.6
6
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
73
1.24e+
004
mdb0
72
1.61e+
004
mdb0
18
1.56e+
004
mdb11
4
1.31e+
004
mdb2
13
1.15e
+004
mdb2
90
MSE
mdb0
17
0
45
90
135
10,000
1.18e
+004
9,000
8,000
7,000
6,000
5,000
1.18e+
004
1.24e+
004
1.20e+
004
1.05e+
004
1.03e
+004
1.08e
+004
1.19e+
004
1.82e+
004
1.27e+
004
1.07e+
004
1.06e
+004
1.18e
+004
1.25e+
004
1.86e+
004
1.24e+
004
1.28e+
004
1.15e
+004
1.24e
+004
Mdb017
Mdb018
Mdb114
Images
Mdb213
Mdb290
120
115
110
105
100
95
90
85
80
75
70
RMSE
mdb0
17
mdb0
72
mdb0
18
mdb11
4
mdb2
13
Mdb017
Mdb072
mdb2
90
Mdb018
Mdb114
Images
Mdb213
Mdb290
(b)
4
1.5
111.61
108.97
109.34
127.26
111.41
135.16
119.82
109.81
112.85
114.64
102.23
103.77
107.2
7
108.9
1
101.8
5
104.2
6
103.2
9
108.9
3
136.69
111.66
113.55
107.3
8
111.5
8
0
45
90
135
1.3
1.2
1.1
1
0.9
0.8
0.7
111.99
x 10
1.4
Sa
mpl
e
Ima
ge
An
gle
00
An
gle
450
An
gle
900
An
gle
135
0
Mdb072
(a)
Sa
mpl
e
Ima
ge
An
gle
00
An
gle
450
An
gle
900
An
gle
135
0
Mdb 017
Mdb 072
Mdb 018
Mdb 114
Mdb 213
Mdb 290
Images
(c)
120
0
45
90
135
115
110
105
100
95
90
85
Mdb 017
Mdb 072
Mdb 018
Mdb 114
Images
Mdb 213
Mdb 290
(d)
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
74
x 10
0
45
90
135
1.9
1.8
1.7
REFERENCES
1.6
1.5
[1]
1.4
1.3
1.2
1.1
[2]
1
0.9
Mdb 017
Mdb 072
Mdb 018
Mdb 114
Images
Mdb 213
Mdb 290
(e)
[3]
140
0
45
90
135
135
130
125
120
[4]
115
110
105
100
95
90
85
Mdb 017
Mdb 072
Mdb 018
Mdb 114
Images
Mdb 213
Mdb 290
[5]
(f)
[6]
Figure6. Performance Analysis on Error rates in (a) MSE (b)
RMSE RKM algorithm (c) MSE (d) RMSE FCM Algorithm, (e)
MSE (f) RMSE K-Means Algorithm
CONCLUSIONS
[7]
[8]
ACKNOWLEDGMENTS
The second author immensely acknowledges the UGC, New
Delhi for partial financial assistance under UGC-SAP (DRS)
Grant No. F.3-50/2011
[9]
[10]
[11]
[12]
[13]
[14]
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
75
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
BIOGRAPHIES
Subash Chandra Boss Rajaraman was born
in 1985 at Villuppuram District, Tamilnadu,
India. He is received the Master of Science in
Computer Science in 2009 from Pondicherry
University, Pondicherry, India. He obtained his
M.Phil (Computer Science) Degree from
Periyar University, Salem, Tamilnadu, India in 2010. Currently
he is doing fulltime Ph.D., Periyar University, Salem,
Tamilnadu, India. His area of interests includes Medical Image
Processing, Data Mining, Neural Network, Fuzzy logic, and
Rough Set.
Thangavel Kuttiyannan was born in 1964 at
Namakkal, Tamilnadu, India. He received his
Master of Science from the Department of
Mathematics, Bharathidasan University in 1986,
and Master of Computer Applications Degree
from Madurai Kamaraj University, India in
2001. He obtained his Ph.D. Degree from the Department of
Mathematics, Gandhigram Rural Institute-Deemed University,
Gandhigram, India in 1999. Currently he is working as
Professor and Head, Department of Computer Science, Periyar
University, Salem. He is a recipient of Tamilnadu Scientist
award for the year 2009 and Sir C.V.Raman award for the year
2013. His area of interests includes Medical Image Processing,
Data Mining, Artificial Intelligence, Neural Network, Fuzzy
logic, and Rough Set.
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
76
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
77