Documente Academic
Documente Profesional
Documente Cultură
| |
=
\ .
_
i j i
k
i
i
v
x S
t
otherwise
k
(2)
where
2
1
u
=
=
_
q
r i
r
i
x c
q
, q is the number of the
neighbors of
i
c and
i
S is a set with q nearest
neighbors of
i
c .
Step4. For each data
j
x , obtain the membership
ij
E with
the following equation:
( )
( )
1
2
1
1
2
1
1
|
|
_
ij j i
ij
k
hj j h
h
W x c
E
W x c
(3)
where 2 | > .
Step5. Update each cluster center
i
c as follows:
( )
( )
1
1
|
|
=
=
=
_
_
N
ij ij j
j
i
N
ij ij
j
E W x
c
E W
(4)
Step6. If
1
max e
=
<
k
i i
i
c o , then goto Step8; otherwise
goto Step7.
Step7. If o s H , then let 1 = + H H , =
i i
o c ( 1, 2, = i
, " k ) and goto Step3; otherwise goto Setp8. o is
the maximum running times of LFCM.
Step8. End LFCM.
III. THE PROPOSED SEEDLFCM
Firstly, the generation of seeds is given. Given the num-
ber of clusters k and a nonempty set
{ }
1 2
, , , = "
N
X x x x
of all unlabeled data in the d-dimensional space
d
R , the clu-
stering algorithms can partition X into k clusters. Let L ,
called the seed set, be the subset of X and for each
m
x
( e
m
x L ), the label be given by means of supervision[8].
We assume that L can be divided into k groups on the
basis of data labels and each subgroup should contain at least
two labeled data for the implementation of SKM. Therefore,
we can acquire a k partitioning
{
1
, L }
2
,"
k
L L of the seed
set L .
Secondly, the procedure for the proposed clustering
method SeedLFCM is as follows:
Step1. Let a set
{ }
1 2
, , , = "
k
L L L L .
Step2. For each subset of L, compute k initial centers
i
c
( 1, 2, , = " i k ) using the following equation:
1
1
=
=
_
G
i g
g
c x
G
(5)
where e
g i
x L and G is the number of all data
belonging to
i
L
Step3. Use k initial centers
1
c ,
2
c , ",
k
c for the initiali-
zation of SeedLFCM.
Step4. The Step2, Step3, Step4, Step5, Step6, Step7 and
Step8 of LFCM are done according to the algorithm
flow of LFCM.
Step5. When LFCM ends, each data should be given its clu-
ster label.
Step6. Assume that one data x is assigned to the ith cluster.
Compute the distance
i
d between x and the cluster
center
i
c .
Step7. Caculate the distance
m
d between x and the nearest
seed
m
x to x .
Step8. Apply the following novel decision rule to reassign-
ing x to one new cluster label Q.
<
=
m i
m if d d
Q
i otherwise
(6)
Step9. SeedLFCM ends.
Three main differences between the previous LFCM and
the proposed novel SeedLFCM as follows:
a. The SeedLFCM is the semi-supervised clustering algo-
rithm, but the LFCM is one of the unsupervised clustering
methods.
89
b. The SeedLFCM uses some labeled seeds for the
initialization, but LFCM employs the random method.
c. The SeedLFCM applies a novel decision rule Eq.(6) to
labeling data on the basis of the results of the LFCM algo-
rithm.
IV. EXPERIMENTAL RESULTS
To demonstrate the effectiveness of the novel proposed
SeedLFCM semi-supervised clustering algorithms, we
compared it with several traditional unsupervised and semi-
supervised clustering methods, such as KM, LFCM and
SeedKM, on one artificial dataset[12] and three UCI real
datasets[13], referred to as DUNN, Iris, BUPA and Sonar
respectively. As shown in Figure.1, one artificial dataset also
called DUNN is a 2-dimensional dataset with 90 instances
of two classes. Iris dataset contains 150 cases with 4-
dimensional feature from three classes. BUPA dataset
collects 345 6-dimensional cases belonging to two classes.
Sonar dataset contains 208 60-dimensional data samples
divided into two classes. All experiments were done by
Matlab on WindowsXP operating system.
Figure.1 The DUNN dataset
For LFCM and SeedLFCM, on each dataset, we set
300 o = ,
5
10 c