Sunteți pe pagina 1din 7

Computers and Electronics in Agriculture 165 (2019) 104962

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture


journal homepage: www.elsevier.com/locate/compag

Segmentation of tomato leaf images based on adaptive clustering number of T


K-means algorithm

Kai Tian, Jiuhao Li , Jiefeng Zeng, Asenso Evans, Lina Zhang
College of Water Conservancy and Civil Engineering, South China Agricultural University, Guangdong 510642, China

A R T I C LE I N FO A B S T R A C T

Keywords: In image-based intelligent identification of crop diseases, leaf image segmentation is a key step. Although the K-
K-means means is a commonly used algorithm between a number of segmented methods, which needs to set the clustering
Leaf image number in advance, so as to make a manual influence on the image segmentation quality. This paper studies an
Image segmentation improved K-means algorithm based on the adaptive clustering number for the segmentation of tomato leaf
Validity index
images. The whole experiment images were acquired from the tomato we grew. The white paper background
images were used for designing algorithm and the natural background images were the algorithm validated data.
Through a series of pretreatment experiments, the value of the clustering number in this algorithm was auto-
matically determined by calculating the DaviesBouldin index, and the initial clustering center was given to
prevent the clustering calculation from falling into a local optimum. Finally, we verified the accuracy of seg-
mentation by two kinds of objective assessment measures, the clustering F1 measure and Entropy. Compared
with the traditional K-means algorithm, DBSCAN algorithm, Mean Shift algorithm and ExG-ExR color indices
method, the proposed algorithm can successfully segment the tomato leaf images more precisely and efficiently.

1. Introduction insect pests, and weeds, because of its concision and high-efficiency.
But it produces different cluster result for the different number of
The diseased crop leaf images segmentation can be divided into two clusters. So, it is required to initialize the proper number of clusters, K
steps: leaf segmentation and lesion segmentation (Wang et al., 2018). (Bhusare and Bansode, 2014). Again, it is required to initialize K
Fast and accurate segmentation of leaf image has always been a chal- number of the centroid. A different value of initial centroid would result
lenging part in image processing in crop disease image identification. in a different cluster. Hence, the selection of proper initial centroid is
There are numerous traditional image processing based techniques for also an important task.
image segmentation, such as threshold based algorithm, edge-based Many researchers are trying to produce new methods to overcome
algorithm, region growing method, and so on (Hammouda and Karray, the drawbacks of K-means algorithm. Purohit and Joshi (2013) in-
2000). With the development of machine learning, certain new image troduced a new efficient approach towards K-means clustering algo-
segmentation algorithm has been shown up, like neural network based rithm. The proposed method generated the cluster center to reduce the
algorithm and clustering based algorithm. The goal of image segmen- mean square error of the final cluster without large increment in the
tation is to separate the different target areas in the image, and the execution time. Many comparisons have been done and it can conclude
purpose of clustering is to divide the data with the same properties into that accuracy is more for dense dataset rather than the sparse dataset.
a category. Both of these aims essentially are the same. The clustering Alan Jose et al. (2014) proposed a brain tumor segmentation using K-
algorithm is strategically applied in dividing an image into a number of means clustering and fuzzy C-means algorithm and its area calculation.
discrete regions such that the pixels have high similarity in each region They divide the segmentation process into three steps, pre-processing of
and high contrast between regions. the image, advanced K-means and fuzzy C-means and lastly the feature
There are different types of clustering: K-means clustering, Fuzzy C- extraction. The resulted segment image is used for the feature extrac-
means clustering, mountain clustering method, and subtractive clus- tion for the region of interest. They used MRI image for the analysis and
tering method (Chuang et al., 2006). One of the most used clustering calculate the size of the extracted tumor region in the image. Yedla
algorithms is K-means clustering. It is an unsupervised prototype clus- et al. (2010) proposed enhancing K-means clustering algorithm with an
tering method, which is widely used in the image of crop diseases, improved initial center. A new method for finding the initial centroid


Corresponding author.
E-mail address: jhli@scau.edu (J. Li).

https://doi.org/10.1016/j.compag.2019.104962
Received 26 March 2019; Received in revised form 8 August 2019; Accepted 13 August 2019
0168-1699/ © 2019 Elsevier B.V. All rights reserved.
K. Tian, et al. Computers and Electronics in Agriculture 165 (2019) 104962

was introduced and it provided an effective way of assigning the data seedlings were inoculated with Tomato yellow leaf curl virus and
points to suitable clusters with reduced time complexity. They proved Tomato mosaic virus at the Plant Protection Research Institute
their proposed algorithm has more accuracy with less computational Guangdong Academy of Agricultural Sciences and then transplanted to
time compared to original K-means clustering algorithm. This algorithm the conservatory (Polston et al., 1999). Tomato was raised after seven-
does not require any additional input like the threshold value. But this day acclimatization. During the period of growth, Hoagland nutrient
algorithm still initializes the number of cluster K and suggested de- solution (Hoagland et al., 1950) (150 ml) was manually sprayed every
termination of the value of K as one of the future work. Nazeer and 2 days by using a sprinkling can to ensure the plants survived with
Sebastian (2009) proposed an enhanced algorithm to improve the ac- enough nutrients.
curacy and efficiency of the K-means clustering algorithm. They present Tomato image samples were collected as the dataset during the
an enhanced K-means algorithm which combines a systematic method whole growth cycle of tomato by a digital camera (SONY DSC-HX400),
for finding initial centroid and an efficient way for assigning the data which was set to automatically adjust the focal length and aperture,
point to the clusters. They have taken different initial centroid and auto white balance, and turn off the flash. The image resolution was
tested execution time and accuracy. From the result, it could be con- 5184∗3888 pixels. To improve the diversity of data and improve the
cluded that the proposed algorithm reduced the time complexity generalization ability of the algorithm, the tomato images were cap-
without sacrificing the accuracy of clusters. But the number of desired tured regardless of weather factors.
clusters is still required to be given as an input. Isa et al. (2009) pre- There were 200 white paper background images and 800 natural
sented four new clustering algorithms namely the fuzzy K-means background images were collected for the segmentation study of the
(FKM), fuzzy moving K-means (FMKM), adaptive moving K-means natural background image. The white paper background images were
(AMKM) and fuzzy adaptive moving K-means (AFMKM) for the seg- used as experimental data for algorithms. The natural background
mentation purpose. The contrast results proved that the proposed al- images were used to test the effectiveness of our proposed algorithm.
gorithms are less sensitive with respect to noise. Ray and Turi (1999) Some sample disease images are shown in Fig. 1.
created an effective index by combining the external index and internal
index of the clustering performance measurement to automatically 2.2. Clustering method
determine the optimal number of clusters. The validity measure was
tested for synthetic images for which the number of clusters is known, As early as 1967, MacQueen summarized the research results of
and was also implemented for natural images. But its upper bound of Cox, Fisher, and Sebestyen, proposed and gave the detailed steps of the
clustering is given based on empirical rules. Sethy et al. (2017) pro- K-means algorithm (MacQueen et al., 1967; Yin et al., 2009; Fisher,
posed and evaluated a framework for the detection of a defected dis- 1958; Sebestyen, 1962). The core idea of the K-means algorithm is that,
eased leaf using K-means clustering based segmentation. The leaves of for a dataset that needs to be divided into K classes, the similarity be-
rice crop were considered as a case study to validate the proposed K- tween classes should be the minimum and the similarity in the class
means clustering technique segmentation for segmenting defect area should be the maximum. The algorithm flow is as follows:
with three clusters and found the outputs accurately.
In this paper, we introduced an adaptive segmentation method for Step 1: Randomly select K initial cluster centers from the data set;
tomato disease leaf images based on K-means clustering. First, the white Step 2: Calculate the distance of each remaining point to each cluster
paper background image was used as the preprocessing data to find the center according to some distance function, and classify each
initial cluster centers and determine the selection of feature parameters. point into the category of the nearest cluster center;
Second, iterative color clustering of clusters was conducted using the Step 3: Recalculate the arithmetic mean of each cluster as a new cluster
Manhattan distance as the similarity distance. Finally, the excess green center;
feature (Woebbecke et al., 1995) as the distinguishing criterion function Step 4: Judging convergence or not, comparing the last and second last
was used to discriminate between the leaf and background auto- cluster center, if there is no change, the clustering is over,
matically.We performed the evaluation both on our in-house data and otherwise continue to repeat step 2 and 3.
public data from Bugwood. Comparison of the proposed algorithm
segmentation with the segmentation of a fixed threshold(ExG-ExR) In spite of the great advantage of being easy to implement, K-means
method, DBSCAN algorithm, Mean Shift algorithm and the traditional algorithm has been shown with certain limitations obviously from the
K-means clustering method showed that our method was accurate and description. The final clustering will depend on the arbitrary selection
robust for tomato disease images segmentation. of initial centroid and on the value of K. The randomly initial centroid
The main contributions of this study are as follows. The S, I com- will cause a different result, even cause the clustering results to fall into
ponent of HSI color space and the a∗, b∗ component of L∗a∗b∗ color local optimal values. What is more, the number of clusters must be set
space were selected as the suitable feature parameters of tomato disease
leaf image. We selected a validity index to address the impact of the
number of clusters for the tomato disease images. By using this cri-
terion, we can obtain better segmentation results with an appropriate
number of clusters. The initial cluster centers given in this paper allows
overcoming the problem of random initial cluster center selection.
Through objective assessment measures, the clustering F1 measure and
Entropy, we validated the stability of the performance of the proposed
criterion. So it is useful for researchers to select a proper number of
clusters.

2. Materials and methods

2.1. Tomato disease images

The tomato disease leaf images used in this study were acquired
from the greenhouse of the College of Water Conservancy and Civil
Engineering at South China Agricultural University. The tomato Fig. 1. Sample images.

2
K. Tian, et al. Computers and Electronics in Agriculture 165 (2019) 104962

in advance, which means that the K-means algorithm is less effective in and other samples in the same cluster, b (i) stands for the minimized
image segmentation. value of the distance between sample i and all samples in the other
cluster.
2.3. Value of clustering number We selected an optimal criterion to evaluate which value of K can
bring out the best segmentation result through a compare experiment,
The number of cluster K values is required and usually determined detail in the next section.
by prior knowledge. Sometimes it is difficult to find an appropriate
number of clusters, even if it is the key for clustering the optimal image 3. Adaptive clustering number K-means algorithm
segmentation results.
Admittedly, the pixels of crop disease images could visibly be di- We proposed an adaptive clustering number K-means algorithm for
vided into two classes: leaf and background. But there are many noisy segmentation of tomato disease images. This algorithm includes mainly
factors in the background, such as dust, shadow, and mist. Therefore, it the following aspects: the input data for clustering, the number of
is inappropriate to set the number of clusters as two simply. classes K and the initial centroids.
Validity index is a common method to determine the optimal
number of clusters (Pakhira et al., 2004; Kwon, 1998; Maulik and
3.1. The input data for clustering
Bandyopadhyay, 2002). Its process can be inducted that: First, assign
the range of K (Kmin, Kmax ) . Second, run the clustering algorithm by
For image segmentation, the subject of clustering is each pixel in the
using the clustering number in ascending order, and obtains a series of
image, and the corresponding value of the pixel is the feature value of
clustering results. Finally, calculates the validity index value for each
the point. One image can be described by different color spaces.
result, and the cluster number corresponding to the maximum or
Usually, digital images are displayed in the RGB color space, but HSI
minimum index value is the optimal clustering number. Compactness
color space is more consistent with the characteristics of human visual
and separation were the main principles for evaluating the effectiveness
system (Weeks and Hague, 1997), and the Lab color space is more
of clustering, namely, members in the same category must be as close as
conducive to the extraction of feature parameters in image processing.
possible to each other and the distance between different categories
Therefore, these color spaces are widely used in the field of computer
must be as far as possible.
vision. The pixel has different characteristic values in different color
Many criterions have been developed for determining the effec-
spaces.
tiveness of the clustering to find the optimal cluster number
In this paper, the RGB image was transformed into the HSI color
(Dimitriadou et al., 2002; Dunn, 1973; Bezdek and Pal, 1998; Pham
space and the Lab color space, and the S, I , a, b components were ex-
et al., 2005). In this paper, we will introduce Davies-Bouldin index
tracted as feature parameters, that is, the [S, I , a, b] four-dimensional
(Davies and Bouldin, 1979), Calinski-Harabasz index (Caliński and
data was subjected to the segmentation experiment. The clustering re-
Harabasz, 1974) and Silhouette Coefficient (de Amorim and Hennig,
sults of sample images were shown in Fig. 2b. It is obvious that for the
2015).
image which was only needed to segment the leaf has been over-seg-
The Davies-Bouldin (DB) index is a function of the ratio of the sum
mented, but it is suitable for research that needs to extract veins. To
of within-cluster scatter to between-cluster separation. The objective is
overcome the over-segmentation, we selected the [S, I ] two-dimen-
to minimize this measure as we want to minimize the within-cluster
sional parameters for testing. The results as shown in Fig. 2c, and the
scatter and maximize the between-cluster separation. It is described by:
leaf segmentation effect was almost flawless. It can be distinctly seen
K that the selection of feature parameters was directly affected by the
1 Wi + Wj ⎞
DB (k ) = ∑ max ⎛⎜ ⎟
clustering result. Therefore, the matched input data should be used in
k i=1 ⎝ Cij ⎠
the segmentation of varying target object. Through experiments, the
where K stands for number of the clusters and Wi represents the average S, I , a, b components of the natural background image were selected as
distance of all samples in class Ci to their cluster centers, Wj represents clustering input data. For the white paper background image, the input
the average distance of all samples in class Ci to the center of class Cj , data were S, I components.
and Cij represents distance between the centers of classes Ci and Cj . According to the preliminary experiment, it was found out that the
The Calinski-Harabasz (CH) index is utilized as an internal cluster time redundancy was extremely serious when the original image was
validity measure which grades clusters created by each individual. It is calculated directly. Thus, the original image size of 3024∗4032 was
described by: reduced to 303∗404, which improved the efficiency of the algorithm,
without affecting the clustering results, during the process of de-
trB (k )/(k − 1)
CH (k ) = termining the optimal cluster number by calculating the validity index.
trW (k )/(n − 1)

where n stands for the number of the clusters and k stands for class. Bc 3.2. The number of classes and the initial centroids
and Wc denotes between and within cluster sums of squares respec-
tively. Considering the leaf is the only target object in the white paper
The Silhouette coefficient is a measure of how similar an object is to background image, set the initial clustering number to 2. In the actual
its own cluster (cohesion) compared to other clusters (separation). The experiment, some image segmentation effect was shown in Fig. 3 is not
Silhouette ranges from − 1 to + 1, where a high value indicates that the ideal. The observation found that the main cause of this phenomenon
object is well matched to its own cluster and poorly matched to was that the images contained shadows of their own or the shadows of
neighboring clusters. If most objects have a high value, then the clus-
tering configuration is appropriate. If many points have a low or ne-
gative value, then the clustering configuration may have too many or
too few clusters. It is described by:
num
1 b (i) − a (i)
s (k ) =
num
∑ maxa (i), b (i)
,k∈n
i=1

where n stands for number of the clusters and k stands for class, num is Fig. 2. Segmentation of different parameter: (a) original image, (b) Four-
the total number of pixels, a (i) is the average distance between sample i dimensional parameters, (c) Two-dimensional parameters.

3
K. Tian, et al. Computers and Electronics in Agriculture 165 (2019) 104962

Fig. 6. Leaf segmentation of white paper background: (a) Leaf segmentation


of Tomato mosaic virus, (b) Leaf segmentation of Tomato yellow leaf curl virus.
Fig. 3. Result of unideal segmentation.
other objects on the leaves. There were significant variations between
the shadows and the white paper background and leaves. Thus, it is
necessary to consider adjusting the number of clusters to improve the
segmentation effect.
For obtaining the best segmentation of different images, different
clustering numbers should be used to cluster. Take Fig. 4a as an ex-
ample, when the clustering number was set to 2, the segmentation re-
sult displayed that only the white paper background was segmented in
the whole image. But when the clustering number was 3, the shadow of
Fig. 4. Comparison of clustering results of different samples with different the leaf edge had been clustered into a category alone, which was
clustering numbers, These two images show the different of shadows cause by clearly shown in the segmentation. So the optimal number of clusters
photographing. for this image should be 3. However, for Fig. 4b, it is obviously that the
white paper was divided into two regions, when the number of clusters
Table 1 was set to 3. This is not meaningful, so the optimal number of clusters
Computational efficiency comparison of different validity index. for this image should be 2. Thence, the experiment demonstrated that
different clustering numbers led the different segmentation results.
Validity index Computation time Accuracy
Calculated validity index to determine the optimal number of
DB index 0.04s 90% clusters for each image. Manually estimated the optimal clustering
CH index 0.03s 57% number of 200 white paper background images in advance. Set [S, I ] as
Silhouette index 0.84s 89%
the input data, and limit the number of clusters to the search range [2,
3]. Computed DB index, CH index and Silhouette coefficient of 200
images respectively and fetched the calculated clustering number by
each index. Matched the calculated clustering number with the optimal
clustering number to compute the accuracy. Compare the average
computation time and accuracy of three indexes, the statistic results are
shown in Table 1. From the table, the accuracy of the DB index is the
highest and run faster. Finally, the DB index was selected to determine
the number of clusters.
To improve the robustness of the clustering results and perfect the
efficiency of the proposed algorithm, we initialized two groups of
cluster centers with fixed values. As we mentioned above, there are
three possible clusters in both white background images and natural
background images: leaf, white paper and shadow. So we segmented
the 10 white background images, and take the average of these 10
images cluster centers as the fixed value. The initial cluster centers for
the two-type clustering were: [0.11, 0.55, 128, 133; 0.45, 0.39, 119,
165], and for the three-type clustering were: [0.41, 0.30, 106, 155;
0.03, 0.55, 127, 131; 0.21, 0.51, 127, 113].

3.3. The proposed algorithm

According to the above steps, we proposed an adaptive clustering


number K-means image segmentation algorithm based on specifying the
optimal number of clusters by calculating the DB validity index. After
clustering, the excess green feature (2G-R-B) (Woebbecke et al., 1995)
was used to automatically distinguish the leaf and background in the
clustering results. The algorithm flow is shown in Fig. 5:
Using shadow-containing white paper background image to verify
the above algorithm. The S and I components in the HSI were selected
Fig. 5. Algorithm flowchart. as the input parameters for segmentation. The segmentation effect is
shown in Fig. 6. For the tomato diseased leaf on a white paper back-
ground, the proposed algorithm segmented the leaf accurately and

4
K. Tian, et al. Computers and Electronics in Agriculture 165 (2019) 104962

Fig. 7. Segmentation results of on-site back-


ground leaf images by three algorithms: (a)
mature tomato Leaf segmentation, (b) Yellow
mature tomato leaf segmentation, (c) seeding
tomato Leaf segmentation. From left to right,
original image, DBSCAN segmentation, Mean
Shift segmentation, ExG-ExR segmentation, the
proposed algorithm segmentation.

Table 2
Segmentation effect evaluation of three clustering algorithm.
Traditional K-means ExG-ExR Mean Shift DBSCAN Proposed Algorithm

Test image F1 Entropy F1 Entropy F1 Entropy F1 Entropy F1 Entropy

1 0.720 0.543 0.977 0.142 0.794 0.570 0.945 0.242 0.987 0.096
2 0.745 0.262 0.981 0.109 0.852 0.231 0.934 0.292 0.993 0.051
3 0.997 0.027 0.992 0.051 0.765 0.238 0.680 0.235 0.997 0.026
4 0.974 0.163 0.956 0.257 0.866 0.443 0.832 0.627 0.983 0.116
5 0.972 0.150 0.961 0.229 0.839 0.486 0.897 0.395 0.981 0.112
6 0.969 0.183 0.963 0.217 0.819 0.452 0.913 0.370 0.976 0.157
7 0.969 0.140 0.972 0.141 0.792 0.532 0.740 0.784 0.985 0.081
8 0.986 0.092 0.977 0.141 0.864 0.361 0.943 0.229 0.988 0.087
9 0.776 0.716 0.952 0.271 0.594 0.650 0.929 0.335 0.973 0.169
10 0.947 0.289 0.944 0.304 0.756 0.635 0.683 0.825 0.970 0.176
11 0.805 0.253 0.976 0.128 0.905 0.275 0.946 0.258 0.989 0.083
12 0.959 0.201 0.832 0.567 0.693 0.647 0.819 0.577 0.982 0.126
13 0.980 0.123 0.945 0.272 0.856 0.512 0.621 0.404 0.981 0.131
14 0.952 0.276 0.926 0.359 0.861 0.533 0.907 0.422 0.969 0.173
15 0.763 0.748 0.799 0.633 0.717 0.832 0.919 0.379 0.977 0.149
16 0.977 0.143 0.933 0.339 0.833 0.346 0.943 0.234 0.988 0.086
17 0.876 0.447 0.976 0.135 0.774 0.590 0.955 0.240 0.987 0.086
18 0.682 0.962 0.947 0.273 0.792 0.618 0.870 0.475 0.979 0.133
19 0.982 0.102 0.979 0.122 0.904 0.253 0.948 0.240 0.988 0.086
20 0.853 0.518 0.957 0.237 0.825 0.497 0.924 0.353 0.960 0.232
Average 0.894 0.317 0.947 0.246 0.805 0.485 0.867 0.396 0.982 0.118

The bold values are used to show the best performance among these four methods.

completely, even if there were shadows on the white paper or some 2 × P (Gj, Ci ) × R (Gj, Ci )
F (Gj, Ci ) =
shade on the leaf. P (Gj, Ci ) + R (Gj, Ci )

where P (Gj, Ci ) is the precision, the proportion of data points of the


3.4. Evaluation of segmentation same class Gj in the cluster Ci , given by:
|Gj ∩ Ci |
We used clustering F1-measure and Entropy as a quality measure of P (Gj, Ci ) =
|Ci |
the clusters to evaluate the performance of the proposed algorithm.
Manually segmented leaf images by Photoshop were set as ground R (Gj, Ci ) is recall, the proportion of data points in cluster Ci classified as
truth. Gj , given by:
The clustering F1-Measure is similar but not the same F-measure as |Gj ∩ Ci |
the F-measure used for classification. Calculated it to trade off between R (Gj, Ci ) =
|Gj |
clustering correctly all data points of the same class in the same cluster
and making sure that each cluster contains points of only one class: The total entropy for a set of clusters is calculated as the sum of the
entropies of each cluster weighted by the size of each cluster:
S
|Gj | |Gj |
F1 = ∑ wj·F (Gj, Ci ), wj = s =
n
S
Ci × Ej
j=1
∑ |Gi |
E= ∑ n
j=1
i=1

where Ej is the entropy of each cluster j, equal to:


where Gj is the j cluster of manually ground truth, and Ci is the i group
of clustering algorithm result, S is the number of clusters and n is the Ej = − ∑ P (Gj, Ci )log(P (Gj, Ci ))
total number of data points. The F (Gj, Ci ) can be defined as: i

5
K. Tian, et al. Computers and Electronics in Agriculture 165 (2019) 104962

The higher F1 and lower E means the better-segmented perfor- large probability that the leaves and background in the image be mis-
mance. The best entropy is obtained when each cluster contains exactly classified as one cluster. When the interference object with great color
one data point. difference exists in the image, the traditional K-means algorithm se-
verely over-segments or under-segments the nature background im-
4. Results and discussion age.For most green leaf images, ExG-ExR method F1 was very close to
the proposed algorithm. But as mentioned above, for some yellow leaf
We compared our method with several popular methods. included images, its performance was not so effective as the proposed algorithm.
two other unsupervised learning methods, DBSCAN (Shen et al., 2016) However, it was still better than DBSCAN and Mean Shift algorithm. To
and Mean Shift (Tao et al., 2007), threshold segmentation method and verify the proposed method, We also evaluated our method with the
the traditional K-means algorithm.The color index(ExG-ExR) with fixed public tomato disease leaf images from Bugwood image database. We
threshold method is detailed in reference (Tian et al., 2016; Meyer and collected 30 on-site background tomato disease images. There is only
Neto, 2008). The experiment was performed using the Intel (R) Core one main leaf in each of these images and we segmented the leaf out
(TM) i7-3770 Central Processing Unit, 8G memory. MATLAB R2018b manually as the ground truth. The average F1 and Entropy is 0.765 and
was used as a development platform by running on Windows 10 com- 0.643 respectively. Observed each segmentation result, we found that
puter operating system. the obstacle is the diseased spot. The diseased spots always turn to
brown along with the disease leaves grow mature. Since the on-site
4.1. Qualitative results and discussion background was also green plants, the brown spots were more easily be
clustering into a cluster. That is to say, the leaf and background have
Compared the effects of these four algorithms (the proposed algo- been clustered.
rithm, DBSCAN, Mean Shit and ExG-ExR method) on the segmentation
of tomato seedlings images, ripe tomato plant images and images with 5. Conclusions
objects of similar color. The results were shown in Fig. 7. Overall, the
algorithm proposed in this paper has the best segmentation results. This paper presented an adaptive clustering number of K-means
Both DBSCAN and Mean Shift algorithm were more easy to over-seg- algorithm for tomato leaf image segmentation. The algorithm was de-
ment or under-segment leaf image due to their sensitive input para- signed to segment tomato leaf images and was easily generalized with
meters. For example, DBSCAN excluded the green stems in Fig. 7a, but the calculation of the validity index. Its experiment-based initial cluster
included the background in Fig. 7b. Mean Shift only segmented part of center makes it relatively robust to the clustering results.
the background, such as the white bucket in Fig. 7a and c. But it even Tests have shown that: 1) For natural background images, a and b
did not eliminated the red ribbon in Fig. 7a. components of the S, I and Lab color Spaces of HIS color space should
In Fig. 7b, the ExG-ExG segmentation is obviously terrible, but in be selected as clustering input data; 2) Compared with CH index and
Fig. 7a and c it performed much better. The difference between these Silhouette index, DB index had the highest accuracy and the lowest
two images is the leaf color. Although ExG-ExR is an effective and computation time; 3) The proposed algorithm greatly improved the
simple index for green plant segmentation, it is strictly restricted in segmentation accuracy of the K-means clustering algorithm.
green color, it is not sensitive to non-green objects, even if the stem of a There is still room for improvement. The main drawback of this
green plant(Fig. 7a). The experimental results directed that the pro- method is that it requires more computation to calculate the validity
posed algorithm obtain perfect segmentation effect by automatically index. We conducted several experiments on natural tomato leaf image
determining the optimal clustering number through calculating the data, trying to show the practicality of the algorithm. We should em-
validity index of each image. phasize that our tests only carried on a small-scale database to de-
monstrate the feasibility of the method. Further work is required to
4.2. Quantitative results and discussion fully deploy the proposed algorithm in large-scale image-processing
applications and mainly concerns the segmentation of images con-
20 images were randomly selected from 800 pieces of natural taining similarly colored objects.
background images to evaluate these four methods, and the results were
shown in Table 2. Since the proposed algorithm contains the process of Acknowledgments
image processing and validity index calculations, its running time was
slightly slower than that of the ExG-ExR method, but much faster than This work is supported by the Foundation of Guangdong Provincial
the other two clustering algorithms. Science and Technology Department (Project No. 2015A020209153)
The results showed that the average F1(0.982), Entropy(0.118) and South China Agricultural University Doctoral Students Abroad
values of the segmentation results of the proposed algorithm were the (Overseas) Joint Training Program (Project No. 2018LHPY026). The
best of these three segmented methods. Compared with the traditional authors would also like to thank Zifu He, the researcher of the Institute
K-means algorithm and the proposed algorithm in Table 2, there were of Plant Protection of the Guangdong Academy of Agricultural Sciences,
12 test images(3, 4, 5, 6, 7, 8, 10, 12, 13, 14, 16, 19) with similar for providing the strain with the pathogen of Tomato yellow leaf curl
evaluation indicators, this was because the optimal number of clusters virus and Tomato mosaic virus.
for these images was 2, namely, the calculation result of the validity
index was consistent with the manually clustering number. Besides, the Appendix A. Supplementary material
proposed algorithm incorporated morphological processing, so the
proposed algorithm segmented the images of the optimal clustering Supplementary data associated with this article can be found, in the
number of 2 more accurately than the traditional algorithm. For other online version, at https://doi.org/10.1016/j.compag.2019.104962.
test images(1, 2, 9, 11, 15, 17, 18, 20), F1 of the proposed algorithm
was far higher than that of the traditional K-means algorithm. The main References
reason for this phenomenon was the optimal clustering number of these
test images were 3. The traditional K-means algorithm artificially as- Bezdek, J.C., Pal, N.R., 1998. Some new indices of cluster validity.
signed the number of cluster of all images to 2, which led to the fact Bhusare, B.B., Bansode, S., 2014. Centroids initialization for k-means clustering using
improved pillar algorithm. Int. J. Adv. Res. Comput. Eng. Technol. 3 (4), 1317–1322.
that in images of complex background noise, such as shadows, floors, Caliński, T., Harabasz, J., 1974. A dendrite method for cluster analysis. Commun. Stat.-
white buckets, etc. clustering only distinguished between the two parts Theory Methods 3 (1), 1–27.
where the characteristics are drastically different categories, with a Chuang, K.-S., Tzeng, H.-L., Chen, S., Wu, J., Chen, T.-J., 2006. Fuzzy c-means clustering

6
K. Tian, et al. Computers and Electronics in Agriculture 165 (2019) 104962

with spatial information for image segmentation. Comput. Med. Imaging Graph. 30 Pham, D.T., Dimov, S.S., Nguyen, C.D., 2005. Selection of k in k-means clustering. Proc.
(1), 9–15. Inst. Mech. Eng., Part C: J. Mech. Eng. Sci. 219 (1), 103–119.
Davies, D.L., Bouldin, D.W., 1979. A cluster separation measure. IEEE Trans. Pattern Anal. Polston, J., McGovern, R., Brown, L., 1999. Introduction of tomato yellow leaf curl virus
Machine Intell. (2), 224–227. in Florida and implications for the spread of this and other geminiviruses of tomato.
de Amorim, R.C., Hennig, C., 2015. Recovering the number of clusters in data sets with Plant Dis. 83 (11), 984–988.
noise features using feature rescaling factors. Inf. Sci. 324, 126–145. Purohit, P., Joshi, R., 2013. An efficient approach towards k-means clustering algorithm.
Dimitriadou, E., Dolničar, S., Weingessel, A., 2002. An examination of indexes for de- Int. J. Comput. Sci. Commun. Networks 4 (3), 125–129.
termining the number of clusters in binary data sets. Psychometrika 67 (1), 137–159. Ray, S., Turi, R.H., 1999. Determination of number of clusters in k-means clustering and
Dunn, J.C., 1973. A fuzzy relative of the isodata process and its use in detecting compact application in colour image segmentation. In: Proceedings of the 4th International
well-separated clusters. Conference on Advances in Pattern Recognition and Digital Techniques, Calcutta,
Fisher, W.D., 1958. On grouping for maximum homogeneity. J. Am. Statist. Assoc. 53 India, pp. 137–143.
(284), 789–798. Sebestyen, G.S., 1962. Decision-making processes in pattern recognition.
Hammouda, K., Karray, F., 2000. A Comparative Study of Data Clustering Techniques. Sethy, P., Negi, B., Bhoi, N., 2017. Detection of healthy & defected diseased leaf of rice
University of Waterloo, Ontario, Canada. crop using k-means clustering technique. Int. J. Comput. Appl. 157 (1), 0975–8887.
Hoagland, D.R., Arnon, D.I. et al., 1950. The water-culture method for growing plants Shen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., Shao, L., 2016. Real-time superpixel seg-
without soil., Circular. California agricultural experiment station 347 (2nd edit). mentation by dbscan clustering algorithm. IEEE Trans. Image Process. 25 (12),
Isa, N.A.M., Salamah, S.A., Ngah, U.K., 2009. Adaptive fuzzy moving k-means clustering 5933–5942.
algorithm for image segmentation. IEEE Trans. Consumer Electron. 55 (4). Tao, W., Jin, H., Zhang, Y., 2007. Color image segmentation based on mean shift and
Jose, A., Ravi, S., Sambath, M., 2014. Brain tumor segmentation using k-means clustering normalized cuts. IEEE Trans. Syst., Man, Cybernet., Part B (Cybernetics) 37 (5),
and fuzzy c-means algorithms and its area calculation. Int. J. Innovative Res. Comput. 1382–1389.
Commun. Eng. 2 (3), 3496–3501. Tian, K., Zhang, L., Xiong, M., Huang, Z., Li, J., 2016. Recognition of phomopsis vexans in
Kwon, S.H., 1998. Cluster validity index for fuzzy clustering. Electron. Lett. 34 (22), solanum melongena based on leaf disease spot features. Trans. Chinese Soc. Agric.
2176–2177. Eng. 32 (1), 184–189.
MacQueen, J., et al., 1967. Some methods for classification and analysis of multivariate Wang, Z., Wang, K., Pan, S., Han, Y., 2018. Segmentation of crop disease images with an
observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical improved k-means clustering algorithm.
Statistics and Probability, Oakland, CA, USA, vol. 1. pp. 281–297. Weeks, A.R., Hague, G.E., 1997. Color segmentation in the hsi color space using the k-
Maulik, U., Bandyopadhyay, S., 2002. Performance evaluation of some clustering algo- means algorithm. In: Nonlinear Image Processing VIII, vol. 3026. International
rithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24 (12), Society for Optics and Photonics, pp. 143–155.
1650–1654. Woebbecke, D.M., Meyer, G.E., Von Bargen, K., Mortensen, D., 1995. Color indices for
Meyer, G.E., Neto, J.C., 2008. Verification of color vegetation indices for automated crop weed identification under various soil, residue, and lighting conditions. Trans. ASAE
imaging applications. Comput. Electron. Agric. 63 (2), 282–293. 38 (1), 259–269.
Nazeer, K.A., Sebastian, M., 2009. Improving the accuracy and efficiency of the k-means Yedla, M., Pathakota, S.R., Srinivasa, T., 2010. Enhancing k-means clustering algorithm
clustering algorithm. In: Proceedings of the world congress on engineering, vol. 1. with improved initial center. Int. J. Comput. Sci. Informat. Technol. 1 (2), 121–125.
pp. 1–3. Yin, Y., Xu, C., Hu, L., 2009. Some insight into yasuda et al’s. a grouping genetic algo-
Pakhira, M.K., Bandyopadhyay, S., Maulik, U., 2004. Validity index for crisp and fuzzy rithm for the multi-objective cell formation problem. Int. J. Prod. Res. 47 (7),
clusters. Pattern Recognit. 37 (3), 487–501. 2009–2010.

S-ar putea să vă placă și