Documente Academic
Documente Profesional
Documente Cultură
Information Sciences
journal homepage: www.elsevier.com/locate/ins
Key Laboratory of Electronic Business, Nanjing University of Finance and Economics, Nanjing 210046, China
School of Engineering, Grifth University, QLD 4111, Australia
c
Atlas of Living Australia, CSIRO Ecosystem Sciences, Canberra, ACT 2601, Australia
b
a r t i c l e
i n f o
Article history:
Received 4 February 2013
Received in revised form 27 June 2014
Accepted 24 July 2014
Available online 7 August 2014
Keywords:
Plant identication
Shape description
Shape matching
Leaf image retrieval
Mobile leaf identication
a b s t r a c t
In this paper, we propose a novel shape description method for mobile retrieval of leaf
images. In this method, termed multiscale arch height (MARCH), hierarchical arch height
features at different chord spans are extracted from each contour point to provide a compact, multiscale shape descriptor. Both the global and detailed features of the leaf shape
can be effectively captured by the proposed algorithm. MARCH descriptors are compared
using a simple L1-norm based dissimilarity measurement providing very fast shape matching. The algorithm has been tested on four publicly available leaf image datasets including
the Swedish leaf dataset, the Flavia leaf dataset, the ICL leaf dataset and the scanned subset
of the ImageCLEF leaf dataset. The experiments indicate that the proposed method can
achieve a higher classication rate and retrieval accuracy than the state-of-the-art benchmarks with a more than 500 times faster retrieval speed. A mobile retrieval system embedding the proposed algorithms has been developed for the real application of leaf image
retrieval.
2014 Elsevier Inc. All rights reserved.
1. Introduction
Rapid and accurate plant identication is essential for effective study and management of biodiversity, as well as contributing to biosecurity measures. The number of plant species is estimated to be around 400,000, however there still exist many
species which are yet unclassied or unknown [43]. Therefore, automated systems for plant identication are a very important although challenging task. Leaf shapes vary between different species, thereby providing valuable cues for the identication of the species. Unlike plant owers which have complex 3D structure and can only be obtained during blooming
season, plant leaves are generally thin and at, and can be found throughout the year. For these reasons, taking pictures
of leaves and applying the technologies of pattern recognition and image processing for automatic plant species identication has already attracted considerable attention from researchers [12,27,34,42]. This computer vision based technology
greatly accelerates the manual process of plant species identication, collection and monitoring.
Modern smart phones embody incredible convenience and performance in an affordable compact low-powered device.
These devices possess onboard cameras, GPS receivers and data communication systems. The ubiquity and functionality
of smart phones makes them perfect for eld use in a mobile leaf identication system. With such a system, an ordinary user
Corresponding author at: Key Laboratory of Electronic Business, Nanjing University of Finance and Economics, Nanjing 210046, China. Tel.: +86 25
84028865.
E-mail address: wangbin@njue.edu.cn (B. Wang).
http://dx.doi.org/10.1016/j.ins.2014.07.028
0020-0255/ 2014 Elsevier Inc. All rights reserved.
133
can take a picture of a leaf using their phone and obtain detailed information about the plant, including its characteristics,
related species, geographic abundance, etc. via the internet. The mobile system allows powerful leaf identication tools to be
accessible to anyone, for use at anytime and anywhere. This has benets for not only ecologists and amateur botanists, but
also for educators. In this work, we focus on developing an algorithm for automatic leaf retrieval highly suited for deployment on mobile devices, for aiding the identication of plant species.
The method of effective feature extraction of leaf shapes and measuring their dissimilarity is a key problem in leaf identication and is also a crucial step in a mobile leaf retrieval system. Although many shape description and dissimilarity measurement methods [30,49] have been proposed, with some of them [2,14,20,27,43], being applied to leaf identication in the
past two decades, the following problems still make it a challenging task for mobile devices. (1) The leaf shapes usually have
a high inter-class similarity and large intra-class difference making it very difcult for the retrieval system to achieve desirable accuracy. (2) Mobile devices have less available RAM, storage and network bandwidth; therefore the extracted shape
features should be as compact as possible. (3) Mobile devices have less processing power than most other computers and
with the large size of many image databases, the algorithms for extracting and comparing shape features must have a
low time complexity. For these reasons, many existing shape description and matching algorithms, though they work very
well on powerful computers, are intractable on mobile platforms.
The above challenges motivate us to develop a novel shape description and matching method for fast and effective mobile
retrieval of leaf images. The preliminary version of this work was presented in [41]. In this paper, we improve the original
algorithm and extend the evaluation of the algorithm against a number of leaf image datasets. The main contributions of this
work are as follows. (1) A multiscale-arch-height descriptor, termed MARCH, is proposed in this paper. In this method, the
hierarchical arch height features are extracted, providing a coarse-to-ne shape description. The dissimilarity measure for
shape matching is the L1 metric. Compared to the state-of-the-art shape description methods, including the well-known
Inner-Distance Shape Context descriptor (IDSC) [29], the proposed MARCH achieves the highest classication rates on the
four publicly available leaf image datasets including the well-known Swedish leaf dataset, Flavia leaf dataset, 5720 samples
of 220 species of plant leaves from the ICL leaf image dataset and the shape-based-feature methods on the scanned subset of
the ImageCLEF leaf dataset. The retrieval experiments on the ICL dataset indicate that the proposed method achieves the
highest retrieval accuracy at a speed of over 500 times faster than the state-of-the-art benchmarks. (2) A prototype system
for online plant leaf identication was developed for use on a consumer mobile platform. The system embeds the proposed
MARCH algorithm and provides high retrieval accuracy and fast retrieval speed (only requiring 0.277 s to retrieve matches
from the ICL dataset on an entry-level mobile phone using Java).
The remainder of the paper is organized as follows. A brief review of related work is presented in Section 2. In Section 3,
we describe the details of the proposed MARCH multiscale-arch-height descriptor. Section 4 provides the architecture of the
proposed mobile retrieval system. In Section 5, a number of experiments are presented and the performance compared and
analyzed. The mobile platform implementation of the proposed MARCH algorithms is then presented for real leaf image
retrieval applications. Finally, we draw conclusions in Section 6.
2. Related work
The existing methods for shape representation and identication can be classied into contour-based and region-based
methods [8,49]. In the former one, the shape features are extracted only from the contour, while in the latter one, the shape
features are extracted from the whole shape region. Up to now, many contour-based descriptors and region-based descriptors have been proposed for leaf shape recognition. A recent survey of existing approaches for plant species identication can
be found in [9,21]. Among them, Wang et al. [42] use the center distance curve, which is generated by calculating the distance between the center of the contour and each contour point, combined with the shape eccentricity and angle code histogram to represent the leaf shape. Du et al. [14] extract invariant moment features and geometric features including aspect
ratio, rectangularity, area ratio of convexity, eccentricity, etc. to describe leaf shape. Wang et al. [43] describe the leaf shape
using seven Hu geometric moments and sixteen Zernike moments derived from the binary leaf image. Sderkvist [38] combined moments, area and curvature for leaf classication and reported 82% recognition rate on the Swedish leaf dataset [38].
Im et al. [20] proposed a hierarchical polygon approximation representation of leaf image to recognize species in the Acer
family. Lee and Chen [28] extracted the shape region features including aspect ratio, compactness, centroid and horizontal/vertical projections. These methods can provide quite compact shape descriptors, however their limited discrimination
ability causes their recognition accuracies to be far from ideal. For example, Lee and Chen [28] reported that their method
obtained recall rate of 48.2% for 10 returned images on a small leaf database of 60 species with 10 samples in each. In their
reported results, the method of Wang et al. [42] only obtained 37.6% recall rate. The Sabanci Okan methods [46] used a collection of 14 different morphological and texture features to describe the leaf contour and surface, and used nonlinear SVM
for classication and attained the highest average classication rate of algorithms included in ImageCLEF 2012 [21].
Apart from extracting the leaf shape features in spatial domain, the leaf shape is also analyzed in the frequency domain.
Fourier descriptors [47] are a classical shape description method which have proven to be better than other boundary based
techniques [23,48]. McLellan and Endler [32] compared Fourier descriptors with several other methods demonstrating that
Fourier descriptors can discriminate successfully between various leaf groups. Hearn [18] applied the Fourier descriptors to
the automated identication of plant leaves and suggested that 10 Fourier harmonics are necessary to accurately
134
represented leaf shape to distinguish between a range of species. Wang [39] combined the Fourier descriptors derived from
the proposed perimeter area signature and the existing centroid-distance signature for leaf image retrieval, reporting 65.25%
retrieval accuracy on a leaf dataset of 1080 samples from 90 species. The frequency domain based methods are compact and
computationally efcient, both in extraction and during comparison (using L1 or L2 metric). This makes them attractive for
online leaf image retrieval.
In recent years, many methods have attempted to nd an optimal one-to-one correspondence between two dense-point
shape boundaries, and from that calculate the dissimilarity measure. Among these methods, Shape context [13] is a classical
method which is quite popular with researchers. Shape context builds a histogram for each contour point to describe the
distribution of the remaining points relative to that point, and these histograms are used to nd the correspondence in a
point-by-point manner. Ling and Jacobs [29] proposed the well-known shape description method (IDSC) which used
inner-distance instead of Euclidean distance to build the shape context, where inner-distance is the shortest path between
two contour points within the shape boundary. They reported recognition rate of 94.13%, which is higher than the 88.12%
obtained by shape context [13], on the Swedish leaf dataset. Because of the high classication accuracy of IDSC, Belhumeur
et al. [6] have successfully applied it to develop a working computer vision system that aids in the identication of plant
species. Height functions [40] are another shape description and matching method based on optimal one-to-one correspondence. In this method, for each contour point of the object, a height function is built based on the distances of the other contour points to its tangent line. After smoothing the height functions, a compact and robust shape descriptor is available for
each contour point. At shape matching stage, the Dynamic Programming algorithm is utilized to nd the optimal correspondence between two shapes. The height functions method has excellent discriminative power.
The above mentioned one-to-one correspondence based methods, including the other similar methods [10,15,45],
achieve exciting retrieval accuracies on the MPEG-7 shape database [26] and some of them have also been applied to leaf
image identication (recognition rates on the Swedish leaf dataset of 95.47% and 96.28% are reported in [10] and [15],
respectively). Although the optimal one-to-one dense point correspondence matching methods can provide more accurate
shape dissimilarity measures, the matching scheme of these methods are very computationally expensive. In these methods,
measuring the dissimilarity between two shapes usually requires O(N3) time, where N is the number of the points on the
shape boundary. Besides the high time complexity, the methods also require a large amount of memory to store the descriptors. This is because they have to prepare a set of features for all boundary points to be stored in the feature database for
nding the optimal one-to-one dense point correspondence at the shape matching stage. For these reasons, these methods
are unsuitable for deployment on mobile devices.
Characterizing the shape at multiple scales is another approach to developing effective shape descriptors. Adamek and
OConnor [3] proposed a multiscale shape representation method, termed multiscale convexity concavity (MCC) representation, which used the relative displacement of a contour point, with respect to its position in the preceding scale level,
to measure the convexity and concavity properties at different scales. This idea is inspired by the observation that smoothing
a closed contour will make the convex and concave points move inside and outside the contour, respectively. Alajlan et al. [4]
proposed another multiscale shape descriptor, termed triangle-area representation (TAR), which utilizes the areas of the triangles formed by the boundary points to measure the convexity and concavity at each point at different scales, where the
scale is associated with the triangle side length. Recently, Mouine et al. [35] presented a further study against the TAR methods. In this study, besides the triangle-area representation (TAR) and the triangle side lengths representation (TSL), two new
representations denoted by triangle oriented angles (TOA) and triangle side lengths and angle representation (TSLA) were
studied and applied to leaf classication. Kumar et al. [24] proposed another recent method in which histograms of curvature
values at different scales are calculated and concatenated to form a feature vector for leaf shape description. Multiscale shape
description methods usually have high discriminative power. Using the method in [35], the recognition rate of 96.53% on the
Swedish leaf dataset can be achieved.
Our work in this paper aims to develop a novel shape description method for effective and efcient leaf shape identication. Similar as the above mentioned multiscale shape description methods, our method focuses on extracting multiscale shape features for discriminatively representing the shape of the leaf. However, our method is very different from
them. Unlike TAR [4] and TSLA [35] in which a large number of scale levels (N/2 scale levels) are used, our method only
uses log2 N 1 scale levels, where N is the number of boundary points of the leaf. So, our method is very compact; in
our experiments, only 101 features are used, while in TAR [4] and TSLA [35], many thousands of features are used for
leaf shape identication. Different from [24], our method uses arch height features instead of curvatures for leaf shape
description. It is well known that the calculation of curvature is tricky to implement in a stable manner for images on
discrete pixel grids, although in [24], integral measures of curvature were used for reliable calculation of shape descriptor. However, the calculation cost is very expensive. In [24], the reported time of the complete feature extraction process
is an average of 0.11 s per image.
Our method is also different from height functions [40]. The main differences are: (1) in [40], the height values associated
with each boundary point are dened as the distances of the other contour points to its tangent line which is less stable than
the arch height dened by our method; (2) our method is a mutiscale shape description method, while the method in [40] is
not; (3) the method in [40] adopts an optimal one-to-one correspondence matching based on dynamic programming which
results in a very expensive matching cost, while our method utilizes a simple L1 norm based dissimilarity measurement to
allow very fast shape comparison.
135
xu xu Syu S yu S yu yu S xu S xu S
S
2
2
2
2
2
2
q
:
hu
2
2
y u 2S y u 2S
x u 2S x u 2S
The sign of hu is determined by the following rules: consider the chord of the arch ASu as a vector which starts from point u 2S
S
and ends at point u 2S. If the point u falls to the right side of the vector, hu takes positive value. If the point u falls to the left
S
S
side of the vector, hu takes negative value. If the point is located in the vector, hu takes zero value. Fig. 1 graphically illustrates
the denition of arch and its arch height. The absolute value of the arch height reects the degree of the bend of the arch,
while the sign of the arch height indicates the convexity or concavity of the arch.
3.1.2. K-scale arch height
S
For a contour point u, let K be a positive integer. Its arch height hu of S 21K is termed as its K-scale arch height. The larger
the value of K, the smaller the length of the contour arc will be. Fig. 2 gives an example to graphically illustrate the 1-scale, 2scale, 3-scale and 4-scale arch heights at a leaf contour point.
3.2. Multiscale-arch-height descriptor (MARCH)
Given a shape contour C(u) = (x(u), y(u)), u e [0, 1], traversing from the starting point C(0) = (x(0), y(0)) to the end point
1
K
1
K
C(1) = (x(1), y(1)) along the contour, the K-scale arch height hu2 , which varies with u, i.e. hu2 can be regarded as a function
of variable u, we call it the K-scale arch height function. This is a 1D function and we denote it as fK(u). The function fK(u)
depends on the selection of the starting point of the shape contour, but the starting point is chosen arbitrarily. This problem
is addressed later when the transformed functions are made invariant to rotation. With K = 1, 2, . . . , T, we can obtain T functions f1(u), f2(u), . . . , fT(u), where T is the largest integer which K can take and depends on the number of the points sampled
from the contour. Assuming the contour has been sampled into N points, we will have T = log2 N 1, where N is usually set to
be a power of two. Using the arch height function fK(u), we can derive two functions, aK(u) and bK(u) which are dened by
u1
u2 -
Fig. 1. Two arches of the contour point u1 and u2 (shown in red color and green color respectively), and their arch heights (shown in bold line segment),
where the labels + and denote the signs of the arch heights. (For interpretation of the references to color in this gure legend, the reader is referred to
the web version of this article.)
136
+ u
1-scale
2-scale
u
3-scale
4-scale
u
-
Fig. 2. An example of graphically illustrating the 1-scale, 2-scale, 3-scale and 4-scale arch heights of a contour point (shown in red color). (For
interpretation of the references to color in this gure legend, the reader is referred to the web version of this article.)
aK u jf K uj
and
(
bK u
1 if f u > 0
0
otherwise
respectively. From the above denition, we can see that aK(u) is the absolute value of the arch height fK(u) which aims to
1
K
reect the degree of bend of the arch Au2 , and bK(u) is a binary value which aims to reect the convex or concave property
1
K
of the arch Au2 . If bK(u) takes a value of 1, the associated arch is considered as convex; otherwise, it is considered to be concave or at. Here, aK(u) and bK(u) represent the absolute function of fK(u) and sign function of fK(u) respectively.
From the denition of the arch height, we can easily prove that a K-scale arch height function fK(u) has intrinsic invariance
to translation of the shape contour. So, according to Eqs. (2) and (3), each function aK(u) and each function bK(u) also have the
property of translation invariance. By checking the properties of each function aK(u) and each function bK(u) relating to the
scaling of the shape contour, we can nd that bK(u) has intrinsic invariance to scaling, while aK(u) does not. To make it invariant to scaling, it can be locally normalized by dividing by its maximum value. For the case of rotating the shape contour, the
position of the starting point of the contour will be changed and make each function fK(u) shift by l, i.e. fK(u) ? fK(u + l), where
l is the displacement of the starting point. Thus, we have aK(u) ? aK(u + l) and bK(u) ? bK(u + l). Fig. 3 plots the aligned curves
of the normalized absolute functions of K-scale arch-height function fK(u) of K = 1, 3, 5, 7 for three leaf shapes.
Here, we apply Fourier transforms to each function aK(u) and each function bK(u), and discard the phase information to
obtain invariance to rotation of the shape contour. Assuming that the shape contour has been uniformly sampled at N points,
u0, u1, . . . , uN1, the discrete forms of each function aK(u) and each function bK(u) will be a sequence aK(uo), aK(u1), . . . ,
aK(uN1) and a sequence bK(uo), bK(u1), . . . , bK(uN1), respectively. The magnitudes of their discrete Fourier transform coefcients is calculated by
j2pv i
;
N
v 0; 1; 2 . . . ; N 1;
1 X
N1
j2pv i
wK v
bK ui exp
;
N i0
N
v 0; 1; 2 . . . ; N 1;
1 X
N1
gK v
N
aK ui exp
i0
and
where j2 = 1. It is not difcult to prove that gK(v) and wK(v) are invariant to the rotation fK(u) ? fK(u + l). Therefore, we use
the magnitudes of the Fourier transform coefcients to describe the shape. To make the generated shape descriptor robust
and compact, the lowest M order coefcients are used, where M N. Finally, the combination of the features derived from Kscale arch height functions, K = 1, 2, . . . , log2 N 1 is used to describe the shape. The resulting MARCH descriptor is dened
as
v 0; 1; . . . ; M 1g;
137
Arc length
Arc length
K=1
K=3
Arc length
Arc length
K=5
K=7
K
Fig. 3. The aligned curves of normalized absolute functions of the K-scale arch height function f (u) of K = 1, 3, 5, 7 for three leaf shapes.
DA; B
logX
2 N1
M
1
X
K1
v 0
v g v j
K
A
jg
K
B
kjwKA
v
wKB
v j
138
where k is the weight parameter. It can be seen from Eq. (7) that the MARCH features, gK(v) and wK(v), of two shapes in each
scale level K = 1, 2, . . . , log2 N 1 are compared, respectively, and the sum of the differences in each scale is used to measure
the total shape dissimilarity.
The proposed MARCH extracts the shape features from log2 N 1 scales (K = 1, 2, . . . , log2 N 1). If the parameter K were
to be equal to 0, the size S of the 0-scale arch will be equal to 1. In this case, the arch is the whole contour and its two end
points are the same, meaning that the equation of calculating the arch height (Eq. (1)), cannot be used. We add the 0-scale
shape features (the features of the whole contour) to the proposed MARCH descriptor, to further enhancing the discriminative power of the proposed shape descriptor. We take three existing global contour features: eccentricity (E), solidity (S) and
rectangularity (R) [1,4,14] for this purpose. The dissimilarity between shape A and shape B incorporating the comparison of
the 0-scale features is then measured by
2
Low computational complexity: As analyzed in Section 3.4, the proposed method has the time complexity of O Nlog2 N for
shape description and O(M log2 N) for shape matching, which indicates its low computational complexity. For the IDSC
[29] method, calculating the IDSC descriptor requires time O(N3) and matching a shape requires time O(KN2), where N
is the number of sample points on the shape contour and K denotes the number of possible starting points for alignment
used in shape contour matching. It is worth noting that in image retrieval applications, low computational complexity of
the online matching is far more important that of the ofine feature extraction. This is because the query shape is
required to extract its shape features online while the features of all the target shapes can be extracted ofine to be stored
in the database beforehand. Shape matching is then conducted online between the query shape and every target shape in
the database. The low time complexity of O(M log2 N) of the proposed method in shape matching indicates that it is very
suitable for online image retrieval.
Hierarchical coarse-to-ne representation structure: The proposed MARCH descriptor has a hierarchical structure which can
embody a coarse-to-ne description. At the coarse hierarchical level (corresponding to smaller values of the parameter K),
the arch height of a longer section of contour is measured for each contour point. For example, at K = 1, the contour length
139
of the measured arc is N2 1 129 points (assume that the contour has been sampled into 256 points). At this hierarchical
level, the extracted features represent global properties of the shape. At the ne hierarchical level (corresponding to larger values of the parameter K), the arch height of a shorter arc is measured. For example, at K = 7, the contour length used
to calculate the arch height is 256
1 3 points. At this level, the ner details of the shape contour are captured. In Fig. 3,
27
we used an example to illustrate this property of the proposed MARCH. This diagram shows four levels (K = 1, 3, 5, 7) of
arch height functions plotted for each of three leaf shapes. From human perception, the three shapes have similar global
features and shape A has rich details which are different from shape B or shape C. In Fig. 3, the comparisons of the arch
height functions for K = 1 and K = 3 indicate that the leaf contours have global similarity, while the comparisons of the
arch height functions for K = 5 and K = 7 highlights the differences between the ne details of the leaf contours.
5. Experiments
In this section, we rst evaluate the performance of the proposed MARCH method and compare it with the state-of-theart approaches against four publicly available leaf image datasets using Matlab on a PC platform. The proposed algorithm is
then ported to a mobile platform for the real application.
Capture
Image
Input Query
Feature Extracon
Display Matches
Always run on mobile device
Ranking
Matching
Leaf Features
and Photos
140
Thresholding
Raw
Image
Binary
Image
Extract Global
Shape Properes
Contour Extracon
x7x7
Final Descriptor
49x
3x
49x
Shape Props. | FFT of Arch Height Sign | FFT of Arch Height Magn.
Fig. 5. Feature extraction ow diagram of the proposed MARCH method.
141
MCC [3]
TAR [4]
IDSC [29]
TSLA [35]
Symbolic representation [10]
Shape tree [15]
Fourier descriptor
Proposed MARCH (l = 0.5)
Proposed MARCH (l = 0)
1280
8067
12,288
2400
9600
20
101
98
94.75
95.97
94.13*
96.53*
95.47*
96.28*
87.54
97.33
96.21
performance evaluation method as the MPEG-7 shape dataset [7], which sets an equal number of samples in each class, we
take the rst 26 leaf images from each species and discard the remainder. There are therefore a total of 220 26 = 5720
images included in the ICL subset used in our experiments. Fig. 8 shows the 26 samples of the Japanese maple in the ICL
dataset. The leaf classication and retrieval performances are evaluated on this dataset. For leaf classication, a similar performance evaluation method as Swedish leaf dataset is adopted, where for each species, half of the samples are taken as
training samples and the remaining images are used as testing cases.
Table 2 summarizes the results of the experiments of the proposed method and the state-of-the-art approaches. It can be
seen that the proposed method achieves the best classication rate (2.58% higher than the well-known IDSC [29]) and the
highest classication rate is obtained with fewer features used (only 0.48%, 4.6% and 0.73% of the number of the features
used by IDSC [29], MCC [3] and TAR [4]). The Fourier descriptor has the highest compactness (only 20 features), but also
the worst classication rate. The 60.08% of its classication rate is lower than the proposed method by 23.89%. If we discard
the 0-scale features (l = 0), the classication rate of the proposed MARCH, using only the arch-height descriptors, only
decreases slightly (86.0385.31% = 0.72%).
For evaluation of the retrieval performance, the common measurement of precision and recall [33,37,45], is used. In this
evaluation scheme, each leaf shape in the dataset is taken as the query and compared with all the samples (total 5720 comparisons for each shape). The precision and recall values on the top 26 matches are calculated and plotted as precisionrecall
curve, where precision is dened as the percentage of similar shapes retrieved with respect to the total number of retrieved
shapes, and recall is dened as the percentage of similar shapes retrieved with respect to the total number of similar shapes
in the dataset. In the precisionrecall plot, the horizontal axis corresponds to recall, the vertical axis corresponds to precision
and each algorithm is represented by a curve of 26 points. Each point in the curve is the average precision/recall values over
5720 queries. So, the top left point of the curve corresponds to the precision/recall values for best match, while the bottom
right point corresponds to the precision/recall values for the top 26 matches. Apart from the retrieval accuracy, the retrieval
speed is also evaluated. The computational time for each shape retrieval is the time of matching the query with all the 5720
shapes in addition the time required to extract the features of the query shape.
The precision versus recall curves of the proposed MARCH and the benchmark methods are shown in Fig. 9. The retrieval
speed comparisons of the proposed approach and the benchmark methods are summarized in Table 3. It can be seen that the
142
Table 2
Classication rates for the ICL dataset.
Algorithm
MCC [3]
TAR [4]
IDSC [29]
Fourier descriptor
Proposed MARCH (l = 0.5)
Proposed MARCH (l = 0)
1280
8067
12,288
20
101
98
73.17
78.25
81.39
60.08
86.03
85.31
proposed method and IDSC [29] have the highest precision and recall value on the top 26 matches. We can see that for the
top 18 matches, the proposed method is better than IDSC [29] and for the top 1926 matches, the proposed method is comparable with IDSC [29]. Retrieval speed is another important performance index for an algorithm. From Table 3, we can see
that the proposed method is very fast and retrieval of a leaf shape from a database of 5720 samples only requires 0.102 s
which is more than 500 times faster than IDSC [29], MCC [3] and TAR [4] (only 0.31%, 0.22% and 0.20% of the retrieval time
used by IDSC [29], MCC [3] and TAR [4]). The retrieval speed of the proposed method is within an order of magnitude of that
of the Fourier descriptor. However, Fourier descriptors have the worst retrieval accuracy among all the competing methods.
These experiments demonstrate that the proposed method is very suitable for leaf image retrieval on mobile devices.
5.1.3. Flavia leaf dataset
The third leaf dataset used in our experiments is the Flavia leaf dataset [44], which can be freely downloaded from the
web [16]. This dataset contains 1907 leaf images belonging to 32 different species (see Fig. 10) with the number of samples
in each ranging from 50 to 77. Several methods have been tested in [35,25] on the Flavia dataset. The same evaluation met-
Fig. 9. Precisionrecall diagram of the proposed MARCH and the state-of-the-art approaches for ICL dataset.
143
Table 3
Retrieval speed on the ICL dataset.
Algorithm
MCC [3]
TAR [4]
IDSC [29]
Fourier descriptor
Proposed MARCH
7.78 104
8.47 104
5.53 104
6.32 101
1.02 102
Fig. 10. Thirty-two samples from the Flavia dataset which consists of 1907 leaf images from 32 different species, one sample per species.
rics as those in [35,25] are used in our experiment, for more convenient comparisons. They use the Mean Average Precision
(MAP) and the precisionrecall curve. It is worth noting that the precisionrecall curve used in [35,25] is different from the
one used in the previous section. In this version, the precision is regarded as the function of recall, that is to say when the
recall value is given, the corresponding precision can be uniquely determined. In this measurement, let the recall value vary
from 0% to 100%, and the corresponding value of precision is calculated and plotted on a curve, shown in Fig. 11. For the MAP
measurement, for each query shape Q, the average precision score AP(Q) is dened as
PM
APQ
K1 Pk
f k
where P(k) is the precision at cut-off k in the list of retrieved shapes and f(k) is equal to 1 if the shape at rank k is relevant to
shape Q, and 0 otherwise, M is the number of retrieved shapes and N is the number of retrieved relevant shapes for Q. The
MAP score is obtained by averaging the AP score over all queries. The higher the MAP score is, the better is the performance.
Fig. 11. Precisionrecall curves on the Flavia dataset. The precision/recall curves of recent methods, TSLA [35], Riemannian metric [25] and MSDM [19], are
from the published results.
144
MAP (%)
MCC [3]
TAR [4]
IDSC [29]
Riemannian metric [25]
TSLA [35]
MSDM [19]
Fourier descriptor
Proposed MARCH (l = 0.5)
Proposed MARCH (l = 0)
53.55
52.84
59.86
57.21*
69.93*
47.91*
43.12
73.00
72.15
On the Flavia dataset tests, The MAP scores of MCC [3], TAR [4], IDSC [29], Riemannian metric [25], TSLA [35], Multiscale
distance matrix (MSDM) [19], Fourier descriptor and the proposed MARCH are listed in Table 4. The precisionrecall curves
for these approaches are plotted in Fig. 11. It can be seen from Table 4 that the proposed MARCH achieves the best MAP
scores and is more than 3% higher than the second best method, TSLA [35]. Removing the 0-scale features from the proposed
MARCH still obtains a high MAP score of 72.15%, which is better than the other compared methods. Referring to Fig. 11, we
can see that our method has the best precisionrecall curve among all of the compared methods.
5.1.4. ImageCLEF leaf dataset
The fourth leaf dataset used for our experiments is the ImageCLEF leaf recognition task dataset, available from [21]. This
dataset includes three classes of images, namely scanned, pseudo-scan or photograph. The scanned images have a
white background and minimal shadowing, the pseudo-scan images are photographs with a white backdrop, but may
exhibit heavy shadowing, and the images labeled with photograph have uncontrolled, busy backgrounds. Some examples
of these types of images are shown in Fig. 13. We use only the scanned images, which represents 57% of the database, as
they can be more reliably segmented using basic methods; segmentation is not the focus of this paper. There are 6630
images in this subset, from 115 species, with between 2 and 249 images for each species.
A single image from each of the 115 species in the subset is shown in Fig. 12, above. The performance metrics used are the
same as for the Flavia dataset, described in Section 5.1.3. The ImageCLEF database subset is more challenging than the other
datasets in previous sections, as can be seen from the precisionrecall graph in Fig. 14 and lower MAP scores in Table 5. This
is likely due to a number of different species having similar leaf shapes, and a number of differences between images from a
single species causing an increase in intra-class distances and a decrease in inter-class distances during matching. Fig. 13
shows some examples of these difcult match targets. In many cases, extra leaves are present on stems, or differing numbers
of leaves are present on compound leaves; this changes the contour drastically and reduces the accuracy of the contourbased identication methods.
As seen in the precisionrecall graph, MARCH outperforms all methods substantially, except for IDSC for recall rates
above 50%. In these cases it performs slightly worse than IDSC; however, remember that MARCH is over 500 times faster
and uses descriptors 120 times smaller than IDSC. The MAP scores again show that the MARCH technique clearly outperformed the Fourier descriptor, MCC and TAR methods, as well as performing marginally better than IDSC.
Another metric is used in [21] to gauge algorithm performance. It is an average classication score which takes into
account the individual tree and individual photographer of each leaf image. We will refer to this metric as the S-Score,
and use it to compare the results of the proposed method with those reported in [21]. The S-Score is dened as follows,
N
u;p
Pu
U
1X
1X
1 X
Su;p;n
U u1 Pu p1 Nu;p n1
10
Fig. 12. One hundred and fteen samples from the scanned subset of the ImageCLEF dataset which consists of 6630 leaf images from 115 different
species, one sample per species.
145
Fig. 13. Samples from the ImageCLEF scanned image dataset showing intraclass differences and interclass similarities. The rst row illustrates leaves from
the Common Hawthorne. Note the variety of shapes and additional leaves present on the stem. The second row depicts leaves from the Narrow-leafed ash
tree. The leaves shown in the third and fourth rows are all from different plant species.
Table 5
Mean Average Precision (MAP) and S-Score algorithm performance on the scanned subset
of the ImageCLEF dataset. Values marked with * are from the published results.
Algorithm
MAP (%)
MCC [3]
TAR [4]
IDSC [29]
Fourier descriptor
TSLA [35]
SABANCI OKAN run 1 [46]
SABANCI OKAN run 2 [46]
INRIA Imedia PlantNet run 1 [5]
Proposed MARCH
36.08
35.97
45.23
24.98
46.25
46.9
48.3
47.9
33.7
53*
58*
58*
49*
54.8
where U is the number of users/photographers having one or more images in the test data set, Pu is the number of individual
plants observed by the uth user, Nu,p is the number of pictures taken of the pth plant observed by the uth user and su,p,n is the
inverse of the rank of the rst correct match of the image from the pth plant taken by the uth user.
The S-Scores of the proposed method, the previously compared methods and the best methods reported in [21] are listed
in Table 5. It can be seen that the proposed MARCH method clearly outperforms the other contour-based methods. The best
algorithm in [21] achieves an S-Score of 58%, which is better than the proposed MARCH. This is because in on top of using
146
Fig. 15. Android mobile application screenshots matching an Aizoon Stonecrop herb leaf and the top 26 matches.
shape features, it utilizes many additional morphological and texture features, and takes advantage of training and uses a
nonlinear SVM as the classier. MARCH uses a concise set of features and a very simple and fast similarity metric, without
the need for training.
147
Table 6
Mobile device image processing times for a dataset of 5720
images from the ICL database.
Method
Processing time
Local database
277340 ms
Internet database
Feature extraction (on phone)
Search time
Search time (2M images)
190236 ms
5.3 ms
1.85 s
Neglecting time to upload and download data, using the online database provides a faster match speed. However, because
many remote locations can have very poor mobile internet connections, the time to download the results can be substantial.
The upload time is minimized by the compactness of the MARCH descriptor.
The mobile implementation of the system uses a linear search of the dataset features, while keeping track of the nearest K
features. This scales linearly with the size of the database, in terms of memory and time complexity. Using this approach, the
current implementation can handle up to 22,000 features before encroaching on the 16 MB application RAM limitation,
while still completing the feature extraction and search in less than 1 s. The time to load the database of this size into memory approached 7 s (it is less than 1.7 s for the 5720 image dataset). The loading time will therefore be the limiting factor in
scaling the mobile application database size. The loading time can be masked by loading the database in the background, as it
usually takes the user a number of seconds to select the retrieval method and take the picture, so they are unlikely to witness
any delay. For larger databases the implementation is more important and nave methods like the linear search will not sufce. Hashing methods may be employed to reduce the number of features required to be accessed from the database.
The linear search collecting a list of K nearest features scales linearly and was tested on our ASP.NET server for up to two
million features, taking 1.85 s on a single thread.
6. Conclusion
In this work, we have presented a novel shape description method, called multiscale-arch-height descriptor (MARCH) for
the mobile retrieval of leaf images. The proposed MARCH descriptor has the following desirable properties: invariance, compactness, low computational complexity and coarse-to-ne representation structure. The performance of the proposed
method has been evaluated on four leaf datasets, the Swedish leaf dataset, the Flavia leaf dataset, the ICL leaf dataset and
the ImageCLEF leaf dataset. The results of the classication and retrieval experiments show that the proposed method
obtains the highest classication rate with the smallest number of features used, and achieves the best retrieval performance
while being over 500 times faster than the state-of-the-art methods. These results indicate that the proposed algorithm is
very suitable for the retrieval of leaf images on resource- and power-limited mobile devices. The proposed algorithm has
been tested on a mobile platform and the results indicate that the proposed method, due to its small memory footprint
and very low computational complexity, is highly suitable to the practical application of mobile device leaf recognition.
Acknowledgements
This work was partially supported by National Natural Science Foundation of China (Grant No. 61372158), Major project
of Natural Science Foundation of Jiangsu Higher Education Institutions of China (Grant No. 11KJA520004), Project Funded by
the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Natural Science Foundation of
Jiangsu province of China (Grant No. BK20141487), and National Center for International Joint Research on E-Business Information Processing under Grant 2013B01035.
References
[1] S. Abbasi, F. Mokhtarian, J. Kittler, Curvature scale space image in shape similarity retrieval, Multimedia Syst. 7 (1999) 467476.
[2] S. Abbasi, F. Mokhtarian, J. Kittler, Reliable classication of chrysanthemum leaves through curvature scale space, Lect. Notes Comput. Sci. 1252 (1997)
284295.
[3] T. Adamek, N.E. OConnor, A multiscale representation method for nonrigid shapes with a single closed contour, IEEE Trans. Circ. Syst. Video Technol.
14 (2004) 742753.
[4] N. Alajlan, I. El Rube, M.S. Kamel, G. Freeman, Shape retrieval using triangle-area representation and dynamic space warping, Pattern Recogn. 40 (2007)
19121920.
[5] V. Bakic, I. Yahiaoui, S. Mouine, et al., Inria IMEDIA2s participation at ImageCLEF 2012 plant identication task, in: CLEF (Online Working Notes/Labs/
Workshop), 2012.
[6] P.N. Belhumeur, D. Chen, S. Feiner, D.W. Jacobs, W.J. Kress, H. Ling, I.C. Lopez, R. Ramamoorthi, S. Sheorey, S. White, L. Zhang, Searching the Worlds
Herbaria: a system for visual identication of plant species, in: ECCV 2008, Part IV, LNCS 5305, 2008, pp. 116129.
[7] M. Bober, J.D. Kim, H.K. Kim, Y.S. Kim, W.-Y. Kim, K. Muller, Summary of the Results in Shape Descriptor Core Experiment, MPEG-7, ISO/IEC/JTC1/SC29/
WG11/MPEG99/M4869, Vancouver, July 1999.
[8] M.A.Z. Chahooki, N.M. Charkari, Shape classication by manifold learning in multiple observation spaces, Inform. Sci. 262 (2014) 4661.
148
[9] J.S. Cope, D. Corney, J.Y. Clark, P. Remagnino, P. Wilkin, Plant species identication using digital morphometrics: a review, Expert Syst. Appl. 39 (2012)
75627573.
[10] M.R. Daliri, V. Torre, Robust symbolic representation for shape recognition and retrieval, Pattern Recogn. 41 (2008) 17821798.
[11] Displaying Bitmaps Efciently. <http://developer.android.com/training/displaying-bitmaps/> (accessed 20.01.13).
[12] J.X. Du, D.S. Huang, X.F. Wang, X. Gu, Computer-aided plant species identication (CAPSI) based on leaf shape matching technique, Trans. Inst. Meas.
Contr. 28 (2006) 275284.
[13] S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts, IEEE Trans. Pattern Anal. Mach. Intell. 24 (2002) 509522.
[14] J.X. Du, X.F. Wang, G.J. Zhang, Leaf shape based plant species recognition, Appl. Math. Comput. 185 (2007) 883893.
[15] P.F. Felzenszwalb, J.D. Schwartz, Hierarchical matching of deformable shapes, in: IEEE International Conference on Computer Vision and Pattern
Recognition, vol. 1, 2007, pp. 18.
[16] Flavia, Flavia, a Leaf Recognition Algorithm for Plant Classication using PNN, 2007. <http://avia.sourceforge.net> (accessed 01.11.13).
[17] Gartner, Mobile Operating System, January 15, 2013. <http://en.wikipedia.org/wiki/Mobile_operating_system> (accessed 20.01.13).
[18] D.J. Hearn, Shape analysis for the automated identication of plants from images of leaves, Taxon 58 (2009) 934954.
[19] R. Hu, W. Jia, H. Ling, D. Huang, Multiscale distance matrix for fast plant leaf recognition, IEEE Trans. Image Process. 21 (2012) 46674672.
[20] C. Im, H. Nishida, T.L. Kunii, Recognizing plant species by leaf shapes a case study of the Acer family, Proc. Pattern Recogn. 2 (1998) 11711173.
[21] ImageCLEF, Plant Identication, 2012. <http://imageclef.org/2012/plant> (accessed 01.11.13).
[22] Intelligent Computing Laboratory, Chinese Academy of Sciences Homepage. <http://www.intelengine.cn/English/dataset>.
[23] H. Kauppinen, T. Seppanen, M. Pietikainen, An experimental comparison of autoregressive and Fourier-based descriptors in 2-D shape classication,
IEEE Trans. PAMI 17 (1995) 201207.
[24] N. Kumar, P.N. Belhumeur, A. Biswas, D.W. Jacobs, W.J. Kress, I. Lopez, J.V.B. Soares, Leafsnap: a computer vision system for automatic plant species
identication, in: ECCV 2012 Part II, 2012, pp. 502516.
[25] H. Laga, S. Kurtek, A. Srivastava, M. Golzarian, S. Miklavcic, A Riemannian elastic metric for shape-based plant leaf classication, in: Digital Image
Computing: Techniques and Applications, 2012, pp. 17.
[26] L.J. Latecki, R. Lakmper, U. Eckhardt, Shape descriptor for non-rigid shapes with a single closed contour, in: IEEE International Conference on
Computer Vision and Pattern Recognition, vol. 1, 2000, pp. 424429.
[27] C.L. Lee, S.Y. Chen, Classication for leaf images, in: Proc. 16th IPPR Conf. Comput. Vision Graphics Image, Process, 2003, pp. 355362.
[28] C.L. Lee, S.Y. Chen, Classication of leaf images, Int. J. Imag. Syst. Technol. 16 (2006) 1523.
[29] H. Ling, D.W. Jacobs, Shape classication using the inner-distance, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 286299.
[30] S. Loncaric, A survey of shape analysis techniques, Pattern Recogn. 31 (1998) 9831001.
[31] J.M. Martinez, MPEG-7 Overview (Version 9), Technical Report ISO/IEC JTC1/SC29/WG11N5525, ISO/IEC JTC1/SC29/WG11, International Organisation
for Standardisation, Coding of Moving Pictures and Audio, March 2003.
[32] T. Mclellan, J.A. Endler, The relative success of some methods for measuring and describing the shape of the complex objects, Syst. Biol. 47 (1998) 264
281.
[33] E. Milios, E.G. M Petrakis, Shape retrieval based on dynamic programming, IEEE Trans. Image Process. 9 (2000) 141147.
[34] F. Mokhtarian, S. Abbasi, Matching shapes with self-intersection: application to leaf classication, IEEE Trans. Image Process. 13 (2004) 653661.
[35] S. Mouine, I. Yahiaoui, A. Verroust-Blondet, A shape-based approach for leaf classication using multiscale triangular representation, in: Proceedings of
the 3rd ACM International Conference on International Conference on Multimedia Retrieval, 2013, pp. 127134.
[36] Nielsen, Nielsen Tops of 2012: Digital, December 20, 2012. <http://blog.nielsen.com/nielsenwire/online_mobile/nielsen-tops-of-2012-digital/>
(accessed 03.01.13).
[37] T.B. Sebastian, P.N. Klein, B.B. Kimia, On aligning curves, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 116124.
[38] O. Sderkvist, Computer Vision Classication of Leaves from Swedish Trees, Masters Thesis, Linkping University, 2001.
[39] B. Wang, Shape retrieval using combined Fourier features, Opt. Commun. 284 (2011) 35043508.
[40] J.W. Wang, X. Bai, X.G. You, W.Y. Liu, L.J. Latecki, Shape matching and classication using height functions, Pattern Recogn. Lett. 33 (2012) 134143.
[41] B. Wang, D. Brown, Y. Gao, J.L. Salle, Mobile plant leaf identication using smart-phones, in: IEEE International Conference on Image Processing (ICIP),
2013, pp. 44174421.
[42] Z. Wang, Z. Chi, D. Feng, Shape based leaf image retrieval, IEE Proc. Vis. Image Signal Process. 150 (2003) 3443.
[43] X.F. Wang, D.S. Huang, J.X. Du, H. Xu, L. Heutte, Classication of plant leaf images with complicated background, Appl. Math. Comput. 205 (2008) 916
926.
[44] S. Wu, F. Bao, E. Xu, Y.-X. Wang, Y.-F. Chang, Q.-L. Xiang, A leaf recognition algorithm for plant classication using probabilistic neural network, in: IEEE
International Symposium on Signal Processing and Information Technology, 2007, pp. 1116.
[45] C. Xu, J. Liu, X. Tang, 2D shape matching by contour exibility, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009) 180186.
[46] B.A. Yanikoglu, E. Aptoula, C. Tirkaz, Sabanci-Okan system at ImageClef 2012: combining features and classiers for plant identication, in: CLEF
(Online Working Notes/Labs/Workshop), 2012.
[47] T. Zahn, R.Z. Roskies, Fourier descriptor for plane closed curves, IEEE Trans. Comput. 21 (1972) 269281.
[48] D.S. Zhang, G. Lu, A comparative study of curvature scale space and Fourier descriptor, J. Vis. Commun. Image Represent. 14 (2003) 4180.
[49] D.S. Zhang, G.J. Lu, Review of shape representation and description techniques, Pattern Recogn. 37 (2004) 119.