Sunteți pe pagina 1din 8

2011 IEEE International Symposium on Multimedia

Blood Cell Image Classification Based on Hierarchical SVM

Wei-Liang Tai1,*, Rouh-Mei Hu2,5, Han C.W. Hsiao1, Rong-Ming Chen3, and Jeffrey J. P. Tsai1,4
1
Department of Biomedical Informatics, Asia University, Taiwan
2
Department of Biotechnology, Asia University, Taiwan
3
Department of Computer Science and Information Engineering, National University of Tainan, Taiwan
4
Department of Computer Science, University of Illinois at Chicago, USA
5
School of Chinese Medicine, China Medical University, Taiwan
*
Email: taiwl@cs.ccu.edu.tw

Abstract—The problem of identifying and counting blood cells However, if differential counting of blood cells is only
within the blood smear is of both theoretical and practical performed by human experts, it is very tedious and time-
interest. The differential counting of blood cells provides consuming. Moreover, the results can differ according to the
invaluable information to pathologist for diagnosis and subjective view of each expert. Although currently available
treatment of many diseases. In this paper we propose an automated cell counters are based on laser-light scattering
efficient hierarchical blood cell image identification and and flow-cytochemical principles, 21% of all processed
classification method based on multi-class support vector blood samples require microscopic review by experts [1].
machine. In this automated process, segmentation and Therefore, various efforts have already been made to develop
classification of blood cells are the most important stages. We
automated cell analysis systems by using image processing.
segment the stained blood cells in digital microscopic images
and extract the geometric features for each segment to identify
and classify the different types of blood cells. The experimental
results are compared with the manual results obtained by the
pathologist, and demonstrate the effectiveness of the proposed
method.

Keywords-blood cells classification; multi-class support


vector machine; feature extraction
(a) Erythrocyte (b) Neutrophil (c) Eosinophil
I. INTRODUCTION
The counting and assessment of the blood cells are very
informative in clinical practice. It is particularly important
for patients suffering from blood disorders in the
observation and development stages and the preparation of
the blood disorders treatment. To perform proper diagnosis
(d) Basophil (e) Monocyte (f) Lymphocyte
of the disease, we have to recognize the blood cells and
calculate their relative quantity in the blood samples.
In the peripheral blood, blood cells can be differentiated
into erythrocytes (red blood cells, RBCs), leukocytes (white
blood cells, WBCs), and thrombocytes (platelets). Red blood
cells are essential for oxygen transportation. Platelets play
important roles in the blood clotting. Leukocytes are the first
line of defense of the immune system. Many blood disorders, (g) Thrombocyte
such as leukemia, acute infection and inflammation, lead to Figure 1. The micrographs of the blood cells.
an abnormal proliferation of certain types of white blood
cell. Differential counting of the leukocytes can provide Theera-Umpon and Dhompongsa [2] applied two
valuable information for accurate disease diagnosis in the conventional classifiers, the Bayes classifier and neural
case blood disorders. In terms of the size and shape of the networks, to classify leukocytes by using four granulometric
nucleus, the color of the cytoplasmic staining, and nucleus features without cytoplasm. Ramoser [3] presented
percentage ratio of nucleus to cytoplasm, leukocytes can be an automated approach to a WBCs classification method that
classified into five major types: neutrophils, eosinophils, uses a pairwise SVM classifier to catalogue cytoplasm and
basophils, monocytes, and lymphocytes (Fig. 1 (b)-(f)). nucleus features. Gonzalez et al. [4] proposed a two-phase
methodology to analyze the morphology of abnormal

978-0-7695-4589-9/11 $26.00 © 2011 IEEE 129


DOI 10.1109/ISM.2011.29
leukocytes images for the classification of acute leukemia basophil has a diameter of 8–10 μm. The nucleus is
subtypes using image processing and data mining frequently bi-lobed to multi-lobed. After staining, the
techniques. Ko et al. [5] presented a novel WBCs cytoplasm of basophils is full of large, deep-blue to purple
classification method that combines a few characteristic stained granules.
features of WBCs and the random forests classifier. Most Monocytes
existing methods focus on the classification of leukocytes, Monocytes (Fig. 1(e)) are large, circulating, phagocytic
however, identification of erythrocytes and thrombocytes is white blood cells. Monocytes constitute from 2 to 10% of the
also important for disease diagnosis. In this study, we leukocytes in humans. A typical monocyte has a diameter of
propose an automated classification of blood cells method, 12–15 μm. They have the kidney shaped nucleus. The
including erythrocytes, leukocytes, and thrombocytes, that cytoplasm is abundant and light blue.
combines a few characteristic features of blood cells and the Lymphocytes
multi-class SVM classifier [6]. Furthermore, we use a There are two major types of lymphocytes, B cells and T
hierarchical strategy to improve the accurate recognition rate. cells, circulating in the circulatory system. Lymphocytes
Experimental results demonstrate that our classification (Fig. 1(f)) constitute from 28 to 33% of the leukocytes in
method is more robust than the conventional multi-class humans. Microscopically, a normal lymphocyte has a
support vector machine. diameter of 8–12μm (approximately the size of a red blood
The remainder of this paper is organized as follows. cell) and has a large, dark-staining nucleus with a small
Section II gives a short description of the blood cells. The cytoplasm to nucleus ratio.
proposed automated classification method for blood cells is Thrombocytes (platelets)
described in Sections III. Section IV shows the identification Thrombocytes (Fig. 1(g)) are small cell fragments
results and evaluates the accuracy recognition rate. Finally, without nucleus. In a blood smear stained by Giemsa,
conclusions and future work are concluded in Section V. platelets have an intense purple color. They are 2–3 μm in
diameter, which are much smaller than erythrocytes. No
II. THE BLOOD CELLS obvious granules can be observed in the cytoplasm.
The three types of peripheral blood cells (erythrocytes, Microscopic blood cells analysis is very useful for
leukocytes and platelets) are descendants of hematopoietic identifying or diagnosing many types of diseases. One can
stem cells in the bone marrow and lymphocytes from the recognize the seven different blood cells types via their
lymph system. Different categories of blood cell exhibit cytoplasm, granules, staining properties of the granules, sizes
different morphologic characteristics that can be served as and shapes of cells, the ratio of nucleus to cytoplasm, and the
markers for automatic classification. In the following, we type of nucleus lobes. Therefore, developing an automatic
will deal with the different blood cells. blood cells recognition system is feasible via image
Erythrocytes (Red blood cells, RBCs) processing and pattern recognition techniques.
The erythrocytes (Fig. 1(a)) are the most abundant blood
cell with a disk diameter of 6-8 μm and a thickness of 2 μm. III. THE PROPOSED METHOD
A typical human erythrocyte is devoid of nucleus and Peripheral blood cells consist of five types of leukocytes
exhibits the shape of a biconcave lens. Their cytoplasm is along with erythrocytes and thrombocytes. The differential
rich in hemoglobin that provides a typical red color of the counting of blood cells provides invaluable information to
cells. pathologist for diagnosis and treatment of many diseases. In
Neutrophils terms of the features of the nucleus and cytoplasm, blood
Neutrophils (Fig. 1(b)) are the most common white blood cells can be classified into seven types: erythrocytes,
cells seen in acute inflammation, playing important roles in neutrophils, eosinophils, basophils, monocytes, lymphocytes,
the defense against bacterial and fungal infection. They make and thrombocytes (Fig. 1 (a)-(g)). Therefore, identification
up 54–62% of total leukocyte count. A typical neutrophil has and recognition of blood cells are essential for accurate
a diameter of 12–15 μm. The nucleus is frequently multi- disease diagnosis. Our automated recognition of blood cells
lobed. After H&E staining, the cytoplasm of neurophils has in microscopic images consists of four major steps,
very tiny faintly pink stained granules with low visibility. including: preprocessing, image segmentation, feature
Eosinophils extraction and classification. The flowchart of automatic
Eosinophils (Fig. 1(c)) primarily deal with parasitic recognition is illustrated in Fig. 2.
infections and are capable to attack parasites and phagocyte
antigen-antibody complexes. They make up 1–6% of total A. Image Pre-processing
leukocyte count. A typical eosinophil has a diameter of 10– The pre-processing stage includes noise reduction and
12 μm. The nucleus is frequently bi-lobed. After H&E contrast enhancement of acquired image and is essentially
staining, the cytoplasm of eosinophils is full of orange-red performed in order to prepare the image for the following
stained granules. segmentation stage.
Basophils De-noising
Basophils (Fig. 1(d)) are chiefly responsible for allergic Images are usually degraded by various noises in the
and antigen response by releasing the chemical histamine. signal transmission. De-noising an image is of great
They are relatively rare in the peripheral blood and represent importance in image processing since the results of image
about 0.5% to 1% of circulating leukocytes. A typical processing such as image segmentation, feature extraction

130
and image recognition will to a great extent depend on the B. Image Segmentation
noise removal results. The median filter is a non-linear Image segmentation consists basically on partitioning an
digital filtering technique, often used to remove noise from image into a set of disjoint and homogeneous regions which
images or other signals. The main idea of the median filter is are supposed to correspond to image objects that are
to run through the signal entry by entry, replacing each entry meaningful to a certain application. Thus, the segmentation
with the median of neighboring entries. The algorithm can process is based on using thresholding, morphology, and
not only remove noise, but also can keep the edge and sharp watershed to enclose every element in the blood slide in a
details of the image well. Fig. 3(a) shows a zoom-in version distinct area.
of the blood cell image. The corresponding de-noising result
is well illustrated in Fig. 3(b).

(a) (b)

(c) (d)
Figure 4. Color channel. (a) Original blood cell image. (b) Red channel.
Figure 2. The flowchart of automatic recognition of blood cells. (c) Green channel. (d) Blue channel.

Binary
In order to segment the desired object from the
background, we need to generate a binary image that
separates foreground and background image pixels. To
produce a representative binary image, Otsu’s adaptive
threshold algorithm [7] is applied on the green channel to
classify all the pixels into two classes. Otsu’s method
exhaustively searches for the threshold Tc that minimizes the
within-class variance, defined as a weighted sum of
variances of two classes:
ܶ௖ ൌ ‫݃ݎܣ‬൫݉݅݊଴ஸ௧ழ௅ ൫‫݌‬ଵ ሺ‫ݐ‬ሻߪଵଶ ሺ‫ݐ‬ሻ ൅ ‫݌‬ଶ ሺ–ሻߪଶଶ ሺ‫ݐ‬ሻ൯൯,
(a) (b) where the weight pi is the probability of a pixel in the i-th
Figure 3. Median filter. (a) The zoom-in version of the blood cell image. class separated by a threshold t andʳ ߪ௜ଶ the variance of
(b) The de-noising result of (a). pixels’ gray level intensities in the i-th classes. Fig. 5 shows
the output binary image produced corresponding to that
Color Split Channel shown in Fig. 4(c).
The blood smear may be stained by different color dyes.
To avoid being influenced by dye color, all blood smear
images were first transformed into gray level. A typical
peripheral blood smear image consists of four components,
which are the background, erythrocytes, leukocytes, and
thrombocytes. Leukocytes appear rather darker than the
background, and erythrocytes appear in an intermediate
intensity level. To segment the desired object from the
background, it is found that the green component of the RGB Figure 5. The binary blood cell image generated by Otsu’s threshold
input image gives the best contrast between the background algorithm.
and the blood cells components, as shown in Fig. 4. As a
result, the green channel is used to segment the blood cells in Mathematical Morphology
our proposed method. Mathematical morphology operations [8] are nonlinear,
translation invariant transformations. The basic
morphological operations involving an image S and a
structuring element E are

131
erosion: ܵ ٚ ‫ ܧ‬ൌ ‫ځ‬ሼܵ െ ݁ ‫ܧ א ݁ ׷‬ሽ, addition, thrombocytes are much smaller
s than other blood
dilation: ܵ ْ ‫ ܧ‬ൌ ‫ڂ‬ሼ‫ ܧ‬൅ ‫ܵ א ݏ ׷ ݏ‬ሽ, cells. As a result, we can use area to distinguish
where ģ and Ж ʳ denote the set intersecction and union, thrombocytes from blood cells.
respectively. E + s denote the translation of a set E by a point Histogram
s. The opening and closing derived from the erosion and Note that a typical human errythrocyte is devoid of
dilation are defined by nucleus and exhibits the shape of a biconcave lens. Hence,
opening: ܵ ‫ ܧ ל‬ൌ ሺܵ ٓ ‫ܧ‬ሻ ْ ‫ܧ‬, we use the red channel to segment thet nucleus. Fig. 7 shows
the histogram of Fig. 4(b). Erythro ocytes can be accurately
closing: ܵ ȉ ‫ ܧ‬ൌ ሺܵ ْ ‫ܧ‬ሻ ٓ ‫ܧ‬,
recognized since the nuclei are located
l at low intensity
Mathematical morphology operations arre used to fill the
levels. As shown in Fig. 7, the histogram helps us distinguish
holes in blood cells and to remove the unwannted points in the
between erythrocytes and leukocytes.
red blood cells and background.
Watershed
The objective of watershed segmentationn [9] is to find all
of the highest gray levels, which are calledd watershed lines.
The simplest way to explain watershed seggmentation is the
“immersion approach.” Imagine that a hole is drilled in each
minimum of the surface, and we flood waater into different
catchment basins from the holes. If the w water of different
catchment basins is likely to merge due to fuurther immersion, Figure 7. The histogram of Fig. 4(b).
a dam is built to prevent the merging. This flooding process
will eventually reach a stage when only thee top of dam (the Circularity
watershed lines) is visible above the wateer line. Thus, in Leukocytes can be divided d in two categories:
order to separation of overlapping cells, wattershed transform granulocytes and agranulocytes. Agranulocytes include
is applied on distance transform of binarry mask of cells monocytes and lymphocytes, which w belong to the
having larger area. Fig. 6 shows the watershhed segmentation mononuclear cell group. Thus, the circularity of the nucleus
result for the blood cell image. helps us distinguish between granulo ocytes and agranulocytes,
as shown in Fig. 1.
‫ܣ‬௜
‫ ݕݐ݅ݎ݈ܽݑܿݎ݅ܥ‬ൌ Ͷ ൈ Ɏ ൈ ଶ ǡ
ܲ௜
where Ai denotes the size of the lab beled blood cell i and Pi
denotes the perimeter of the nucleuss in labeled blood cell i. A
circularity value of 1.0 indicates a peerfect circle. As the value
(a) (b)
approaches 0.0, it indicates an increasingly elongated
polygon.
Figure 6. Image segmentation. (a) The binary image of Fig. 4(c). (b) The Cytoplasm Ratio
watershed segmentation result of (aa).
For agranulocytes, leukocytes are characterized by the
C. Feature Extraction apparent absence of granules in th heir cytoplasm. The cells
include monocytes and lymphocyttes, which belong to the
To obtain the individual objects of innterest from the mononuclear cell group. As a result, r the ratio of the
background in the segmentation process, we remove the cytoplasm to the cell can be used d to distinguish between
border touching cells obtained in binary image and then monocytes and lymphocytes.
perform labeling the segmented binary imagge. Our approach ‫ܥ‬௜
is to define a set of machine measurablee features on the ‫݋݅ݐܴܽ݉ݏ݈ܽ݌݋ݐݕܥ‬ ‫ ݋‬ൌ ǡ
blood cell image and then, using a traininng set of cells, to ‫ܣ‬௜
where Ai denotes the size of the lab beled blood cell i and Ci
partition the resulting feature space. T To improve the
denotes the cytoplasm area of the lab beled blood cell i.
recognition rate, a hierarchical strategy is used in our
Color of Cytoplasm
proposed method. For fast and efficient cclassification, we
There are three types of grranulocytes: neutrophils,
extract five features: area, histogram, circuularity, cytoplasm
eosinophils, and basophils, which are named according to
ratio, and color of cytoplasm.
their staining properties. Thus, fo or each granulocyte, we
Area
extract the average and standard d deviation of the Hue
For the global shape feature, we use tthe ratio of each
component in HSI color space instead of the color histogram
blood cell size to the average blood cell size.
݇ ൈ ‫ܣ‬௜ in order to reduce feature dimensions. The color component
‫ ܽ݁ݎܣ‬ൌ ௞ ǡ H denotes the property of colors by which they can be
σ௜ୀଵ ‫ܣ‬௜ perceived as ranging from red thro ough yellow, green, and
where Ai denotes the size of the labeled blood cell i and k blue, as determined by the dominantt wavelength of the light.
denotes the number of entire blood cells. F From Fig. 4(a), it ሾሺܴ െ ‫ܩ‬ሻ ൅ ሺܴ െ ‫ܤ‬ሻሿ
can be found that the erythrocytes are thee most numerous ‫ ܪ‬ൌ …‘• ିଵ ቈ ቉ǡ
blood cells, and leukocytes are bigger thann erythrocytes. In ʹඥሺܴ െ ‫ܩ‬ሻଶ ൅ ሺܴ ሺ െ ‫ܤ‬ሻሺ‫ ܩ‬െ ‫ܤ‬ሻ

132
where (R, G, B) denotes the color component of the pixel blood cells can be distinguished into two types,
value. thrombocytes and erythrocytes, leukocytes by the feature
“area.” Next, we can use the feature “histogram” to identify
D. SVM Classification erythrocytes and leukocytes. For leukocytes, we can use the
Support vector machine (SVM) [10] is a concept for a set feature “circularity” to identify granulocytes and
of related supervised learning methods that analyze data and agranulocytes due to agranulocytes belong to the
recognize patterns, used for classification and regression mononuclear cell group. In the following, we use the feature
analysis. The main advantage of the SVM network used as a “color of cytoplasm” to distinguish granulocytes into
classifier is its very good generalization ability and neutrophils, eosinophils, and basophils. Finally, monocytes
extremely powerful learning procedure, leading to the global and lymphocytes can be recognized by the feature
minimum of the defined error function. “cytoplasm ratio.”
Given instances xi, i=1, …, l with labels ‫ݕ‬௜ ‫ א‬ሼെͳǡ ͳሽ, the Blood Cells
main task in training SVMs is to solve the following
quadratic optimization problem [11]: Area
ͳ
‹ ݂ሺߙሻ ൌ ߙ ் ܳߙ െ ݁ ் ߙ
ఈ ʹ Erythrocytes & Leukocytes Thrombocytes
•—„Œ‡…––‘Ͳ ൑ ߙ௜ ൑ ‫ܥ‬ǡ ݅ ൌ ͳǡ ǥ ǡ ݈,
‫ ߙ ் ݕ‬ൌ Ͳ, Histogram
where e is the vector of all ones, C is the upper bound of all
variables, Q is an l by l symmetric matrix with Qij = yiyjK(xi, Erythrocytes Leukocytes
xj), and K(xi, xj) is the kernel function.
The most known kernel functions are the radial Gaussian Circularity
basis, polynomial, spline, or sigmoidal functions. The final
learning problem of the SVM is transformed to the solution
Granulocytes Agranulocytes
of the so-called dual problem defined with respect to the
Lagrange multipliers [12]: Color of Cytoplasm Cytoplasm Ratio
௟ ௟ ௟
ͳ
ƒš ܳሺߙሻ ൌ ෍ ߙ௜ െ ෍ ෍ ߙ௜ ߙ௝ ‫ݕ‬௜ ‫ݕ‬௝ ‫ܭ‬൫‫ݔ‬௜ ǡ ‫ݔ‬௝ ൯
ʹ Neutrophils Eosinophils Basophils Monocytes Lymphocytes
௜ୀଵ ௜ୀଵ ௝ୀଵ
with the constrains (i=1, …, l) Figure 8. The hierarchical strategy of the proposed method.
σ௟௜ୀଵ ߙ௜ ‫ݕ‬௜ ൌ Ͳ ǡͲ ൑ ߙ௜ ൑ ‫ܥ‬.
The output signal s(x) of the SVM after learning is IV. EXPERIMENTAL RESULTS
described in the form [11], To obtain a better understanding of how different images
‫ݏ‬ሺ‫ݔ‬ሻ ൌ σ௟௜ୀଵ ߙ௜ ‫ݕ‬௜ ‫ܭ‬ሺ‫ݔ‬௜ ǡ ‫ݔ‬ሻ ൅ ܾ, affect the performance of the proposed scheme, we present
where b is the bias and the vector x represents the class when some results in a statistics form. In this section, we describe
s(x) is positive and the alternative class when s(x) is the image acquisition, classification performance evaluation,
negative. The hyperparameter of the kernel function and the and the experimental results and analysis.
regularization constant C have been adjusted by repeating the
A. Image Acquisition
learning experiments for the set of their predefined values
and choosing the best value on the validation data sets. Their Twenty human peripheral blood smear sides were
optimal values are those for which the classification error on screened under a Leica microsystem by at least two
the validation data set was the smallest. biologists and 210 images were taken for testing. Each
The one-against-one method [13] is applied to deal with imgage contains at least one leukocyte. Since basophil is
the problem of multiple classes. The maximum voting of the relatively rare in the blood of a healthy individual, some
multiple classes is used to find the final classification results. testing basophil images were retrieved from [14, 15]. There
During the training phase, the models of the multiple classes are 50, 91, 9, 10, 20, 77, and 50 objects for all seven cell
SVMs are learned from training data. In the testing phase, classes, erythrocytes, neutrophils, eosinophils, basophils,
the learned models are employed to generate multiple sets of monocytes, lymphocytes, and thrombocytes, respectively.
predictions for each test sample. The one having the largest Meanwhile, the images were taken of peripheral blood
prediction is the final decision. smears using a microscope, charge-coupled device (CCD)
From earlier literatures, we found that it is hard to camera, and 24-bit digitizer.
accurately distinguish blood cells into seven classes by using B. Performance Evaluation
the single-stage SVM classification. Thus, we propose the
hierarchical SVM classification to improve the recognition When referring to the performance of a classification
ratio. Fig. 8 illustrates our proposed hierarchical strategy. For model, we are interested in the ability of the model to
fast and efficient classification, five features, area, histogram, correctly predict the classes. Generally, we evaluate a
circularity, cytoplasm ratio, and color of cytoplasm, are classifier’s performance using the terms true positive, true
extracted for the following SVM training. For the first level, negative, false positive, and false negative, which compare

133
the predicted class of an item with the actual class. As arguments for prediction. The testing data and training data
illustrated in Table I, a predictive model may result in the are scaled with the same range [-1, 1]. The classification
confusion matrix when tested on independent data. There are results are evaluated in terms of the traditional classification
four possible predicted outcomes from a binary classifier. If rate and the class-wise classification rate.
the outcome from a prediction is positive and the actual First, to retrieve the features for training and prediction,
value is also positive, then it is called a true positive (Tp); we need to segment the cell objects from the blood cells
however if the actual value is negative then it is said to be a images. Because leukocytes consist of a nucleus and
false positive (Fp). Conversely, a true negative (Tn) has cytoplasm, for each color image, the green channel and the
occurred when both the predicted outcome and the actual red channel are applied to segment the entire cell and the
value are negative, and false negative (Fn) is when the nucleus, respectively. Fig. 9 shows the nucleus and
predicted outcome is negative while the actual value is cytoplasm segmentation result of a neutrophil. For the
positive. training classifier, the feature vector of entire cells had 2
feature dimensions. The feature vector of the nucleus had 1
TABLE I. THE CONFUSION MATRIX feature dimensions and the feature vector of the cytoplasm
Actual class had 3 feature dimensions.
Positive Negative
True positive False positive
Positive
Predicted (Tp) (Fp)
class True negative False negative
Negative
(Tn) (Fn)
For evaluating the effectiveness of our proposed
hierarchical strategy, the benchmark metric includes
precision rate (PR) and recall rate (RR). Precision is a
measure of the accuracy provided that a specific class has
been predicted. It is defined by
ܶ‫݌‬
ܴܲ ൌ ǡ (a) (b)
ܶ‫ ݌‬൅ ‫݌ܨ‬
where Tp and Fp are the numbers of true positive and false
positive predictions for the considered class. Recall is a
measure of the ability of a prediction model to select
instances of a certain class from a data set. It is also called
sensitivity, and corresponds to the true positive rate.
ܶ‫݌‬
ܴܴ ൌ ǡ
ܶ‫ ݌‬൅ ‫݊ܨ‬
where Tp and Fn are the numbers of true positive and false
negative predictions for the considered class. Tp + Fp is the
total number of test examples of the considered class.
As a result, precision measures the exactness of a (c) (d)
classifier. A higher precision means less false positives, Figure 9. The nucleus and cytoplasm segmentation result. (a) A
while a lower precision means more false positives. Recall neutrophil. (b) Cell segmentation. (c) Nucleus segmentation. (d) Cytoplasm
measures the completeness, or sensitivity, of a classifier. segmentation.
Higher recall means less false negatives, while lower recall
means more false negatives. Precision and recall are TABLE II. THE FEATURE VALUES OF DIFFERENT BLOOD CELLS TYPES
competing measures. An attempt to maximize precision
usually leads to lower recall values, and vice versa. The goal Feature
of this research, therefore, is to create a classifier with both Cell type Histo Circula Color of
Cytopl
high precision and recall. Area asm
gram rity cytoplasm
ratio
C. Results and Analysis Erythrocytes 1.09 0
We use 2 objects for each class for training. For feature Neutrophils 3.34 1 0.269
1: 1.221
selection and performance comparison, we include 307 2: 7.324
objects to form a dataset (50 from erythrocytes, 91 from 1: 1.126
Eosinophils 3.26 1 0.363
2: 12.585
neutrophils, 9 from eosinophils, 10 from basophils, 20 from 1: 0.505
monocytes, 77 from lymphocytes, and 50 from Basophils 3.05 1 0.342
2: 21.765
thrombocytes) and examine the robustness of our method. Monocytes 1.96 1 0.845 0.512
In the experiments, we set type of SVM as C-SVC and Lymphocytes 1.04 1 0.876 0.148
use the radial basis function kernel in LIBSVM [11, 13]. Platelets 0.18
Besides, we use cross validation to find the optimal

134
Table II presents the feature values computed for Figure 11. Performance comparison between our hierarchical strategy and
different types of segmented cells. For the first level, single-stage SVM. Numbers labeling graphs represent blood cells types: 1.
erythrocytes, 2. neutrophils, 3. eosinophils, 4. basophils, 5. monocytes, 6.
thrombocytes are much smaller than other blood cells. Thus, lymphocytes, 7. thrombocytes.
we can use “area” to distinguish thrombocytes from blood
cells. Next, we can use the feature “histogram” to identify In the following, we compare our hierarchical strategy
erythrocytes and leukocytes since erythrocytes are devoid of with single-stage SVM. We first combine the features to
nucleus. For leukocytes, we can use the feature “circularity” form a higher dimensional feature space in single-stage
to identify granulocytes and agranulocytes. In the following, SVM. As shown in Fig. 11, the hierarchical strategy
we use the feature “color of cytoplasm” to distinguish produces a better classification performance; in particular,
granulocytes into neutrophils, eosinophils, and basophils. the best recall rate is improved up to 56% in comparison to
Finally, monocytes and lymphocytes can be recognized by the method of single-stage SVM. In addition, the average
the feature “cytoplasm ratio.” recall rate of the hierarchical strategy outperformed the
The classification performance of the proposed method single-stage SVM method at 95.27% to 66.84%. These
by using hierarchical feature sets of blood cells is given in results prove that the hierarchical strategy is a reasonable
Fig. 10. Overall, nearly every class that is positive is classifier for blood cells classification.
correctly identified as such, with 95.3% average recall rate.
This means very few false negatives in the positive class. V. CONCLUSIONS
But, an eosinophil class given a positive classification is only This study demonstrated an efficient hierarchical blood
64% likely to be correct. Not so good precision leads to 36% cells classification method using the geometric features from
false positives for the positive label. The main reason for the the nucleus and the cytoplasm and a multi-class SVM
lower precision rate in the eosinophil class is that the classification scheme. Classification using the proposed
cytoplasm of some neutrophils presented a very weak hierarchical strategy outperformed classification using only
difference against that of eosinophils. Moreover, in certain the single-stage SVM because the cytoplasm of some
cases, the cytoplasm of some neutrophils had a complex leukocytes presents a very weak difference against the
texture or several different granules regions. The complex background and touches neighboring cells. In addition,
texture results in that some of neutrophils were mistaken for experimental results showed that using the hierarchical
eosinophils. multi-class SVM classification with hierarchical features
Precision rate Recall rate could indeed improve the classification performance
120.000 100 100 compared to the single-stage SVM method.
100 99 95 100 100 96 97 100
100.000 90 89 85
ACKNOWLEDGMENT
Percentage (%)

80.000
64
60.000
This work is supported in part under grant numbers NSC
99-2221-E-468-007, NSC 99-2221-E-024-010, NSC 99-
40.000
2221-E-468 -021, NSC 99-2632-E-468-001-MY3, and NSC
20.000 100-2221-E-024-020 from the National Science Council,
0.000 Taiwan, and Asia University. The views, opinions and/or
1 2 3 4 5 6 7 findings contained in this report are those of the authors and
Class should not be construed as an official National Science
Council position, policy or decision unless so designated by
Figure 10. The precision and recall for our hierarchical strategy. Numbers other documentation.
labeling graphs represent blood cells types: 1. erythrocytes, 2. neutrophils,
3. eosinophils, 4. basophils, 5. monocytes, 6. lymphocytes, 7.
thrombocytes. REFERENCES

Hierarchical strategy single-stage SVM [1] H. Ceelie, R. B. Dinkelaar, and W. van Gelder, “Examination of
peripheral blood films using automated microscopy; evaluation of
120.000
100 100 95 100 100 100 Diffmaster Octavia and Cellavision DM96,” Journal of Clinical
90 97
100.000 Pathology, vol. 60, no. 1, pp. 72-79, May 2007.
85
Recall rate (%)

71 [2] N. Theera-Umpon and S. Dhompongsa, “Morphological


80.000
50 62 granulometric features of nucleus in automatic bone marrow white
60.000 blood cell classification,” IEEE Transactions on Information
44 40
40.000 Technology in Biomedicine, vol. 11, no. 3, pp. 353-359, May 2007.
20.000 [3] H. Ramoser, “Leukocyte segmentation and SVM classification in
blood smear images,” International Journal of Machine Graphics &
0.000 Vision, vol. 17, no. 1, pp. 187-200, Jan. 2008.
1 2 3 4 5 6 7
[4] J. A. Gonzalez, I. Olmos, L. Altamirano, B. A. Morales, C. Reta, M.
Class C. Galindo, J. E. Alonso, and R. Lobato, “Leukemia identification
from bone marrow cells images using a machine vision and data
mining strategy”. Intelligent Data Analysis, vol. 15, no. 3, pp. 443-
462, May 2011.

135
[5] B. C. Ko, J. W. Gim, and J. Y. Nam, “Cell image classification based [10] B. E. Boser, I. Guyon, and V. Vapnik, “A training algorithm for
on ensemble features and random forest,” Electronics Letters, vol. 47, optimal margin classifiers,” Proceedings of the Fifth Annual
no. 11, pp. 638-639, May 2011. Workshop on Computational Learning Theory, Pittsburgh, PA, pp.
[6] K. Crammer and Y. Singer, “On the algorithmic implementation of 144-152, 1992.
multiclass kernel-based vector machines,” Journal of Machine [11] C. C. Chang and C. J. Lin, “LIBSVM : a library for support vector
Learning Research, vol. 2, pp. 265-292, 2001. machines,” ACM Transactions on Intelligent Systems and
[7] N. Otsu, “A threshold selection method from gray level histogram,” Technology, vol. 2, no. 3, pp. 27:1-27:27, 2011.
IEEE Transactions on Systems, Man and Cybernetics, vol. 9, no. 1, [12] B. Schölkopf and A. Smola, “Learning with kernels: support vector
pp. 62-66, Jan. 1979. machines, regularization, optimization, and beyond,” The MIT Press,
[8] J. Serra. “Image analysis and mathematical morphology,” Academic 2001.
Press, 1984. [13] C. W. Hsu and C. J. Lin, “A comparison of methods for multiclass
[9] L. Vincent and P. Soille, “Watersheds in digital spaces: an efficient support vector machines,” IEEE Transactions on Neural Networks,
algorithm based on immersion simulations," IEEE Transactions on vol. 13, no. 2, pp. 415-425, Mar. 2002.
Pattern Analysis and Machine Intelligence, vol. 13, no. 6, pp. 583- [14] “CellaVision,” http://cellavision.se/?sid=459
598, Jun. 1991. [15] “Microanatomy Web Atlas,”
http://www.cytochemistry.net/microanatomy/main.htm

136

S-ar putea să vă placă și