Medical Image Analysis Using Texture Analysis

Medical Image Analysis Using Texture Analysis
Neil MacEwen
Supervisor: Dr. W. Nailon 19th April 2004

Medical Image Analysis Using Texture Analysis Neil MacEwen
DECLARATION
I declare that this report is entirely the result of my own work under the
supervision of Dr. W. Nailon.
Neil MacEwen
19th April 2004
2
CONTENTS
CONTENTS........................................................................................................................ 3
ABSTRACT........................................................................................................................ 5
ACKNOWLEDGEMENTS................................................................................................ 6
TABLE OF FIGURES........................................................................................................ 7
LIST OF TABLES.............................................................................................................. 8
1. INTRODUCTION ..................................................................................................... 9
2. PATTERN RECOGNITION .................................................................................... 10
2.1 Feature generation............................................................................................ 11
3. REVIEW OF PREVIOUS PROJECT ...................................................................... 12
3.1 Texture Analysis ............................................................................................... 12
3.1.1 First-order algorithm................................................................................. 12
3.1.2 Second-order algorithms........................................................................... 13
3.2 Algorithm implementation...................................................................................... 15
4. CLASSIFICATION .................................................................................................. 16
4.1 A simple one-dimensional classification example........................................... 16
4.2 Classifier design................................................................................................ 18
4.3 Classifiers.......................................................................................................... 19
4.3.1 Statistical Approach .................................................................................. 19
4.3.2 Decision Functions.................................................................................... 20
4.3.3 Distance functions and clustering ............................................................. 22
4.3.4 Fuzzy logic classifier ................................................................................ 23
4.3.5 Artificial Neural Net (ANN) classifier ..................................................... 23
5. FEATURE REDUCTION .................................................................................... 25

3
6. SEGMENTATION ............................................................................................... 27
7. CLASSIFIER DESIGN AND CODING .............................................................. 28
7.1 Data Set............................................................................................................. 28
7.2 Feature vector generation.................................................................................. 29
7.3 Classifiers.......................................................................................................... 30
7.3.1 Euclidean distance classifier ..................................................................... 30
7.3.2 Mahalanobis classifier .............................................................................. 30
7.3.3 Neural Network classifier ......................................................................... 31
7.4 Classifier testing................................................................................................ 31
7.4.1 Initial performance.................................................................................... 31
7.4.2 Normalisation............................................................................................ 33
7.4.3 Classifier performance for a 2-dimensional data set................................. 34
7.4.4 Neural Network ‘spread’ variation ........................................................... 37
7.4.5 Classifier performance at different sizes................................................... 38
7.4.6 Neural Network “spread” variation at different image sizes .................... 39
8. SEGMENTATION OF A TEXTURE IMAGE .................................................... 40
9. FEATURE SUBSET SELECTION.......................................................................... 42
10. CONCLUSION & FURTHER WORK ............................................................... 46
REFERENCES ................................................................................................................. 47
APPENDICES .................................................................................................................. 48
Appendix A – Texture features..................................................................................... 48
Appendix B – MATLAB code...................................................................................... 50
4
ABSTRACT
This main aim of this project was to validate the use of a set of 5 texture analysis
algorithms for the identification of textures in images.
The project built on a previous final year project that used the 5 algorithms to generate a
set of texture features describing an image. This report outlines the work done in
validating these algorithms. A set of classifiers were designed and tested, and results
showed that the algorithms can be used to successfully differentiate between different
texture types. The segmentation of a mixed texture image was undertaken to validate the
classifier performance results.
Feature reduction was investigated in order to reduce the amount of features needed to
classify successfully a texture image.
5
ACKNOWLEDGEMENTS
I would like to thank Bill for introducing me to this interesting topic, and for his support
and guidance throughout the year.
6
TABLE OF FIGURES
Figure 2.1: A basic scheme for pattern recognition......................................................... 10
Figure 3.1: Two textural images with their histograms and some first-order statistics... 13
Figure 3.2: An example of a greyscale MRI image and the graphical representation of its
textural features. See appendices A and B for feature details and code respectively. ..... 14
Figure 4.1: Two-dimensional feature space containing two classes. An incoming pattern
is assigned to the class corresponding to the region it falls in. ......................................... 16
Figure 4.2: The characters to be identified, 1 & 0, placed on a grid................................ 17
Figure 4.3: A histogram of features values. Anything falling to the right hand side of the
boundary is classified as a 0, anything to the left as a 1. .................................................. 18
Figure 4.4: (a) Linear decision function (b) Nonlinear decision function (both shown in
red) .................................................................................................................................... 20
Figure 4.5:Three linearly separable classes in R2, the decision boundary for a class Ci is
given by di(x) .................................................................................................................... 21
Figure 4.6: Three pairwise separable classes in R2, two decision boundaries are needed to
select each class. ............................................................................................................... 21
Figure 7.1: The four textures used for classifier development. ....................................... 28
Figure 7.2: Initial classifier performance results (% test vectors correctly identified).... 32
Figure 7.3: Graphical illustration of the normalisation process....................................... 33
Figure 7.4: Classifier performance after normalisation (% test vectors correctly
identified).......................................................................................................................... 34
Figure 7.5: Fisher’s iris data ............................................................................................ 35
Figure 7.6: Graphical representation of Fisher Data classification.................................. 36
7
Figure 7.7: Neural Net classifier performance for different spreads for 32 x 32 pixel
images ............................................................................................................................... 37
Figure 7.8: Average classifier performance over all four classes at different sizes ........ 38
Figure 7.9: Neural Net Classifier performance at varying values of spread.................... 39
Figure 8.1: Left: combination of the four Brodatz source textures. ............................ 40
Right: outline of combination image, showing classes............................. 40
Figure 8.2: Segmented Brodatz texture combination image............................................ 41
Figure 9.1: Classifier performances using subsets selected using the Bhattacharyya
distance ............................................................................................................................. 42
Figure 9.2: Classifier performances using subsets selected using forward and backward
selection procedures.......................................................................................................... 43
Figure 9.3: Classifier performances for each texture algorithm. ..................................... 44
LIST OF TABLES
Table 7.1: Number of smaller images extracted from source images at each resolution 29
Table 7.2: Classification of Fisher’s Iris Data before and after normalisation ................ 36
Table 9.1: Feature subset sizes created by using only individual texture algorithms....... 44
8
1. INTRODUCTION
Visual assessment of digital image information is often aided by computer-based
image analysis to remove any subjective bias. In a medical context for example, it can be
very difficult to distinguish between different clinical features present in a medical image,
such as between grey and white matter in a MRI scan. One type of computer-based
analysis that can be used is texture analysis, which can be used to identify different
clinical features using their texture properties.
This project is an extension to a previous final year project [1] that aimed to create
a robust image-viewing platform and investigate the use of advanced image analysis
strategies for assisting clinical diagnosis. Five texture analysis algorithms were used to
generate a set of 38 textural features describing an image, and this project will
consequently extend the previous work to analyse a gold-standard data set to validate the
use of these algorithms for the classification of different texture types.
9
2. PATTERN RECOGNITION
This project fits into the overall subject area known as pattern recognition.
Pattern recognition is the process of discriminating between (classifying) certain
observations. For example, given a group of a thousand we may want to discriminate
between four types of humans [2]; (a) tall and thin (b) tall and fat (c) short and thin (d)
short and fat. A classification process is therefore carried out on certain features
belonging to these persons, to put them into the correct class. A good choice of features
in this case could be for example (height, weight).
In any pattern recognition problem, features must be generated from observations.
This process is called feature generation. A subset of the selected features may then be
chosen for various reasons (see section 6); the selection of this subset is called feature
reduction. These features are then used as the input for a classifier, which will assign the
original object into a corresponding class. In the previous example, the classes are the
four types of human, the observations are all the observed qualities of each human (which
are almost limitless, such as age, employment etc), and the extracted features are height
and weight, drawn from the observations.
A basic scheme for pattern recognition is given in figure 2.1.
x Feature y Feature y' wi

Classifier
Generation Reduction
Figure 2.1: A basic scheme for pattern recognition

Where x – observation vector
y – feature vector
10
y’ – reduced feature vector
wi - selected class
In the specific case of this project, the observation vector is a digital image (i.e.
the pixel values), the feature vector is a vector of textural features, and the various classes
are texture types.
2.1 Feature generation
Feature generation is the process of selecting useful features from some
observation vector that describes the original object. In the specific case of this project
features must be extracted from medical images. There are many ways to extract features
from images [3], in this project the image is described using textural features extracted by
texture analysis.
11
3. REVIEW OF PREVIOUS PROJECT
The previous project [1] undertaken involved the textural analysis of medical
images in order to extract textural features to aid clinical diagnosis. Various algorithms
were explored which produced a total of 38 features (see appendix A). A Graphical User
Interface (GUI) was created in MATLAB [10] to allow simple graphical and textual
viewing of the feature values, although no analysis was undertaken.
3.1 Texture Analysis
Texture is an important characteristic that can be used to identify or describe an
image [4]. In a digital image, texture describes the relationship between the intensities of
neighbouring pixels (not necessarily adjacent). Texture can be examined in two ways,
structurally and statistically. The statistical approach was used [1]. One first-order
algorithm was used, producing nine features, and four second-order algorithms producing
the remaining twenty-nine.
3.1.1 First-order algorithm
The first-order algorithm simply studies the first-order probability distribution of
the pixel intensity values. The nine features calculated are detailed in appendix A. An
example of some first-order statistics is shown in figure 3.1.
12
Figure 3.1: Two textural images with their histograms and some first-order statistics [1]
3.1.2 Second-order algorithms
Four algorithms were based on second-order statistics. In each case an
intermediate matrix describing the digital image was created, from which the features
were then calculated. The four techniques were:
1. The Neighbourhood Grey Tone Difference Matrix (NGTDM)
- This approach is based on the characteristics of the Human Visual
System (HVS). The HVS tends to measure some basic properties of
visual data such as size, colour, shape and orientation, and then
classify the textures in terms of properties such as coarseness, contrast,
roughness, directionality etc [1]. The NGTDM technique returns 5
features as detailed in appendix A.
2. The Grey Level Run Length Matrix (GLRLM)
13
- This approach calculates the number and length of runs of different
grey level values in the image. The calculation is performed over
various directions. The GLRLM also returns five features as detailed
in appendix A.
The next two algorithms are based on the co-occurrence matrix. The co-occurrence
matrix represents the joint probability distribution of pairs of grey level intensities. [1]
3. The Spatial Grey Level Dependence Matrix (SGLDM)
- This approach considers the probability of finding various different
pairs of pixel values over certain distances and directions. The
SGLDM returns 14 features as detailed in appendix A.
4. The Grey Level Difference Method (GLDM)
- This approach examines the differences between pixel values at fixed
separations. The GLDM returns 5 features as detailed in appendix A.
An example of some textural statistics is shown in figure 3.2.
Figure 3.2: An example of a greyscale MRI image and the graphical representation of its textural
features. See appendices A and B for feature details and code respectively.
14
3.2 Algorithm implementation
The algorithms were implemented using MATLAB. For both the intermediate
matrix and texture feature calculations, computation was found to be very slow. To
optimise computational time calculations were performed in the C language, which is
more suitable for these types of algorithms [1]. A GUI viewer was also created, which
allows the user to view the input image and its numerical feature values. Regions of
interest (ROI) can also be selected, allowing the user to compare textural features for up
to four different regions.
15
4. CLASSIFICATION
To validate the texture algorithms textures were identified using the previously
described texture features; a classification process was therefore undertaken.
Classification is the process of categorising an object using certain features describing the
object. The features create a feature space in which all objects will lie and the aim of a
classifier is to identify the regions of the feature space taken up by each class. Thus when
a new feature vector is applied to the classifier it will be assigned to the corresponding
class according the region in which it fell, as shown in figure 4.1.
x2
Region 1
o pattern type 1
x x x pattern type 2
x x x
xx x oo
o oo o
o
Region 2
x1
Figure 4.1: Two-dimensional feature space containing two classes. An incoming pattern is assigned
to the class corresponding to the region it falls in.
4.1 A simple one-dimensional classification example
A simple example of classification could be character recognition [3]. In a two-
class case the classifier must differentiate between a 1 and a zero. The objects (the
numbers) are shown in figure 4.2. They have been placed on a grid so that an observation
vector can be found for each character.
16
1 2 3 4 5 1 2 3 4 5
6 6
25 25
Figure 4.2: The characters to be identified, 1 & 0, placed on a grid.

The observation vectors are given roughly below. The area of each region of the grid
covered by the letter gives each element of the observation vector.
⎡0⎤ ⎡0⎤
⎢0⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
. .
x1 = ⎢ ⎥ x0 = ⎢ ⎥
⎢0.2⎥ ⎢0.2⎥
⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥
⎢0⎥ ⎢0.2⎥
⎢ ⎥ ⎢ ⎥
⎢⎣ 0 ⎥⎦ ⎢⎣ 0 ⎥⎦
As the 0’s generally cover more area than the 1’s, a feature that will differentiate between
the two characters could intuitively be chosen to be the total area covered by each
character. The components of each observation vector are therefore summed to give the
feature vector y.
y1 = 3.8 y 2 = 9.1
A classifier can be designed by plotting a histogram of all the features obtained from a
“training” set, as shown in figure 4.3. The histogram for each class (1 and 0) is plotted on
the same line, and then a boundary can be visually applied, a character being classified to
a class depending on which side of the boundary its feature vector falls in.
17
6
Decision Boundary
5
1's
3
N
0's
0
y1
Figure 4.3: A histogram of features values. Anything falling to the right hand side of the boundary
is classified as a 0, anything to the left as a 1.
This classifier is obviously not ideal, and the error regions can be seen visually as
the portions of the 1 histogram to the right of the boundary, and the portions of the 0
histogram to the left of the boundary.
4.2 Classifier design
There are various considerations that must be taken in to account when
contemplating the use of a classifier [5]. A given pattern x has to be assigned to one of C
classes w1, w2,…., wc based on its feature values (x1, x2,…., xN). There is therefore an N-
dimensional feature space. The features have a density function conditioned on the
pattern class, thus a pattern vector x belonging to class wi is viewed as an observation
drawn from the class-conditional density p(x | wi).
The amount of information known about the class-conditional densities must be
considered. A parametric classifier assumes knowledge of the class-conditional densities.
However, even if the densities are not known a common approach is to estimate them
using a training set of patterns, as was highlighted in the previous simple example. That
is, a selection of feature vectors belonging to a single class are taken and an estimate is
made of the conditional density belonging to that class. There are also non-parametric
18
techniques that assume no prior knowledge of the classes. This is when it is impossible
to construct a classifier using training samples taken from a known class due to the
unavailability of labelled training samples. It may not even be known how many classes
there should be. In these cases cluster analysis is used to organize the training sets into
groups, or clusters, each corresponding to a class.
4.3 Classifiers
4.3.1 Statistical Approach
To use the statistical approach the class-distribution densities must be known or
estimated. The statistical approach is useful if there is an overlapping of class regions in
the feature space. A statistical classifier examines the risk involved with every
classification and attempts to measure the probability of misclassification. A well-known
statistical classifier is the Bayes classifier that is based on Bayes formula from probability
theory and minimises the total expected risk, the classifier is thus an optimum classifier.
The Bayes classifier calculates the posterior probability of a pattern being in each class,
and assigns it to the class that gives the largest probability. A simple Bayesian decision
rule in a two-class case could be the following [3]:
p x|w (x | w1 ) Pr(w2 )
If l ( x) ≡ 1
> choose w1
p x|w (x | w2 )
2
Pr(w1 )
Pr( w2 )
< choose w2
Pr( w1 )
Where l(x) is called the likelihood ratio
p x|w ( x | wi ) is the conditional density function for class i evaluated at x

i
19
4.3.2 Decision Functions
When the number of classes is known and the training patterns produce
geometrically separated classes, decision functions can be used to classify an incoming
pattern. For a two-class example, where two classes C1 and C2 exist in Rn and a
hyperplane d (x) = 0 separates their patterns, the decision function d (x) can be used as
a linear classifier.
d (x) > 0 ⇒ x ∈ C1
d ( x) < 0 ⇒ x ∈ C 2
The hyperplane d (x) = 0 is called the decision boundary. In some cases where the
classes cannot be separated by linear decision functions a nonlinear classifier can be
created using generalised decision functions, or the feature space can be transformed into
a much higher dimension where linear decision boundaries can be used. Examples of
linear and nonlinear decision functions are shown in figure 4.4.
x2 x2
Region 1 x Region 2
o pattern type 1 x x
x x x pattern type 2
x x x oo
xx x oo oo o x1
o oo o o
o x x
Region 2 x
x
x1
Region 1
(a) (b)
Figure 4.4: (a) Linear decision function (b) Nonlinear decision function (both shown in red)
In general, however, there are m pattern classes {C1,C2,…,Cm} in Rn. If some
surface d (x) = 0, x ∈ R separates some class Ci from the remaining Cj, j ≠ i, i.e.
n
20
d (x) > 0 , x ∈ Ci
d (x) < 0 , x ∈ C j , j ≠ i
then d (x) is a decision function of Ci. This concept is illustrated in figure 4.5, which
gives an example of absolutely separable classes. Classes can also be pairwise separable,
which means that there is a possible linear decision boundary between each pair of
classes, as illustrated in figure 4.6.
d2(x) = 0
+
x2 -
C2
C3
-
d1(x) = 0
+
C1 + x1
-
d3(x) = 0
Figure 4.5:Three linearly separable classes in R2, the decision boundary for a class Ci is given by di(x)
d13(x) = 0
x2 -
+
C3
-
d23(x) = 0
+
C1
C2
x1
- +
d12(x) = 0
Figure 4.6: Three pairwise separable classes in R2, two decision boundaries are needed to select each
class.
21
4.3.3 Distance functions and clustering
If the training patterns form clusters, a distance function and clustering approach
can be used. This entails classifying an incoming pattern according to its proximity to
patterns of existing classes. Two ways of deciding to what class the incoming pattern
belongs are minimum-distance and nearest-neighbour classification [2]. Minimum
distance classification represents each class by prototype vectors, for example the class
mean. The simplest case is when the patterns of each class are very close to each other,
and each class can therefore be represented by a single prototype. If there are m pattern
classes in Rn, {C1,…,Cm}, represented by the prototype vectors y1,…,ym, then the
distances between an incoming pattern x and the prototype vectors can be defined as [2]:
( )
1
Di = x − y i = (x − y i ) (x − y i ) , 1≤ i ≤ m
T 2
x will be classified as Cj for which Dj is minimum, i.e.
D j = min x − y i , 1 ≤ i ≤ m
This calculates the minimum Euclidean distance to the class prototypes. The
Euclidean distance however does not take into account any correlation between features,
and thus an improvement is to use the Mahalanobis distance that takes into account the
class covariance matrices. This classifier is also occasionally known as the Gaussian
classifier. Given a class mean and covariance matrix ui and Wi respectively, the distance
is defined as [6]:
−1
Di = ( x − u i ) T Wi ( x − u i ) + ln Wi
The decision rule is therefore:
x ∈ CL when DL = min{D i }
22
When using the minimum-distance classifier a major problem is defining the class
prototypes. This is especially a problem if classes are split into several clusters. A
measure of similarity between patterns must be used, so that “similar” patterns can be
grouped together to form clusters. Clustering algorithms usually aim to optimise some
performance index, such as the sum of the distances between each pattern and its
corresponding cluster centre. Several clustering algorithms have been developed [2],
such as the c-Means Iterative algorithm (CMI), which iteratively updates each cluster
centre by replacing it with the mean of its samples, or the ISODATA algorithm, which is
another, more complex, iterative method.
Another classifier is the Nearest Neighbour classifier which classifies x to the
class corresponding to its nearest neighbour in the set of sample patterns. The Nearest
Neighbour classifier can also be extended to take into account the k nearest neighbours
[2].
4.3.4 Fuzzy logic classifier
Classification can be carried out using a fuzzy logic approach. This would result
in each incoming pattern being classified to every class with varying degrees of certainty.
This approach would not be useful for this project, as definite pattern identification must
be achieved, i.e. an incoming pattern must be classified into one single class.
4.3.5 Artificial Neural Net (ANN) classifier
In order to use the ANN classifier it must be assumed that a set of training
patterns and their correct classifications are a priori available. ANNs are based on the
23
functionality of the human brain. The human brain is made up of many neurons
connected together by synapses. ANNs are based on the same idea; they consist of
neurons connected by “weights”. Each neuron performs an operation on its input signal
to produce its output, and the weights simply multiply the signal by a fixed value. A
form of ANN that can be used for classification is the Probabilistic Neural Network
(PNN), which can be easily implemented using the MATLAB Neural Net toolbox [10].
The PNN consists of two layers, the first calculates the distance from an input vector to
the training vectors before the second determines the probabilities of the input being in
each class. Finally the input is classified according to the maximum of these
probabilities.
24
5. FEATURE REDUCTION
An optional step in the simple pattern recognition system is feature reduction.
Feature reduction is the process of reducing the dimensionality of the feature space. The
aim is to create a different set of features to either improve classifier performance or
reduce feature computation expense, which may be unnecessarily high due to irrelevant
features.
Feature reduction can be also known as feature extraction or feature selection.
Feature extraction is where a smaller number of new features are created from linear
combinations of the original features. The obvious drawback is that this means that the
same number of measurements must be taken in the first place. Feature selection is the
choice of a smaller subset of the original features. This can be an extremely
computationally expensive task to achieve, as if the problem is in high dimensions there
can be a huge number of possible subsets. There are however some other heuristic
methods of feature selection which, although perhaps not finding the “best” subset, will
find a reasonable subset [3]. All the methods require some form of quantifying the “best”
features, usually a measure closely related to the error rate of the resulting classifier if the
actual classifier performance cannot be evaluated.
Stepwise forward: this method first finds the single feature that maximises the
measure of “best”. Then another feature is selected which, coupled with the original,
again maximises the measure. A third feature is then chosen and this process continues
until a certain number of features have been found.
25
Stepwise backward: this method starts with all the features and, at each step,
removes the feature that maximises (or least reduces) the measure.
Full stepwise: this method combines both the previous methods to form a method
with the properties of both.
Another method [11] uses the Bhattacharyya distance to rank order the features in terms
of ‘relevance’ in separating the classes. The ranked features can then be used to create
subsets of chosen sizes. The Bhattacharyya distance for a single feature between two
classes a and b is shown below.
1 ⎧ 1 σ a2 σ b2 ⎫ 1 ⎧ (µ a − µb ) 2 ⎫
BD (a, b) = ln ⎨ ( 2 + 2 + 2)⎬ + ⎨ 2 2 ⎬
4 ⎩ 4 σ b σ ab ⎭ 4 ⎩ σa +σb ⎭
where µ j and σ j are the class mean and variance for class j.
26
6. SEGMENTATION
Segmentation is the problem of separating an image into regions. The goal of
many medical image analysis applications is to separate an image into regions defined by
different clinical features. For example this may be to define clear regions in an MRI
brain scan containing white matter or grey matter. This could provide an extremely
useful tool for assisting clinical diagnosis.
In this project segmentation provides a method of validating classifier
performances. A simple intuitive method of segmentation is as follows. The image is
split into smaller blocks, and each block analysed individually, and either given a
classification as one of the classes, or designated “unknown”. The unknown blocks are
then analysed in more detail, as they are likely to contain boundaries between classes. A
progressive “zooming-in” process is then undertaken in order to find the boundary lines
between classes. The image has therefore been split into small regions each containing a
single class, and a new image could be created showing clearly the distinctions between
the classes. The performance of this segmentation process would validate the
performance of a classifier in a real-life application.
27
7. CLASSIFIER DESIGN AND CODING
7.1 Data Set
The Brodatz texture database [8] was chosen for the development of a suitable
classifier as it provides a benchmark set of texture images. Four separate textures were
selected to provide four distinct classes; as illustrated in figure 7.1. Clockwise from the
top-left texture, they are as follows:
D19 – Woollen cloth
D55 – Straw matting
D93 – Hide of unborn calf
D92 – Pigskin
The ‘D’ identifiers refer to page numbers in Brodatz’s original publication.
Figure 7.1: The four textures used for classifier development.

The four images each of size 640 x 640 pixels were used as sources for each class;
i.e. they were broken up into smaller sized regions which were analysed individually.
28
7.2 Feature vector generation
Initially, various smaller images were extracted from the source images. Once the
numerous smaller images were extracted, they were divided into training and test sets, as
shown in table 7.1, so that the classifiers could be designed and tested. The number of
images extracted at each resolution is a function of (and limited by for the larger sizes)
the size of the source image.
Extracted image size Number of images Number of Number of test

(pixels) extracted training samples samples
8x8 400 320 80
16 x 16 400 320 80
32 x 32 400 340 60
64 x 64 100 70 30
128 x 128 25 15 10
Table 7.1: Number of smaller images extracted from source images at each resolution
The extracted texture images were then used to create feature vectors. Initially for
classifier development the study was carried out only on the intermediate-sized 32 x 32
pixel images. A feature vector containing values for each of the 38 textures features was
generated for each image, as was explained in section 3. The feature vectors were stored
in matrix form for ease of future accessibility, thus classifier development was carried out
on two matrices for each class, a 340 x 38 training matrix, and a 60 x 38 test matrix.
29
7.3 Classifiers
Three classifiers were chosen for examination; the simple minimum Euclidean
distance classifier, the Gaussian Mahalanobis distance classifier, and the probabilistic
neural network classifier contained within the MATLAB Neural Network Toolbox.
Neural Network classifiers and the Gaussian classifier are commonly seen in
texture literature, and they have been shown to give good results. The Euclidean distance
classifier was chosen to provide a simple, fast alternative to the other more complex
classifiers.
7.3.1 Euclidean distance classifier
As explained in section 4.3.3 in order to implement a minimum distance classifier
there must first be a choice made of class prototypes to represent the class. An incoming
feature vector is then classified to the class represented by the nearest prototype, ‘nearest’
established using the Euclidean distance. The simplest prototype that can be used is
simply the class mean. Class prototypes can also be generated using some clustering
algorithm. As a comparison to the class means, prototypes were also created using the c-
means algorithm, and a function was also written to attempt to match the c-means cluster
centres to their best-fit class. The c-means algorithm has various modifiable attributes,
and thus produced a wide variety of different results, from this point on only the best
result will be referred to. The c-means process was undertaken using a MATLAB
toolbox function kmeans.m, which allowed for various different versions of the algorithm
to be implemented. A Euclidean distance classifier was written in MATLAB.
7.3.2 Mahalanobis classifier

30
The Mahalanobis classifier is another minimum distance classifier, classifying an
incoming feature vector to the nearest class, ‘nearest’ established using the Mahalanobis
distance (see section 4.3.3). The Mahalanobis distance measures the distance between a
point in space and a data set, the classifier thus needs a priori class mean and covariance
values, which are of course available from the training matrix. A Mahalanobis classifier
was written in MATLAB.
7.3.3 Neural Network classifier
The MATLAB Neural Network toolbox was used to create a probabilistic neural
network (see section 4.3.5) using the training data, which was then used for classification
purposes. The classification was then carried out by inputting a test feature vector into
the neural net, which outputted the corresponding class. The Neural Net classifier
essentially operates as a k-nearest-neighbour classifier, examining a number of local
vectors and working out the probabilities of the test vector belonging to each class. This
is achieved by measuring the Euclidean distance from the test point to each of its
neighbours. The test vector is then allocated to the class corresponding to the highest
probability.
7.4 Classifier testing
7.4.1 Initial performance
As explained earlier, initial classifier testing was carried out on the 32 x 32 pixel
set of texture images with 340 training vectors for each class used to create a priori class
means and covariance matrices, and for constructing the neural net. The test vectors were
31
inputted one by one into each classifier, and each classifier’s performance was measured
by examining the amount of correct classifications it achieved. The initial results are
shown in figure 7.2. The figure shows the percentage of test vectors from each class that
120
100
Euclidean
distance(means)
80
Euclidean distance(c-
means)
60
Mahalanobis distance
40
Neural Net
20
0
D19 D55 D92 D93 average
Figure 7.2: Initial classifier performance results (% test vectors correctly identified)
were correctly classified, and finally the average performance for each classifier. In the
case of the c-means minimum distance classifier, the results shown are the best achieved
for all algorithm set-ups (see MATLAB documentation for further information) in order
to compare with the class means performance.
As can be see in figure 7.2, the classifiers produced widely varying results. The
best performance was from the Mahalanobis classifier, which obtained an overall
classification performance of 96.25% of test vectors correctly identified. It was also seen
that the c-means algorithm provided no great advantage over simply using the class
means as prototypes. This is because the c-means algorithm is essentially trying to find
cluster centres that minimise the total distance between each cluster centre and its
members, and thus the best result that can be found is in fact the class mean. The Neural
Net classifier produced very poor results.
32
7.4.2 Normalisation
The feature vectors that are created come from five different texture analysis
techniques, and thus there are significant range variations between some values in a
feature vector. This is illustrated in figure 7.3, in which the range of texture feature
values for a simple texture image can be seen to range over several orders of magnitude.
Figure 7.3: Graphical illustration of the normalisation process
A normalisation process was therefore carried out on all vectors used in order to
bring them all into the same range, as also illustrated in figure 7.3. The normalisation
undertaken was zscore normalisation (or standardisation), which allows for the
comparison and combination of measures made on different scales. The zscore of a
column vector x (feature in a matrix of observations) is as follows, and is measured in
units of standard deviation;
x−µ
Z=
σ
where µ and σ are the feature mean and standard deviation.
Figure 7.4 shows the classifier performance characteristics after normalisation.
33
120
100
Euclidean
distance(means)
80
Euclidean distance(k-
means)
60
Mahalanobis distance
40
Neural Net
20
0
D19 D55 D92 D93 average
Figure 7.4: Classifier performance after normalisation (% test vectors correctly identified)
The Euclidean distance classifiers underwent a significant improvement in
performance after normalisation, both increasing by around 10%. The performance of
the neural net classifier was completely transformed, returning a 98.75% success rate.
The Mahalanobis classifier however underwent a large decrease in performance, the
number of successful classifications almost halving. As the classifiers are working in a
38-dimensional space, it is very difficult to understand the different performances;
therefore an exploration of a 2-dimensional data set was carried out with the aim of better
understanding the classifiers.
7.4.3 Classifier performance for a 2-dimensional data set
The performance of each classifier was examined for a 2-dimensional data set, as
this allowed visual analysis of the data. Fisher’s Iris data set, contained within the
MATLAB environment, was used which consists of measurements on various features of
150 iris specimens, 50 of each of 3 types. The 2 features selected for analysis were sepal
34
length and petal width. Figure 7.5 shows a graphical representation of the data, before
and after normalisation, the black dots show the class means.
Figure 7.5: Fisher’s iris data

It can be seen that the setosa type of iris is well separated from the other two
types, and thus one would expect it to classify well. The other two iris types, versicolor
and virginica, are slightly overlapping, and thus one would expect some possible
misclassifications. In this case the normalisation process has a smaller effect as each
feature is measured on the same scale to begin with, however the means can be seen to be
slightly further apart.
The classifiers were again tested for the new data, every specimen was inputted to
each classifier in an attempt to find some correlation between the results given by the
classifiers and the spatial representation of the data. For the Euclidean distance classifier
the class means were used as class prototypes. The results of the classification are shown
in table 7.2. Figure 7.6 shows the results of the classification graphically. The circled
samples are those that were misclassified, the black corresponding to the Euclidean
classifier, magenta to the Mahalanobis classifier and cyan to the Neural Net classifier.
35
Euclidean Mahalanobis
Class Neural Net
Distance Distance
Before normalisation 100 100 100
Setosa
After normalisation 100 100 100
Versicolor
Virginica
Before normalisation 84.67 96.67 87.3
Average
After normalisation 86 96.67 89.33
Table 7.2: Classification of Fisher’s Iris Data before and after normalisation
Figure 7.6: Graphical representation of Fisher Data classification

As expected the setosa type classified very well, and the other classes contain
some misclassifications. Normalisation has no great spatial effect on the data, however it
still affects the classifier performances. Again the Euclidean distance and the Neural Net
classifiers have improved their performance. This time however the Mahalanobis
classifier underwent no change whatsoever.

36
The Neural Network and Euclidean classifiers both improve performance as they
are using the Euclidean distance, which can be adversely affected by being measured
over distorted scales. The Mahalanobis distance however measures its distance in units
of standard deviation from the class mean, and therefore the zscore normalisation has no
effect on Gaussian classifier performance.
7.4.4 Neural Network ‘spread’ variation
When designing the Neural Network classifier, a variable “spread” can be
defined. For initial classification testing spread was set to one, however the value of
spread affects the performance of the classifier. When spread is set to around 0, the
classifier behaves like a nearest-neighbour classifier. As spread increases, the classifier
takes into account several neighbouring vectors, and thus becomes a k-nearest-neighbour
classifier. Figure 7.7 shows the average Neural Net classifier performance for values of
spread varying from 0.1 to 2. It can be seen that the Neural Net classifier performed best
for 32 x 32 pixel images when the spread was between 0.8 and 0.9.
Figure 7.7: Neural Net classifier performance for different spreads for 32 x 32 pixel images
37
7.4.5 Classifier performance at different sizes
The classifier performances were tested at the image sizes other than 32 x 32
pixels. The results of the classifications are shown in figure 7.8. Neural Net and
Euclidean classifier performances were evaluated under normalised conditions, while the
Mahalanobis classifier was evaluated under non-normalised conditions. It must be noted
that evaluations at 64 x 64 and 128 x 128 pixel images were carried out on reduced data
sets.
Figure 7.8: Average classifier performance over all four classes at different image sizes
All three classifier performances dropped significantly at the smallest image size,
8 x 8 pixels. This is because it is hard to extract significant texture information from such
a small region. In general, all three classifiers performed best at 32 x 32 pixels, with the
performance characteristics generally dropping off to either side. The Neural Net
classifier returned overall the best performance. The general dip in performance at the
sizes greater than 32 x 32 can be attributed to the smaller data sets, and thus reduced
training sets available for classifier construction.
38
7.4.6 Neural Network “spread” variation at different image sizes
The variation of the Neural Net classifier’s performance according to spread was also
examined for the various image sizes. Figure 7.9 shows the performance of the classifier
at spread values between 0.1 and 2.
Figure 7.9: Neural Net Classifier performance at varying values of spread

It can be seen that the classifier’s performance varied quite considerably with spread, and
at all sizes the performance had a peak somewhere between 0.9 and 1.1. The figure also
illustrates very well the different performances at different image sizes, with the best
overall performance again being seen at the 32 x 32 image size.
39
8. SEGMENTATION OF A TEXTURE IMAGE
With the classifiers having now been designed and tested, an image made up from
the four previous Brodatz textures was segmented to verify the classifier performance
results seen previously and as a precursor to attempting to segment medical images.
Figure 8.1 shows the image created by combining sections of the source image,
and the classes that each section belongs to.
Figure 8.1: Left: combination of the four Brodatz source textures.

Right: outline of combination image, showing classes.
A segmentation process was coded in MATLAB, simply selecting a region of
interest (ROI), classifying the region, and creating a new image using a different colour
for each identified class. The combination image was created using 32 x 32 pixel blocks
taken from the source image, and as such performance using an ROI of greater than 32 x
32 pixels would be expected to be very poor.
Figure 8.2 shows the segmented images created using a ROI of 8, 16 and 32
pixels.
40
Figure 8.2: Segmented Brodatz texture combination image

The segmentation results confirmed the classifier performance results seen
previously. Using an 8 x 8 ROI the results were again generally quite poor, with a high
occurrence of misclassifications across all classifiers. Increasing the ROI size improved
the segmentation for all classifiers, and at 32 x 32 the Mahalanobis and Neural Net
classifiers segmented the image faultlessly.
41
9. FEATURE SUBSET SELECTION
As explained in section 5, feature reduction can be undertaken to reduce the
number of features used by the classifier. Calculating the 38 texture features was a very
time-consuming process, and therefore it was decided to investigate feature selection as a
means to reduce the amount of computation needed to create the input vectors for the
classifiers. As explained previously finding the optimum subset is an extremely onerous
task, thus techniques to find sub-optimal subsets were investigated. Again for
investigation purposes the 32 x 32 data set was used.
Firstly the rank ordering of the features using the Bhattacharyya distance was
carried out (see section 5). The resulting classifier performances for normalised and non-
normalised data are shown in figure 9.1.
Figure 9.1: Classifier performances using subsets selected using the Bhattacharyya distance
The results showed that a vastly reduced subset of features can be used for no
great loss in performance, or even in some cases an increase in performance. The
original 38-feature performances can be seen at the extreme right of each graph. It is
42
interesting to note that the Neural Net classifier achieved good performance using only 2
features of non-normalised data, and thereafter fell off to its usual poor performance. All
3 classifiers generally showed consistent performances across a wide range of subset
sizes for non-normalised data, however for normalised data subset size affected
considerably the performances. The Mahalanobis classifier also achieved good
performances with normalised data, unlike previously, which suggests that there were
only a few features that were significantly affecting its performance.
Performance at reduced subsets was also examined using the stepwise procedures
to select the subsets (see section 5). Both forward and backward algorithms were
investigated using both normalised and non-normalised data and using all 3 classifiers as
performance indicators. Figure 9.2 shows the results obtained using these procedures.
Figure 9.2: Classifier performances using subsets selected using forward and backward selection
procedures.
The results again showed that excellent performance could be achieved using
reduced subsets. The graphs show forward and backward selection results using each
classifier as the measure of performance. Moving from left to right across the graph the
forward algorithm adds a feature and the backward algorithm removes one. Thus at the
very left side of the graph, the forward subset contains one feature, and the backward
subset contains 37 features (one has been removed). Likewise at the extreme right of the
43
each plot the forward subset contains 38 features (showing original performances) and
the backward subset contains one feature (not necessarily the same feature as the forward
algorithm).
Again it was seen that normalisation has very little affect on the Mahalanobis
classifier, but for the others it resulted in a marked improvement in performance. It is
interesting to note the reverse characteristics of the forward and backward algorithms, for
example for the Euclidean classifier using normalised data the backward characteristic is
almost a mirror image of the forward characteristic.
The performance of each texture algorithm was also examined, thus producing 5
reduced subsets of varying sizes, as shown in table 9.1. These performances are shown in
figure 9.3.
Algorithm Size of feature subset
First order 9
NGTDM 5
GLDM 5
GLRLM 5
SGLDM 14
Table 9.1: Feature subset sizes created by using only individual texture algorithms
Figure 9.3: Classifier performances for each texture algorithm.

44
It can be seen that the GLRLM algorithm is on average the best performing for
both normalised and non-normalised data, it is also affected the least by normalisation.
These results again show that a much-reduced subset can be used for good classifier
performances.
45
10. CONCLUSION & FURTHER WORK
A set of classifiers was developed which were used to validate the use of the five
texture algorithms for texture identification. Results showed that the algorithms generate
features that can be used to classify images by texture, and a texture combination image
was successfully segmented.
The generation of the features was found to be time-consuming, therefore feature
reduction was examined, and good classification was achieved using reduced subsets of
the original 38 features.
As the algorithms have been successfully used to differentiate between different
textures, the next step of the project is to use the algorithms to identify clinical features in
medical images. This will be investigated as a double-diploma project extension by
examining the segmentation of MRI images.
46
REFERENCES
1. Anas Zirari, “Design of a medical image analysis platform”, Final Year
Project 2002/2003 , University of Strathclyde
2. Menahem Friedman, Abraham Kandel “Introduction to pattern recognition:
statistical, structural, neural and fuzzy logic approaches.” Imperial College
Press 1999.
3. Charles W. Therrien, “Decision estimation and classification: an introduction
to pattern recognition and related topics.” John Wiley & Sons, Inc, 1989.
4. Robert M. Haralick, K. Shanmugam, Its’hak Dinstein, “Textural Features for
Image Classification.” IEEE Transactions on Systems, Man and Cybernetics,
Vol.3, No. 6, Nov 1973.
5. Anil K. Jain, “Advances in statistical pattern recognition”, NATO ASI Series,
Vol F30. “Pattern Recognition Theory and Applications”, edited by P.A.
Devijver and J. Kittler, Springer-Verlag Berlin Heidelberg, 1987.
6. Xiaoou Tang, “Texture Information in Run-Length Matrices”, IEEE
Transactions On Image Processing, Vol. 7, No. 11, Nov 1998.
7. http://www.nd.com/welcome/whatisnn.htm
8. P. Brodatz. “Textures: A Photographic Album for Artists & Designers.” New
York: Dover, 1966.
9. http://www.nlm.nih.gov/research/visible/visible_human.html
10. http://www.mathworks.com
11. Abhir Bhalerao and Constantino Carlos Reyes-Aldasoro, “Volumetric Texture
Description and Discriminant Feature Selection for MRI”, ??????
47
APPENDICES
Appendix A – Texture features
First-order - mean f1
- variance f2
- skew f3
- kurtosis f4
- energy f5
- coarseness f6
- entropy f7
- median f8
- mode f9
NGTDM - coarseness f10
- contrast f11
- busyness f12
- complexity f13
- texture strength f14
GLDM - contrast f15
- energy f16
- entropy f17
- mean f18
- homogeneity/inverse difference moment f19
48
GLRLM - short run emphasis f20
- long run emphasis f21
- grey level distribution f22
- run length distribution f23
- run percentage f24
SGLDM - contrast f25
- energy f26
- homogeneity f27
- correlation f28
- entropy f29
- sum of squares variance f30
- sum average f31
- sum variance f32
- sum entropy f33
- difference variance f34
- difference entropy f35
- information measure of correlation 1 f36
- information measure of correlation 2 f37
- maximal correlation coefficient f38
49
Appendix B – MATLAB code
50

Medical Image Analysis Using Texture Analysis

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Medical Image Analysis Using Texture Analysis

Încărcat de

Drepturi de autor:

Formate disponibile

Medical Image Analysis Using Texture Analysis

Supervisor: Dr. W. Nailon 19th April 2004

supervision of Dr. W. Nailon.

19th April 2004

2. PATTERN RECOGNITION .................................................................................... 10

2.1 Feature generation............................................................................................ 11

3. REVIEW OF PREVIOUS PROJECT ...................................................................... 12

3.1 Texture Analysis ............................................................................................... 12

3.1.1 First-order algorithm................................................................................. 12

3.1.2 Second-order algorithms........................................................................... 13

3.2 Algorithm implementation...................................................................................... 15

4.1 A simple one-dimensional classification example........................................... 16

4.2 Classifier design................................................................................................ 18

4.3.1 Statistical Approach .................................................................................. 19

4.3.2 Decision Functions.................................................................................... 20

4.3.3 Distance functions and clustering ............................................................. 22

4.3.4 Fuzzy logic classifier ................................................................................ 23

4.3.5 Artificial Neural Net (ANN) classifier ..................................................... 23

5. FEATURE REDUCTION .................................................................................... 25

7. CLASSIFIER DESIGN AND CODING .............................................................. 28

7.1 Data Set............................................................................................................. 28

7.2 Feature vector generation.................................................................................. 29

7.3.1 Euclidean distance classifier ..................................................................... 30

7.3.2 Mahalanobis classifier .............................................................................. 30

7.3.3 Neural Network classifier ......................................................................... 31

7.4 Classifier testing................................................................................................ 31

7.4.1 Initial performance.................................................................................... 31

7.4.3 Classifier performance for a 2-dimensional data set................................. 34

7.4.4 Neural Network ‘spread’ variation ........................................................... 37

7.4.5 Classifier performance at different sizes................................................... 38

7.4.6 Neural Network “spread” variation at different image sizes .................... 39

8. SEGMENTATION OF A TEXTURE IMAGE .................................................... 40

9. FEATURE SUBSET SELECTION.......................................................................... 42

10. CONCLUSION & FURTHER WORK ............................................................... 46

Appendix A – Texture features..................................................................................... 48

Appendix B – MATLAB code...................................................................................... 50

algorithms for the identification of textures in images.

classifier performance results.

classify successfully a texture image.

and guidance throughout the year.

Figure 2.1: A basic scheme for pattern recognition......................................................... 10

is assigned to the class corresponding to the region it falls in. ......................................... 16

Figure 4.2: The characters to be identified, 1 & 0, placed on a grid................................ 17

boundary is classified as a 0, anything to the left as a 1. .................................................. 18

given by di(x) .................................................................................................................... 21

select each class. ............................................................................................................... 21

Figure 7.3: Graphical illustration of the normalisation process....................................... 33

Figure 7.4: Classifier performance after normalisation (% test vectors correctly

Figure 7.5: Fisher’s iris data ............................................................................................ 35

Figure 7.6: Graphical representation of Fisher Data classification.................................. 36

Figure 7.9: Neural Net Classifier performance at varying values of spread.................... 39

Right: outline of combination image, showing classes............................. 40

Figure 8.2: Segmented Brodatz texture combination image............................................ 41

Figure 9.3: Classifier performances for each texture algorithm. ..................................... 44

Visual assessment of digital image information is often aided by computer-based

clinical features using their texture properties.

use of these algorithms for the classification of different texture types.

Pattern recognition is the process of discriminating between (classifying) certain

observations. For example, given a group of a thousand we may want to discriminate

in this case could be for example (height, weight).

In any pattern recognition problem, features must be generated from observations.

and weight, drawn from the observations.

A basic scheme for pattern recognition is given in figure 2.1.

x Feature y Feature y' wi