Sunteți pe pagina 1din 12

Exploring the Average Information

Parameters over Lung Cancer for Analysis


and Diagnosis

Vaishnaw G. Kale and Vandana B. Malode

Abstract Lung cancer seems to be a very common cause of death among the
people all over the world. Hence, accurate detection of lung cancer increases the
chance of survival of the people. The major problem with the treatment is the time
constraint in several physical diagnoses that increases the death possibilities so
basically this method is an approach to help the physicians to take more accurate
decision in this regard. This paper comes up with a method which is based on
average information statistical parameters using image processing for lung cancer
analysis. The basic aim is to help the physicians to take decisions regarding pos-
sibilities of lung cancer. Image averaging is a digital image processing technique,
which is mostly implemented to improve the quality of images that have been
degraded by random noise. The average information parameters are among the
statistical parameters that are implemented for lung cancer analysis, and hence,
some of the parameters like Entropy, Standard Deviation, Mean, Variance, and
MSE are considered in this paper. The selection of average information parameters
is thoroughly based on the calculation of number of iterations carried over the lung
images through the algorithm. This paper also successfully rejects null hypothesis
test by implementing ANOVA. The images are microscopic lung images and the
algorithm is implemented in MATLAB.

Keywords Average information ⋅ Statistical parameters ⋅ Lung cancer


ANN ⋅
ANOVA

V. G. Kale (✉)
Department of Electronics & Telecommunication, Dr. Vithalrao Vikhe Patil College
of Engineering, Ahmednagar 414111, Maharashtra, India
e-mail: vaishnaw25@rediffmail.com
V. B. Malode
Department of Electronics & Telecommunication, Jawaharlal Nehru Engineering College,
Aurangabad 431003, Maharashtra, India
e-mail: vandana_malode@yahoo.co.in

© Springer Nature Singapore Pte Ltd. 2019 605


H. S. Behera et al. (eds.), Computational Intelligence in Data Mining,
Advances in Intelligent Systems and Computing 711,
https://doi.org/10.1007/978-981-10-8055-5_54
606 V. G. Kale and V. B. Malode

1 Introduction

Lungs take care of proper functioning of the respiration of the human body. For
normal growth, cells in the lungs divide and reproduce at a controlled rate to restore
wounded tissues of the healthy body. Lung cancer [1, 2] develops, when cells inside
the lungs multiply at an uncontrollable rate. These abnormal tissues of the lungs
lead to cancer. Today there are many imaging techniques [1, 3] available with
radiologists and physicians for the diagnoses of lung cancer such as X-ray, Com-
puter Tomography (CT), High Resolution, Magnetic Resonance Imaging (MRI),
and Positron Emission Tomography (PET). But each technique has some advan-
tages with some shortcomings which do not give a complete assurance about the
lung cancer, and also, the case history of the patient becomes important at the time
of decision. Hence, there is a need of a method that could help the radiologists to
reach a perfect result. Besides these medical imaging techniques, one more method
that is implemented for lung cancer diagnosis is the lung biopsy [4]. Medical
Imaging techniques are used to find out whether the cancer has spread over the
lungs or not, but it lacks in accurate lung cancer diagnosis. A biopsy is a process in
which small amount of lung tissue is taken for examination under electron
microscope. Besides biopsies and surgical operations, imaging techniques are very
important in the analysis of lung cancer. However, no test is ideal, and no scan can
diagnose lung cancer, but biopsy can do that. But again, biopsy has some draw-
backs that include difficulty in breathing, excessive bleeding, oozing out and also
there is always a chance of spreading of cancer cells in the lungs as well as other
parts of the body, due to the removal of small part of tissue and hence considered as
the last option for the cancer diagnosis. It is often suggested when no other scan
works.
The microscopic lung image is considered here for the statistical analysis which
is obtained through biopsy taken through electron microscope [5], which is a
powerful microscope that allows the researchers to view the specimen of the lung at
nanoscale level. A small piece of lung tissue is taken, entrenched in paraffin, cut
thin, placed on a glass slide, and then reagent is used in treating a specimen for
microscopic examination. The resulting preparations are examined under micro-
scope for lung cancer analysis. The images that are obtained through this process
are called as microscopic lung images as seen in Fig. 1. The magnification of these
images can be up to 400 times or even more which is very useful for the medical
analysis. It is very difficult to visualize the microscopic images and take decisions
as it may go wrong in number of cases, so it requires a robust method. Image
processing with MATLAB is very useful in handling the microscopic lung images.
Exploring the Average Information Parameters … 607

Fig. 1 Microscopic lung image

2 Methodology

The methodology used here is the extension of the algorithm used in [6], in which the
statistical parameters used were Entropy, Standard Deviation, and texture factor for
lung cancer analysis and diagnosis. These parameters were used to differentiate lung
cancer from other lung diseases, as well as for lung cancer analysis. This method adds
some more parameters into the analysis in order to improve the performance. This
algorithm concentrates only on lung cancer analysis and diagnosis. In order to
understand the methodology, the flow diagram of the algorithm in image processing
needs to be understood and the parameters that are included in average information
method. The selection of the parameters under this method is based on their average
calculation principle used for the analysis. The parameter selection may vary method
to method depending on the applications. Here the statistical parameters used are
Entropy, Mean, Variance, Standard Deviation and Mean square Error. The input is
the microscopic lung image, which is first normalized by resizing and then converting
it into grayscale image. The quality of these images have been tested and verified.
These images have been properly differentiated into cancerous and noncancerous
microscopic lung images. The image of microscopic lungs is resized to 255 * 255
which is maintained throughout the implementation. Median filter is one of the best
filters used to denoise such kind of medical images due to nonlinear nature of the
noise. These images are having lots of variations in terms of pixel intensities and
hence are not perfect for the processing, hence histogram equalization is applied for
the image enhancement. Now the image is ready for the further processing, which
involves implementation of average information method, finding out the similarities
through correlation method and then finally the image classifier. The average
608 V. G. Kale and V. B. Malode

information method is the statistical analysis that is carried out for lung cancer
analysis. This analysis with image classifier together is used for the lung cancer
diagnosis. The statistical analysis is used for lung cancer analysis, and averaging
information method is one of the statistical methods used in this paper.

2.1 Statistical Analysis

Structural and statistical analysis is the subject of concerned for this method. As the
image to be processed is a microscopic lung image, statistical analysis is considered,
which can reveal the important information of the image. Statistical analysis [6, 7] is
actually the analysis of random data. It does not try to understand the structure of
image but provides their deterministic properties, which give the relationship between
gray levels of an image. In this paper, the random data is nothing but the random
pattern of the lung cancer. In order to analyze this random data, it is necessary to
analyze its statistical properties. As no specific tools are available to process this
random data, statistical analysis is one of the best solutions for the lung cancer analysis
and diagnosis. They are demonstrated to feature a potential for the effective structure
discrimination or disorder in the biomedical images. This type of analysis is done
through statistical analysis of the microscopic lung images. There are some important
statistical and mathematical parameters in this concerned, which are considered in this
paper. These parameters can be analyzed against cancerous microscopic lung images
to get an appropriate range for the lung cancer analysis. The identified range is
obtained through number of iterations carried out for the specific image database only.

2.1.1 Average Information Method

The method is based on averaging of the intensity values for each pixel position in the
image. Each scanned image has two components: One is constant signal component
and the other is random noise component. In the averaging process, the signal
component remains unchanged, but the noise component varies from frame to frame.
Because the noise is random, it tends to cancel out while performing the summation.
When the averaged image is computed, the image signal component has lot of
influence over the summation as compared to the noise component. Based on the
same principle, all the statistical parameters under this are selected. The study of these
identified parameters helps us to analyze the cancerous as well as noncancerous lung
images. The statistical parameters under average information considered are
(i) Entropy
It is an average information of the image. The lowest value of the Entropy
means no uncertainty. It is zero if the event is sure or impossible, that is, E = 0 if
P = 0 or 1. Entropy is supposed to be high throughout the image [6–8] and is
calculated from Eq. (1)
Exploring the Average Information Parameters … 609

m n
E = − ∑ ∑ P½x, y log P½x, y ð1Þ
x y

(ii) Mean
It calculates the mean of the gray levels in the image [6–8]. Mean is the most
important and basic parameter of all statistical measures. The mathematical
expression from Eq. (2) is used to calculate the mean of an image.

M N
μ = 1 ̸ N * M ∑ ∑ P½x, y ð2Þ
x=0 y=0

(iii) Variance
Variance [6–8] explains the distribution of gray levels over the image. The value
of the Variance is expected to be high, if the gray levels of the image are spread out
extensively. The formula for the variance used is shown in Eq. (3).

1 1
f ðx, yÞ = ∑ ½gðr, cÞ − ∑ gðr, cÞ2 ð3Þ
mn − 1 ðr, cÞeW mn − 1 ðr, cÞeW

(iv) Standard Deviation


Standard Deviation indicates a lot of variations that appears from the average
value of the image which has the potential for measuring the variability in the
image. The value of Standard Deviation is assigned to the center pixel of the image,
which is calculated from Eq. (4). It is the square root of the variance [6–8].
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 1
f ðx, yÞ = ∑ ½gðr, cÞ − ∑ gðr, cÞ2 ð4Þ
mn − 1 ðr, cÞeW mn − 1 ðr, cÞeW

(v) Mean Square Error (MSE)


MSE represents the averaging of the squares of the errors between the two
images [9]. The error is the amount by which the values of the reference image
differs from the test image. It is actually the image quality measuring parameter.
The mathematical expression for MSE is given in Eq. (5).

m−1 n−1
MSE = ∑ ∑ kf ði, jÞ − gði, jÞk2 ð5Þ
0 0
610 V. G. Kale and V. B. Malode

2.2 Correlation

Correlation is also a statistical technique, which shows how variables are robustly
related with each other. It extracts the necessary information from an image. It is
used to find the location in an image that is analogous to the reference image.
Reference image is slid around the image to find the location, where the template
overlaps the reference image to get aligned with similar values in the image.
Correlation is a measure of gray level linear dependence between the pixels at the
specified positions relative to each other [10].

G − 1 G − 1 fi × jg × Pði, jÞ − fμ × μy g
x
correlation = ∑ ∑ ð6Þ
i=0 i=0 σx × σy

From Eq. (6), a correlation is calculated between the parameter values obtained
by average information method and reference parameters of the noncancerous lung
images, which is then given to image classifier for lung cancer diagnosis. An
intelligent correlation analysis can help for better understanding of the image data
as it finds the similarity between the two images.

2.3 Image Classifier

Neural Network [11, 12] is the method used as an image classifier in this paper for
lung cancer diagnosis. The various values obtained for different statistical param-
eters under Average Information Method for cancerous and noncancerous micro-
scopic lung images overlaps, which make it difficult to take a decision whether
image is infected or not, hence Neural Network as an image classifier is used as a
decision maker for the lung cancer diagnosis. Basically input–output pairs which in
this case are the parameter values obtained through the algorithm and the desired
output is the training data provided to ANN to build a network for generalization in
order to diagnose new unseen cases of cancer, which is not present in the training
data. Few parameter values for cancerous and noncancerous lung images goes
beyond the specific calculated range, hence ANN is used to resolve this issue.

2.4 Standard Statistical Method

There are various standard statistical methods used in the image processing [13].
Analysis of variance (ANOVA) [14, 15] is used in this paper, which is a collection
of statistical models used to analyze the differences among group means and their
associated procedures (such as “variation” among and between groups), developed
by statistician and evolutionary biologist, Ronald Fisher. In the ANOVA setting,
Exploring the Average Information Parameters … 611

the observed variance in a particular variable is partitioned into components attri-


butable to different sources of variation. In its simplest form, ANOVA provides a
statistical test of whether or not the means of several groups are equal and therefore
generalizes the t-test to more than two groups. ANOVA is useful for comparing
(testing) three or more means (groups or variables) for the statistical significance.

3 Proposed Method

The proposed method is implemented using image processing algorithm in


MATLAB. The flow of the algorithm is as follows
1. The input or test image is a microscopic lung image.
2. The reference image is a healthy lung image.
3. The image database is pre-verified by the radiological experts as cancerous and
noncancerous lung image.
4. Image is preprocessed by resizing it to 255 * 255, converted into grayscale
image and then enhanced using a median filter.
5. The enhanced image is passed through the identified statistical parameters
under average information method.
6. The statistical parameters of both test image and reference image are correlated
using the correlation method.
7. The statistical parameters of both test image and reference image are correlated
using the correlation method.
8. Similarities of both images are identified, but still there will be some values
which falls beyond the calculated statistical parameter range.
9. Neural Network is used as a decision maker which classifies the test image into
cancerous or noncancerous lung image.
10. ANN train and test images for lung cancer diagnosis.
11. Increased image database and input parameters has lead to an improved result.
12. Hypothesis test is also carried out by implementing the analysis of variance
(ANOVA), which is one of the standard statistical methods.

4 Results and Discussion

The analysis of lung cancer is discussed in this section includes the actual results of
the average information method and verified through a standard statistical method.
612 V. G. Kale and V. B. Malode

4.1 Results of Average Information Method

Statistics involves a discrete set of data that is characterized by Entropy, Mean,


Variance, Standard Deviation, and MSE. The average information method is
applied over predetermined cancer-infected microscopic lung images. With these
calculations, specific range of each average information parameter has been iden-
tified for the analysis and diagnosis of lung cancer. The identified range of each
parameter for cancerous lung is calculated as shown in Table 1.
The average range as observed in Table 1 is the statistical parameter value
calculated for the current image. The image database is increased to 323 micro-
scopic lung images including both cancerous as well as noncancerous. These
images are already been pre-verified from radiological experts. Also, the parameters
under average information are more in this paper for the analysis as compared to [6]
in order to increase the accuracy of the method. The algorithm is tested when image
database is increased to 323 microscopic lung images including both cancerous as
well as noncancerous through which statistical parameter range is obtained that can
be observed in Table 1.
In this paper, some more input parameters are added like Mean, MSE, and
Variance as only few parameters are not enough to reach to any decision. Now the
specific range is calculated for all the parameters when applied over the cancerous
lung images. When new image is tested with these parameters and if the values of
these parameters lie under the above-mentioned range as seen in Table 1, the
decision regarding cancerous or noncancerous is taken. But suppose if some of the
values overlap for the image, then the final decision is taken by ANN based on how
many parameters lie in the range for the cancerous lung image.
It is also clear that the input parameter range increases as the number of itera-
tions on the image increases with increase in image database. This increased range
helps to improve the performance of the algorithm. Now the identified range is used
for automatic run-time analysis and diagnosis of lung cancer, that is, without any
manual interference.
Tables 2 and 3 show the calculations of parameter values for some of cancerous
and noncancerous microscopic lung images. When these calculations are carried out

Table 1 Identified Range of average information parameters for cancerous lung image
Average Minimum to maximum Average Range for Range for
information value for cancerous range cancerous lung noncancerous lung
parameters lung from graph from graph
Mean 102–123 112.50 130.96 226.37
Standard 50–59 54.50 58.60 49.64
deviation
Mean square 55–140 97.50 155.085 232.947
error
Variance 2.46 * 103–3.64 * 103 3.05 * 3.306 * 103 2.383 * 103
103
Entropy 7.02–7.62 7.32 7.76 5.68
Exploring the Average Information Parameters … 613

Table 2 Parameter values for noncancerous lung images


Average information Noncancer Noncancer Noncancer Noncancer Noncancer
parameters image 1 image 2 image 3 image 4 image 5
Mean 187.36 222.52 217.89 158.44 186.78
Standard deviation 40.08 30.05 36.75 40.03 45.91
Mean square error 235.92 244.05 176.21 228.7 199.01
Variance 1.567 * 103 866.20 1.121 * 103 1.540 * 103 2.072 * 103
Entropy 6.12 6.39 7.00 6.24 7.28

Table 3 Parameter values for cancerous lung images


Average information Cancer Cancer Cancer Cancer Cancer
parameters image 1 image 2 image 3 image 4 image 5
Mean 186.783 132.90 159.83 169.28 140.52
Standard deviation 32.68 48.554 52.813 51.21 43.69
Mean square error 220.78 118.89 162.88 179.72 136.92
Variance 1.024 * 2.308 * 2.426 * 2.465 * 1.841 *
103 103 103 103 103
Entropy 5.96 7.51 7.56 7.48 7.47

by applying the algorithm over the large image database, a range for all the
parameters under average information method is obtained. This range helps to dif-
ferentiate an image as cancerous or noncancerous lung image. Although it is not easy
as it looks because some of the parameter values overlap and seems to be similar for
both cancerous and noncancerous lung images, hence this confusion is eliminated by
Artificial Neural Network which trains and tests the images for number of iterations.
The next thing is to calculate the accuracy of the algorithm for which 323 images
are tested and the accuracy of the method is calculated as 68.42%. The accuracy of

Fig. 2 Comparative analysis graph for a noncancerous and b cancerous lung


614 V. G. Kale and V. B. Malode

the algorithm is calculated based on how many images are correctly diagnosed as
cancerous and noncancerous. As the image database is already been verified, it is
compared with pre-diagnosis results. The average information method correctly
diagnosis 221 images including cancerous and noncancerous out of 323 micro-
scopic lung images.
Figure 2 shows the graph of statistical parameters versus index level. Run-time
graph generation shows the impact of average information parameters on lung
cancer diagnosis. The graphs as shown in Fig. 2a, b is a plot for the index level
versus statistical parameters, which gives an idea about the variations in statistical
parameter index level according to cancerous and noncancerous microscopic lung
images. One can easily now differentiate the microscopic image as cancerous and
noncancerous by observing the current graph. With subjective analysis, the graphs
are having its own impact on lung cancer diagnosis.

4.2 Results of ANOVA

In the proposed system, 5 groups are considered according to the used parameters.
ANOVA is applied over the proposed system and the important calculations found
are as follows:
Total sum of squares (TSS) = 66050543
Sum of squares between the groups (SSB) = 5.68E + 07
Sum of squares within the groups (SSW) = 9.25E + 06
F ratio = SSB/SSW
F (4, 45) = 69.1, p < 0.05(p = significance factor)
Critical value = 2.61 (approximately according to F-Distribution table for F (4, 45))
F (4, 45) is relative frequency
F test value > Critical value, which can be observed from Fig. 3, i.e.,
69.1 > 2.61, hence the proposed method successfully rejects null hypothesis.

Fig. 3 Critical value calculation on F-distribution


Exploring the Average Information Parameters … 615

5 Conclusions

Among numerous average information parameters, selective parameters are iden-


tified for lung cancer analysis and diagnosis. For the selection of statistical
parameters, iteration method over predetermined lung cancer microscopic images is
used. These statistical parameters under average information method have the
ability to work effectively for lung cancer diagnosis. These parameters are tested
and verified on microscopic lung images including cancerous and noncancerous
lung images using image processing techniques with MATLAB. Out of 5 statistical
parameters, Variance has shown good statistical response for cancerous lung
images. Artificial Neural Network as an image classifier plays an important role in
decision making, which decides whether the current image is cancerous or non-
cancerous and can be also observed through ANN performance graph. The result
shows that accuracy improves with number of trained images, which shows that
ANN works well as an image classifier for the proposed method. Also, the method
is tested by one of the standard statistical method ANOVA, which successfully
rejects null hypothesis. The accuracy of the method comes out to be 68.42% which
is calculated on the basis of how many images are diagnosed correctly. This means
that the proposed method is working satisfactorily, but still requires some more
methods in addition, which could be a hybrid combination of mathematical, sta-
tistical and structural method or could be some new statistical or mathematical
parameters that could fill up the gap that appears in this method inorder to improve
the performance and accuracy of the algorithm. But surely this method is one of the
major revolutionary steps toward the medical research field for lung cancer.

References

1. Joes Vilar, “Breathe Easy”, How Radiologic Helps To Find and Fight Lung Diseases,
European Society of Radiologic. Chapter 1.
2. J. B. Walter & D. M. Pryce, “The histology of lung cancer” PMC, US National Library of
Medicine, National Institute of Health, pp. 107–116.
3. Kale Vaishnaw G., “Imaging Techniques for Lungs Analysis”, International Journal of
Scientific & Engineering Research (IJSER), Vol. 5, Issue 4, April 2014, pp. 1–4.
4. Muhammad Qurhanul Rizqie, Nurul Shafiqa Mohd Yusof, Rino Ferdian Surakusumah, Dyah
Ekashanti Octorina Dewi, Eko Supriyanto and Khin Wee Lai, “Review on Image Guided
Lung Biopsy”, IJN-UTM Cardiovascular Engineering Center, Springer Science and Business
Media Singapore 2015, pp. 41–50.
5. Vaishnaw Gorakhnath Kale, “An Overview of Microscopic Imaging Technique for Lung
Cancer & Classification” International Journal of Innovation in Engineering, Research
and Technology [IJIERT], ICITDCEME’15 Conference Proceedings, ISSNNo-2394-3696,
pp. 1–4.
6. Kale Vaishnaw G., Vandana B. Malode, “New Approach of Statistical Analysis for Lung
Disease Diagnosis using Microscopy images” IEEE-2016, pp. 378–383.
616 V. G. Kale and V. B. Malode

7. K. Punithavathy, M.M. Ramya, Sumathi Poobal, “Analysis of Statistical Texture Features for
Automatic Lung Cancer Detection in PET/CT Images”, International Conference on Robotics,
Automation, Control and Embedded Systems–RACE2015.
8. Narain Ponraj, Lilly Saviour, Merlin Mercy, “Segmentation of thyroid nodules using
watershed segmentation”, Electronics and Communication Systems (ICECS), 2nd Interna-
tional Conference on, IEEE-2015.
9. Kale Vaishnaw G, “Lung Cancer Analysis by Quality Measures” International Journal of
Modern Trends in Engineering and Research, Vol. 3, Issue 4, April 2016, Special Issue of
ICRTET’2016, pp. 738–741.
10. David Jacobs, “Correlation and Convolution” Tutorial for CMSC 426, pp. 1–10.
11. Monica Bianchini and Franco Scarselli, “On the Complexity of Neural Network Classifiers: A
Comparison between Shallow and Deep Architectures”, IEEE Transactions on Neural
Networks and Learning Systems, Vol. 25, No. 8, August 2014.
12. K. Balachandran, R. Anitha, “An Efficient Optimization Based Lung Cancer Pre-Diagnosis
System with Aid of Feed Forward Back Propagation Neural Network (FFBNN)”. Journal of
Theoretical and Applied Information Technology 20 Oct 2013 Vol. 56 No. 2.
13. Jay L. Devore, Kenneth N. Berk, Modern Mathematical Statistics with Applications, ©
Springer Science+Business Media, LLC 2012.
14. K.elkourd, “Detect the Tumor with Numerical Analysis and With “ANOVA” Technique for
MRI Image”, International Journal of Engineering Issue 1, July 2013. ISSN: 2277-3754 ISO
9001:2008.
15. El. kourd Kaouther, Seif eddine Khelil, Saleh Hammoum, “Study With RK4 & ANOVA The
Location Of The Tumor At The Smallest Time for Multi-Images” IEEE-2015.

S-ar putea să vă placă și