Documente Academic
Documente Profesional
Documente Cultură
The Diagnosis of
Coronary Artery Disease (CAD)
Steve Iduye
Xiaoqing Zhuang
HINF 6210 Data Mining
Contents
Coronary Heart Disease in a Nutshell
Description of the Datasets
Case 1
Case 2
Case 3
Discussion
Conclusion
Case 1
The case study investigates the risk factors which contribute to
Dataset
Prior Setting
Rules with confidence levels above 90%, with accuracy levels above
Apriori Rules
Apriori Rules
Summary: Case 1
Four of the five rules attributed for the healthy class indicates
female gender on this particular dataset, have more chance of
being free from coronary heart disease.
Also, the results shows that when exercise induced angina (chest
pain) was false, it was a good indicator of a person being healthy,
irrespective of gender (exercise induced angina = false has
appeared in the LHS of all the high confidence rules).
The number of coloured vessels being zero and thal (heart status)
being normal were also shown to be good indicators of health.
Case 1 Summary
Rules mined for the sick class, on the other hand, showed that
chest pain type being asymptomatic and thal being reversed were
probable indicators of a person being sick (both the high confidence
rules have these two factors in LHS).
EXERCISE_INDUC
ED_ANGINA
NO_VESSEL_COLO
RED
THAL(HEART
STATUS)
Female
Failed
Normal
Female
Failed
Female
Failed
Female
Failed
M or F
Failed
CLASS
Healthy (no_CAD)
False
Healthy(no_CAD)
Healthy (no_CAD)
Normal
0
FASTING
BLOOD
SUGAR
Normal
False
Healthy (no_CAD)
Healthy (no_CAD)
SLOPE
asymptomatic
flat
asymptomatic
EXERCISE
INDUCED
ANGINA
true
THAL(HEART
STATUS)
CLASS
reversible defect
Unhealthy
(CAD)
reversible defect
Unhealthy (CAD)
C. Rules
If {Sex = female \ fasting_blood_sugar = fal \
exercise_induced_angina = fal \ thal = norm} => Then, no CAD
If {Resting_blood_pres less or = (115.2, 136.4] \
exercise_induced_angina = fal \ number_of_vessels_colored = 0 \
thal = norm} => Then, no CAD
If {Sex=female \ exercise_induced_angina = fal \
number_of_vessels_colored = 0} => Then, no CAD
C. Rules
If {Chest_pain_type = asympt \ slope = flat \ thal = rev} => Then,
CAD is present
If {Chest_pain_type=asympt \ exercise_induced_angina=TRUE \
thal=rev} => Then, CAD is present
Dataset
Objectiv
e
Feature
s
Method
s
Case 2 (METHODS)
Case 2 (METHODS)
Bagging Algorithm
Classifies each sample based on the output of a set of diverse base
classifiers.
Base classifiers can be selected from the C4.5, Nave Bayes, ID3, and
other data mining algorithms.
Case 3 (METHODS)
Sequential Minimal Optimization (SMO): algorithm for efficiently
solving the optimization problem which arises during the training
of Support Vector Machines (SVMs)
Nave Bayes classifier: simple probabilistic classifier based on
applying Bayes theorem with strong independence assumption
Bagging algorithm
Neural Network algorithm: Artificial Neural Network (ANN)
interconnected group of artificial neuronsuse a mathematical or
computational model for information processing based on a
connectionist approach.Model complex relationships between
Case 3 (METHODS)
Feature Selection
uses the coefficients of the normal vector of a linear SVM as feature
weights
The attribute values still have to be numerical.
34 of features had the weight > 0.6: selected and the algorithms were
applied on them.
Case 3 (METHODS)
Feature creation
3 new features: LAD (Left Anterior Descending) recognizer, LCX (Left
Circumflex) recognizer, RCA (Right Coronary Artery) recognizer are
used to recognize whether LAD, LCX, RCA is blocked. Higher the
value, higher the risk.
Available features of the dataset are first discretized into binary
variables
value 1 for a feature indicates higher probabilities of the record being in
the CAD class, while value zero indicates otherwise.
Case 3(METHODS)
Association rule mining (Mentioned in Case 1)
Support
Confidence
Understand CAD
Confusion Matrix
Sensitivity
Specificity
Accuracy
Rules
Extract
ed
Performance
Measurement
Results
Confidenc
e
Feature Selection
Feature Creation
Information Gain
Gini Index
C 4.5
Bagging Algorithm
SMO Algorithm
Naive Bayes algorithm
Neural Network algorithm
Association Rule Mining
RapidMine
r
Conclusion
Using Feature selection methods can increase the accuracy of CAD diagnosis
(Though sometimes may decrease the accuracy of the LAD, RCA stenosis diagnosis)
To enrich our dataset, we may need to create some new features which has vital
influence the accuracy of the CAD diagnosis.
Rules extracted from association rule mining methods may not be 100% correct, we
need some more testing data to test the rules.
Still need the results of the standard angiographic method which are used as the
base of comparison, to assess the prediction capability of classification algorithms.