Sunteți pe pagina 1din 48

RACOG and wRACOG: Two Probabilistic Oversampling Techniques

Presented By:
Manish Kumar Sharma
PhD Scholar,
Department of Computer Science & Engineering.
The LNM Institute of Information Technology
Jaipur, Rajasthan,

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


CONTENT
 Concept of Fatigue
 Accidental Statistics Analysis
 Motivation for the Problem
 Cognitive Fatigue
 Generalized Architectural diagram of Fatigue detection system
 Data used
 Feature Extraction and analysis
 Statistical Features: The statistical features used were mainly
 Wavelet Features
 Performance Measures Used
 Confusion Matrix used
 Basic K-means Work as a Classifier
 Modified -1 K-means Work as a Classifier
 Modified -2 K-means Work as a Classifier
 Comparative Analysis of K-means based Classifiers
 Comparative Analysis of Various Versions of K-Means
Algorithms on the basis of Confusion Matrix Parameters
 Optimal Performing Parameters and Various Versions of K- Means
 Conclusion & Future Scope
 References

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Introduction
 Machine learning approach use to handle scientific problems in the
current deluge of data, a variety of challenges arise due to the nature
of the underlying data.
 Imbalanced class distribution problem, where one of the target
classes (the minority class) is underrepresented in comparison with
the other classes (the majority class or classes).
 Resampling, that balances class priors of training data by either
increasing the number of minority class data samples
(oversampling) or decreasing the number of majority class data
samples (under sampling);
 cost-sensitive learning (CSL), which assigns higher
misclassification cost for minority class samples than majority class;
 kernel-based learning methods, which make the minority class
samples more separable from the majority class by mapping the data
to a high dimensional feature space.
Infrequent Weighted Itemset Mining Using Frequent Pattern Growth
Motivation for the Problem
 The Weighted association rule(WAR) with weight as significant factor that
is introduced only during the rule generation step after performing the
traditional frequent mining process.

 The weight used to exploit the anti-monotonicity of the weight support


constraint to determine for motivation of he Apriori based itemset mining
phase.

 The probabilistic frequent itemset mining from uncertain data with item
occurrence in each transaction are uncertain.

 The probabilistic model have been constructed and integrated in Apriori


based or projection based algorithms although probabilities of item
occurrence may be remapped to weights, the semantic behind probabilistic
and weighted itemset mining is radically different.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Motivation for the Problem
 The issue of discovering minimal infrequent itemsets, the itemset satisfy
a maximum support threshold and do not contain any infrequent subset
from transactional data sets.

 traditional infrequent item set mining algorithms still suffer from their
inability to take local item interestingness into account during the mining
phase

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Infrequent Weighted Itemset Mining
 The infrequent weighted item set (IWI) mining task
effectively, while, on the other hand, state-of-the-art
infrequent item set miners are, to the best of our
knowledge, unable to cope with weighted data.
 The discovery of infrequent and weighted item sets, i.e., the
infrequent weighted item sets, from transactional weighted
data sets.
 The IWI-support measure is defined as a weighted
frequency of occurrence of an item set in the analyzed
data. Occurrence weights are derived from the weights
associated with items in each transaction by applying a
given cost function.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Infrequent Weighted Itemset Mining
 Major focus on two different IWI-support measures:
(i) The IWI-support- min measure, which relies on a minimum cost
function, with the occurrence of an item set in a given transaction is
weighted by the weight of its least interesting Item,
(ii) The IWI-support-max measure, which relies on a maximum cost
function, with, the occurrence of an item set in a given transaction is
weighted by the weight of the most interesting item.

 The following problems have been addressed:


(i) IWI and Minimal IWI mining driven by a maximum IWI-support-
min threshold, and
(ii) IWI and Minimal IWI mining driven by a maximum IWI-support-
max threshold.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Infrequent Weighted Itemset Mining
 Six transactions represented by tids(transaction ids) each transaction
composed four distinct items weighted by the corresponding degree of
interest (e.g., item a has weight 0 in tid 1, and 100 in tid 4).
 Data center resource management and application profiling, transactions
may represent CPU usage readings collected at a fixed sampling rate. For
example, tid 1 means that, at a fixed point of time (1), CPU b works at a
high usage rate (weight 100), CPUs c and d have an intermediate usage
rate (weights 57 and 71, respectively), while CPU a is temporarily idle
(weight 0).

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Infrequent Weighted Itemset Mining
 Task (A) involves discovering IWIs and minimal IWIs (MIWIs) which
include the item(s) with the least local interest within each transaction.
Table 2 reports the IWIs mined from Table 1 by enforcing a maximum
IWI-support-min threshold equal to 180 and their corresponding IWI
support min values. system usage profiling, IWIs in Table 2 represent sets
of CPUs which contain at least one underutilized or idle CPU at each
sampled instant.
 The real-life system malfunctioning or underutilization may arise
when the workload is not allocated properly over the available
CPUs. For example, considering CPUs a and b, recognizing a
suboptimal usage rate of at least one of them may trigger targeted
actions, such as system resizing or resource sharing policy
optimization.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Infrequent Weighted Itemset Mining

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Literature Survey & Problem Statements
• From the Literature Review many researchers have been used
different sensory parameters to work out a feasible and
acceptable solution using various existing & modified
classification/detection methodologies for detection/monitoring
of cognitive fatigue in vehicular drivers.

• The Physiological sensory parameters are following:


 EEG (Electro-Encephalogram)
 ECG (Electro Cardiogram)
 EOG (Electro-Oculogram)
 EMG (Electro-Myogram)
 Oximetery Pulse
 Skin Conductance
 Respiration

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Literature Survey & Problem Statements

• Findings of the Review :


 The solutions using Physiological Signals such as EEG could fetch highly
precise and accurate systems but those are not acceptable to the users.
 The solutions using simple physiological parameters such as skin conductance,
could provide simple and feasible system.
 The physiological signals and their features extracted were used as input to
various classification techniques using Neural Networks, SVM, Fuzzy Systems,
Statistical methods so on.
 K-means has been used as mainly clustering technique for various kinds of
data.
Skin Conductance: The skin conductance determine the electrical
conductance of the skin. The control dependent on the value of sweat induced
skin moisture. The skin represented by different by Galvanic skin
response(GSR), Electro dermal Response(EDR),Psycho Galvanic
Reflex(PGR), Skin Conductance Response(SCR)and Skin Conductance
Level(SCL).

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Problem Statements and Objectives

• Problem Statements:
 Let 𝐼 = {𝑖1 , 𝑖2 , … … , 𝑖𝑛 }. A transactional data set 𝑇 = {𝑡1 , 𝑡2 , … , 𝑡𝑛 } is
a set of transactions, where each transaction 𝑡𝑞 (𝑞 𝜖 [1, 𝑛])is a set of
items in 𝐼 and is characterized by a transaction ID (tid). An item set
𝐼 is a set of data items. More specifically, we denote as k-item set a
set of k items in 𝐼. An item set 𝐼 is infrequent if its support (an item
set is the number of transactions containing 𝐼 in 𝑇). is less than or
equal to a predefined maximum support threshold 𝜉.
 An infrequent item set is said to be minimal if none of its subsets is
infrequent. Given a transactional data set 𝑇 and a maximum support
threshold 𝜉 , the infrequent (minimal) item set mining problem
entails discovering all infrequent (minimal) item sets from 𝑇.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Problem Statements and Objectives

• Problem Statements:
 The traditional support measure for driving the item set mining process
entails treating items and transactions equally, even if they do not have the
same relevance in the analyzed data set. To treat items differently within
𝑞
each transaction the concept of weighted item introduce as a pair 𝑖𝑘 , 𝑤𝑘 ,
𝑞
where 𝑖𝑘 𝜖 𝐼 is an item contained in 𝑡𝑞 𝜖 𝑇 whereas 𝑤𝑘 is weight associated
with 𝑖 that characterizes its local/interest in 𝑡 .
𝑘 𝑞
 Definition 1 (Weighted transactional data set): Let 𝐼 = {𝑖1 , 𝑖2 , … … , 𝑖𝑚 } be
a set of items. A weighted transactional data set Tw is a set of weighted
𝑤
transaction, where each weighted transaction 𝑡𝑞 is a set of weighted
𝑞 𝑞
weighted item 𝑖𝑘 , 𝑤𝑘 such that ik ϵ I and 𝑤𝑘 is the weight associated with
𝑤
𝑖𝑘 in 𝑡𝑞 .

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Problem Statements and Objectives

 To design and implement basic K-Means algorithm as


classifier and test these data sets.
 To make certain modifications in the existing
algorithms, implement and test for the features
extracted.
 To do comparative analysis of K-means algorithms
used as fatigue classifier based on physiological
parameters used.
 To check the correlation of each individual feature
with cognitive fatigue and the classifier performance.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Cognitive Fatigue Detection System

General Architectural Diagram of Cognitive Fatigue


Detection System:

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Data Collection & Processing
Data used: In order to detect cognitive fatigue in vehicular drivers, real time physiological
sensory data related to fatigue state and normal state collected by Dr. Mahesh Bundele was
used for further processing. The data recorded was 3-5 minutes recording of Normal state
driver (Pre-driving) and fatigued state (Post-driving) in text file format. The data contained
total 150 drivers data recorded in real time. Few selected signals of the Skin Conductance,
Oximetry pulse and Respiration physiological parameters are considered for this project.
The two state signals of SC(Skin Conductance), Oximetry pulse and Respiration were
processed for extraction of predefined features using MATLAB.
• Statistical Features:
The Standard statistical features used
 Mean of Signal (MOS)
 Standard Deviation of Signal (STDEVS)
 Frame Energy(FE)
 Maximum Frequency (MAXF)
 Standard Deviation of Frequency Spectrum(STDFS)
 Mean of Frequency Spectrum (MOFS)

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Feature Extraction & Analysis

Wavelet features: Vectors of Decomposed components such as


approximation Coefficients (CA) or Detail Coefficients (CD1, CD2, CD3,
CD4, CD5, CD6) and Vectors of Reconstructed components such as
approximation Coefficients(A) or Detail Coefficients (D1, D2, D3, D4, D5,
D6) are further processed to six statistical features are followings
 Maximum
 Minimum
 Mean
 Mode
 Variance
 Standard Deviation

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Feature Extraction & Analysis

Daubechies Wavelet :
 (SC_DB3_L6): Here Daubechies mother wavelet with order 3 has been
used to decompose each frame of signal up to Level 6. Total 77 features
vectors were extracted from this signal.
Biorlet Wavelet:
 OP_BIOR1.1_L4: Biorlet wavelet with level 4 and total 60 feature vectors
were extracted.
Dmyer’s Wavelet:
 RSP_DMey_L4: Dmyer’s wavelet with level 4 Total 56 features vectors
were extracted from this signal.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms
• Details of data and its specifications used
• From Various feature vectors of SC signal for two class data Pre-driving and Post-driving
were used as input to the algorithms.

Physiological Feature File Name No. of Feature Size of Training Size of Testing
Signal Vectors dataset(Pre+Pos dataset(Pre+P
t) ost)
Skin Conductance SC_STAT_1 6 400x6 400x6

SC_STAT_2 18 400x6 400x6

SC_DB3_L6 77 400x6 400x6

Oximetry Pulse OP_BIOR1.1_L4 60 400x6 400x6

OP_DMey_L8 110 400x6 400x6

Respiration RSP_DB6_L3 37 400x6 400x6

RSP_DMey_L4 56 400x6 400x6


Infrequent Weighted Itemset Mining Using Frequent Pattern Growth
Design and Implementation of K-mean Algorithms
Performance Measures Used
Percentage Classification Accuracy(PCLA)
The PCLA is the average of Percentage Classification Accuracy of
the two classes under consideration. The PCLA has been considered
as single classification performance indicator.
 Confusion Matrix
Confusion matrix is used to display the classification results of a
classifier. The confusion matrix is defined by labeling the desired
classification on the rows and the predicted classifications on the
columns.

Table of Confusion Matrix

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms

The Confusion Matrix Parameters are

True Positive (Tp): -It is the number of positive class


samples detected by the classifier as positive. Here post
driving signals are treated as positive class.

True Negative (Tn): It is the number of negative class


samples detected by the classifier as negative. Here, pre
driving signals are treated as negative class.

False Positive (Fp): It is the number of negative class


samples detected by the classifier as positive.

False Negative (Fn ): It is the number of positive class


samples detected by the classifier as negative.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms
A. Algorithm to compute the Basic Version of K-mean Classifier:
(row=0 to 399 & 400 to 799 are replicates, Total 1200 rows, First 400 are
used as training and last 400 used as Testing dataset)
Step 1: Read an Excel data sheet of two different classes of data of (Pre
and Post driving).

Step 2: Calculate the number of rows and columns in each excel sheet.

Step3: In the starting the dataset from row 0 to row/3 and column from 0
to total no of columns store into 2-D array along with row tag.

Step 4: For row=0 to rows/3 & for column= 0 to (Number of Columns-1).


Step 5: The cluster centers of two different classes assign into initial two
values of each column.

Step 6: In each column calculate the distance between next data point to
the consistent cluster centers

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms
Step 7: For each column allocate data point to respective cluster with
minimum distance.

Step 8: The values of cluster centers of consistent columns updated.

Step9: In each column repeatedly calculate the distance between new cluster
centers and assigned data points.

Step10: Stop the loop if no data point of each column was reassigned
otherwise go to step 6 for corresponding column.

Step 11: Then dataset from row i=rows/3+ 1 to 2*rows/3 and column 0 to
total no. of columns store into 2-D array along with row tag.
Step 12: Assign the each column data points to resultant cluster on the basis
of minimum distance with cluster center.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms
Step 13: Write an arranged confusion matrix of training data for dataset
from row = rows/3+ 1 to 2*rows/3 and column 0 to total no of columns to
output file.

Step 14: Then the dataset from row=2*rows/3+ 1 to 3*rows/3 and column=0
to total no. of columns store into 2-D array along with row tag.

Step 15: Assign the each column data points to resultant cluster on the basis
of minimum distance with cluster center.

Step 16: Write an arranged confusion matrix of training data for dataset
from row=2*rows/3+ 1 to 3*rows/3 and column 0 to total no of columns to
output file.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms
B. Algorithm to compute the Modified-1 k-means work as Classifier:
(row=0 to 399 & 400 to 799 are replicates, Total 1200 rows, First 400 are used as
training and last 400 used as Testing dataset)

Step 1: Read an Excel data sheet of two different classes of data of (Pre and Post
driving).

Step 2: Calculate the number of rows and columns in each excel sheet.

Step3: In the starting the dataset from row 0 to row/3 and column from 0 to total
no of columns store into 2-D array along with row tag.

Step 4: For row=0 to rows/3 & for column= 0 to (Number of Columns-1).

Step 5: For each column to two cluster centers calculate the values of mean,
Variance, Standard Deviation, Percentile.

Step 6: The cluster centers of two different classes assign the calculated attribute
values of each column.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms
B. Algorithm to compute the Modified-1 k-means work as Classifier:
Step 7: For each column allocate data point to respective cluster with
minimum distance.

Step 8: The values of cluster centers of consistent columns updated.

Step9: In each column repeatedly calculate the distance between new


cluster centers and assigned data points.

Step10: Stop the loop if no data point of each column was reassigned
otherwise go to step6 for corresponding column.

Step 11: Then dataset from row i=rows/3+ 1 to 2*rows/3 and column 0 to
total no. of columns store into 2-D array along with row tag.

Step 12: Assign the each column data points to resultant cluster on the
basis of minimum distance with cluster center.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms
B. Algorithm to compute the Modified-1 k-means work as Classifier:

Step 13: Write an arranged confusion matrix of training data for


dataset from row = rows/3+ 1 to 2*rows/3 and column 0 to total no
of columns to output file.

Step 14: Then the dataset from row=2*rows/3+ 1 to 3*rows/3 and


column=0 to total no. of columns store into 2-D array along with
row tag.

Step 15: Assign the each column data points to resultant cluster on
the basis of minimum distance with cluster center.

Step 16: Write an arranged confusion matrix of training data for


dataset from row=2*rows/3+ 1 to 3*rows/3 and column 0 to total
no of columns to output file.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms
C. Algorithm to compute the Modified-2 k-means work as Classifier:
(row=0 to 399 & 400 to 799 are replicates, Total 1200 rows, First 400 are used
as training and last 400 used as Testing dataset)
Step 1: Read an Excel data sheet of two different classes of data of (Pre and
Post driving).
Step 2: Calculate the number of rows.

Step3: In the starting make a class of classification which contain numeric


attribute along with string tag as a member variables.

Step 4: Store the dataset i=0 to rows/3 into 1-D object array of Classification
class.

Step 5: For row=0 to rows/3.

Step 6: Assign first two values of 1-D object array of classification class to
corresponding class 1-D cluster centers array of classification class to store
the 2-cluster center values
Infrequent Weighted Itemset Mining Using Frequent Pattern Growth
Design and Implementation of K-mean Algorithms
C. Algorithm to compute the Modified-2 k-means work as Classifier:
Step 7: Read and Calculate the distance between next data element in 1-D
object array to the corresponding cluster centers values in 1-D cluster center
array.

Step 8: The minimum distance data elements assign to the cluster.


Step 9: Each cluster centers data values updated.

Step10: Repeatedly calculate the distance between new 1-D array of cluster
centers and previously assigned data point of 1-D object array.

Step11: If no data point of 1-D object array was reassigned then stop
otherwise go to step 6.

Step 12: Then store the dataset i=rows/3+ 1 to 2*rows/3 into 1-D object array
along with row tag.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Design and Implementation of K-mean Algorithms
C. Algorithm to compute the Modified-2 k-means work as Classifier:
Step 13: Assign each of the data element of 1-D object array to corresponding
cluster on the basis of minimum distance with the 1-D current cluster centers
object array.

Step 14: Prepare confusion matrix of test data for dataset row=rows/3+ 1 to
2*rows/3 & write to output file.

Step 15: Then store the dataset row=2*rows/3+ 1 to 3*rows/3 into 1-D array
along with row tag.

Step 16: Assign each of the data element of 1-D object array to corresponding
cluster on the basis of minimum distance with the element of 1-D current
cluster centers object array.

Step 17: Prepare confusion matrix of test data for dataset i=2*rows/3+ 1 to
3*rows/3 & write to output file.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Comparative Analysis of K-means based Classifiers
For Feature Set-I of Skin Conductance:
K2- MOFS(High) K1,K2,K3 FE,STDFS,MOS(Same)
Average Classification Accuracy using three versions of K-Mean
Algorithm

80

70

60

50

40

30

20

10

0
FE MOFS STDFS MOS

K-mean Version 1 K-mean Version 2 K-mean Version 3

SC Feature Set I versus PCLA for K-means 1 to 3

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Comparative Analysis of K-means based Classifiers
For Feature Set-II of Skin Conductance:
K1,K2,K3-
MOFS,SDOS(same)
K1,K3-FE,STDFS(High) Average Classification Accuracy using three versions of K-Mean Algorithm
K3-MOS(High)

100
90
80
70
60
50
40
30
20
10
0
FE MOFS STDFS MOS SDOS

K-mean Version 1 K-mean Version 2 K-mean Version 3

SC Feature Set II versus PCLA for K-means 1 to 3

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Comparative Analysis of K-means based Classifiers
For Feature Set-III of Skin Conductance: SC_DB3_L6
K1,K3- performed very well for all parameters
PCLA using three versions of K-Mean Algorithm

100

90

80

70

60

50

40

30

20

10

0
MAX CA6 MIN CA6 MEANCA6 MODE CA6 MAX A6 MIN A6 MEANA6 MODE A6 ENTROPY

K-mean Version 1 K-mean Version 2 K-mean Version 3

SC Feature Set II versus PCLA for K-means 1 to 3

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Comparative Analysis of K-means based Classifiers
For Feature Set-III of Skin Conductance: SC_DB3_L6
K1-minD1, modeD1, minD4, modeD4(high)
Average Classification Accuracy using three versions of K-Mean Algorithm

90

80

70

60

50

40

30

20

10

0
STD CA6 VAR CA6 STD CD4 MAX CD5 STD CD5 MAX CD6 STD CD6 MIN D1 MODE D1 MIN D4 MODE D4

K-mean Version 1 K-mean Version 2 K-mean Version 3


SC Feature Set II versus PCLA for K-means 1 to 3

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Comparative Analysis of K-means based Classifiers
For Feature Set-I of Oximetry Pulse OP_BIOR1.1_L4
PCLA for three versions of K-Mean Algorithm
100

90

80

70

60

50

40

30

20

10

0
MAX CA4 STD CA4 VAR CA4MAX CD1MIN CD1 MIN CD3 STD CD3 VAR CD3MODE CD3MAX A4 STD A4 MAX D1 VAR D1 MAX D2ENERGY D2

K-mean Version 1 K-mean Version 2 K-mean Version 3

OP Feature Set I versus PCLA for K-means 1 to 3

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Comparative Analysis of K-means based Classifiers
For Feature Set-I of Oximetry Pulse OP_BIOR1.1_L4

PCLA for three versions of K-Means Algorithm


80

70

60

50

40

30

20

10

0
STD CD1 MAX CD2 MIN CD2 STD CD2 MAX CD3 MIN CD4MODE CD4MIN D1 STD D1 MIN D2 STD D2 MAX D3 MIN D3 STD D3 MODE D3
K-mean Version 1 K-mean Version 2 K-mean Version 3

OP Feature Set I versus PCLA for K-means 1 to 3


Infrequent Weighted Itemset Mining Using Frequent Pattern Growth
Comparative Analysis of K-means based Classifiers
For Feature Set-II of Oximetry Pulse OP_DMey_L8
PCLA for three versions of K-Mean Algorithm
100

90

80

70

60

50

40

30

20

10

K-mean Version 1 K-mean Version 2 K-mean Version 3

OP Feature Set II versus PCLA for K-means 1 to 3


Infrequent Weighted Itemset Mining Using Frequent Pattern Growth
Comparative Analysis of K-means based Classifiers
For Feature Set-II of Oximetry Pulse OP_DMey_L8
PCLA for three versions of K-Mean Algorithm
90

80

70

60

50

40

30

20

10

0
MIN CA8 MODE CA8 MAX CD1 MIN CD1 MAX CD2 MIN CD2 VAR CD3 MIN CD5 MAX D2 MEAN D6 ENTROPY

K-mean Version 1 K-mean Version 2 K-mean Version 3

OP Feature Set II versus PCLA for K-means 1 to 3


Infrequent Weighted Itemset Mining Using Frequent Pattern Growth
Comparative Analysis of K-means based Classifiers
Feature set I: RSP_DMey_L4
PCLA for three versions of K-Mean Algorithm

100

90

80

70

60

50

40

30

20

10

0
MAX CA4 MIN CA4 MEANCA4 MODE CA4 MAX A4 MIN A4 MEAN A4 MODE A4 ENTROPY

K-mean Version 1 K-mean Version 2 K-mean Version 3

RSP Feature Set I versus PCLA for K-means 1 to 3

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Comparative Analysis of K-means based Classifiers
Feature set II: RSP_DB6_L3
PCLA for three versions of K-Mean Algorithm
100

90

80

70

60

50

40

30

20

10

0
MAX CA3 MIN CA3 MEANCA3 MODE CA3 MAX A3 MIN A3 MEAN A3 MODE A3 MIN D3 STD D3

K-mean Version 1 K-mean Version 2 K-mean Version 3

RSP Feature Set II versus PCLA for K-means 1 to 3

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Optimal Performing Parameters and Various Versions of K-Means on
the Basis of PCLA

Physiological Feature File Total K-Means Version 1 K-Means K-Means Version 3


Parameters / Features Version 2
Algorithms
Skin Feature 6 NIL NIL NIL
Conductance Set I
Feature Set 18 NIL MOS FE, MOFS, STDFS, MOS
II
Feature Set 77 MAX, MIN, MEAN & NIL MAX, MIN, MEAN &
III- MODE of CA6 and A6, MODE of CA6 & A6 and
SC_DB3_L6 Entropy Entropy
(9 Features) (9 Features)
Oximetry Feature Set I 60 MAX A4 NIL MAX CA4 and MAX A4
Pulse OP_BIOR1.1
_L4

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


Optimal Performing Parameters and Various Versions of K-Means on
the Basis of PCLA
Oximetry Feature Set II 110 MAX, MEAN of CA8, MAX MAX, MEAN of
Pulse OP_DMey_L8 VAR, MODE of CD1, CA8 CA8, MEAN CD7,
MAX, MEAN, MODE of MEAN CD8, MAX,
A8, VAR CD2, MODE D8 MEAN, MIN, MODE
and entropy of A8, MIN &
(10 Features) MODE of D8 (10)
Respiration Feature Set I 56 MAX, MIN, MEAN, NIL MAX, MIN, MEAN,
RSP_DMey_L MODE of CA4 & A4, and MODE of CA4 & A4
4 Entropy and Entropy
(9 Features) (9 Features)
Feature Set II 33 MAX, MIN, MEAN and NIL MAX, MIN, MEAN,
RSP_DB6_L3 MODE of CA3 & A3 MODE of CA3 & A3
(8 Features) (8 Features)

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


CONCLUSION & FUTURE SCOPE
After exhaustive experimentation following finally concluded:
 That K-means version 1 (Basic K-means) and version 3 could perform very well for
some of the features of all the three physiological parameters such as SC, OP and RSP.
 Although K-1 & K3 performed well for some of the features, the test has been on
individual features rather than in combination and hence these features only can be
used for detection of cognitive fatigue as the classification accuracies for both the
classes has been up to 100 %.
 For Skin Conductance (SC) Feature set II, MOS feature performed well with K-means
Version 2 while FE, MOS, MOFS, STDFS worked well in K-means version 3.
 In SC feature set III which was based on Daubechies order 3 and level 6
decomposition, Approximate coefficient of decomposed as well as reconstructed i.e.
CA6 & A6 and the entropy features performed the best with K-means version 1 and 3.
 For Oximetry Pulse signal, Bior’s wavelet features only the approximate coefficient of
Decomposed and Reconstructed signals i.e. CA4, A4 has performed well in K-means
Version 1 and 3.
 For OP DMeyer’ s feature set Approximate coefficient of decomposed and
reconstructed wave along with some detail coefficients has performed very well in
Version 1 & 3, whereas MAX CA8 has performed good in K-means version 2 also.
 For RSP DMeyer as well as Daubechies features approximate coefficient of
decomposed and reconstructed components did perform the best in K-means Version 1
and 3.
Infrequent Weighted Itemset Mining Using Frequent Pattern Growth
CONCLUSION & FUTURE SCOPE

Major Contributions & Achievements


 Carried out in-depth literature review of research papers related to
Cognitive Fatigue Detection systems and K-means based applications.
 Selected simple physiological parameters those could work very well in
Dr. Mahesh Bundele’s work.
 Used real time data collected by Dr. Mahesh Bundele and processed
further in MATLAB for getting appropriate statistical and wavelet
features.
 Attempted to find an alternative approach for classification of fatigued
and non fatigued physiological signals through implementation of three
versions of K-means algorithms Implemented using JAVA.
 Carried out testing of all three algorithms on three physiological
parameters namely SC, OP and RSP.
 The test results proved that some of the features of all the parameters
individually could be classified to the extent of 100 % by K-means
version 1 and 3.

Infrequent Weighted Itemset Mining Using Frequent Pattern Growth


References
[1] Brown, Ivan D. "17 Methodological issues in driver fatigue
research." Fatigue and driving: Driver impairment, driver fatigue, and
driving simulation (1995): 155.

[2] Bundele, M.M. and Banerjee, R., 2009, December. An SVM classifier
for fatigue-detection using skin conductance for use in the BITS-Lifeguard
Wearable Computing System. In Emerging Trends in Engineering and
Technology (ICETET), 2009 2nd International Conference on (pp. 934-
939). IEEE.

[3] Zhang, Chong, Chong-Xun Zheng, and Xiao-Lin Yu. "Automatic


recognition of cognitive fatigue from physiological indices by using wavelet
packet transform and kernel learning algorithms." Expert Systems with
Applications36, no. 3 (2009): 4664-4671. www.elsevier.com/locate/eswa

[4] Yang, Guosheng, Yingzi Lin, and Prabir Bhattacharya. "A driver
fatigue recognition model using fusion of multiple features." In Systems,
Man and Cybernetics, 2005 IEEE International Conference on, vol. 2, pp.
1777-1784. IEEE, Weighted
Infrequent 2005. Itemset Mining Using Frequent Pattern Growth
References
[5] Eoh, Hong J., Min K. Chung, and Seong-Han Kim.
"Electroencephalographic study of drowsiness in simulated driving with sleep
deprivation." International Journal of Industrial Ergonomics 35, no. 4 (2005):
307-320, www.elsevier.com/locate/ergon

[6] Hu, Shuyan, and Gangtie Zheng. "Driver drowsiness detection with eyelid
related parameters by Support Vector Machine." Expert Systems with
Applications 36, no. 4 (2009): 7651-7658. www.elsevier.com/locate/eswa

[7] Bundele, Mahesh M. "Identification of body Parameters for changes in


reflexes of a vehicular driver under drowsiness/fatigue/stress conditions."
Published in the Proceeding of FRONTIER (2008): 123-131.

[8] Bundele, Mahesh M., and Rahul Banerjee. "Detection of fatigue of


vehicular driver using skin conductance and oximetry pulse: a neural network
approach." In Proceedings of the 11th International Conference on
Information Integration and web-based applications & services, pp. 739-744.
ACM, 2009.
Infrequent Weighted Itemset Mining Using Frequent Pattern Growth
References
[9] Yeo, Mervyn VM, Xiaoping Li, Kaiquan Shen, and Einar PV Wilder-
Smith. "Can SVM be used for automatic EEG detection of drowsiness during
car driving?" Safety Science 47, no. 1 (2009): 115-124., Elsevier.

[10] Shen, Kai-Quan, Chong-Jin Ong, Xiao-Ping Li, Zheng Hui, and Einar PV
Wilder-Smith. "A feature selection method for multilevel mental fatigue EEG
classification." IEEE Transactions on Biomedical Engineering 54, no. 7 (2007):
1231-1237.

[11] Sharma, Manish Kumar, and Mahesh M. Bundele. "Design & analysis of
k-means algorithm for cognitive fatigue detection in vehicular driver using
oximetry pulse signal." In Computer, Communication and Control (IC4), 2015
International Conference on, pp. 1-6. IEEE, 2015.

[12] Sharma, Manish Kumar, and Mahesh M. Bundele. "Design & analysis of
performance of K-Means algorithm for cognitive fatigue detection in
vehicular drivers using Skin Conductance signal." In Computing for
Sustainable Global Development (INDIACom), 2015 2nd International
Conference on, pp.Weighted
Infrequent 707-712.Itemset
IEEE, 2015.
Mining Using Frequent Pattern Growth

S-ar putea să vă placă și