Sunteți pe pagina 1din 35

Presented By

K.Indira

Under the Guidance of
Dr. S. Kanmani,
Professor,
Department of Information Technology,
Pondicherry Engineering College.


1
Mining Association Rules using Optimal
Genetic Algorithm & Quantum Swarm
intelligent PSO.
2
Objective.
Introduction.
Data Mining.
Association Analysis.
Limitations of the existing system.
GA and PSO An Introduction.
Existing Work.
Based on GA.
Based on PSO.
Work Done So far.
Proposed Work.
Papers Published.
References.
Contents
Execution Plan.
3
To Propose an efficient methodology for
mining of ARs using Optimal Genetic
Algorithm & Quantum Swarm intelligent
PSO
Objective
Extraction of interesting information or
patterns from data in large databases is known
as data mining.
Data Mining
4
Association analysis is the discovery of what are
commonly called association rules.
It studies the frequency of items occurring together in
transactional databases
Association rule mining provides valuable
information in assessing significant correlations.

ASSOCIATION ANALYSIS
5
6
Association Rules
Find all the rules X Y with
minimum support and
confidence
Support, s, probability that a
transaction contains X Y
Confidence, c, conditional
probability that a transaction
having X also contains Y
Let minsup = 50%, minconf = 50%
Freq. Pat.: Milk:3, Nuts:3, Sugar:4, Eggs:3,
{Milk, Sugar}:3
Customer
buys sugar
Customer
buys both
Customer
buys milk
Nuts, Eggs, Bread 40
Nuts, Coffee, Sugar , Eggs, Bread
50
Milk, Sugar, Eggs 30
Milk, Coffee, Sugar 20
Milk, Nuts, Sugar 10
Items bought
Tid
Association rules:
Milk Sugar (60%, 100%)
Sugar Milk (60%, 75%)
7
Apriori, FP Growth Tree, clat are some of the
popular algorithms for mining ARs.
Traverse the database many times.
I/O overhead, and computational complexity is more
Cannot meet the requirements of large-scale
database mining.
Limitations of Existing System
GA and PSO An Introduction
Evolutionary algorithms provide robust and
efficient approach in exploring large search space.

A Genetic Algorithm (GA) is a procedure used to
find approximate solutions to search problems
through the application of the principles of
evolutionary biology.

PSOs mechanism is inspired by the social and
cooperative behavior displayed by various species
like birds, fish etc including human beings.

8
9
Existing Work
Mining ARs Based on Genetic Algorithm
Efficient Distributed Genetic Algorithm done by spatial
partitioning of the population into several semi-isolated nodes,
each evolving in parallel and possibly exploring different regions
of the search space.

Genetic algorithm without taking the minimum support and
confidence into account. Extracts the best rules that have best
correlation between support and confidence

Improved niched Pareto genetic algorithm(INPGA), selects the
accurate candidates and also saves selection time with combining
BNPGA and SDNPGA

GRA with a new operator, called guided mutation is introduced.
GRA considers the correlation coefficient between nodes in each
individual of GRA.

10
Mining ARs Based on Particle Swarm Optimization
Existing Work contd..
A novel algorithm for association rule mining in order to improve
computational efficiency as well as to automatically determine
suitable threshold values.

The algorithm operates at three evolution levels where an adaptive
inertia weight is presented. The safety distance is introduced to move
the particle through its current position, and the proximity index.

Self-adaptive method to adjust the inertia weight of the velocity update
rule based on the empirical values and negative feedback technique is
introduced ,which relieve the burden of specifying the parameters
values.

Combines Particle Swarm Optimization (PSO) and Genetic Algorithms
(GAs) using fuzzy logic to integrate the results of both methods and for
parameters tuning. The new optimization method combines the
advantages of PSO and GA to give us an improved FPSO + FGA hybrid
approach.
11
Work Done so Far
Association Rule Mining was carried out using the
Genetic Algorithm in Matlab 2008a.

Mining of Association rule was carried out using self
Adaptive Genetic algorithm using Java.

The GA Parameters were varied and the results were
recorded for each cases.





12
Mining ARs using GA in Matlab 2008a.
Methodology

Selection : Tournament

Crossover Probability : Fixed ( Tested with 3 values)

Mutation Probability : No Mutation

Fitness Function :

Dataset : Lenses, Iris, Haberman from
UCI Irvine repository.

Population : Fixed ( Tested with 3 values)


13
Flow chart of the GA
Results Analysis
No. of Instances No. of Instances * 1.25 No. of Instances *1.5
Accuracy
%
No. of
Generations
Accuracy
%
No. of
Generations
Accuracy
%
No. of
Generations
Lenses 75 7 82 12 95 17
Haberman
71 114 68 88 64 70
Iris 77 88 87 53 82 45
Comparison based on variation in population Size.

Minimum Support & Minimum Confidence
Sup = 0.4 & con
=0.4
Sup =0.9 & con =0.9 Sup = 0.9 & con =
0.2
Sup = 0.2 & con =
0.9
Accuracy
%
No. of
Gen
Accuracy
%
No. of
Gen.
Accuracy
%
No. of
Gen.
Accuracy
%
No. of
Gen
Lenses 22 20 49 11 70 21 95 18
Haberman
45 68 58 83 71 90 62 75
Iris 40 28 59 37 78 48 87 55
Comparison based on variation in Minimum Support and Confidence
15
Cross Over
Pc = .25 Pc = .5 Pc = .75
Accuracy % No. of
Generations
Accuracy % No. of
Generations
Accuracy % No. of
Generations
Lenses 95 8 95 16 95 13
Haberman 69 77 71 83 70 80
Iris 84 45 86 51 87 55
Dataset No. of
Instance
s
No. of
attributes
Populatio
n Size
Minimum
Support
Minimum
confidence
Crossover
rate
Accuracy
in %
Lenses 24 4 36 0.2 0.9 0.25 95
Haberman
306 3 306 0.9 0.2 0.5 71
Iris 150 5 225 0.2 0.9 0.75 87
Comparison of the optimum value of
Parameters for maximum Accuracy achieved
Comparison based on variation in Crossover Probability

16
Values of minimum support, minimum confidence and
population size decides upon the accuracy of the system
than other GA parameters.
Crossover rate affects the convergence rate rather than the
accuracy of the system.
The optimum value of the GA parameters varies from data
to data and the fitness function plays a major role in
optimizing the results.
The size of the dataset and relationship between
attributes in data contributes to the setting up of the
parameters.
Inferences
17
Mining ARs using Self Adaptive GA in
Java.
Methodology

Selection : Roulette Wheel

Crossover Probability : Fixed ( Tested with 3 values)

Mutation Probability : Self Adaptive





Fitness Function :

Dataset : Lenses, Iris, Car from
UCI Irvine repository.

Population : Fixed ( Tested with 3 values)


18
Procedure SAGA

Begin
Initialize population p(k);
Define the crossover and mutation rate;
Do
{
Do
{
Calculate support of all k rules;
Calculate confidence of all k rules;
Obtain fitness;
Select individuals for crossover / mutation;
Calculate the average fitness of the n and (n-1) the generation;
Calculate the maximum fitness of the n and (n-1) the generation;
Based on the fitness of the selected item, calculate the new crossover
and mutation rate;
Choose the operation to be performed;
} k times;
}
Self Adaptive GA
SELF
ADAPTIVE
20
Dataset Traditional GA Self Adaptive GA
Accuracy No. of
Generations
Accuracy No. of Generations
Lenses 75 38 87.5 35
Haberman 52 36 68 28
Car
Evaluation
85 29 96 21
Dataset Traditional GA Self Adaptive GA
Accuracy No. of
Generations
Accuracy No. of
Generations
Lenses 50 35 87.5 35
Haberman 36 38 68 28
Car
Evaluation
74 36 96 21
ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE
ACCORDING TO TERMINTAION OF SAGA
ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE IDEAL
FOR TRADITIONAL GA
Results Analysis
Inferences

Better accuracy.
Better convergence.
Self Adaptive GA gives better accuracy than
Traditional GA.

22
Proposed Work
1. To implement a Distributive niched Pareto memetic
Algorithm for Rule Mining.

2. To propose a association rule mining algorithm based
on Chaotic PSO and swarm intelligence.

3. Propose a Particle swarm optimization rule mining
methodology combined with quantum computing and
quantum differential evolution
23
Obtains the comparison set S from clustering based samples.
For any two candidates and comparison set S, if one candidate is
dominated and the other not, the candidate non-dominated is
selected, Exit.
If two candidates (cd_1 and cd_2) compute the number of samples
in two niches, count1 and count2.
If count1=0, cd_1 is selected and if count2=0, cd_2 is selected, Exit.
If count1-count2>delta or count2-count1>delta, then selects
cd_2 or cd_1, Exit..
If abs(count1-count2)<delta, computing the standard deviation of
two niches,sd1 and sd2.
If sd1>sd2, cd_1 is selected, otherwise, cd_2 is selected.
Exit
Niched Pareto Selection Algorithm
24

Distributed Model
GA1
subpopulation
GA2
subpopulation

GA3
subpopulation

GA4
subpopulation

Full Dataset
Rules
Generated
Rules
Generated
Rules
Generated
Rules
Generated
Concept
Description
25
Based on
chaotic maps
Association Rule mining Algorithm based on Chaotic
PSO and Swarm intelligence.
Swarm Intelligence
Concept
Execution Plan
26
July : Niched Pareto Sampling based Selection.
Implementing GA for Local intensity Search.

August : Distributed Methodology Implementation.
Preparing the Above work as a paper.
September
& : Particle Swarm Optimization based
October Rule Mining to be implemented.

November : Chaotic PSO & Swarm intelligence based PSO
for Mining ARs to be implemented.
Documenting the same into paper.

December
& : Study on Quantum computing and
January differential Evolution concepts.
Papers Published
27
Paper titled Framework for Comparison of Association Rule
Mining Using Genetic Algorithm has been presented in the
International Conference On Computers, Communication &
Intelligence at VCET, 2010.
Paper titled Mining Association Rules Using Genetic
Algorithm: The role of Estimation Parameters has been
Selected for presentation in the International conference on
advances in computing and communications ,2011. To be
published in Springer LNCS (CCIS) series.
Paper titled Rule Acquisition in Data Mining Using a Self
Adaptive Genetic Algorithm has been Selected for
presentation in the First International conference on Computer
Science and Information Technology (CCSEIT-2011) , To be
published in Springer LNCS (CCIS) series.
References
Jing Li, Han Rui-feng, A Self-Adaptive Genetic Algorithm Based On Real-
Coded, International Conference on Biomedical Engineering and
computer Science , Page(s): 1 - 4 , 2010

Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, Linkage Discovery
through Data Mining, IEEE Magazine on Computational Intelligence,
Volume 5, February 2010.

Caises, Y., Leyva, E., Gonzalez, A., Perez, R., An extension of the Genetic
Iterative Approach for Learning Rule Subsets , 4th International Workshop
on Genetic and Evolutionary Fuzzy Systems, Page(s): 63 - 67 , 2010

Shangping Dai, Li Gao, Qiang Zhu, Changwu Zhu, A Novel Genetic
Algorithm Based on Image Databases for Mining Association Rules, 6th
IEEE/ACIS International Conference on Computer and Information Science,
Page(s): 977 980, 2007

Peregrin, A., Rodriguez, M.A., Efficient Distributed Genetic Algorithm for
Rule Extraction,. Eighth International Conference on Hybrid Intelligent
Systems, HIS '08. Page(s): 531 536, 2008

28
29
Mansoori, E.G., Zolghadri, M.J., Katebi, S.D., SGERD: A Steady-State
Genetic Algorithm for Extracting Fuzzy Classification Rules From
Data, IEEE Transactions on Fuzzy Systems, Volume: 16 , Issue: 4 ,
Page(s): 1061 1071, 2008..

Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, Genetic Algorithm Based on
Evolution Strategy and the Application in Data Mining, First
International Workshop on Education Technology and Computer Science,
ETCS '09, Volume: 1 , Page(s): 848 852, 2009

Hong Guo, Ya Zhou, An Algorithm for Mining Association Rules Based
on Improved Genetic Algorithm and its Application, 3rd International
Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s):
117 120, 2009

Genxiang Zhang, Haishan Chen, Immune Optimization Based Genetic
Algorithm for Incremental Association Rules Mining, International
Conference on Artificial Intelligence and Computational Intelligence, AICI
'09, Volume: 4, Page(s): 341 345, 2009
References Contd..
30
Maria J. Del Jesus, Jose A. Gamez, Pedro Gonzalez, Jose M. Puerta,
On the Discovery of Association Rules by means of Evolutionary
Algorithms, from Advanced Review of John Wiley & Sons , Inc. 2011
Junli Lu, Fan Yang, Momo Li, Lizhen Wang, Multi-objective Rule
Discovery Using the Improved Niched Pareto Genetic Algorithm,
Third International Conference on Measuring Technology and
Mechatronics Automation, 2011.
Hamid Reza Qodmanan, Mahdi Nasiri, Behrouz Minaei-Bidgoli,
Multi Objective Association Rule Mining with Genetic Algorithm
without specifying Minimum Support and Minimum Confidence,
Expert Systems with Applications 38 (2011) 288298.
Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient
Distributed Genetic Algorithm for Rule Extraction, Applied Soft
Computing 11 (2011) 733743.
J.H. Ang, K.C. Tan , A.A. Mamun, An Evolutionary Memetic
Algorithm for Rule Extraction, Expert Systems with Applications 37
(2010) 13021315.
References
R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm optimization
to association rule mining, Applied Soft Computing 11 (2011) 326336.
Bilal Alatas , Erhan Akin, Multi-objective rule mining using a chaotic
particle swarm optimization algorithm, Knowledge-Based Systems 22
(2009) 455460.
Mourad Ykhlef, A Quantum Swarm Evolutionary Algorithm for mining
association rules in large databases, Journal of King Saud University
Computer and Information Sciences (2011) 23, 16.
Haijun Su, Yupu Yang, Liang Zhao, Classification rule discovery with
DE/QDE algorithm, Expert Systems with Applications 37 (2010) 12161222.
Jing Li, Han Rui-feng, A Self-Adaptive Genetic Algorithm Based On Real-
Coded, International Conference on Biomedical Engineering and
computer Science , Page(s): 1 - 4 , 2010
Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, Linkage Discovery
through Data Mining, IEEE Magazine on Computational Intelligence,
Volume 5, February 2010.





31
References Contd..
32

Caises, Y., Leyva, E., Gonzalez, A., Perez, R., An extension of the
Genetic Iterative Approach for Learning Rule Subsets , 4th
International Workshop on Genetic and Evolutionary Fuzzy Systems,
Page(s): 63 - 67 , 2010
Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, Genetic Algorithm Based on
Evolution Strategy and the Application in Data Mining, First
International Workshop on Education Technology and Computer
Science, ETCS '09, Volume: 1 , Page(s): 848 852, 2009

References Contd..
33
References
Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient
Distributed Genetic Algorithm for Rule extraction, Applied Soft
Computing 11 (2011) 733743.

Hamid Reza Qodmanan , Mahdi Nasiri, Behrouz Minaei-Bidgoli,
Multi objective association rule mining with genetic algorithm
without specifying minimum support and minimum confidence,
Expert Systems with Applications 38 (2011) 288298.

Junli Lu, Fan Yang, Momo Li1, Lizhen Wang, Multi-objective Rule
Discovery Using the Improved Niched Pareto Genetic Algorithm, 2011
Third International Conference on Measuring Technology and
Mechatronics Automation.

Yan Chen, Shingo Mabu, Kotaro Hirasawa, Genetic relation algorithm
with guided mutation for the large-scale portfolio optimization,
Expert Systems with Applications 38 (2011) 33533363.

34
References
R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm
optimization to association rule mining, Applied Soft Computing 11
(2011) 326336

Yamina Mohamed Ben Ali, Soft Adaptive Particle Swarm Algorithm
for Large Scale Optimization, IEEE 2010.

Feng Lu, Yanfeng Ge, LiQun Gao, Self-adaptive Particle Swarm
Optimization Algorithm for Global Optimization, 2010 Sixth
International Conference on Natural Computation (ICNC 2010)

Fevrier Valdez, Patricia Melin, Oscar Castillo, An improved
evolutionary method with fuzzy logic for combining Particle Swarm
Optimization and Genetic Algorithms, Applied Soft Computing 11
(2011) 26252632

35

S-ar putea să vă placă și