Sunteți pe pagina 1din 17

Association Rule

Mining
- MaxMiner

Mining Association Rules


in Large Databases

Association rule mining

Algorithms Apriori and FP-Growth

Max and closed patterns

Mining various kinds of association/correlation


rules

Max-patterns & Close-patterns


If there are frequent patterns with many
items, enumerating all of them is costly.
We may be interested in finding the
boundary frequent patterns.
Two types

Max-patterns
Frequent pattern {a1, , a100} (1001) +
(1002) + + (110000) = 2100-1 = 1.27*1030
frequent sub-patterns!
Max-pattern: frequent patterns without
proper frequent super pattern

BCDE, ACD are max-patterns


BCD is not a max-pattern

Min_sup=2

Tid

Items

10

A,B,C,D,E

20

B,C,D,E,

30

A,C,D,F

Maximal Frequent Itemset


An itemset is maximal frequent if none of its immediate supersets
is frequent
Maximal
Itemsets

Infrequent
Itemsets

Border

Closed Itemset

An itemset is closed if none of its immediate


supersets has the same support as the itemset
Itemset Support
{A,B,C}
2
{A,B,D}
3
{A,C,D}
2
{B,C,D}
3
{A,B,C,D}
2

Maximal vs Closed Itemsets


Transaction Ids

null

TID

Items

ABC

ABCD

BCE

ACDE

DE

124

123

12

124

AB

12

24

AC

ABC

ABD

ABE

AE

345
D

2
BC

BD

4
ACD

245

123

24

Not supported by
any transactions

AD

1234

4
ACE

BE

ADE

BCD

24

CD

BCE

4
ABCD

ABCE

ABDE

ABCDE

ACDE

BCDE

34

CE

BDE

45

DE

CDE

Maximal vs Closed Frequent


Itemsets

Minimum support = 2
124

123

12

124

AB

12
ABC

24

AC

AD

ABD

ABE

1234

AE

345

2
BC

BD

4
ACD

245

123

24

Closed but
not
maximal

null

4
ACE

BE

ADE

BCD

24

CD

BCE

Closed and
maximal
34

CE

BDE

45

DE

CDE

4
ABCD

ABCE

ABDE

ACDE

BCDE

# Closed = 9
# Maximal = 4

ABCDE

Maximal vs Closed Itemsets

MaxMiner: Mining Maxpatterns


Idea:
generate the complete set-

enumeration tree one level at a time, while


prune if applicable.
(ABCD)

A (BCD)

B (CD)

AB (CD) AC (D) AD () BC (D) BD ()


ABC (C) ABD () ACD ()
ABCD ()

BCD ()

C (D)

D ()

CD ()

Local Pruning Techniques (e.g.


at
node
A) of ABCD and AB, AC, AD.
Check
the frequency

If ABCD is frequent, prune the whole sub-tree.


If AC is NOT frequent, remove C from the
parenthesis before expanding.
(ABCD)
A (BCD)

B (CD)

AB (CD) AC (D) AD () BC (D) BD ()


ABC (C) ABD () ACD ()
ABCD ()

BCD ()

C (D)

D ()

CD ()

Algorithm MaxMiner
Initially, generate one node N= (ABCD) ,
where h(N)= and t(N)={A,B,C,D}.
Consider expanding N,

If h(N)t(N) is frequent, do not expand N.


If for some it(N), h(N){i} is NOT frequent,
remove i from t(N) before expanding N.

Apply global pruning techniques

Global Pruning Technique (across


sub-trees)
When a max pattern is identified (e.g. ABCD),
prune all nodes (e.g. B, C and D) where
h(N)t(N) is a sub-set of it (e.g. ABCD).
(ABCD)
A (BCD)

B (CD)

AB (CD) AC (D) AD () BC (D) BD ()


ABC (C) ABD () ACD ()
ABCD ()

BCD ()

C (D)

D ()

CD ()

Example
(ABCDEF)
A (BCDE)B (CDE) C (DE)
Items

Frequency

ABCDEF

D (E)

E ()

Tid

Items

10

A,B,C,D,E

20

B,C,D,E,

30

A,C,D,F

Min_sup=2
Max patterns:

Example
(ABCDEF)
A (BCDE)B (CDE) C (DE)
AC (D) AD ()

D (E)

E ()

Tid

Items

10

A,B,C,D,E

20

B,C,D,E,

30

A,C,D,F

Min_sup=2

Node A
Items

Frequency

ABCDE

AB

AC

AD

AE

Max patterns:

Example
(ABCDEF)
A (BCDE)B (CDE) C (DE)
AC (D) AD ()

D (E)

E ()

Items

Frequency

BCDE

BD
BE

Items

10

A,B,C,D,E

20

B,C,D,E,

30

A,C,D,F

Min_sup=2

Node B

BC

Tid

Max patterns:

BCDE

Example
(ABCDEF)
A (BCDE)B (CDE) C (DE)
AC (D) AD ()

D (E)

E ()

Tid

Items

10

A,B,C,D,E

20

B,C,D,E,

30

A,C,D,F

Min_sup=2

Node AC
Items

Frequency

ACD

Max patterns:

BCDE
ACD

S-ar putea să vă placă și