Sunteți pe pagina 1din 8

Association Rule Mining

Rule mining
Goal: find the frequent itemset Some itemset, e.g.{a,b,c}, appears frequently, higher than certain support. Rules can be derived from the itemset: AB
{a,b,c} is frequent, then a,bc, abc,

Metrics Support = # of occurrences of itemset/ total # of transactions


i.e., prob (A,B)

Confidence = # of occurrences of itemset/# of occurrences of left (of the rule)


I.e. the conditional prob: Pr (B|A)

Association Rule Hiding: what? why?? and how???

Problem: hide sensitive association rules in data without losing non-sensitives. Motivations: large repositories of data contain confidential rules disclosed with serious adverse effects Traditional: Solutions fine-tuning, control the

Data modification

hiding effects indirectly New promising:


knowledge sanitization, control effects directly
2

distortion blocking

Data reconstruction

Data modification

Data modification: These methods hide sensitive association rules


by directly modifying original data. According to different modification means, it can be further classified into the two subcategories: Data-Distortion techniques and Data-Blocking techniques.

Data-Distortion is implemented by deleting or adding items to reduce the support or confidence of the sensitive rules. Data-blocking is implemented by replacing certain items with a question mark ? to make the support and confidence of the sensitive rules become uncertain. As traditional methods, data modification can operate simply. However, they suffer from the weakness of providing a way of fine-tuning the generation of the released database, which makes that they cannot control the hiding effects directly and obviously.

Data reconstruction

Different from data modification, is a new promising method in dealing with the association rule hiding problem. This approach is based on knowledge sanitization. Specifically, it hides the rules by sanitizing item set lattice called a knowledge base rather than sanitizing original dataset. Then a reconstruction procedure reconstructs a new released dataset from the sanitized item set lattice. In this way, data reconstruction gives user a knowledge level window to control the availability of rules handily and control the hiding effects directly.

Motivation
Data mining Data sharing

Privacy Preserving Data mining (PPDM)

Privacy preserving

Two problems addressed in PPDM


the protection of private data the protection of sensitive rules (knowledge) contained in the data
5

Problem statement
Given
a database D to be released minimum threshold MST, MCT a set of association rules R mined from D a set of sensitive rules Rh R to be hided

Find a new database D such that


the rules in Rh cannot be mined from D the rules in R-Rh can still be mined as many as possible
KHD (Knowledge Hiding in Database) problem
6

Related work
Data modification approaches
Basic idea: data sanitization D->D Current status:distortion,blocking, prosperous Drawbacks
Cannot control hiding effects intuitively, lots of I/O

Data reconstruction approaches


Basic idea:knowledge sanitization D->K->D Current status:limited Advantages
Can easily control the availability of rules and control the hiding effects directly, intuitively, handily

Framework of our approach


1.Frequent Set Mining

D D

FS

2.Perform sanitization Algorithm -tree - based Inverse Frequent Set Mining 3.FP

D
FP-tree

FS

RRh

S-ar putea să vă placă și