Documente Academic
Documente Profesional
Documente Cultură
Rule mining
Goal: find the frequent itemset Some itemset, e.g.{a,b,c}, appears frequently, higher than certain support. Rules can be derived from the itemset: AB
{a,b,c} is frequent, then a,bc, abc,
Problem: hide sensitive association rules in data without losing non-sensitives. Motivations: large repositories of data contain confidential rules disclosed with serious adverse effects Traditional: Solutions fine-tuning, control the
Data modification
distortion blocking
Data reconstruction
Data modification
Data-Distortion is implemented by deleting or adding items to reduce the support or confidence of the sensitive rules. Data-blocking is implemented by replacing certain items with a question mark ? to make the support and confidence of the sensitive rules become uncertain. As traditional methods, data modification can operate simply. However, they suffer from the weakness of providing a way of fine-tuning the generation of the released database, which makes that they cannot control the hiding effects directly and obviously.
Data reconstruction
Different from data modification, is a new promising method in dealing with the association rule hiding problem. This approach is based on knowledge sanitization. Specifically, it hides the rules by sanitizing item set lattice called a knowledge base rather than sanitizing original dataset. Then a reconstruction procedure reconstructs a new released dataset from the sanitized item set lattice. In this way, data reconstruction gives user a knowledge level window to control the availability of rules handily and control the hiding effects directly.
Motivation
Data mining Data sharing
Privacy preserving
Problem statement
Given
a database D to be released minimum threshold MST, MCT a set of association rules R mined from D a set of sensitive rules Rh R to be hided
Related work
Data modification approaches
Basic idea: data sanitization D->D Current status:distortion,blocking, prosperous Drawbacks
Cannot control hiding effects intuitively, lots of I/O
D D
FS
2.Perform sanitization Algorithm -tree - based Inverse Frequent Set Mining 3.FP
D
FP-tree
FS
RRh