Documente Academic
Documente Profesional
Documente Cultură
Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction
Partitioning Clustering
A partition of a set of n objects X {x1 , x2 ,..., xn } is a collection of K disjoint non - empty subsets P 1, P 2 ,..., P K of X (K n), often called clusters , satisfying the following conditions :
for all P (1) they are disjoint : P i P j 0 i and P j, i j (2) their union is X : P 1 P 2 ... P K X.
Denote the partition P {P 1, P 2 ,..., P K }, P i are called components of P
Each cluster must contain at least one object Each object must belong to exactly one group
3
Partitioning Clustering
What is a good partitioning clustering?
Key ideas: Objects in each group are similar and objects between different groups are dissimilar.
Hierarchical Clustering
Partition Q is nested into partition P if every component of Q is a subset of a component of P.
A hierarchical clustering is a sequence of partitions in which each partition is nested into the next partition in the sequence.
(This definition is for bottom-up hierarchical clustering. In case of top-down hierarchical clustering, next becomes previous).
5
Concept Hierarchy
Discovered Concepts
Attributes
8
1. Neural network representation 2. Feed-forward neural networks 3. Using back-propagation algorithm 4. Case-studies
10
Lecture 7
classification techniques.
Out-of-sample testing
Training data
Induction method
2/3
Historical Data (warehouse) Sampling method Sample data Sampling method Model
1/3
Testing data Error estimation
error
The quality of the test sample estimate is dependent on the number of test cases and the validity of the independent assumption
13
Cross Validation
Historical Data (warehouse) Sampling method Sample data Sampling method
iterate
Sample 1
Induction method
Sample 2
. . .
Model
Sample n
to be evaluated 2 A method 3 1
A data set
run on each 2 subsets as training data to find knowledge
Prerequisite
and Content
Introduction
to Lectures
and
Conclusion
This presentation summarizes the content and organization of lectures in module Knowledge Discovery and Data Mining
16