Documente Academic
Documente Profesional
Documente Cultură
Purpose
Obtain a reduced representation of the dataset that is much
smaller in volume, yet closely maintains the integrity of the
original data
Strategies
Data cube aggregation
Aggregation operations are applied to construct a data cube
Data compression
Data encoding or transformations are applied so as to obtain a reduced or
compressed representation of the original data
Numerosity reduction
Data are replaced or estimated by alternative, smaller data representations
(e.g. models)
Stepwise Selection
Decision Induction
Decision Tree
A mode in the form of a tree structure
Decision nodes
Each denotes a test on the corresponding attribute which is the best
attribute to partition data in terms of class distributions at the point
Each branch corresponds to an outcome of the test
Leaf nodes
Each denotes a class prediction
Data Compression
Purpose
Lossless Compression
Lossy Compression
Numerosity Reduction
Purpose
Parametric Methods
Non-Parametric Methods
Clustering
Sampling
Represent the original data by a much smaller sample (subset) of the data
6
Sampling
Each time a record is drawn from D, it is recorded and then placed back to D, so
it may be drawn more than once
Cluster Sampling
Draw s of the N records from dataset D (s<N), with no record can be drawn
more than once
Records in D are first divided into groups or clusters, and a random sample of
these clusters is then selected (all records in the selected clusters are included in
the sample)
Stratified Sampling
Records in D are divided into subgroups (or strata), and random sampling
techniques are then used to select sample members from each stratum
Raw Data
Stratified Sampling
Cluster Sampling
Principal Component
Analysis
Principal Component
Overview
Analysis
Eigenvalues
and Eigenvectors - HMC Calculus Tut
orial.htm
x1
x2
x
3
x
9
X1,x2,x
d
At slid
otoin
2+10=86
%