Documente Academic
Documente Profesional
Documente Cultură
Topics Covered
Methodologies for Stream Data Processing
Association
Tilted Time Frame
Critical Layers
Lossy Counting Algorithm
Hoeffding Tree Algorithm
VFDT (Very Fast Decision Tree learner)
Categories of Time-Series Movements
Estimation of Trend Curve
Similarity Search in Time-Series Analysis
7/11/15
Random Sampling
Sliding Windows
Histograms
Multiresolution Methods
Sketches
Randomized Algorithms
31 days
2 4 h o u rs
4 q trs
tim e
8t
4t
2t
T im e
Critical Layers
Window 2
Window 3
Empty
First Window
Next Window
=N
= 1/
Output:
Approximation guarantees
15
Hoeffding Bound
Independent of the probability distribution
generating the observations
A real-valued random variable r whose
range is R
n independent observations of r with mean
r
Hoeffding bound states that P(r r - ) =
1 - , where r is the
is a
R 2 true
ln(1 / mean,
)
16
Data Stream
no
Protocol = http
Packets > 10
yes
Data Stream
no
Bytes >
60K
yes
Protocol = ftp
7/11/15
Protocol = http
19
Incremental
Make class predictions in parallel
New examples are added as they come
Weakness
20
Categories of Time-Series
Movements
Trend or Long-term or movements
General direction in which a time series is moving over a long
interval of time
Least-square method
Find the curve minimizing the sum of
the squares of the deviation of points on
the curve from the corresponding data
points
Moving Average
Moving average of order n
movements
Loses the data at the beginning or end
of a series
Typical Applications
Financial market
Market basket data analysis
Scientific databases
Medical diagnosis
Data
Transformation
Many techniques for signal analysis
require the data to be in the
frequency domain
Reduction techniques
discrete Fourier transform (DFT)
discrete wavelet transform (DWT)
Subsequence
Matching
Break each sequence into a set
of pieces of window with length
w
Extract the features of the
subsequence inside the
window
Map each sequence to a trail
in the feature space
Divide the trail of each
sequence into subtrails and
represent each of them with
minimum bounding rectangle
Use a multi-piece assembly
algorithm to search for longer
Reference
Chapter 6, Data Mining Concepts and
Techniques, Third Edition. By Jiawei Han,
Micheline Kamber and Jian Pei.
7/11/15
30
Thank You
7/11/15
31