Documente Academic
Documente Profesional
Documente Cultură
PROJECT REPORT
NAME - SAKSHAM KAPOOR
GUIDE/MENTOR Ms. BHAWANA SAINI
INTRODUCTION
There is a rapidly growing amount of available electronic information such as
online newspapers, journals, conference proceedings, Web sites, e-mails, etc.
Using all these electronic information, controlling, indexing or searching is
not feasible especially for human and also for search engines. The data *****
MOA (Massive On-line Analysis) is a framework for data stream mining. It
includes tools for evaluation and a collection of machine learning algorithms.
Related to the WEKA project, it is also written in Java, while scaling to more
demanding problems. The goal of MOA is a benchmark framework for
running experiments in the data stream mining context by proving
storable settings for data streams (real and synthetic) for repeatable
experiments
Literature Survey
Data Set*definition*
Data Streams
A data stream is an ordered sequence of points x1;:::;xn that must be
accessed in order and that can be read only once or a small number of times.
Each reading of the sequence is called a linear scan or a pass. The stream
model is motivated by emerging applications involving massive data sets; for
example, customer click streams, telephone records, large sets of web
pages, multimedia data, financial transactions, and observational science
data are better modeled as data streams. These data sets are far too large to
fit in main memory and are typically stored in secondary storage devices.
Data stream clustering is applied in applications that involve large amounts
of streaming data. For clustering, is a widely used heuristic but alternate
algorithms have also been developed such as k-medoids, CURE
For data streams, one of the first results appeared in 1980 but the model was
formalized in 1998.
Data clustering techniques are categorized on various different factors as
follows:1.
2.
3.
4.
5.
6.
7.
8.
Partitioning
Hierarchical
Density based
Grid based
Model based
Frequent pattern based
Constraint based
Link based
Clustering
WORKFLOW OF MOA
The workflow in MOA follows the simple schema depicted below: first a data
stream (feed, generator) is chosen and configured, second an algorithm (e.g.
a classifier) is chosen and its paramters are set, third the evaluation method
or measure is chosen and finally the results are obtained after running the
task.
MOA FRAMEWORK
Data
feed/Generat
or
Learning
Algorithm
Evalutaion
Method
RESULTS