Documente Academic
Documente Profesional
Documente Cultură
Data Mining
Knowledge-Discovery in Databases (KDD)
Searching large volumes of data for patterns.
The nontrivial extraction of implicit, previously
known, and potentially useful information from
data.
The science of extracting useful information
from large data sets or databases.
Uses computational techniques from statistics,
machine learning, and pattern recognition.
Descriptive Statistics
Collect data
Classify data
Summarize data
present data
Make inferences to draw a conclusions
--Point and interval estimation
--Hypothesis testing
--Prediction
Machine Learning
Concerned with the development of
techniques which allow computers to
"learn".
Concerned with the algorithmic
complexity of computational
implementations.
Many inference problems turn out to be
NP-hard or harder .
Pattern Recognition
The act of taking in raw data and taking an
action based on the category of the data.
Aims to classify data patterns based on prior
knowledge or on statistical info.
Based on availability of training set:
supervised and unsupervised leanings
Two approaches: statistical (decision theory)
and syntactic (structural).
Supervised Techniques
Classification:
-- k-Nearest Neighbors
--Nave Bayes
--Classification Trees
--Descriminant Analysis
--Logistic Regression
--Neural Nets
Supervised Techniques
Prediction (Estimation):
--Regression
--Regression Trees
--k-Nearest Neighbors
Unsupervised Techniques
Cluster Analysis
Principle Components
Association Rules
Collaborative Filtering
2.
3.
References
Java Data Mining Specification
http://www.jcp.org/en/jsr/detail?id=73
Mine Your Own Data with the JDM
API, Frank Sommers, July 7, 2005
http://www.artima.com/lejava/articles/da
ta_mining.html
http://www.stanford.edu/class/cs345a
/#handouts