Documente Academic
Documente Profesional
Documente Cultură
US Secretary of Defence
The methodology is to first convert “unknown unkowns” into “known unknowns” and then
finally to “known knowns”.
What is Data Mining?: Slightly Informal
Data Mining is the exploratory data analysis with little or no human interaction
using computationally feasible techniques, i.e., the attempt to find interesting
structures/patterns unknown a priori
Cont…
Data collected much faster than it can be processed or managed. NASA Earth Observation
System (EOS), will alone, collect 15 Peta bytes by 2007 (15,000,000,000,000,000 bytes).
Much of which won't be used - ever!
Much of which won't be seen - ever!
Why not?
There's so much volume, usefulness of some of it will never be discovered
SOLUTION: Reduce the volume and/or raise the information content by structuring, querying,
filtering, summarizing, aggregating, mining...
Claude Shannon's info. theory
The TIME Magazine May 2000 issue has given a list of the ten hottest jobs of year 2025.
Data miners and knowledge engineers were at 5th and 6th position respectively.
Among the list of emerging technologies that will change the world, Data mining is at the 3rd
place.
Thus in view of the above facts, data miners have a long career in national as well as international
market as major companies both private and government are quickly adopting the technology
and many have already adopted.
How Data Mining is different?
Knowledge Discovery
--Overall process of discovering useful knowledge
Data Mining (Knowledge-driven exploration)
-- Query formulation problem.
-- Visualize and understand of a large data set.
-- Data growth rate too high to be handled manually.
Data Warehouses (Data-driven exploration):
-- Querying summaries of transactions, etc. Decision support
Traditional Database (Transactions):
-- Querying data in well-defined processes. Reliable storage
Data Mining Vs. Statistics
Formal statistical inference is assumption driven i.e. a hypothesis is formed and validated
against the data.
Data mining is discovery driven i.e. patterns and hypothesis are automatically extracted from
data.
Said another way, data mining is knowledge driven, while statistics is human driven.
Both resemble in exploratory data analysis, but statistics focuses on data sets far smaller than
used by data mining researchers.
Statistics is useful for verifying relationships among few parameters when the relationships
are linear.
Data mining builds much complex, predictive, nonlinear models which are used for predicting
behavior impacted by many factors.
What Can Data Mining Do
There are a number of data mining techniques and the selection of a particular technique is
highly application dependent, although other factors affect the selection process too.
Same as classification or estimation except records are classified according to some predicted
future behavior or estimated value.
◦ Using classification or estimation on a training example with known predicted values and historical data
a model is built.
◦ Then explain the known values, and use the model to predict future.
Example:
Predicting how much customers will spend during next 6 months.
MARKET BASKET ANALYSIS
Determining which things go together, e.g. items in a shopping cart at a super market.
◦ Used to identify cross-selling opportunities
◦ Design attractive packages or groupings of products and services or increasing price of some items etc.
CLUSTERING
Task of segmenting a heterogeneous population into a number of more homogenous sub-groups
or clusters.
Unlike classification, it does NOT depend on predefined classes.
It is up to you to determine what meaning, if any, to attached to resulting clusters.
It could be the first step to the market segmentation effort.
Read
Where does Data Mining fits in?
Main Types of Data Mining (Presentation
topic Will be)