Documente Academic
Documente Profesional
Documente Cultură
Introduction
Data mining skills are in high demand as organizations
automatically.
which noise data and irrelevant data are removed from the
collection.
Data integration: at this stage, multiple data sources, often
heterogeneous, may be combined in a common source.
Data selection: at this step, the data relevant to the analysis
is decided on and retrieved from the data collection.
Data transformation: also known as data consolidation, it is
a phase in which the selected data is transformed into forms
appropriate for the mining procedure.
mining tasks.
Estimation
Prediction
Classification
Clustering
Association
Classification
There is a target categorical variable
Example:
income bracket: partitioned into 3 classes or categories:
high income, middle income, and low income
Classification
Classification - Steps
First : examine the data set containing both the predictor
Classification
For example, in the medical field, suppose that we are
interested in classifying
the type of drug a patient should be prescribed, based on
certain patient characteristics,
such as the age of the patient and the patients
sodium/potassium ratio. Figure 1.4 is
a scatter plot of patients sodium/potassium ratio against
patients ages for a sample
of 200 patients. The particular drug prescribed is
symbolized by the shade of the
points. Light gray points indicate drug Y; medium gray
points indicate drug A or X; dark gray points indicate drug
B or C.
sodium/potassium ratio?
Which drug should be prescribed for older patients with low
sodium/potassium ratios?
transaction is fraudulent
Assessing whether a mortgage application is a
good or bad credit risk
Diagnosing whether a particular disease is present
Determining whether a will was written by the
actual deceased, or fraudulently by someone else
Identifying whether or not certain financial or
personal behavior indicates a possible terrorist
threat
Estimation
Estimation is similar to classification except that
Examples of Estimation
Estimating the amount of money a randomly
Prediction
Prediction is similar to classification and estimation, except
Examples of Prediction
Predicting the price of a stock three months into
the future
Predicting the percentage increase in traffic deaths
next year if the speed limit is increased
Predicting the winner of this falls baseball World
Series, based on a comparison of team statistics
Predicting whether a particular molecule in drug
discovery will lead to a profitable new drug for a
pharmaceutical company
Examples of Prediction
Clustering
Clustering refers to the grouping of records, observations,
defined by zip code. One of the clustering mechanisms they use is the
PRIZM segmentation system, which describes every U.S. zip code area
in terms of distinct lifestyle types
Clustering Example
clusters for zip code 90210, Beverly Hills, California.
Segments are:
Number
Name
16
Bohemian Mix
07
03
01
Upper Crust
04
Young Digerati
Examples of Clustering
Group of residents according to zip codes.
For accounting auditing purposes, to segment
Association
Association
Another example, a particular supermarket may find
Examples of Association
Investigating the proportion of subscribers to a companys
Bohemian Mix
Midscale, Middle Age Mix
A collection of mobile urbanites, Bohemian Mix
represents the nation's most liberal lifestyles. Its
residents are an ethnically diverse, progressive mix of
young singles, couples, and families ranging from
students to professionals. In their funky row houses and
apartments, Bohemian Mixers are the early adopters
who are quick to check out the latest movie, nightclub,
laptop, and microbrew.
Young Digerati
Upscale, Younger Mix
Young Digerati are tech-savvy and live in fashionable
neighborhoods on the urban fringe. Affluent, highly
educated, and ethnically mixed, Young Digerati
communities are typically filled with trendy apartments
and condos, fitness clubs and clothing boutiques, casual
restaurants and all types of bars--from juice to coffee to
microbrew.