Sunteți pe pagina 1din 30

The task is to learn to assign instances to

predefined classes.
Requires supervised learning: the training
data has to specify what we are trying to
learn (the classes)
Classifier is a mathematical function,
implemented by a classification
algorithm, that maps input data to a
category which performs classification



Two common approaches:
Probabilistic
Geometric
Useful for many search-related tasks
Spam detection
Sentiment classification
Online advertising

Decision trees
Nave Bayes classifier
Neural networks
Quadratic classifiers
Support vector machines
A tree structured prediction model where
each internal node denotes a test on an
attribute, each outgoing branch represents
an outcome of the test and each leaf node
is labeled with a class or class distribution.
Attribute to be predicted: dependent
variable
Attribute that help in predicting dependent
variable: independent variable
Figure below shows a decision tree with
tests on attributes X and Y:

Consider that the captain of a cricket
team has to decide whether to bat or
field first in the event that they win the
toss.
He decides to collect the statistic of the
last ten matches when the winning
captain has decided to bat first and
compare in order to decide what to do.
INDEPENDENT VARIABLES DEPENDENT
VARIABLE
Outlook Humidity No of batsmen
in team > 6
Final outcome
Sunny High Yes Won
Overcast High No Lost
Sunny Low No Lost
Sunny High No Won
Overcast Low Yes Lost
Sunny Low Yes Won
Sunny Low No Lost
Sunny High No Won
Sunny Low Yes Won
Sunny Low Yes Won
Dependent variable: game won or lost
Works on a simple, but comparatively
intuitive concept.
It makes use of the variables contained in
the data sample, by observing them
individually, independent of each other.
Based on the Bayes rule of conditional
probability. It makes use of all the attributes
contained in the data, and analyses them
individually as though they are equally
important and independent of each other.
Consider that the training data consists of
various animals (say elephants, monkeys
and giraffes), and our classifier has to
classify any new instance that it encounters.
We know that elephants have attributes like
they have a trunk, huge tusks, a short tail,
are extremely big, etc. Monkeys are short in
size, jump around a lot, and can climb
trees; whereas giraffes are tall, have a long
neck and short ears.
The Nave Bayes classifier will consider each of
these attributes separately when classifying a
new instance.
When checking to see if the new instance is an
elephant, the Nave Bayes classifier will not
check whether it has a trunk and has huge
tusks and is large. Rather, it will separately
check whether the new instance has a trunk,
whether it has tusks, whether it is large, etc. It
works under the assumption that one attribute
works independently of the other attributes
contained by the sample
The task is to learn a classification from
the data. No predefined classification is
required.
An unsupervised learning the training
data doesnt specify what we are trying
to learn (the clusters)
Clustering algorithms divide a data set
into natural groups (clusters).
Often use a distance measure for
dissimilarity
General outline of clustering algorithms
1. Decide how items will be represented (e.g.,
feature vectors)
2. Define similarity measure between pairs or
groups of items (e.g., cosine similarity)
3. Determine what makes a good clustering
4. Iteratively construct clusters that are
increasingly good
5. Stop after a local/global optimum clustering is
found
Steps 3 and 4 differ the most across
algorithms

Segment customer database based on
similar buying patterns
Group houses in a town into
neighborhood based on similar features
Identify similar Web usage patterns
Hierarchical Clustering
Has two versions:
Agglomerative (bottom up)
Divisive (top down)
Overlapping Clustering
Uses fuzzy sets to cluster data, so that each point
may belong to two or more clusters with different
degrees of membership.
Exclusive clustering
Data are grouped in exclusive way, so that a certain
datum belongs to only one definite cluster.
Eg: K-means clustering
Probabilistic Clustering
Uses a completely probabilistic approach.
Eg: Mixture of Gaussian
Hierarchy can be visualized as a
Dendogram - a tree data structure
which illustrates hierarchical clustering
techniques.
Each level shows clusters for that level
Leaf individual clusters
Root one cluster
A cluster at level i is the union of its
children clusters at level i+1
A D E B C F G
H
I
J
K
L
M
Divisive
Initially all items in one cluster
Large clusters are successively divided
Top Down
Agglomerative
Initially each item in its own cluster
Iteratively clusters are merged together
Bottom Up
How do we know how to divide or combined
clusters?
Define a division or combination cost
Perform the division or combination with the lowest
cost

F
A
C
E
B
G
D
F
A
C
E
B
G
D
F
A
C
E
B
G
D
F
A
C
E
B
G
D
F
A
C
E
B
G
D
F
A
C
E
B
G
D
F
A
C
E
B
G
D
F
A
C
E
B
G
D
Single Linkage
Smallest distance between points
Complete Linkage
Largest distance between points
Average Linkage
Average distance between points
Average Group Linkage
Distance between centroids
F
A
C
E
B
G
F
A
C
E
B
G
D
F
A
C
E
B
G
D
D
Single Linkage Complete
Linkage
Average
Linkage
Average Group
Linkage




One of the simplest unsupervised
learning algorithms that solves the
clustering problem.
K-means always maintains exactly K
clusters
Clusters represented as centroids (center of
mass)
The main idea is to define K centroids,
one for each cluster.

Basic algorithm:
Step 1: Choose K cluster centroids
Step 2: Assign points to closet centroid
Step 3: Recompute cluster centroids
Step 4: Goto 2
Tends to converge quickly
Can be sensitive to choice of initial
centroids