Sunteți pe pagina 1din 65

Lecture 11: Clustering Introduction and Projects Machine Learning

Andrew Rosenberg

March 12, 2010

1/1

Last Time

Junction Tree Algorithm


Ecient Marginals in Graphical Models

2/1

Today

Clustering Project Details

3/1

Clustering

Clustering Clustering is an unsupervised Machine Learning application The task is to group similar entities into groups.

4/1

We do this all the time

5/1

We do this all the time

6/1

We do this all the time

7/1

We can do this in many dimensions

8/1

We can do this to many degrees

9/1

We can do this in many dimensions

10 / 1

We do this all the time

11 / 1

How do we set this up computationally?

In Machine Learning, we optimize objective functions to nd the best solution. Maximum Likelihood (for Frequentists) Maximum A Posteriori (for Bayesians) Empirical Risk Minimization Loss function Minimization What makes a good cluster? How do we dene loss or likelihood in a clustering solution?

12 / 1

Cluster Evaluation

Intrinsic Evaluation
Evaluate the compactness of the clusters

Extrinsic Evaluation
Compare the results to some gold standard labeled data. (Not covered today)

13 / 1

Intrinsic Evaluation

Intercluster Variability (IV)


How dierent are the data points within the same cluster

Extracluster Variability (EV)


How dierent are the data points that are in distinct clusters

Minimize IV while maximizing EV . IV Minimize EV

IV =
C xC

d(x, c)

d(x, c) = ||x c||

14 / 1

Degenerate Clustering Solutions


One Cluster

15 / 1

Degenerate Clustering Solutions


N Clusters

16 / 1

Clustering Approaches

Hierarchical Clustering Partitional Clustering

17 / 1

Hierarchical Clustering
Recursive Partitioning

18 / 1

Hierarchical Clustering
Recursive Partitioning

19 / 1

Hierarchical Clustering
Recursive Partitioning

20 / 1

Hierarchical Clustering
Recursive Partitioning

21 / 1

Hierarchical Clustering
Recursive Partitioning

22 / 1

Hierarchical Clustering
Agglomerative Clustering

23 / 1

Hierarchical Clustering
Agglomerative Clustering

24 / 1

Hierarchical Clustering
Agglomerative Clustering

25 / 1

Hierarchical Clustering
Agglomerative Clustering

26 / 1

Hierarchical Clustering
Agglomerative Clustering

27 / 1

Hierarchical Clustering
Agglomerative Clustering

28 / 1

Hierarchical Clustering
Agglomerative Clustering

29 / 1

Hierarchical Clustering
Agglomerative Clustering

30 / 1

Hierarchical Clustering
Agglomerative Clustering

31 / 1

Hierarchical Clustering
Agglomerative Clustering

32 / 1

Hierarchical Clustering
Agglomerative Clustering

33 / 1

K-Means Clustering

K-Means clustering is a Partitional Clustering Algorithm. Identify dierent partitions of the space for a xed number of clusters Input: a value for K the number of clusters. Output: the K centers of clusters centroids

34 / 1

K-Means Clustering

35 / 1

K-Means Clustering
Algorithm: Given an integer K specifying the number of clusters. Initialize K cluster centroids
Select K points from the data set at random Select K points from the space at random

For each point in the data set, assign it to the cluster whose center it is closest to.
argminCi d(x, Ci )

Update the centroid based on the points assigned to the cluster.


ci =
1 |Ci | xCi

If any data point has changed clusters, repeat.

36 / 1

Why does K-Means Work?

When an assignment is changed, the sum of squared distances of the data point to its assigned cluster is reduced.
IV is reduced.

When a cluster centroid is moved the sum of squared distances of the data points within that cluster is reduced
IV is reduced.

At convergence we have found a local minimum of IV

37 / 1

K-Means Clustering

38 / 1

K-Means Clustering

39 / 1

K-Means Clustering

40 / 1

K-Means Clustering

41 / 1

K-Means Clustering

42 / 1

K-Means Clustering

43 / 1

K-Means Clustering

44 / 1

K-Means Clustering

45 / 1

K-Means Clustering

46 / 1

K-Means Clustering

47 / 1

K-Means Clustering

48 / 1

Soft K-Means
In K-means, we forced every data point to be the member of exactly one cluster. We can relax this constraint. p(x, Ci ) = p(x, Ci ) =
Based on minimizing entropy of cluster assignment.

d(x, ci ) j d(x, cj )

exp{d(x, ci )} j exp{d(x, cj )}

We still dene a cluster by a centroid, but we calculate the centroid as a weighted center of all the data points. ci =
x

x p(x, Ci ) x p(x, Ci )

Convergence is based on a stopping threshold rather than changing assignments.


49 / 1

Potential Problems with K-Means

Optimal? K-means approaches a local minimum, but this is not guaranteed to be globally optimal. Could you design an approach which is globally optimal? Consistent? Dierent starting clusters can lead to dierent cluster solutions

50 / 1

Potential Problems with K-Means

Optimal? K-means approaches a local minimum, but this is not guaranteed to be globally optimal. Could you design an approach which is globally optimal? Sure, in NP. Consistent? Dierent starting clusters can lead to dierent cluster solutions

51 / 1

Suboptimality in K-Means

52 / 1

Inconsistency in K-Means

53 / 1

Inconsistency in K-Means

54 / 1

Inconsistency in K-Means

55 / 1

Inconsistency in K-Means

56 / 1

More Clustering

K-Nearest Neighbors Gaussian Mixture Models Spectral Clustering We will return to these.

57 / 1

The Project

Research Paper Project

58 / 1

Research Paper

8-10 pages Reporting on work in 4-5 papers. Scope:


One application area or One technique

59 / 1

Research Paper: Application Areas


Identify an application that has made use of machine learning and discuss how.
Graphics Object Recognition Optical Character Recognition Superresolution Segmentation Natural Language Processing Parsing Sentiment Analysis Information Extraction Speech Recognition Synthesis Discourse Analysis Intonation
60 / 1

Game Playing Scrabble Craps Prisoners Dilemma Financials Stock Prediction Review Systems Amazon Netix Facebook

Research Paper: Techniques


Identify a machine learning technique. Describe its use and variants.
L1-regularization Non-linear Kernels Loopy Belief Propagation Non-parametric Belief Propagation Soft-Decision Trees Analysis of Neural Network Hidden Layers Structured Learning Generalized Expectation Evaluation Measures Cluster Evaluation Semi-supervised Evaluation Graph Embedding Dimensionality Reduction Feature Selection Graphical Model Construction Non-parametric Bayesian Methods Latent Dirichlet Allocation

61 / 1

Project

Run a Machine Learning Experiment


Identify a problem/task data set. Implement one or more ML algorithm Evaluate the approach.

Write a Report of the Experiment


4 pages including references. Abstract 1 paragraph summarizing the experiment Introduction describe the Problem Data Describe the data set, features extracted, etc. Method Describe the algorithm/approach Results Present and discuss results Conclusion Summarize the experiment and results.

62 / 1

Project Ideas: Tasks


Projects can take any combination of Tasks and Approaches Graphics
Object Classication Facial Recognition Fingerprint Identication Optical Character Recognition Handwriting recognition
(for languages/character systems other than English...)

Language
Topic Classication Sentiment Analysis Speech Recognition Speaker Identication Punctuation Restoration Semantic Segmentation Recognition of Emotion, Sarcasm, etc. SMS Text normalization Chat participant identication Twitter classication/threading
63 / 1

Games
Chess Checkers Poker (Poker Academy Pro) Blackjack

Recommenders (Collaborative Filtering)


Netix Courses Jokes Books Facebook?

Video Classication
Motion classication Segmentation

64 / 1

Bye

Next
Hidden Markov Models Viterbi Decoding

65 / 1

S-ar putea să vă placă și