Sunteți pe pagina 1din 95

SE-402: Lecture 2

Machine Learning: Supervised


Learning
09-09-2013
Announcements

Done with AI Introduction


Machine Learning
Todays lecture will be FUN
Fun
AI Marathon
Motivating Example
SE-402

AI and Machine
Applications Learning
and
Applicati
Agents
ons
Where We Are

AI and Machine
Applications Learning
and
Applicati
Agents
ons
What is Machine Learning?
Volunteer
ML: The study and design of intelligent
agents
That make decisions based on data.
The goal of machine learning is to
build computer systems that can
adapt and learn from their
experience.
Tom Dietterich
.
AI: The study and design of intelligent
agents
A Generic System
x1 y1
x2 y2
System


xN h1 , h2 ,..., hK
yM

Input Variables: x x1 , x2 ,..., xN


Hidden Variables: h h1 , h2 ,..., hK
Output Variables: y y1 , y2 ,..., yK
Machine Learning 15
Another definition
Machine Learning algorithms discover the
relationships between the variables of a
system (input, output and hidden) from
direct samples of the system

These algorithms originate form many


fields:
Statistics, mathematics, theoretical computer
science, physics, neuroscience, etc

Machine Learning 16
Past When are ML algorithms NOT
needed?

When the relationships between all


system variables (input, output, and
hidden) is completely understood!

This is NOT the case for almost any


real system!

Machine Learning 17
Past Observations

Example Left Hand Right $


# Hand
0 Up Up 0
1 Up Down 2
2 Down Up 4
3 Up Down 2
4 Down Down 0
5 Down Up 4
6 Up Up 0
Most interesting thing ever?
No
Exists Alongside Others

Most interesting thing ever?


No
Not New
Full Focus

Most interesting thing ever?


No
Machine Learning

Learnin
g
Machine Learning

Supervised
Learning

Learnin
g
Machine Learning

Unsupervise Supervised
d Learning Learning

Learnin
g
Machine Learning

Unsupervise Supervised
d Learning Learning

Learnin
g

Feature
Learning
Deep Learning
Learn by Doing

c
Theory will Help

c
Carpentry
Background on supervised task

Dive into a particular three


algorithms
Nearest neighbor
Regression
SVM

Talk about carpentry


Background on supervised task

Dive into a particular three


algorithms
Nearest neighbor
Regression
SVM

Talk about carpentry


Machine Learning

Unsupervise Supervised
d Learning Learning
Today!
Learnin
g

Feature
Learning
Supervised Learning

Given labeled data. Predict


output. (Learning with a
teacher)
Already have one algorithm

Add three more

Carpentry of Supervised
Learning
What does Data Look
Like?
Data

M observations :
For each observation (i) we have x(i) and y(i)
Supervised Learning

Training
Set

Learning
Algorith
m

h
Supervised Learning

Training
Set

Learning
Algorith
m

x h
Supervised Learning

Training
Set

Learning
Algorith
m

x h predicted
y
An Example Application
An emergency room in a hospital measures 17
variables (e.g., blood pressure, age, etc) of
newly admitted patients.
A decision is needed: whether to put a new
patient in an intensive-care unit.
Due to the high cost of ICU, those patients
who may survive less than a month are given
higher priority.
Problem: to predict high-risk patients and
discriminate them from low-risk patients.

CS583, Bing Liu, UIC 40


Applications: Example 2
A credit card company receives thousands of
applications for new cards. Each application
contains information about an applicant,
age
Marital status
annual salary
outstanding debts
credit rating
etc.
Problem: to decide whether an application
should approved, or to classify applications into
two categories, approved and not approved.

CS583, Bing Liu, UIC 41


Learning the Target
Function
Like human learning from past experiences.
A computer does not have experiences.
A computer system learns from data, which
represent some past experiences of an
application domain.
Our focus: learn a target function that can be
used to predict the values of a discrete class
attribute, e.g., approve or not-approved, and
high-risk or low risk.
The task is commonly called: Supervised
learning, classification, or inductive learning.
CS583, Bing Liu, UIC 42
The Data and goal
Data: A set of data records (also called
examples, instances or cases) described
by
k attributes: A1, A2, Ak.
a class: Each example is labelled with a pre-
defined class.
Goal: To learn a classification model from
the data that can be used to predict the
classes of new (future, or test)
cases/instances.

CS583, Bing Liu, UIC 43


Unintentional

Hard working is the key to


success!
Loan application
example Approved or not

CS583, Bing Liu, UIC 45


The learning task
Learn a classification model from the data
Use the model to classify future loan
applications into
Yes (approved) and
No (not approved)
What is the class for following case/instance?

CS583, Bing Liu, UIC 46


Supervised vs.
Unsupervised
Supervised learning: classification is seen
as supervised learning from examples.
Supervision: The data (observations,
measurements, etc.) are labeled with pre-
defined classes. It is like that a teacher
gives the classes (supervision).
Test data are classified into these classes too.
Unsupervised learning (clustering)
Class labels of the data are unknown
Given a set of data, the task is to establish the
existence of classes or clusters in the data

CS583, Bing Liu, UIC 47


Break

https://www.youtube.com/watch?v=
mq0D5kq9ePE
Steps in Supervised
Learning
Learning (training): Learn a model using the training data
Testing: Test the model using unseen test data to assess the model accuracy

Number of correct classifications


Accuracy ,
Total number of test cases

CS583, Bing Liu, UIC 49


What is Learning?
Given
a data set D,
a task T, and
a performance measure M,
a computer system is said to learn
from D to perform the task T if after
learning the systems performance on T
improves as measured by M.
In other words, the learned model helps
the system to perform T better as
compared to no learning.
CS583, Bing Liu, UIC 50
Example: Learning
Data: Loan application data
Task: Predict whether a loan should be
approved or not.
Performance measure: accuracy.

No learning: classify all future applications


(test data) to the majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
We can do better than 60% with learning.

CS583, Bing Liu, UIC 51


Assumption in learning
Assumption: The distribution of training
examples is identical to the distribution of
test examples (including future unseen
examples).

In practice, this assumption is often violated


to certain degree.
Strong violations will clearly result in poor
classification accuracy.
To achieve good accuracy on the test data,
training examples must be sufficiently
representative of the test data.
CS583, Bing Liu, UIC 52
Discriminative

Training
Set

Learning
Algorith
m

x h predicted
y
Discriminative

1.Nearest Neighbors

2.Linear Regression

3.SVMs
Background on supervised task

Dive into a particular three


algorithms
Nearest neighbor
Regression
SVM

Talk about carpentry


K Nearest Neighbors
Curse of Dimensionality
Linear Regression

Support Vector Machine


Why Machine?
Vladimir Vapnik
Support Vector Machine

Theoretical guarantees
on performance

Correlation between
features doesnt throw it
of
Less prone to curse of
dimensionality
Live Example
Background on supervised task

Dive into a particular three


algorithms
Nearest neighbor
Regression
SVM

Talk about carpentry


Background on supervised task

Dive into a particular three


algorithms
Nearest neighbor
Regression
SVM

Talk about carpentry


Carpentry
LibSVM
Cross Validation

Train Set Test Set


N Fold Cross Validation

Train Train Train Test Train


Set Set Set Set Set
Metrics

Accuracy
F1
Precision
Recall
AUC (Area Under the Curve)
ROC (Receiver Operating
Characteristic)
Efficiency (time, memory)
AUC
Overfitting?
Overfitting?
AUC (how good)

Training Set

Testing Set

Model Complexity
Generative Vs
Discriminative?

Generative tells a better story

In general discriminative is more accurate.

Generative outperforms with use of an expert.

Bayes nets are going


to come back for
unsupervised learning
Generative Vs Discriminative: An
analogy?

The task is to determine the language


that someone is speaking

Generative approach:
is to learn each language and determine
as to
which language the speech belongs to
Discriminative approach:
is determine the linguistic diferences
without
learning any language a much easier
task!
ML Taxonomy
Generative Methods
Model class-conditional pdfs (probability distribution function) and
prior probabilities
Generative since sampling can generate synthetic data points
Popular models
Gaussians, Nave Bayes, Mixtures of multinomials
Mixtures of Gaussians, Mixtures of experts, Hidden Markov
Models (HMM)
Sigmoidal belief networks, Bayesian networks, Markov random
fields
Discriminative Methods
Directly estimate posterior probabilities
No attempt to model underlying probability distributions
Focus computational resources on given task better performance
Popular models
Logistic regression, SVMs
Traditional neural networks, Nearest neighbor
Conditional Random Fields (CRF)
Generative Models
Examples
Nearest neighbor Neural networks

106 examples

Shakhnarovich, Viola, Darrell 2003 LeCun, Bottou, Bengio, Haffner 1998


Berg, Berg, Malik 2005 Rowley, Baluja, Kanade 1998

Support Vector Machines and Kernels Conditional Random Fields

Guyon, Vapnik McCallum, Freitag, Pereira 2000


Heisele, Serre, Poggio, 2001 Kumar, Hebert 2003

Further difference

A generative algorithm models how the


data was generated in order to
categorize a signal.
It asks itself a question: based on my
generation assumptions,
which category is most likely to
generate this signal?

A discriminative algorithm does not care


about how the data was generated, it
simply categorizes a given signal.
Nave Bayes vs SVM
[suspense]
87.8 vs 85.9 AUC
Ensemble Method?
Motivating Example
Good Features
Good Features
Good Features
Good Features
Detect Multiple Faces?
Sliding Window
Sliding Window
No Face
No Face
No Face
Maybe Face
Face!
The End?

S-ar putea să vă placă și