Sunteți pe pagina 1din 35

Bayesian Learning

CSL465/603 - Fall 2016


Narayanan C Krishnan
ckn@iitrpr.ac.in

Outline
Bayes Theorem
MAP Learners
Bayes optimal classifier
Nave Bayes classifier
Example text classification
Bayesian networks
EM algorithm

Bayesian Learning

CSL465/603 - Machine Learning

Features Bayesian Learning


Practical learning algorithms
Nave Bayes learning
Bayesian network learning
Combine prior knowledge with observations
Require prior probabilities

Useful conceptual framework


gold standard for evaluating other classifiers
Tools for analysis

Bayesian Learning

CSL465/603 - Machine Learning

Bayes Theorem
If and are two random variables
()
=
()

In the context of classifier hypothesis h and training


data
h (h)
h =

()
(h) prior probability of hypothesis h
() prior probability of training data
h| probability of h given
|h probability of given h

Bayesian Learning

CSL465/603 - Machine Learning

Choosing the Hypotheses


Given the training data, we are interested in the
most probable hypothesis
Maximum a posteriori hypothesis - h+,h+,- argmax46 h
h (h)
argmax46
()
argmax46 h h
If every hypothesis is equally probable, h7 =
h8 , h7 , h8 , then we can simplify it to
Maximum likelihood (ML) hypothesis - h+<
h+< = argmax4=6 |h7
Bayesian Learning

CSL465/603 - Machine Learning

Example
Does the patient have cancer or not?
A patient takes a lab test and the result comes back
positive. The test returns a correct positive result in only
98% of the cases in which the disease is actually present,
and a correct negative result in only 97% of the cases in
which the disease is no present. Furthermore, 0.008 of
the entire population have this cancer.

Bayesian Learning

CSL465/603 - Machine Learning

+| =

| =

+| =

| =

| + =

Bayesian Learning

CSL465/603 - Machine Learning

Brute-Force MAP Hypothesis


Learner (1)
If we are given = < H , H , , K , K >,
examples and the class labels,
For each hypothesis h , calculate the posterior
probability
h (h)
h =
()
Output the hypothesis +,- that has the highest posterior
probability
h+,- = argmax 46 h

Bayesian Learning

CSL465/603 - Machine Learning

Brute-Force MAP Hypothesis


Learner (2)
If we are given = < H , H , , K , K >,
examples and the class labels, choose (|h)
(|h) = 1 if h is consistent with
(|h) = 0 otherwise

Choose (h) to be uniform distribution


h =

H
6

Then h =

Bayesian Learning

-(4)

-(P)

CSL465/603 - Machine Learning

Bayesian Learning

CSL465/603 - Machine Learning

10

Brute-Force MAP Hypothesis


Learner (3)
If we are given = < H , H , , K , K >,
examples and the class labels, choose (|h)
(|h) = 1 if h is consistent with
(|h) = 0 otherwise

Choose (h) to be uniform distribution


h =

H
6

Then
H

h =Q

Bayesian Learning

RS,T

, if h is consistent with

0 , otherwise
CSL465/603 - Machine Learning

11

P h|H

Bayesian Learning

P h|H, _

Evolution of Posterior
Probabilities

CSL465/603 - Machine Learning

12

Classifying new instances


Given a new instance x, what is the most probable
classification?
One solution h+,- (x)

But can we do better?


Consider the following example containing three
hypotheses:
hH = 0.4, h_ = 0.3, hc = 0.3
Given a new instance x,
hH x = +, h_ x = , hc x =
What is the most probable classification for x
Bayesian Learning

CSL465/603 - Machine Learning

13

Bayes Optimal Classifier (1)


Combine the prediction of all hypotheses weighted
by their posterior probabilities
Bayes optimal classification
argmaxd f y h7 (h7 |)
4= 6

Example
hH = .4, hH = 0, + hH = 1
h_ = .3, h_ = 1, + h_ = 0
hc = .3, hc = 1, + hc = 0
f + h7 (h7 |) =

f h7 (h7 |) =

4= 6

4= 6

Bayesian Learning

CSL465/603 - Machine Learning

14

Bayes Optimal Classifier (2)


Optimal in the sense
No other classification method using the same hypothesis
space and same prior knowledge can outperform this
method on average.

Method maximizes the probability that the new


instance is classified correctly, given the available
data, hypothesis space and prior probabilities over
the hypothesis.
But it is inefficient
Compute posterior probability for every hypothesis and
combine the predictions of each hypothesis.
Bayesian Learning

CSL465/603 - Machine Learning

15

Gibbs Classifier
Gibbs Algorithm
Choose a hypothesis h H at random, according to the
posterior probability distribution over
Use h to classify the new instance x.

Observation Assume target concepts are drawn at


random from according to the priors on , Then
E l7mmn 2E qrstnuvw7xry
Haussler et al., ML 1994

Bayesian Learning

CSL465/603 - Machine Learning

16

Nave Bayes Classifier (1)


Bayes rule, slightly different application
Let = {H , _ , { } be the different class labels.
The label for ~ instance

x ( )
x =
(x)
x - posterior probability that instance x belongs to
class
x - probability that an instance drawn from class
would be x (likelihood)
( ) probability of class (prior)
(x) probability of instance x (evidence)
Bayesian Learning

CSL465/603 - Machine Learning

17

Nave Bayes Classifier (2)


Classify instance x as class with maximum
posterior probability
= argmax ( |x)

Ignore the denominator (since we are only


interested in the maximum)
= argmax x ( )

If the prior is uniform


= argmax x

Bayesian Learning

CSL465/603 - Machine Learning

18

Nave Bayes Classifier (3)


Look at the classifier
= argmax x

What is each instance x?


A dimensional tuple (H, , )

Estimate the joint probability


distribution H ,
Practical issue- need to know the probability of every
possible instance given every possible class.
With Boolean features and classes K2 probability
values!!!

Bayesian Learning

CSL465/603 - Machine Learning

19

Nave Bayes Classifier (4)


Make the nave Bayes assumption
features/attributes are conditionally independent
given the target attribute (class label)

H , =
H

This results in the nave Bayes classifier (NBC)!

= argmax ( )
H

Bayesian Learning

CSL465/603 - Machine Learning

20

NBC Practical Issues (1)


Estimating probabilities from I
Prior probabilities
| x7 , : = |
=
||
If the features are discrete
| x7 , : = = |
= =
| x7 , : = |

Bayesian Learning

CSL465/603 - Machine Learning

21

NBC Practical Issues (2)


If the features are continuous?
Assume some parameterized distribution for , e.g.,
Normal
Learn parameters of distribution from data, e.g., mean and
variance of values
Determine the parameters that maximize the likelihood.
~ (, _ ), and _ are unknown

Bayesian Learning

CSL465/603 - Machine Learning

22

Maximum Likelihood Estimate

Bayesian Learning

CSL465/603 - Machine Learning

23

Bayesian Learning

CSL465/603 - Machine Learning

24

NBC Practical Issues (3)


If the features are continuous?
Assume some parameterized distribution for , e.g.,
Normal
Learn parameters of distribution from data, e.g., mean and
variance of values
Determine the parameters that maximize the likelihood.

Discretize the feature


E.g., price to price , ,

Bayesian Learning

CSL465/603 - Machine Learning

25

NBC Practical Issues (4)


If there are no examples in class for which =
= = 0

= 0
H

Use m-estimate defined as follows


x7 , : = = +
= =
x7 , : = +
Prior estimate of the probability
Equivalent sample size (how heavily to weight
relative to the observed data)

Bayesian Learning

CSL465/603 - Machine Learning

26

Example Learn to Classify Text


Problem Definition
Given a set of news articles that are of interest, we would
to like to learn to classify the articles by topic.

Nave Bayes is among the most effective algorithms


to perform this task.
What will be attributes to represent the documents?
Vector of words one attribute per word position in the
document

What is the Target concept


Is the document interesting?
Topic of the document
Bayesian Learning

CSL465/603 - Machine Learning

27

Algorithm Learn Nave Bayes


Collect all words and tokens that occur in the
Examples ()
Vocabulary all distinct words and tokens in

Compute probabilities and

Examples for which the target label is

|P |
=
|P|

total number of words in (counting duplicates


multiple times)
For each work in Vocabulary
= number of times word occurs in
| =
Bayesian Learning

|R d|
CSL465/603 - Machine Learning

28

Algorithm Classify Nave Bayes


Given a test instance
Compute the frequency of occurrence in the test instance
of each term in the vocabulary

Apply nave Bayes classification rule!

Bayesian Learning

CSL465/603 - Machine Learning

29

Example: 20 Newsgroup
Given 1000 training documents from each group
Learn to classify new documents according to the
newsgroup it came from
NBC 89% accuracy

Bayesian Learning

CSL465/603 - Machine Learning

30

Bayesian Network (1)


Nave Bayes assumption of conditional
independence is too restrictive.
The problem is intractable without some conditional
independent assumption
Bayesian networks describe conditional
independence among subsets of variables.
Allows for combining prior knowledge about (in)
dependencies among variables with training data
Recollect Conditional Independence

Bayesian Learning

CSL465/603 - Machine Learning

31

Bayesian Network - Example


Storm

BusTourGroup

S,B
Lightning

Campfire

S,B S,B S,B

0.4

0.1

0.8

0.2

0.6

0.9

0.2

0.8

Campfire
Thunder

Bayesian Learning

ForestFire

CSL465/603 - Machine Learning

32

Bayes Network (2)


Network represents the joint probability distribution
over all variables
(, , )
In general,

H, _, , = |
H

Where denotes immediate predecessors of


in the graph.

What is the Bayes Network corresponding to the


Naive Bayes Classifier?
Bayesian Learning

CSL465/603 - Machine Learning

33

Bayes Network (3)


Inference
Bayes network encodes all the information required for
inference.
Exact inference methods
Work well for some structures

Monte Carlo methods


Simulate the network randomly to calculate approximate
solutions.

Learning
If the structure is known and there are no missing values,
it is easy to learn a Bayes network
If the network structure is known and there are some
missing values, expectation maximization algorithm
If the structure is unknown, the problem is very difficult.
Bayesian Learning

CSL465/603 - Machine Learning

34

Summary
Bayes rule
Bayes Optimal Classifier
Practical Nave Bayes Classifier
Example text classification task

Maximum-likelihood estimates
Bayesian networks

Bayesian Learning

CSL465/603 - Machine Learning

35

S-ar putea să vă placă și