Sunteți pe pagina 1din 4

CSE5230 Tutorial: The Naı̈ve Bayes Classifier 1

CSE5230 Tutorial: The Naı̈ve Bayes Classifier


M ONASH U NIVERSITY
Faculty of Information Technology

CSE5230 Data Mining

Semester 2, 2004

The aim of this exercise is to learn how to construct and use a Naı̈ve Bayes Classifier for data with
categorical attributes.

1 The weather data


In this tutorial, we will use the same weather dataset that was used in the handouts on the ID3 algorithm
and the Naı̈ve Bayes Classifier. The data is shown in Table 1.

outlook temperature humidity windy play


sunny hot high false no
sunny hot high true no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
overcast cool normal true yes
sunny mild high false no
sunny cool normal false yes
rainy mild normal false yes
sunny mild normal true yes
overcast mild high true yes
overcast hot normal false yes
rainy mild high true no

Table 1: The weather data (Witten and Frank; 1999, p. 9).

In this dataset, there are five categorical attributes outlook, temperature, humidity, windy, and play. We
are interested in building a system which will enable us to decide whether or not to play the game on the
basis of the weather conditions, i.e. we wish to classify the data into two classes, one where the attribute
play has the value “yes”, and the other where it has the value “no”. This classification will be based on
the values of the attributes outlook, temperature, humidity, and windy.

David McG. Squire, August 19, 2004


CSE5230 Tutorial: The Naı̈ve Bayes Classifier 2

2 The Naı̈ve Bayes classifier


Recall the explanation of the Naı̈ve Bayes classifier given in lecture 5. We consider each data instance
to be an n-dimensional vector of attribute values:

X = (x1 , x2 , x3 , . . . , xn ). (1)

For the weather data, n = 4. The first data instance in Table 1 would be written X = (sunny, hot, high, false).
In a Bayesian classifier which assigns each data instance to one of m classes C1 , C2 , . . . , Cm , a data
instance X is assigned to the class for which it has the highest posterior probability conditioned on X,
i.e. the class which is most probable given the prior probabilities of the classes and the data X (Duda
et al.; 2000). That is to say, X is assigned to class Ci if and only if

P (Ci |X) > P (Cj |X) for all j such that 1 ≤ j ≤ m, j 6= i. (2)

For the weather data, m = 2, since there are two classes.


According to Bayes Theorem
P (X|Ci )P (Ci )
P (Ci |X) = . (3)
P (X)
Since P (X) is a normalising factor which is equal for all classes, we need only maximise the numerator
P (X|Ci )P (Ci ) in order to do the classification.
We can estimate both the values we need, P (X|Ci ) and P (Ci ), from the data used to build the classifier.

2.1 Estimating the class prior probabilities


We can estimate the prior probabilities of the classes from their frequencies in the training data. Consider
the weather data in Table 1. Let us label the class where play has the value “yes” as C1 , and the class
where it has the value “no” as C2 . From the frequencies in the data, we estimate:
9
P (yes) = P (C1 ) =
14
5
P (no) = P (C2 ) =
14
These are the prior probabilities of the classes, i.e. the probabilities before we know anything about a
given data instance.

2.2 Estimating the probability of the data given the class


In general, it can be very computationally expensive to compute the P (X|Ci ). If each component xk of
X can have one of r values, there are rn combinations to consider for each of the m classes.
In order to simplify the calculation, the assumption of class conditional independence is made, i.e. that
for each class, the attributes are assumed to be independent. The classifier resulting from this assumption
is known as the Naı̈ve Bayes classifier. The assumption allows us to write
n
Y
P (X|Ci ) = P (xk |Ci ), (4)
k=1

David McG. Squire, August 19, 2004


CSE5230 Tutorial: The Naı̈ve Bayes Classifier 3

i.e. the product of the probabilities of each of the values of the attributes of X for the given class Ci .
To see how this works, let us consider an example. What is the probability of outlook = sunny given
that play = no? Of the five cases where play = no, there are three where outlook = sunny, thus
P (outlook = sunny|play = no) = 3/5. In the notation of Equation 4, we may write
3
P (x1 = sunny|C2 ) = .
5
Now we will consider how to put these attribute value probabilities together to calculate a P (X|Ci )
according to Equation 4. Let us consider the probability of the first data instance in Table 1, given the
class C2 (i.e. given that play = no). We have

P (X = (sunny, hot, high, false)|C2 ) = P (x1 = sunny|C2 ) × P (x2 = hot|C2 ) ×


P (x3 = high|C2 ) × P (x4 = false|C2 )
3 2 4 2
= × × ×
5 5 5 5
48
=
625
We can put this together with our known prior probability for class C2 to obtain, using Equation 3,
P (X = (sunny, hot, high, false)|C2 )P (C2 )
P (C2 |X = (sunny, hot, high, false)) =
P (X)
48 5
625 × 14
=
P (X)
240
= ÷ P (X).
8750
Remember that we don’t need to calculate P (X) since it is constant for all classes.

3 Questions
You may answer the following questions using calculations done by hand, as above. If you wish, you
may set up an Excel spreadsheet to help, or even write a small program in the language of your choice.
Question 1 Calculate P (C1 |X = (sunny, hot, high, false)). How would the Naı̈ve Bayes
classifier classify the data instance X = (sunny, hot, high, false)?

Question 2 Does this agree with the classification given in Table 1 for the data instance
X = (sunny, hot, high, false)?

Question 3 Consider a new data instance X 0 = (overcast, cool, high, true). How would
the Naı̈ve Bayes classifier classify X 0 ?

Question 4 Some algorithms (e.g. ID3) are able to produce a classifier that classifies the
data in Table 1 without errors. Does the Naı̈ve Bayes classifier achieve the
same performance? (n.b. This will take some time to compute by hand.)

David McG. Squire, August 19, 2004


CSE5230 Tutorial: The Naı̈ve Bayes Classifier 4

References
Duda, R. O., Hart, P. E. and Stork, D. G. (2000). Pattern Classification, 2nd edn, Wiley, New York, NY,
USA.

Witten, I. H. and Frank, E. (1999). Data Mining: Practical Machine Learning Tools and Techniques
with Java Implementations, Morgan Kaufmann, San Francisco, CA, USA.
URL: http://www.cs.waikato.ac.nz/˜ml/weka/book.html

David McG. Squire, August 19, 2004

S-ar putea să vă placă și