Sunteți pe pagina 1din 15

Data  Mining  

Classification:  Alternative  Techniques

Bayesian  Classifiers

Introduction  to  Data  Mining,  2nd Edition


by
Tan,  Steinbach,  Karpatne,  Kumar

Bayes  Classifier

● A  probabilistic  framework  for  solving  classification  


problems
● Conditional  Probability: P( X , Y )
P(Y | X ) =
P( X )
P( X , Y )
P( X | Y ) =
P(Y )
● Bayes  theorem:
P( X | Y ) P(Y )
P(Y | X ) =
P( X )
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   2
Example  of  Bayes  Theorem

● Given:  
– A  doctor  knows  that  meningitis  causes  s tiff  neck  50%  of  the  
time
– Prior  probability   of  any  patient  having  meningitis   is  1/50,000
– Prior  probability   of  any  patient  having  stiff  neck  is  1/20

● If  a  patient  has  stiff  neck,  what’s  the  probability  


he/she  has  meningitis?
P( S | M ) P( M ) 0.5 ×1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   3

Using  Bayes  Theorem  for  Classification

● Consider  each  attribute  and  class  label  as  r andom  


variables

● Given  a  r ecord  with  attributes  ( X1,  X2,…,  Xd)  


– Goal  is  to  predict  class  Y
– Specifically,  we  want  to  find  the  value  of  Y  that  
maximizes  P(Y|  X1,  X2,…,  Xd   )

● Can  we  estimate  P(Y|  X1,  X2,…,  Xd   )  directly  from  


data?

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   4


Example  Data
Given  a  Test  Record:
l l
ic
a
ic
a us
go
r
go
r X =tin(uRefund
o
s = No, Divorced, Income = 120K)
te te n as
ca ca co cl
Tid   Refund   Marital  
Status  
Taxable  
Income   Evade   ● Can  we  estimate
1   Yes   Single   125K   No   P(Evade  =  Yes  |  X)  and  P(Evade  =  No  |  X)?
2   No   Married   100K   No  
3   No   Single   70K   No  
4   Yes   Married   120K   No   In  the  following   we  will  replace  
5   No   Divorced   95K   Yes  
Evade  =  Yes  by  Yes,  and  
6   No   Married   60K   No  
7   Yes   Divorced   220K   No   Evade  =  No  by  No
8   No   Single   85K   Yes  
9   No   Married   75K   No  
10   No   Single   90K   Yes  
10  

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   5

Using  Bayes  Theorem  for  Classification

● Approach:
– compute  posterior  probability  P(Y  |  X1,  X2,  …,  Xd)  using  
the  Bayes  theorem
P( X 1 X 2 ! X d | Y ) P(Y )
P(Y | X 1 X 2 ! X n ) =
P( X 1 X 2 ! X d )

– Maximum  a-­posteriori:  Choose  Y  that  maximizes  


P(Y  |  X1,  X2,  …,  Xd)

– Equivalent  to  choosing  value  of  Y  that  maximizes


P(X1,  X2,  …,  Xd|Y)  P(Y)

● How  to  estimate  P(X1,  X2,   …,  Xd   |  Y  )?


02/14/2018 Introduction   to   Data  Mining,  2nd Edition   6
Example  Data
Given  a  Test  Record:
l l
ic
a
ic
a us
go
r
go
r X =tinu(oRefund
s = No, Divorced, Income = 120K)
te te n as
ca ca co cl
Tid   Refund   Marital   Taxable  
Status   Income   Evade  

1   Yes   Single   125K   No  


2   No   Married   100K   No  
3   No   Single   70K   No  
4   Yes   Married   120K   No  
5   No   Divorced   95K   Yes  
6   No   Married   60K   No  
7   Yes   Divorced   220K   No  
8   No   Single   85K   Yes  
9   No   Married   75K   No  
10   No   Single   90K   Yes  
10  

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   7

Naïve  Bayes  Classifier

● Assume  independence  among  attributes  Xi when  class  is  


given:        
– P(X1,  X2,  …,  Xd |Yj)  =  P(X1|  Yj)  P(X2|  Yj)…  P(Xd|  Yj)

– Now  we  can  estimate  P(Xi|  Yj)  for  all  Xi and  Yj
combinations  from  the  training  data

– New  point  is  classified  to  Yj if    P(Yj)  Π P(Xi|  Yj)    is  
maximal.

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   8


Conditional  Independence

●X and  Y are  conditionally  independent  given  Z if  


P(X|YZ)  =  P(X|Z)

● Example:  Arm  length  and  r eading  skills  


– Young  child  has  shorter  arm  length  and  
limited  reading  skills,  compared  to  adults
– If  age  is  fixed,  no  apparent  relationship  
between  arm  length  and  reading  skills
– Arm  length  and  reading  skills  are  conditionally  
independent  given  age

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   9

Naïve  Bayes  on  Example  Data


Given  a  Test  Record:
l l
ic
a
ic
a us
go
r
go
r X =tin(uRefund
o
s = No, Divorced, Income = 120K)
te te n as
ca ca co cl
Tid   Refund   Marital   Taxable  
Status   Income   Evade   ● P(X  |  Yes)  =  
No  
1   Yes   Single   125K  
P(Refund  =  No  |  Yes)  x  
2   No   Married   100K   No  
3   No   Single   70K   No  
P(Divorced  |  Yes)  x  
4   Yes   Married   120K   No   P(Income  =  1 20K  |  Yes)
5   No   Divorced   95K   Yes  
6   No   Married   60K   No  
7   Yes   Divorced   220K   No   ● P(X  |  No)  =  
8   No   Single   85K   Yes   P(Refund  =  No  |  No)  x  
9   No   Married   75K   No  
P(Divorced  |  No)  x  
10   No   Single   90K   Yes  
10  

P(Income  =  1 20K  |  No)

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   10


Estimate  l P robabilities  
l
from  Data
a a s
r ic r ic
u ou
go go tin s
ca
te
ca
te n
cl
as ● Class:    P(Y)  =  Nc /N
co
Tid   Refund   Marital   Taxable   – e.g.,    P(No)  =  7 /10,  
Status   Income   Evade   P(Yes)  =  3/10
1   Yes   Single   125K   No  
2   No   Married   100K   No  
● For  categorical  attributes:
3   No   Single   70K   No  
4   Yes   Married   120K   No   P(Xi |  Yk )  =  |Xik |/  Nc k
5   No   Divorced   95K   Yes  
– where  |Xik|  is  number  of  
6   No   Married   60K   No  
instances  having  attribute  
7   Yes   Divorced   220K   No  
value  Xi and  belonging  to  
8   No   Single   85K   Yes  
class  Yk
9   No   Married   75K   No  
10   No   Single   90K   Yes  
– Examples:
P(Status=Married|No)  =  4 /7
10  

P(Refund=Yes|Yes)=0
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   11

Estimate  P robabilities  from  Data

● For  continuous  attributes:  


– Discretization: Partition  the  range  into  bins:
u Replace  continuous  value  with  bin  value
k
– Attribute changed from continuous to ordinal

– Probability  density  estimation:


u Assume  attribute  follows  a  normal  distribution
u Use  data  to  estimate  parameters  of  distribution  
(e.g.,  mean  and  standard  deviation)
u Once  probability  distribution  is  known,  use  it  to  
estimate  the  conditional  probability  P(Xi|Y)

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   12


Estimate  
o r ic P robabilities  
al
o r ic
al
uo
us from  Data
t eg t eg n tin as
s
ca ca co cl
Tid   Refund   Marital  
Status  
Taxable  
Income   Evade  
● Normal  distribution:
( X i − µij ) 2
1   Yes   Single   125K   No   1 −
2σ ij2
2   No   Married   100K   No  
P( X i | Y j ) = e
2
2πσ ij
3   No   Single   70K   No  
4   Yes   Married   120K   No   – One  for  each  (Xi,Yi)  pair
5   No   Divorced   95K   Yes  
6   No   Married   60K   No   ● For  (Income,  Class=No):
7   Yes   Divorced   220K   No  
– If  Class=No
8   No   Single   85K   Yes  
u sample  mean  =  1 10
9   No   Married   75K   No  
10   No   Single   90K   Yes   u sample  variance  =  2 975
10  

1 −
( 120 −110 ) 2

P( Income = 120 | No) = e 2 ( 2975 )


= 0.0072
2π (54.54)
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   13

Example  of  Naïve  Bayes  Classifier


Given  a  Test  Record:

X = (Refund = No, Divorced, Income = 120K)


Naïve    Bayes   Classifier:

P(Refund  =  Yes  |  No)  =  3/7


P(Refund  =  No  |  No)  =  4/7 ● P(X  |  No)  =  P(Refund=No   |  No)
P(Refund  =  Yes  |  Yes)  =  0 × P(Divorced  |  No)
P(Refund  =  No  |  Yes)  =  1 × P(Income=120K   |  No)
P(Marital  Status  =  Single  |  No)  =  2/7 =  4/7  × 1/7   × 0.0072   =  0.0006
P(Marital  Status  =  Divorced  |  No)  =  1/7
P(Marital  Status  =  Married  |  No)  =  4/7
● P(X  |  Yes)  =  P(Refund=No   |  Yes)
P(Marital  Status  =  Single  |  Yes)  =  2/3
P(Marital  Status  =  Divorced  |  Yes)  =  1/3 × P(Divorced  |  Yes)
P(Marital  Status  =  Married  |  Yes)  =  0 × P(Income=120K   |  Yes)
=  1  × 1/3  × 1.2  × 10-­9 =  4  × 10-­10
For  Taxable  Income:
If  class  =  No:  sample  mean  =  110
sample  variance  =  2975
Since  P(X|No)P(No)   >  P(X|Yes)P(Yes)
If  class  =  Yes:  sample  mean  =  90 Therefore  P(No|X)  >  P(Yes|X)
sample  variance  =  25
=>  Class  =  No

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   14


Example  of  Naïve  Bayes  Classifier
Given  a  Test  Record:

X = (Refund = No, Divorced, Income = 120K)


Naïve    Bayes   Classifier:
● P(Yes)  =  3/10
P(Refund  =  Yes  |  No)  =  3/7
P(No)  =  7/10
P(Refund  =  No  |  No)  =  4/7
P(Refund  =  Yes  |  Yes)  =  0
P(Refund  =  No  |  Yes)  =  1
● P(Yes  |  Divorced)  =  1/3  x  3/10  /  P(Divorced)
P(Marital  Status  =  Single  |  No)  =  2/7
P(Marital  Status  =  Divorced  |  No)  =  1/7 P(No  |  Divorced)  =  1/7   x  7/10   /  P(Divorced)
P(Marital  Status  =  Married  |  No)  =  4/7
P(Marital  Status  =  Single  |  Yes)  =  2/3
P(Marital  Status  =  Divorced  |  Yes)  =  1/3 ● P(Yes  |  Refund   =  No,  Divorced)  =  1  x  1/3  x  3/10   /  
P(Marital  Status  =  Married  |  Yes)  =  0
P(Divorced,  Refund   =  No)
For  Taxable  Income: P(No  |  Refund   =  No,  Divorced)  =  4/7  x  1/7  x  7/10   /        
If  class  =  No:  sample  mean  =  110 P(Divorced,  Refund   =  No)
sample  variance  =  2975
If  class  =  Yes:  sample  mean  =  90
sample  variance  =  25

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   15

Issues  with  Naïve  Bayes  Classifier

Naïve    Bayes   Classifier:


● P(Yes)  =  3/10
P(Refund  =  Yes  |  No)  =  3/7 P(No)  =  7/10
P(Refund  =  No  |  No)  =  4/7
P(Refund  =  Yes  |  Yes)  =  0
P(Refund  =  No  |  Yes)  =  1 ● P(Yes  |  Married)  =  0  x  3/10  /  P(Married)
P(Marital  Status  =  Single  |  No)  =  2/7
P(Marital  Status  =  Divorced  |  No)  =  1/7 P(No  |  Married)  =  4/7  x  7/10  /  P(Married)
P(Marital  Status  =  Married  |  No)  =  4/7
P(Marital  Status  =  Single  |  Yes)  =  2/3
P(Marital  Status  =  Divorced  |  Yes)  =  1/3
P(Marital  Status  =  Married  |  Yes)  =  0

For  Taxable  Income:


If  class  =  No:  sample  mean  =  110
sample  variance  =  2975
If  class  =  Yes:  sample  mean  =  90
sample  variance  =  25

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   16


Issues  with  Naïve  Bayes  Classifier
l l a a s
ric ric u ou
o o
t eg
t n eg a tin ss
Consider  the   cl Naïve    Bayes   Classifier:
c a table  with  Tid
ca =  7  deleted
co
Tid   Refund   Marital   Taxable  
Status   Income   Evade   P(Refund  =  Yes  |  No)  =  2/6
P(Refund  =  No  |  No)  =  4/6
1   Yes   Single   125K   No   P(Refund  =  Yes  |  Yes)  =  0
2   No   Married   100K   No   P(Refund  =  No  |  Yes)  =  1
P(Marital  Status  =  Single  |  No)  =  2/6
3   No   Single   70K   No  
P(Marital  Status  =  Divorced  |  No)  =  0
4   Yes   Married   120K   No   P(Marital  Status  =  Married  |  No)  =  4/6
5   No   Divorced   95K   Yes   P(Marital  Status  =  Single  |  Yes)  =  2/3
P(Marital  Status  =  Divorced  |  Yes)  =  1/3
6   No   Married   60K   No   P(Marital  Status  =  Married  |  Yes)  =  0/3
7   Yes   Divorced   220K   No   For  Taxable  Income:
If  class  =  No:  sample  mean  =  91
8   No   Single   85K   Yes  
sample  variance  =  685
9   No   Married   75K   No   If  class  =  No:  sample  mean  =  90
10   No   Single   90K   Yes   sample  variance  =  25
10  

Given  X  =  (Refund  =  Yes,  Divorced,  120K)


Naïve  Bayes  will  not  be  able  to  
P(X  |  No)  =  2 /6  X  0  X  0 .0083  =  0
classify  X  as  Yes  or  No!
P(X  |  Yes)  =  0  X  1 /3  X  1 .2  X  1 0 -­9 =  0

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   17

Issues  with  Naïve  Bayes  Classifier

● If  one  of  the  conditional  probabilities  is  zero,  then  


the  entire  expression  becomes  zero
● Need  to  use  other  estimates  of  conditional  probabilities  
than  simple  fractions c: number of classes
● Probability  estimation: p: prior probability of
the class
N ic
Original : P( Ai | C ) = m: parameter
Nc
N ic + 1 N c: number of instances
Laplace : P( Ai | C ) = in the class
Nc + c
N ic + mp
m - estimate : P( Ai | C ) = N ic: number of instances
Nc + m having attribute value Ai
in class c
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   18
Example  of  Naïve  Bayes  Classifier

Name Give  Birth Can  Fly Live  in  Water Have  Legs Class
human yes no no yes mammals A:  attributes
python no no no no non-­mammals M:  mammals
salmon no no yes no non-­mammals
whale yes no yes no mammals N:  non-­mammals
frog no no sometimes yes non-­mammals
komodo no no no yes non-­mammals
6 6 2 2
bat yes yes no yes mammals P ( A | M ) = × × × = 0.06
pigeon
cat
no
yes
yes
no
no
no
yes
yes
non-­mammals
mammals
7 7 7 7
leopard  shark yes no yes no non-­mammals 1 10 3 4
turtle no no sometimes yes non-­mammals P ( A | N ) = × × × = 0.0042
penguin no no sometimes yes non-­mammals 13 13 13 13
porcupine yes no no yes mammals
eel no no yes no non-­mammals 7
salamander no no sometimes yes non-­mammals P ( A | M ) P ( M ) = 0.06 × = 0.021
gila  monster no no no yes non-­mammals 20
platypus no no no yes mammals
owl no yes no yes non-­mammals 13
dolphin yes no yes no mammals P ( A | N ) P ( N ) = 0.004 × = 0.0027
eagle no yes no yes non-­mammals 20

Give  Birth Can  Fly Live  in  Water Have  Legs Class P(A|M)P(M)  >  P(A|N)P(N)
yes no yes no ? =>  Mammals

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   19

Naïve  Bayes  (Summary)

● Robust  to  isolated  noise  points

● Handle  m issing  values  by  ignoring  the  instance  


during  probability  estimate  calculations

● Robust  to  irrelevant  attributes

● Independence  assumption  m ay  not  hold  for  some  


attributes
– Use  other  techniques  such  as  Bayesian  Belief  
Networks  (BBN)
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   20
Naïve  Bayes

● How  does  Naïve  B ayes  p erform  o n  the  following  d ataset?

Conditional  independence  of  attributes  is  v iolated

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   21

Naïve  Bayes

● How  does  Naïve  B ayes  p erform  o n  the  following  d ataset?

Naïve  Bayes  c an  c onstruct  oblique  decision  boundaries

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   22


Naïve  Bayes

● How  does  Naïve  B ayes  p erform  o n  the  following  d ataset?

Y  =  1 1 1 1 0

Y  =  2 0 1 0 0

Y  =  3 0 0 1 1

Y  =  4 0 0 1 1

X  =  1 X  =  2 X  =  3 X  =  4

Conditional  independence  of  attributes  is  v iolated

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   23

Bayesian  Belief  Networks

● Provides  graphical  r epresentation  of  probabilistic  


relationships  among  a  set  of  random  variables
● Consists  of:
– A  directed  acyclic  graph  (dag) A B
u Node  corresponds  to  a  variable
u Arc  corresponds  to  d ependence  
C
relationship  between  a  pair  of  variables

– A  probability  table  associating  each  node  to  its  


immediate  parent

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   24


Conditional  Independence

D
D  is  parent  of  C
A  is  child  of  C
C
B  is  descendant  of  D
D  is  ancestor  of  A

A B

● A  node  in  a  Bayesian  network  is  conditionally  


independent  of  all  of  its  nondescendants,  if  its  
parents  are  known
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   25

Conditional  Independence

● Naïve  Bayes  assumption:

X1 X2 X3 X4 ... Xd

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   26


Probability  Tables

● If  X  does  not  have  any  parents,  table  contains  


prior  probability  P(X)
Y

● If  X  has  only  one  parent  ( Y),  table  contains  


conditional  probability  P(X|Y) X

● If  X  has  m ultiple  parents  ( Y1,  Y2,…,  Yk ),  table  


contains  conditional  probability  P(X|Y1,  Y2,…,  Yk )

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   27

Example  of  Bayesian  Belief  Network

Exercise=Yes 0.7 Diet=Healthy 0.25


Exercise=No 0.3 Diet=Unhealthy 0.75

Exercise Diet

E=Healthy   E=Healthy   E=Unhealthy   E=Unhealthy  


Heart   D=Yes D=No D=Yes D=No
Disease HD=Yes 0.25 0.45 0.55 0.75
HD=No 0.75 0.55 0.45 0.25

Blood
Chest  Pain
Pressure

  HD=Yes HD=No   HD=Yes HD=No


CP=Yes 0.8 0.01 BP=High 0.85 0.2
CP=No 0.2 0.99 BP=Low 0.15 0.8

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   28


Example  of  Inferencing  using  BBN

● Given:  X  =  (E=No,  D=Yes,  CP=Yes,  BP=High)


– Compute  P(HD|E,D,CP,BP)?
● P(HD=Yes|  E=No,D=Yes)  =  0.55
P(CP=Yes|  HD=Yes)  =  0.8
P(BP=High|  HD=Yes)  =  0.85
– P(HD=Yes|E=No,D=Yes,CP=Yes,BP=High)  
∝ 0.55  × 0.8  × 0.85  =  0 .374
Classify  X  
● P(HD=No|  E=No,D=Yes)  =  0.45 as  Yes
P(CP=Yes|  HD=No)  =  0.01
P(BP=High|  HD=No)  =  0.2
– P(HD=No|E=No,D=Yes,CP=Yes,BP=High)  
∝ 0.45  × 0.01  × 0.2  =  0 .0009

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   29

S-ar putea să vă placă și