Documente Academic
Documente Profesional
Documente Cultură
s s
e
+ =
d
n k
k
C c
map
c t P c P c
h d i di t tk i f th l ti f f how good an indicator tk is for c the relative frequency of c
22
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
^ ^
] ) | ( log ) ( [log max arg
1
^ ^
s s
e
+ =
d
n k
k
C c
map
c t P c P c
how to estimate the parameters ?
Maximum Likelihood Estimate (MLE)
for the parameters Estimation f p
23
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
What is the Maximum Likelyhood Estimation (MLE): What is the Maximum Likelyhood Estimation (MLE):
the relative frequency and corresponds to the
most likely value of each parameter given the most likely value of each parameter given the
training data.
How?
number of documents in class c
How?
for the priors:
N
N
c P
c
= ) (
^
total number of documents
for the conditional probability:
b f f
e
=
V t
ct
ct
T
T
c t P
'
'
^
) | (
number of occurrences of
t/t in training documents
from class c
24
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
A problem with MLE A problem with MLE
what if a term that did not occur in the training
data ? data ?
for we need
0 ) | (
'
'
^
= =
e V t
ct
ct
T
T
c t P
) | ( log
^
c t P 0 ) | ( > c t P for , we need
Solution: add-one or Laplace smoothing
) | ( log c t P 0 ) | ( > c t P
+
=
+
=
ct ct
T T
c t P
^
1 1
) | (
e e
+ +
V t
ct
V t
ct
B T T
c t P
'
'
'
'
) ( 1
) | (
B=|V| B=|V|
the number of terms in vocabulary
25
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
Naive Bayes algorithm: Naive Bayes algorithm:
Training
26
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
Naive Bayes algorithm: Naive Bayes algorithm:
Testing
27 TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
Question: Question:
Decide:
h th d t d5 b l i t l Chi ? whether document d5 belonging to class c=China?
28
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
Solution: Solution:
T i i
4 / 1 ) ( 4 / 3 ) (
_ ^ ^
= = c P c P
Training:
7 / 3 14 / 6 ) 6 8 /( ) 1 5 ( ) | (
^
= = + + = c Chinese P
14 / 1 ) 6 8 /( ) 1 0 ( ) | ( ) | (
^ ^
= + + = = c Japan P c Tokyo P
9 / 2 ) 6 3 /( ) 1 1 ( ) | (
_ ^
Chi P
4 / 1 ) ( , 4 / 3 ) ( = = c P c P
Testing:
9 / 2 ) 6 3 /( ) 1 1 ( ) | ( = + + = c Chinese P
9 / 2 ) 6 3 /( ) 1 1 ( ) | ( ) | (
_ ^ _ ^
= + + = = c Japan P c Tokyo P
c=China
29
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
Th t diff nt s t s t p n There are two different ways to set up an
NB classifier:
multinomial Naive Bayes (multinomial NB model)
multivariate Bernoulli model (Bernoulli model)
30
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
B n lli m d l Bernoulli model
different with multinomial NB model:
different estimation strategies different estimation strategies
different classification rules different classification rules
31
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
training for
prior probability
are same
fraction of tokens in c containing t
fraction of documents in c containing t
32
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
only considering terms that
appears in the documents pp
t t till ff t nonoccurrent terms still affect
the computing
33
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
Question with Bernoulli model: Question with Bernoulli model:
Decide:
h th d t d5 b l i t l Chi ? whether document d5 belonging to class c=China?
34
TDT4215
Naive Bayes Text Classification
NAIVE BAYES TEXT CLASSIFICATION NAIVE BAYES TEXT CLASSIFICATION
Solution with Bernoulli model: Solution with Bernoulli model:
T i i / ) /( ) ( ) | (
^
h 4 / 1 ) ( 4 / 3 ) (
_ ^ ^
P P Training: 5 / 4 ) 2 3 /( ) 1 3 ( ) | ( = + + = c Chinese P
5 / 1 ) 2 3 /( ) 1 0 ( ) | ( ) | (
^ ^
= + + = = c Japan P c Tokyo P
3 / 2 ) 2 1 /( ) 1 1 ( ) | (
_ ^
= + + = c Chinese P 3 / 2 ) 2 1 /( ) 1 1 ( ) | ( ) | (
_ ^ _ ^
= + + = = c Japan P c Tokyo P
4 / 1 ) ( , 4 / 3 ) ( = = c P c P
5 / 2 ) 2 3 /( ) 1 1 ( ) | ( ) | (
^ ^
= + + = = c Macao P c Beijing P
Testing:
3 / 2 ) 2 1 /( ) 1 1 ( ) | ( = + + = c Chinese P 3 / 2 ) 2 1 /( ) 1 1 ( ) | ( ) | ( = + + = = c Japan P c Tokyo P
3 / 1 ) 2 1 /( ) 1 0 ( ) | ( ) | ( ) | (
_ ^ _ ^ _ ^
= + + = = = c Shanghai P c Macao P c Beijing P
005 . 0 ) | (
5
^
~ d c P
not-China
022 . 0 ) | (
5
_
~ d c P
35
TDT4215
Naive Bayes Text Classification
OUTLINES OUTLINES
Int d ti n: m ti ti n nd m th ds Introduction: motivation and methods
The Text Classification Problem
Naive Bayes Text Classification
Properties of Naive Bayes Properties of Naive Bayes
Feature Selection
Evalutation of Text Classification Evalutation of Text Classification
36
TDT4215
Naive Bayes Text Classification
PROPERTIES OF NAIVE BAYES PROPERTIES OF NAIVE BAYES
R ll B s l : Recall Bayes rule:
) | ( ) ( ) | ( ) ( ) ( B A P B P A B P A P AB P = = ) | ( ) ( ) | ( ) ( ) ( B A P B P A B P A P AB P = =
) | ( ) (
) | (
B A P B P
A B P
) (
) | ( ) (
) | (
A P
A B P =
37
TDT4215
Naive Bayes Text Classification
PROPERTIES OF NAIVE BAYES PROPERTIES OF NAIVE BAYES
With Bayes rule for a document d and a class c: With Bayes rule, for a document d and a class c:
) ( ) | (
) | (
c P c d P
d c P =
) (
) | (
d P
d c P =
) | ( max arg d c P c
C
map
=
Bayes rule
C ce
) (
) ( ) | (
max arg
d P
c P c d P
=
P(d) do not
affect the result
) (
g
d P C ce
) ( ) | ( max arg c P c d P = ) ( ) | ( g
C ce
38
TDT4215
Naive Bayes Text Classification
PROPERTIES OF NAIVE BAYES PROPERTIES OF NAIVE BAYES
) ( ) | ( P d P ) ( ) | ( max arg c P c d P c
C c
map
e
=
high time complexity to
compute both conditional
How to compute P(d|c):
compute both conditional
probabilities
Multinomial:
is the sequence of terms as it occurs in d
) | ,..., ,..., ( ) | (
1
c t t t P c d P
d
n k
> < =
> <
d
n k
t t t ,..., ,...,
1
Bernoulli:
is a binary vector of dimensionality M that
) | ,..., ,..., ( ) | (
1
c e e e P c d P
M k
> < =
> <
M k
e e e ,..., ,...,
1
is a binary vector of dimensionality M that
indicates for each term whether it occurs in d or not
> <
M k
e e e ,..., ,...,
1
39
TDT4215
Naive Bayes Text Classification
PROPERTIES OF NAIVE BAYES PROPERTIES OF NAIVE BAYES
C nditi n l Ind p nd n Ass mpti n Conditional Independence Assumption
probability that in a document of class
th t t ill i iti k
Multinomial:
[
= = > < =
k k k
c t X P c t t t P c d P
1
) | ( ) | ( ) | (
c the term t will occur in position k
Bernoulli:
[
s s
> <
d
d
n k
k k n k
c t X P c t t t P c d P
1
1
) | ( ) | ,..., ,..., ( ) | (
Bernoulli:
[
< <
= = > < =
M i
i i M k
c e U P c e e e P c d P
1
1
) | ( ) | ,..., ,..., ( ) | (
< < M i 1
probability that a document of class c the term ti
- will occur if ei=1
ill if i 0 - will not occur if ei=0
40
TDT4215
Naive Bayes Text Classification
PROPERTIES OF NAIVE BAYES PROPERTIES OF NAIVE BAYES
M ltin mi l:
probability that in a document of class
Multinomial:
[
X P P d P ) | ( ) | ( ) | (
probability that in a document of class
c the term t will occur in position k
still high time complexity if we have to
[
s s
= = > < =
d
d
n k
k k n k
c t X P c t t t P c d P
1
1
) | ( ) | ,..., ,..., ( ) | (
still high time complexity if we have to
consider the position of each term t occurs
Positional Independence Assumption Positional Independence Assumption
) | ( ) | ( c t X P c t X P
k k
= = =
Equivalent to bag of words model
) | ( ) | (
2 1
k k
q g
41
TDT4215
Naive Bayes Text Classification
OUTLINES OUTLINES
Int d ti n: m ti ti n nd m th ds Introduction: motivation and methods
The Text Classification Problem
Naive Bayes Text Classification
Properties of Naive Bayes Properties of Naive Bayes
Feature Selection
Evalutation of Text Classification Evalutation of Text Classification
42
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
Feature Selection is a process of selecting a subset Feature Selection is a process of selecting a subset
of the terms occurring in the training set and using
only this subset as features in text classification. only this subset as features in text classification.
Feature Selection: Why ?
Text collections have a large number of features g
o 10,000 1, 000, 000 unique words and more
May make using a particular classifier feasible
o Some classifiers cant deal with 100,000 of features
Reduces training time
o Training time for some methods is quadratic or worse in o Training time for some methods is quadratic or worse in
the number of features
Can improve generalization
o Eliminates noise features and avoid overfitting
43
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
Feature Selection: How ? Feature Selection: How ?
A(t,c) utility measures:
f frequency
mutual information
2
the test
2
_
44
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
F n b s d f t s l ti n Frequency-based feature selection
selecting terms that are most common in the class
simple and easy to implement
may select some frequent terms that have no
specific information (such as, Monday, Tuesday ) p f f m ( , y, y )
however, if many thousands of features are
selected, it usually does well.
45
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
Mutual Information feature selection: ) ( ) ( C U I t A Mutual Information feature selection: ) ; ( ) , ( C U I c t A =
U is a random variable
th d t t i t
1
o : the document contains t
o : the document does not contain t
C i d i bl
1 =
t
e
0 =
t
e
C is a random variable
o : the document is in class c
h d l
1 =
c
e
0
o : the document is not in class c
0 =
c
e
46
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
With Maximum Likelyhood Estimation: With Maximum Likelyhood Estimation:
number of documents that do NOT
contain t, but in c
number of documents that
contain t and in c
number of documents that
contain t, but NOT in c
number of documents
that does NOT contain
t and NOT in c
11 10 1
N N N + =
- number of documents that contain t
11 10 . 1
N N N +
11 01 1 .
N N N + =
00 01 . 0
N N N + =
N N N + =
- number of documents that contain t
- number of documents in c
- number of documents that do NOT contain t
numb f d cum nts NOT in c
00 10 0 .
N N N + =
11 10 01 00
N N N N N + + + =
- number of documents NOT in c
- total number of documents
47
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
An Example An Example
In Reuters-RCV1, c = poultry, t = export
48
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
The figure shows terms with high mutual information scores The figure shows terms with high mutual information scores
for the six classes in Reuters-RCV1.
49
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
Th f t s l ti n
2
_
The feature selection
In statistics, the text is applied to test
h d d f
_
2
_
the independence of two events.
Events A and B are defined to be
independence if
P(AB)=P(A)P(B) or P(AB) P(A)P(B) or
P(A|B)=P(A) and P(B|A)=P(B)
In feature selection the two events are In feature selection, the two events are
occurrence of the term and occurrence of
class class.
50
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
The feature selection
2
_
The feature selection
_
h th i i M t l I f ti
N
has the same meaning as in Mutual Information
feature selection.
is the expected frequency of t and c occurring
c t
e e
N
E
is the expected frequency of t and c occurring
together in a document assuming that term and class
are independent.
c t
e e
E
are independent.
51
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
The feature selection
2
_
The feature selection
_
b t d f th
N N N N N
can be counted from the
training data set as in Mutual Information feature
selection
11 10 01 00
, , , N N N N N
c t
e e
selection.
can also be computed from the
training data set.
11 10 01 00
, , , E E E E E
c t
e e
training data set.
52
TDT4215
Naive Bayes Text Classification
FEATURE SELECTION FEATURE SELECTION
The Example again: The Example again:
t compute :
C t th i th
11
E
Compute other in the same way:
c t
e e
E
the higher the value the
more dependence between more dependence between
term t and class c
53
TDT4215
Naive Bayes Text Classification
OUTLINES OUTLINES
Int d ti n: m ti ti n nd m th ds Introduction: motivation and methods
The Text Classification Problem
Naive Bayes Text Classification
Properties of Naive Bayes Properties of Naive Bayes
Feature Selection
Evalutation of Text Classification Evalutation of Text Classification
54
TDT4215
Naive Bayes Text Classification
EVALUATION OF TEXT CLASSIFICATION EVALUATION OF TEXT CLASSIFICATION
Evaluation must be done on test data that are Evaluation must be done on test data that are
independent of the training data (usually a disjoint
set of instances) set of instances)
Classification accuracy: c/n
n is the total number of test instances n is the total number of test instances
c is the number of test instances correctly
classified f
Accuracy measurement is appropriate only if
percentage of documents in the class is high p g g
A class with relative frequency 1%, the always
no classifier will achieve 99% accurate
55
TDT4215
Naive Bayes Text Classification
SUMMARY SUMMARY
Int d ti n: m ti ti n nd m th ds Introduction: motivation and methods
The Text Classification Problem
Naive Bayes Text Classification
Properties of Naive Bayes Properties of Naive Bayes
Feature Selection
Evalutation of Text Classification Evalutation of Text Classification
56
TDT4215
Naive Bayes Text Classification