Documente Academic
Documente Profesional
Documente Cultură
P.S.Sastry
sastry@ee.iisc.ernet.in
Reference Books
Pattern Recognition
A basic attribute of people categorisation of sensory
input
Pattern PR System Class label
Pattern Recognition
A basic attribute of people categorisation of sensory
input
Pattern PR System Class label
Examples of Pattern Recognition tasks
Recognising Speech
Reading a Document
Wine tasting
Character Recognition
Pattern Image.
Class identity of character
Features: Binary image, projections (e.g., row and
column sums), Moments etc.
Examples Contd.
Speech Recognition
Pattern 1-D signal (or its sampled version)
Class identity of speech units
Features LPC model of chunks of speech,
spectral info, cepstrum etc.
Pattern can become a sequence of feature vectors.
Examples contd...
Examples contd...
Some notation
Some notation
Some notation
A simple PR problem
: Features:
x1 : Marks based on academic record
x2 : Marks in the interview
A simple PR problem
: Features:
x1 : Marks based on academic record
x2 : Marks in the interview
A simple PR problem
: Features:
x1 : Marks based on academic record
x2 : Marks in the interview
A simple PR problem
: Features:
x1 : Marks based on academic record
x2 : Marks in the interview
Design of classifier:
We have to choose a specific form for the classifier.
What values to use for parameters such as a, b, c?
PRNN Jan-Apr 2016 p.28/177
Designing Classifiers
Designing Classifiers
Designing Classifiers
Training Set
Function Learning
Function Learning
ai Z(k i)
i=1
ai Z(k i)
i=1
Machine Learning
Design of Classifiers
Design of Classifiers
Design of Classifiers
Design of Classifiers
Class 1
Class 2
Class 1
Class 2
Class 1
Class 2
Class 1
Class 2
Let
Let
Let
Let
interest.
For a feature vector X , let y(X) denote the class
label of X . In general, y(X) would be random.
interest.
For a feature vector X , let y(X) denote the class
label of X . In general, y(X) would be random.
Statistical PR contd.
Statistical PR contd.
Statistical PR contd.
Statistical PR contd.
qi (X) = fi (X)pi / Z
where Z = f0 (X)p0 + f1 (X)p1 is the normalising
constant
PRNN Jan-Apr 2016 p.81/177
Bayes Classifier
q0 (X)
>1
h(X) = 0 if
q1 (X)
= 1 otherwise
Bayes Classifier
q0 (X)
>1
h(X) = 0 if
q1 (X)
= 1 otherwise
Bayes Classifier
q0 (X)
>1
h(X) = 0 if
q1 (X)
= 1 otherwise
Bayes Classifier
q0 (X)
>1
h(X) = 0 if
q1 (X)
= 1 otherwise
Optimality contd.
Tpe-I or Type-II;
Statistical PR contd.
Statistical PR contd.
Statistical PR contd.
Loss functions
L(a, b) = 0 if a = b
= 1 otherwise.
Loss functions
L(a, b) = 0 if a = b
= 1 otherwise.
Now
Loss functions
Loss functions
Loss functions
q0 (X) L(0, 1)
>
hB (X) = 0 if
q1 (X) L(1, 0)
= 1 otherwise
q0 (X) L(0, 1)
>
hB (X) = 0 if
q1 (X) L(1, 0)
= 1 otherwise
q0 (X) L(0, 1)
>
hB (X) = 0 if
q1 (X) L(1, 0)
= 1 otherwise
Distance function:
Can use Eucledean distance.
"
d(X, X ) = (xi xi )2 .
Distance function:
Can use Eucledean distance.
"
d(X, X ) = (xi xi )2 .
d(X, X ) = ( i )
Here i is the (estimated) variance of ith feature.
Distance function:
Can use Eucledean distance.
"
d(X, X ) = (xi xi )2 .
d(X, X ) = ( i )
Here i is the (estimated) variance of ith feature.
d(X, X ) = (X X )T 1 (X X )
where is the (estimated) covariance matrix. Called
Mahalanobis distance.
PRNN Jan-Apr 2016 p.117/177
g(W, X) =
n
!
w i xi + w 0
i=1
g(W, X) =
n
!
w i xi + w 0
i=1
g(W, X) =
n
!
w i xi + w 0
i=1
= 0 otherwise
= 0 otherwise
Recall W = (w0 w1 wn )T .
= 0 otherwise
Recall W = (w0 w1 wn )T .
XiT W > 0 if yi = 1
< 0 if yi = 0
XiT W > 0 if yi = 1
< 0 if yi = 0
F (W ) =
!
1
L(h(W, Xi ), yi )
i=1
F (W ) =
!
1
L(h(W, Xi ), yi )
i=1
F (W ) =
!
1
L(h(W, Xi ), yi )
i=1
F (W ) =
!
1
L(h(W, Xi ), yi )
i=1
F (Wm ) F (Wm ) + K
We need to minimize
F (W ) =
!
1
L(h(W, Xi ), yi )
i=1
We need to minimize
F (W ) =
!
1
L(h(W, Xi ), yi )
i=1
We need to minimize
F (W ) =
!
1
L(h(W, Xi ), yi )
i=1
hinge loss:
There are many other loss function that one can use.
F (W ) =
!
1
(h(W, X) y(X))2
i=1
.
Now we can use standard optimization techniques to
minimize F .
If h(W, X) = W T X then this is standard linear least
squares estimation.
If we use sigmoid function then it is called logistic
regression.
PRNN Jan-Apr 2016 p.155/177
0.9
H1
0.8
H2
H3
H1
0.7
0.6
0.5
0.4
H2
0.3
H3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
C0
C1 C0
C1
0.9
H1
0.8
H2
H3
H1
0.7
0.6
0.5
0.4
H2
0.3
H3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
C0
C1 C0
C1
SVM idea
SVM idea
SVM idea
Now,
Logistic regression.