Documente Academic
Documente Profesional
Documente Cultură
Machines
Henry Kautz
Temperature
Humidity
= play tennis
= do not play tennis
Linear Support Vector
Machines
Data: <xi,yi>, i=1,..,l
xi Rd
yi {-1,+1}
x2
=+1
=-1
x1
Linear SVM 2
f(x) =-1
=+1
H1
H
H2
Recall the distance from a point(x0,y0) to a line: d+
Ax+By+c = 0 is|A x0 +B y0 +c|/sqrt(A2+B2) d-
The distance between H and H1 is:
|w•x+b|/||w||=1/||w||
L(w, b, ) || w || i yi (x i w ) b 1
1
2 i
subject to y
i
i i 0 and i 0
Quadratic Programming
• Why is this reformulation a good thing?
• The problem
1
Maximize i yi y j i j xi x j
i 2 i, j
subject to y
i
i i 0 and i 0
=-1
=+1
1
We want to maximize i yi y j i j F (x i ) F (x j )
i 2 i, j
Define K (x i , x j ) F (x i ) F (x j )
Cool thing : K is often easy to compute directly! Here,
2
K (x i , x j ) x i x j
Other Kernels
=-1
=+1
Overtraining/overfitting 2
A measure of the risk of overtraining with SVM (there are also other
measures).
It can be shown that: The portion, n, of unseen data that will be
missclassified is bounded by:
n Number of support vectors / number of training examples
Ockham´s razor principle: Simpler system are better than more complex ones.
In SVM case: fewer support vectors mean a simpler representation of the
hyperplane.
Nuclear
SVM Model
All others
Cross-validation
Cross validation: Split the data into n sets, train on n-1 set, test on the set left
out of training.
1
1 Test set
Nuclear 1
2
2
1
Training set
All others 3
2
2
3
3
Performance measurments
Test data Predictions TP
FP
+1
Model
TN
-1
=+1
=-1
FN
Reason: All enemy tank photos taken in the morning. All own tanks in dawn.
The classifier could recognize dusk from dawn!!!!
References
http://www.kernel-machines.org/
http://www.support-vector.net/
Papers by Vapnik
C.J.C. Burges: A tutorial on Support Vector Machines. Data Mining and
Knowledge Discovery 2:121-167, 1998.