Documente Academic
Documente Profesional
Documente Cultură
Ujjwal Das
1 / 25
Outline
SVM for more than two classes (will be discussed completely via
examples in R language)
2 / 25
The classication problem
So, using some set of rules we will be able to classify the entire
population into two groups.
3 / 25
A separable data
40
20
0
height
4 / 25
Hyperplane
So, if we nd a point x∗ = (x∗1 , x∗2 , . . . , x∗p ) that satises (1) then
∗
we say that the point x lies on the hyperplane.
5 / 25
Separating Hyperplane
Let e = (e
x x1 , x ep )
e2 , . . . , x be a point from p-dimension and it does
not satisfy (1), rather it is > 0. this informs us that the point x
e
lies to one side of the hyperplane.
6 / 25
Separating Hyperplanes
7 / 25
Separating Hyperplanes
40
20
0
height
8 / 25
The maximal margin classier
For a given hyperplane, one can compute its distance from each
of the training observations. The smallest of these distances is
called margin which is the minimum distance of the hyperplane
from the observations.
The maximal margin hyperplane is the one that has the largest
margin; or alternatively the maximal margin hyperplane has the
farthest minimum distance to the training data points
9 / 25
A separable data and maximal margin classier
3
2
X2
1
0
−1
−1 0 1 2 3
X1
Mathematically speaking
max M (2)
β0 ,β1 ,...,βp
subject to
p
X
βj2 = 1
j=1
yi (β0 + β1 xi1 + β2 xi2 + . . . + βp xip ) ≥ M
for every i = 1, 2, . . . , n
The constraint yi (β0 + β1 xi1 + β2 xi2 + . . . + βp xip ) ≥ M
∀i = 1, 2, ..., n and M > 0 ensures that each training data point
will be on the correct side of the hyperplane
11 / 25
The pros and cons of MMH
There are three training points that are equidistant from the
hyperplane and lie along the margin. The points on the margin
are called support vectors since they support the maximal
margin hyperplane. A small change in the support vectors may
dramatically change the maximal margin hyperplane
12 / 25
Non-separable data
13 / 25
A non-separable data
2.0
1.5
1.0
X2
0.5
0.0
−0.5
−1.0
0 1 2 3
X1
max M (3)
β0 ,β1 ,...,βp ,1 ,2 ,...,n
subject to
p
X
βj2 = 1
j=1
yi (β0 + β1 xi1 + β2 xi2 + . . . + βp xip ) ≥ M (1 − i )
Xn
i ≤ C
i=1
15 / 25
Support vector classier
16 / 25
Support vector classier
C" can be thought as a budget for the amount that the margin
can be violated by the n observations
17 / 25
A support vector classier
10 10
4
4
7 7
3
3
11
9 9
2
2
8 8
X2
X2
1
1
1 1
12
3 3
0
0
4 5 4 5
2 2
−1
−1
6 6
−0.5 0.0 0.5 1.0 1.5 2.0 2.5 −0.5 0.0 0.5 1.0 1.5 2.0 2.5
X1 X1
Figure: Support vector classier with margin and hyperplane for two
dierent choices of C"
18 / 25
Support vector classier on same data for choices of
tuning parameter
3
3
2
2
1
1
X2
X2
0
0
−1
−1
−2
−2
−3
−3
−1 0 1 2 −1 0 1 2
X1 X1
3
3
2
2
1
1
X2
X2
0
0
−1
−1
−2
−2
−3
−3
−1 0 1 2 −1 0 1 2
X1 X1
Figure: Support vector classier on same data for dierent values of C"
19 / 25
Support vector classier
20 / 25
Non-linear decision boundary: Support vector machine
21 / 25
A support vector classier on data having nonlinear
decision boundary
4
4
2
2
X2
X2
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
Figure: Non-linear decision boundary with two classes and Support vector
classier on them
22 / 25
Support vector machine
To address this non-linearity of the data we need to incorporate
some non-linear function of the features in the optimization
max M (4)
β0 ,β1 ,...,βp ,1 ,2 ,...,n
subject to
p
X
βj2 = 1
j=1
p
X p
X
yi β0 + βj1 xij + βj2 x2ij ≥ M (1 − i )
j=1 j=1
n
X
i ≤ C
i=1
p
X
K(xi , xi0 ) = exp −γ (xij − xi0 j )2 (5)
j=1
24 / 25
Support vector machine with a polynomial kernel of d=3
and radial kernel
4
4
2
2
X2
X2
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
Figure: Support vector machine with polynomial kernel at left panel and
radial kernel at right panel
25 / 25