Sunteți pe pagina 1din 33

6.

867
Nonlinear models,Kernels

Fall 2016
1X X
maximize i j yi yj hxi , xj i + i
2 i,j i
X
subject to i yi = 0 and i 2 [0, C]
i

X
w= yi i x i
i i = 0 =) yi [hw, xi i + b] 1
i [yi [hw, xi i + b] + i 1] = 0 0 < i < C =) yi [hw, xi i + b] = 1
i i = 0 i = C =) yi [hw, xi i + b] 1

courtesy: A. Smola (CMU, 2013)


The Kernel Trick
Linear soft margin problem
1 2
X
minimize kwk + C i
w,b 2 i
subject to yi [hw, xi i + b] 1 i and i 0

Dual problem
1X X
maximize
2 i,j
i j yi yj hxi , xj i + i
i
X
subject to i yi = 0 and i 2 [0, C]
i

Support vector expansion


X
f (x) = i yi hxi , xi + b
i
courtesy: A. Smola (CMU, 2013)
The Kernel Trick
Linear soft margin problem
1 2
X
minimize kwk + C i
w,b 2 i
subject to yi [hw, (xi )i + b] 1 i and i 0

Dual problem
1X X
maximize
2 i,j
i j yi yj k(xi , xj ) + i

i
X
subject to i yi = 0 and i 2 [0, C]
i

Support vector expansion


X
f (x) = i yi k(xi , x) + b
i
courtesy: A. Smola (CMU, 2013)
C=1

courtesy: A. Smola (CMU, 2013)


C=2

courtesy: A. Smola (CMU, 2013)


C=5

courtesy: A. Smola (CMU, 2013)


C=10

courtesy: A. Smola (CMU, 2013)


C=20

courtesy: A. Smola (CMU, 2013)


C=50

courtesy: A. Smola (CMU, 2013)


C=100

courtesy: A. Smola (CMU, 2013)


C=1

courtesy: A. Smola (CMU, 2013)


C=2

courtesy: A. Smola (CMU, 2013)


C=5

courtesy: A. Smola (CMU, 2013)


C=10

courtesy: A. Smola (CMU, 2013)


C=20

courtesy: A. Smola (CMU, 2013)


C=50

courtesy: A. Smola (CMU, 2013)


C=100

courtesy: A. Smola (CMU, 2013)


C=1

courtesy: A. Smola (CMU, 2013)


C=2

courtesy: A. Smola (CMU, 2013)


C=5

courtesy: A. Smola (CMU, 2013)


C=10

courtesy: A. Smola (CMU, 2013)


C=20

courtesy: A. Smola (CMU, 2013)


C=50

courtesy: A. Smola (CMU, 2013)


C=100

courtesy: A. Smola (CMU, 2013)


now with a narrower kernel

courtesy: A. Smola (CMU, 2013)


courtesy: A. Smola (CMU, 2013)
courtesy: A. Smola (CMU, 2013)
courtesy: A. Smola (CMU, 2013)
courtesy: A. Smola (CMU, 2013)
now with a very wide kernel

courtesy: A. Smola (CMU, 2013)


courtesy: A. Smola (CMU, 2013)
Nonlinear separation
Pattern Recognition

Figure 7.10 2D toy example of a binary classification problem solved using a soft margin
Increasing C allows for more nonlinearities
SVC. In all cases, a Gaussian kernel (7.27) is used. From left to right, we decrease the
kernel width. Note that for a large width, the decision boundary is almost linear, and
Decreases number of errors
the data set cannot be separated without error (see text). Solid lines represent decision
boundaries; dotted lines depict the edge of the margin (where (7.34) becomes an equality
SV boundary need not be contiguous
with i = 0).

Kernel width adjusts function class


was used, but the kernel width c was varied. For large values of c, the classifier is
courtesy: A. Smola (CMU, 2013)
almost linear, and it cannot separate the data set without errors. For a small width

S-ar putea să vă placă și