Sunteți pe pagina 1din 104

Feature space

Basics of reproducing kernel Hilbert spaces


Kernel Ridge Regression

Lecture 1: Introduction to RKHS


Columbia, 2014

Gatsby Unit, CSML, UCL

April 29, 2014

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Kernels and feature space (1): XOR example


5

x2

5
5

No linear classifier separates red from blue


Map points
feature space:
 to higher dimensional

(x) = x1 x2 x1 x2 R3
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Kernels and feature space (2): smoothing

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.2

0.4

0.4

0.4

0.6

0.6

0.6

0.8

0.8

1
0.5

0.5

1.5

1
0.5

0.8
0

0.5

1.5

1
0.5

Kernel methods can control smoothness and avoid


overfitting/underfitting.

Lecture 1: Introduction to RKHS

0.5

1.5

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Outline: reproducing kernel Hilbert space

We will describe in order:


1

Hilbert space

Kernel (lots of examples: e.g. you can build kernels from


simpler kernels)

Reproducing property

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Hilbert space
Definition (Inner product)
Let H be a vector space over R. A function h, iH : H H R
is an inner product on H if
1
2

Linear: h1 f1 + 2 f2 , g iH = 1 hf1 , g iH + 2 hf2 , g iH


Symmetric: hf , g iH = hg , f iH

hf , f iH 0 and hf , f iH = 0 if and only if f = 0.


p
Norm induced by the inner product: kf kH := hf , f iH
3

Definition (Hilbert space)


Inner product space containing Cauchy sequence limits.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Hilbert space
Definition (Inner product)
Let H be a vector space over R. A function h, iH : H H R
is an inner product on H if
1
2

Linear: h1 f1 + 2 f2 , g iH = 1 hf1 , g iH + 2 hf2 , g iH


Symmetric: hf , g iH = hg , f iH

hf , f iH 0 and hf , f iH = 0 if and only if f = 0.


p
Norm induced by the inner product: kf kH := hf , f iH
3

Definition (Hilbert space)


Inner product space containing Cauchy sequence limits.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Hilbert space
Definition (Inner product)
Let H be a vector space over R. A function h, iH : H H R
is an inner product on H if
1
2

Linear: h1 f1 + 2 f2 , g iH = 1 hf1 , g iH + 2 hf2 , g iH


Symmetric: hf , g iH = hg , f iH

hf , f iH 0 and hf , f iH = 0 if and only if f = 0.


p
Norm induced by the inner product: kf kH := hf , f iH
3

Definition (Hilbert space)


Inner product space containing Cauchy sequence limits.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Hilbert space
Definition (Cauchy sequence)
A sequence {fn }
n=1 of elements of a normed vector space
(F, kkF ) is said to be a Cauchy (fundamental) sequence if for
every  > 0, there exists N = N() N, such that for all n, m N,
kfn fm kF < .
Definition (Complete space)
A metric space F is said to be complete if every Cauchy sequence
{fn }
n=1 in F converges: it has a limit, and this limit is in F.
Complete + norm = Banach space
Complete + inner product = Hilbert space
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Hilbert space
Definition (Cauchy sequence)
A sequence {fn }
n=1 of elements of a normed vector space
(F, kkF ) is said to be a Cauchy (fundamental) sequence if for
every  > 0, there exists N = N() N, such that for all n, m N,
kfn fm kF < .
Definition (Complete space)
A metric space F is said to be complete if every Cauchy sequence
{fn }
n=1 in F converges: it has a limit, and this limit is in F.
Complete + norm = Banach space
Complete + inner product = Hilbert space
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Kernel
Definition
Let X be a non-empty set. A function k : X X R is a kernel
if there exists an R-Hilbert space and a map : X H such that
x, x 0 X ,


k(x, x 0 ) := (x), (x 0 ) H .
Almost no conditions on X (eg, X itself doesnt need an inner
product, eg. documents).
A single kernel can correspond to several possible features. A
trivial example for X := R:


x/2
1 (x) = x
and
2 (x) =
x/ 2
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

New kernels from old: sums, transformations


Theorem (Sums of kernels are kernels)
Given > 0 and k, k1 and k2 all kernels on X , then k and
k1 + k2 are kernels on X .
To prove this, just check inner product definition. A difference of
kernels may not be a kernel (why?)
Theorem (Mappings between spaces)
Let X and Xe be sets, and define a map A : X Xe. Define the
kernel k on Xe. Then the kernel k(A(x), A(x 0 )) is a kernel on X .
Example: k(x, x 0 ) = x 2 (x 0 )2 .

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

New kernels from old: sums, transformations


Theorem (Sums of kernels are kernels)
Given > 0 and k, k1 and k2 all kernels on X , then k and
k1 + k2 are kernels on X .
To prove this, just check inner product definition. A difference of
kernels may not be a kernel (why?)
Theorem (Mappings between spaces)
Let X and Xe be sets, and define a map A : X Xe. Define the
kernel k on Xe. Then the kernel k(A(x), A(x 0 )) is a kernel on X .
Example: k(x, x 0 ) = x 2 (x 0 )2 .

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

New kernels from old: products


Theorem (Products of kernels are kernels)
Given k1 on X1 and k2 on X2 , then k1 k2 is a kernel on X1 X2 .
If X1 = X2 = X , then k := k1 k2 is a kernel on X .
Proof.
Define:
H1 corresponding to k1 (x, x 0 ) = h1 (x), 1 (x 0 )iH1 (e.g.:
kernel between two images)
H2 corresponding to k2 (y , y 0 ) = h2 (y ), 2 (y 0 )iH2 (e.g.:
kernel between two captions)
Is the following a kernel? (e.g. between one image-caption pair and
another)


K (x, y ), (x 0 , y 0 ) = k1 (x, x 0 ) k2 (y , y 0 )
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

New kernels from old: products


Proof.
Given b H2 and a H1 , we define the tensor product a b as a
rank-one operator from H2 to H1 ,
(a b)f 7 hb, f iH2 a.
a b HS(H2 , H1 ), space of Hilbert-Schmidt operators with inner
product
X
hL, MiHS =
hLfj , Mfj iH1 ,
jJ

where {fj } an ONB of H2 . Applying the above definition, for any


L HS(G, F),
hL, a biHS = ha, LbiH1 .
Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

New kernels from old: products


Proof.
P
To see this, first expand b = jJ hb, fj iH2 fj . Then
+
X
a, L hb, fj iG fj

*
ha, Lbi =
=

X
j

and

H1

hb, fj iH2 ha, Lfj iH1

ha b, LiHS =

hLfj , (a b)fj iH1


hb, fj iH2 ha, Lfj iH1 .

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

New kernels from old: products

Proof.
Special case:
hu v , a biHS = hu, aiH1 hb, v iH2 .
Apply this to


k1 (x, x 0 )k2 (y , y 0 ) = 1 (x), 1 (x 0 ) H 2 (y ), 2 (y 0 ) H
1

2
= 1 (x) 2 (y ), 1 (x 0 ) 2 (y 0 ) HS .

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Sums and products = polynomials

Theorem (Polynomial kernels)


Let x, x 0 Rd for d 1, and let m 1 be an integer and c 0 be
a positive real. Then


m
k(x, x 0 ) := x, x 0 + c
is a valid kernel.
To prove: expand into a sum (with non-negative scalars) of kernels
hx, x 0 i raised to integer powers. These individual terms are valid
kernels by the product rule.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Infinite sequences

The kernels weve seen so far are dot products between finitely
many features. E.g.

> 

sin(y ) y 3 log y
k(x, y ) = sin(x) x 3 log x


where (x) = sin(x) x 3 log x
Can a kernel be a dot product between infinitely many features?

Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Infinite sequences
Definition
The space `p of p-summable sequences is defined as all sequences
(ai )i1 for which

X
aip < .
i=1

Kernels can be defined in terms of sequences in `2 .


Theorem
Given sequence of functions (i (x))i1 in `2 where i : X R is
the ith coordinate of (x). Then
k(x, x 0 ) :=

i (x)i (x 0 )

i=1
Lecture 1: Introduction to RKHS

(1)

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Infinite sequences
Definition
The space `p of p-summable sequences is defined as all sequences
(ai )i1 for which

X
aip < .
i=1

Kernels can be defined in terms of sequences in `2 .


Theorem
Given sequence of functions (i (x))i1 in `2 where i : X R is
the ith coordinate of (x). Then
k(x, x 0 ) :=

i (x)i (x 0 )

i=1
Lecture 1: Introduction to RKHS

(1)

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Infinite sequences (proof)


Proof: We just need to check that inner product remains finite.
Norm kak`2 associated with inner product (1)
kak`2

v
u
uX
a2 ,
:= t
i

i=1

where a represents sequence with terms ai . Via Cauchy-Schwarz,




X





i (x)i (x 0 ) ki (x)k`2 i (x 0 ) ` ,

2


i=1

so the sequence defining the inner product converges for all


x, x 0 X
Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Taylor series kernels


Definition (Taylor series kernel)
For r (0, ], with an 0 for all n 0
f (z) =

an z n

|z| < r , z R,

n=0

Define X to be the

r -ball in Rd , sokxk <

k(x, x 0 ) = f

x, x 0

r,

n
an x, x 0 .

n=0

Example (Exponential kernel)


k(x, x 0 ) := exp

x, x 0

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Taylor series kernel (proof)


Proof: By Cauchy-Schwarz,


x, x 0 kxkkx 0 k < r ,
so the Taylor series converges. Define cj1 ...jd =

d
X
X
k(x, x 0 ) =
an
xj xj0
n=0

i =1 ji !

j=1

an

n=0

n!
Qd

X
j1 ...jd >0

cj1 ...jd

i=1

j1 . . . jd 0
j1 + . . . + jd = n
aj1 +...+jd cj1 ...jd

d
Y
i=1

d
Y
(xi , xi0 )ji

xiji

d
Y
(xi0 )ji .
i=1

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Gaussian kernel

Example (Gaussian kernel)


The Gaussian kernel on Rd is defined as


2 
k(x, x 0 ) := exp 2 x x 0 .
Proof: an exercise! Use product rule, mapping rule, exponential
kernel.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Positive definite functions

If we are given a function of two arguments, k(x, x 0 ), how can we


determine if it is a valid kernel?
1

Find a feature map?


1

2
2

Sometimes this is not obvious (eg if the feature vector is


infinite dimensional, e.g. the Gaussian kernel in the last slide)
The feature map is not unique.

A direct property of the function: positive definiteness.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Positive definite functions

Definition (Positive definite functions)


A symmetric function k : X X R is positive definite if
n 1, (a1 , . . . an ) Rn , (x1 , . . . , xn ) X n ,
n X
n
X
i=1 j=1

ai aj k(xi , xj ) 0.

The function k(, ) is strictly positive definite if for mutually


distinct xi , the equality holds only when all the ai are zero.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Kernels are positive definite


Theorem
Let H be a Hilbert space, X a non-empty set and : X H.
Then h(x), (y )iH =: k(x, y ) is positive definite.
Proof.
n X
n
X
i=1 j=1

ai aj k(xi , xj ) =

n X
n
X
i=1 j=1

hai (xi ), aj (xj )iH


2
n
X



=
ai (xi ) 0.


i=1

Reverse also holds: positive definite k(x, x 0 ) is inner product in a


unique H (Moore-Aronsajn: coming later!).
Lecture 1: Introduction to RKHS

The reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

First example: finite space, polynomial features


Reminder: XOR example:
5

x2

5
5

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

First example: finite space, polynomial features


Reminder: Feature space from XOR motivating example:
: R2 R3



x1
x1
x=
7 (x) = x2 ,
x2
x1 x2
with kernel

>

x1
y1
k(x, y ) = x2 y2
x1 x2
y1 y2

(the standard inner product in R3 between features). Denote this


feature space by H.
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

First example: finite space, polynomial features


Define a linear function of the inputs x1 , x2 , and their product x1 x2 ,
f (x) = f1 x1 + f2 x2 + f3 x1 x2 .
f in a space of functions mapping from X = R2 to R. Equivalent
representation for f ,

>
f () = f1 f2 f3
.
f () refers to the function as an object (here as a vector in R3 )
f (x) R is function evaluated at a point (a real number).
f (x) = f ()> (x) = hf (), (x)iH
Evaluation of f at x is an inner product in feature space (here
standard inner product in R3 )
H is a space of functions mapping R2 to R.
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

First example: finite space, polynomial features


Define a linear function of the inputs x1 , x2 , and their product x1 x2 ,
f (x) = f1 x1 + f2 x2 + f3 x1 x2 .
f in a space of functions mapping from X = R2 to R. Equivalent
representation for f ,

>
f () = f1 f2 f3
.
f () refers to the function as an object (here as a vector in R3 )
f (x) R is function evaluated at a point (a real number).
f (x) = f ()> (x) = hf (), (x)iH
Evaluation of f at x is an inner product in feature space (here
standard inner product in R3 )
H is a space of functions mapping R2 to R.
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

First example: finite space, polynomial features


(y ) is a mapping from R2 to R3 . . .
. . .which also parametrizes a function mapping R2 to R.
>

= (y ),
k(, y ) := y1 y2 y1 y2
Given y , there is a vector k(, y ) in H such that
hk(, y ), (x)iH = ax1 + bx2 + cx1 x2 ,
where a = y1 , b = y2 , and c = y1 y2
Due to symmetry,
hk(, x), (y )i = uy1 + vy2 + wy1 y2
= k(x, y ).

We can write (x) = k(, x) and (y ) = k(, y ) without ambiguity:


canonical feature map
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

First example: finite space, polynomial features


(y ) is a mapping from R2 to R3 . . .
. . .which also parametrizes a function mapping R2 to R.
>

= (y ),
k(, y ) := y1 y2 y1 y2
Given y , there is a vector k(, y ) in H such that
hk(, y ), (x)iH = ax1 + bx2 + cx1 x2 ,
where a = y1 , b = y2 , and c = y1 y2
Due to symmetry,
hk(, x), (y )i = uy1 + vy2 + wy1 y2
= k(x, y ).

We can write (x) = k(, x) and (y ) = k(, y ) without ambiguity:


canonical feature map
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

The reproducing property

This example illustrates the two defining features of an RKHS:


The reproducing property:
x X , f () H, hf (), k(, x)iH = f (x)
. . .or use shorter notation hf , (x)iH .
In particular, for any x, y X ,

k(x, y ) = hk (, x) , k (, y )iH .
Note: the feature map of every point is in the feature space:
x X , k(, x) = (x) H,

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

First example: finite space, polynomial features


Another, more subtle point:H can be larger than all (x).
Why?

(x) : x X

E.g. f = [1 1 1] H cannot be obtained by (x) = [x1 x2 (x1 x2 )].


Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

First example: finite space, polynomial features


Another, more subtle point:H can be larger than all (x).
Why?

(x) : x X

E.g. f = [1 1 1] H cannot be obtained by (x) = [x1 x2 (x1 x2 )].


Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Second (infinite) example: fourier series


Function on the torus T := [, ] with periodic boundary. Fourier
series:

X
X
f (x) =
f` exp(`x) =
f` (cos(`x) + sin(`x)) .
`=

l=

Example: top hat function,


(
1 |x| < T ,
f (x) =
0 T |x| < .
Fourier series:
sin(`T )
f` :=
`

f (x) =

2f` cos(`x).

`=0

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for top hat function


Top hat

Basis function
1

cos( x)

1.4
1.2
1

0.5
0

0.5
1
4

0.6

10

t
Fourier series coefficients

0.5

0.4

0.4
0.2
0.3

f (x)

0.8

0.2
0.1

0.2
4

0
10

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for top hat function


Top hat

Basis function
1

cos( x)

1.4
1.2
1

0.5
0

0.5
1
4

0.6

10

t
Fourier series coefficients

0.5

0.4

0.4
0.2
0.3

f (x)

0.8

0.2
0.1

0.2
4

0
10

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for top hat function


Top hat

Basis function
1

cos( x)

1.4
1.2
1

0.5
0

0.5
1
4

0.6

10

t
Fourier series coefficients

0.6

0.4

0.4

0.2

f (x)

0.8

0.2

0
0
0.2
4

0.2
10

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for top hat function


Top hat

Basis function
1

cos( x)

1.4
1.2
1

0.5
0

0.5
1
4

0.6

10

t
Fourier series coefficients

0.6

0.4

0.4

0.2

f (x)

0.8

0.2

0
0
0.2
4

0.2
10

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for top hat function


Top hat

Basis function
1

cos( x)

1.4
1.2
1

0.5
0

0.5
1
4

0.6

10

t
Fourier series coefficients

0.6

0.4

0.4

0.2

f (x)

0.8

0.2

0
0
0.2
4

0.2
10

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for top hat function


Top hat

Basis function
1

cos( x)

1.4
1.2
1

0.5
0

0.5
1
4

0.6

10

t
Fourier series coefficients

0.6

0.4

0.4

0.2

f (x)

0.8

0.2

0
0
0.2
4

0.2
10

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for top hat function


Top hat

Basis function
1

cos( x)

1.4
1.2
1

0.5
0

0.5
1
4

0.6

10

t
Fourier series coefficients

0.6

0.4

0.4

0.2

f (x)

0.8

0.2

0
0
0.2
4

0.2
10

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for kernel function


Kernel takes a single argument,
k(x, y ) = k(x y ),
Define the Fourier series representation of k
k(x) =

k` exp (`x) ,

`=

k and its Fourier transform are real and symmetric. E.g. ,




 2 2
x 2
`
1
1

,
,
k` =
exp
.
k(x) =
2
2 2
2
2
is the Jacobi theta function, close to Gaussian when 2 sufficiently narrower than
[, ].
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for Gaussian-spectrum kernel


Gaussian

Basis function

cos( x)

0.6

0.5

0.5
0

0.5
1
4

t
Fourier series coefficients

0.3
0.2
0.2
0.15
0.1

k (x)

0.4

0.1
4

0.1
0.05

0
10

Lecture 1: Introduction to RKHS

10

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for Gaussian-spectrum kernel


Gaussian

Basis function

cos( x)

0.6

0.5

0.5
0

0.5
1
4

t
Fourier series coefficients

0.3
0.2
0.2
0.15
0.1

k (x)

0.4

0.1
4

0.1
0.05

0
10

Lecture 1: Introduction to RKHS

10

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for Gaussian-spectrum kernel


Gaussian

Basis function

cos( x)

0.6

0.5

0.5
0

0.5
1
4

t
Fourier series coefficients

0.3
0.2
0.2
0.15
0.1

k (x)

0.4

0.1
4

0.1
0.05

0
10

Lecture 1: Introduction to RKHS

10

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series for Gaussian-spectrum kernel


Gaussian

Basis function

cos( x)

0.6

0.5

0.5
0

0.5
1
4

t
Fourier series coefficients

0.3
0.2
0.2
0.15
0.1

k (x)

0.4

0.1
4

0.1
0.05

0
10

Lecture 1: Introduction to RKHS

10

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Feature space via fourier series


Define H to be the space of functions with (infinite) feature space
representation
f () =

. . . f` /

i>
q

.
k` . . .

The space H has an inner product:


hf , g iH =

f g
q ` ` q  .
`=
k`
k`

Define the feature map


k(, x) = (x) =

...

k` exp(`x) . . .

i>

Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Feature space via fourier series


Define H to be the space of functions with (infinite) feature space
representation
f () =

. . . f` /

i>
q

.
k` . . .

The space H has an inner product:


hf , g iH =

f g
q ` ` q  .
`=
k`
k`

Define the feature map


k(, x) = (x) =

...

i>
q
k` exp(`x) . . .
Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Feature space via fourier series


The reproducing theorem holds,
hf (), k(, x)iH =
=


X
f`
`=

q
k` exp(`x)
q
k`

f` exp(`x) = f (x),

`=

. . .including for the kernel itself,


!
 q
q
X
hk(, x), k(, y )iH =
k` exp(`x)
k` exp(`y )
=

`=

`=

k` exp(`(y x)) = k(x y ).


Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Feature space via fourier series


The reproducing theorem holds,
hf (), k(, x)iH =
=


X
f`
`=

q
k` exp(`x)
q
k`

f` exp(`x) = f (x),

`=

. . .including for the kernel itself,


!
 q
q
X
hk(, x), k(, y )iH =
k` exp(`x)
k` exp(`y )
=

`=

`=

k` exp(`(y x)) = k(x y ).


Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series: what does it achieve?


The squared norm of a function f in H is:
kf k2H = hf , f iH =


X
f` f`
.
k`

l=

If k` decays fast, then so must f` if we want kf k2H < .


Recall

X
f (x) =
f` (cos(`x) + sin(`x)) .
`=

Enforces smoothness.
Question: is the top hat function in the Gaussian RKHS?
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series: what does it achieve?


The squared norm of a function f in H is:
kf k2H = hf , f iH =


X
f` f`
.
k`

l=

If k` decays fast, then so must f` if we want kf k2H < .


Recall

X
f (x) =
f` (cos(`x) + sin(`x)) .
`=

Enforces smoothness.
Question: is the top hat function in the Gaussian RKHS?
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Fourier series: what does it achieve?


The squared norm of a function f in H is:
kf k2H = hf , f iH =


X
f` f`
.
k`

l=

If k` decays fast, then so must f` if we want kf k2H < .


Recall

X
f (x) =
f` (cos(`x) + sin(`x)) .
`=

Enforces smoothness.
Question: is the top hat function in the Gaussian RKHS?
Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Third example: infinite feature space


Reproducing
Pm property for function
Pm with Gaussian kernel:
f (x) := i=1 i k(xi , x) = h i=1 i (xi ), (x)iH .
1
0.8

f(x)

0.6
0.4
0.2
0
0.2
0.4
6

What do the features (x) look like (there are infinitely many
of them, they are not unique!)
What do these features have to do with smoothness?
Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Third example: infinite feature space


Reproducing
Pm property for function
Pm with Gaussian kernel:
f (x) := i=1 i k(xi , x) = h i=1 i (xi ), (x)iH .
1
0.8

f(x)

0.6
0.4
0.2
0
0.2
0.4
6

What do the features (x) look like (there are infinitely many
of them, they are not unique!)
What do these features have to do with smoothness?
Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Third example: infinite feature space


Reproducing
Pm property for function
Pm with Gaussian kernel:
f (x) := i=1 i k(xi , x) = h i=1 i (xi ), (x)iH .
1
0.8

f(x)

0.6
0.4
0.2
0
0.2
0.4
6

What do the features (x) look like (there are infinitely many
of them, they are not unique!)
What do these features have to do with smoothness?
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Third example: infinite feature space


Define RKHS kernel k such that kkkL2 () < and the associated
RKHS H is separable. The operator
Tk

L2 () L2 ()

f 7 f (x 0 )k(x, x 0 )d (x 0 )

is compact, postive, self-adjoint. (Steinwart and Christmann, Theorem 4.27)


By the spectral theorem there is an at most countable ONS s.t.
(

X
1 i =j
Tk f =
j hf , ej i ej
ei (x)ej (x)d (x) =
0 i 6= j.
X
j
Can we use the {i , ei } to construct a feature space for H?
Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Third example: infinite feature space

Theorem
(Mercer) Let X be a compact metric space, k be a continous
kernel, and be a finite Borel measure with supp{} = X . Then
the convergence of
X
k(x, y ) =
j ej (x)ej (y )
j

is absolute and uniform (ej is the continuous element of the L2


equivalence class ej .).

Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Third example: infinite feature space


Theorem
(Mercer RKHS)(Steinwart and Christmann, Theorem 4.51) Under the
assumptions of Mercers theorem,
)
(
X p
H :=
ai i ei : ai `2
i

is an RKHS
with

kernel k. The feature map is
i ei (x) . . . .
(x) = . . .
Given two functions in the RKHS
X p
X p
f :=
ai i ei ,
g :=
bi i ei ,
i

the inner product is hf , g iH =

ai bi
Lecture 1: Introduction to RKHS

(2)

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Third example: infinite feature space


Proof: Most of the requirements for this being a Hilbert space are
straightforward. There are two aspects requiring care:
1 Is k(x, ) H
x X ? Requires Mercers theorem
2 Does the reproducing property hold? hf , k(, x)i
H = f (x).
First part:
By the definition of H (2), the function in H indexed by x is
 p

X p
k(x, ) =
i ei (x)
i ei () .
i

Is this function in the RKHS? Yes, if the `2 norm of


bounded. This is due to Mercer: x X ,
p
 2


i ei (x) = k(x, x) < .


i ei (x) is

`2

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Third example: infinite feature space

Proof:
Second part:
The reproducing property holds: using the inner product definition,

X p
i ei (x) = f (x),
hf , k(x, )iH =
fi
i

which is always well defined since both f `2 and k(x, ) `2 .

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Third example: infinite feature space




k2
Gaussian kernel, k(x, y ) = exp kxy
,
2 2
k

bk

b<1

ek (x) exp((c a)x 2 )Hk (x 2c),

a, b, c are functions of , and Hk is kth order Hermite polynomial.

k(x, x ) =

i ei (x)ei (x 0 )

i=1

(Figure from Rasmussen and Williams)


WARNING: R is non-compact domain,
cannot use Mercer argument in form
given earlier.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Third example: infinite feature space


Example RKHS function, Gaussian kernel:

m
m
hp
i
X
X
X
X
j ej (xi )ej (x) =
fj
i k(xi , x) =
i
f (x) :=
j ej (x)
i=1

where fj =

j=1

i=1

Pm

i=1 i

j=1

j ej (xi ).

1
0.8

f(x)

0.6
0.4
0.2
0
0.2
0.4
6

NOTE that this


enforces
smoothing:
j decay as ej
become rougher,
fP
j decay since
2
j fj < .

x
Lecture 1: Introduction to RKHS

Some reproducing kernel Hilbert space theory

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Reproducing kernel Hilbert space (1)


Definition
H a Hilbert space of R-valued functions on non-empty set X . A
function k : X X R is a reproducing kernel of H, and H is a
reproducing kernel Hilbert space, if
x X , k(, x) H,

x X , f H, hf (), k(, x)iH = f (x) (the reproducing


property).
In particular, for any x, y X ,
k(x, y ) = hk (, x) , k (, y )iH .

(3)

Original definition: kernel an inner product between feature maps.


Then (x) = k(, x) a valid feature map.
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Reproducing kernel Hilbert space (2)


Another RKHS definition:
Define x to be the operator of evaluation at x, i.e.
x f = f (x) f H, x X .
Definition (Reproducing kernel Hilbert space)
H is an RKHS if the evaluation operator x is bounded: x X
there exists x 0 such that for all f H,
|f (x)| = |x f | x kf kH
= two functions identical in RHKS norm agree at every point:
|f (x) g (x)| = |x (f g )| x kf g kH

f , g H.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

RKHS definitions equivalent


Theorem (Reproducing kernel equivalent to bounded x )
H is a reproducing kernel Hilbert space (i.e., its evaluation
operators x are bounded linear operators), if and only if H has a
reproducing kernel.
Proof: If H has a reproducing kernel = x bounded
|x [f ]| = |f (x)|

= |hf , k(, x)iH |

kk(, x)kH kf kH

1/2

= hk(, x), k(, x)iH kf kH


= k(x, x)1/2 kf kH

Cauchy-Schwarz in 3rd line . Consequently, x : F R bounded


with x = k(x, x)1/2 .
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

RKHS definitions equivalent


Proof: x bounded = H has a reproducing kernel
We use. . .
Theorem
(Riesz representation) In a Hilbert space H, all bounded linear
functionals are of the form h, g iH , for some g H.
If x : F R is a bounded linear functional, by Riesz fx H
such that
x f = hf , fx iH , f H.

Define k(x 0 , x) = fx (x 0 ), x, x 0 X . By its definition, both


k(, x) = fx H and hf , k(, x)iH = x f = f (x). Thus, k is the
reproducing kernel.
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem

Theorem (Moore-Aronszajn)
Let k : X X R be positive definite. There is a unique RKHS
H RX with reproducing kernel k.
Recall feature map is not unique (as we saw earlier): only kernel is.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Main message #1

Reproducing kernels

Posi1ve denite func1ons

Hilbert func1on spaces with


bounded point evalua1on

Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Main message #2
Small RKHS norm results in smooth functions.
E.g. kernel ridge regression with Gaussian kernel:
n
X

f = arg min

f H

i=1

!
2

(yi hf , (xi )iH ) + kf k2H

=0.1, =0.6

=10, =0.6

0.5

0.5

=1e07, =0.6
1.5
1
0.5

0.5

0.5

1
0.5

0.5

1.5

1
0.5

0.5

0.5

1.5

1
0.5

0.5

Lecture 1: Introduction to RKHS

1.5

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem: pre-RKHS

How do we prove this? (Sketch only - a very good full proof is in


Berlinet and Thomas-Agnan, 2004, Chapter 1)

Starting with a positive def. k, construct a pre-RKHS (an inner


product space) H0 RX with properties:
1
2

The evaluation functionals x are continuous on H0 ,

Any H0 -Cauchy sequence fn which converges pointwise to 0


also converges in H0 -norm to 0

Lecture 1: Introduction to RKHS

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Moore-Aronszajn Theorem: pre-RKHS


pre-RKHS H0 = span {k(, x) | x X } will be taken to be the set
of functions:
n
X
f (x) =
i k(x, xi )
i=1
1
0.8

f(x)

0.6
0.4
0.2
0
0.2
0.4
6

x
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem: Steps


Theorem (Moore-Aronszajn - Step A)
Space H0 = span {k(, x) | x X }, endowed with the inner product
hf , g iH0
where f =
pre-RKHS.

Pn

i=1 i k(, xi )

n X
m
X

i j k(xi , yj ),

i=1 j=1

and g =

Pm

j=1 j k(, yj ),

is a valid

Theorem (Moore-Aronszajn - Step B)


Let H0 be a pre-RKHS space. Define H to be the set of functions
f RX for which there exists an H0 -Cauchy sequence {fn }
converging pointwise to f . Then, H is an RKHS.
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem: Steps


Theorem (Moore-Aronszajn - Step A)
Space H0 = span {k(, x) | x X }, endowed with the inner product
hf , g iH0
where f =
pre-RKHS.

Pn

i=1 i k(, xi )

n X
m
X

i j k(xi , yj ),

i=1 j=1

and g =

Pm

j=1 j k(, yj ),

is a valid

Theorem (Moore-Aronszajn - Step B)


Let H0 be a pre-RKHS space. Define H to be the set of functions
f RX for which there exists an H0 -Cauchy sequence {fn }
converging pointwise to f . Then, H is an RKHS.
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem - Step A

Is hf , g iH0 a valid inner product?

Are evaluation functionals x are continuous on H0 ?

Does every H0 -Cauchy sequence fn which converges pointwise


to 0 also converge in H0 -norm to 0?

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem- Step B


Define H to be the set of functions f RX for which there exists
an H0 -Cauchy sequence {fn } converging pointwise to f . Clearly,
H0 H.
1

We define the inner product between f , g H as the limit of


an inner product of the H0 -Cauchy sequences {fn }, {gn }
converging to f and g respectively. Is this inner product well
defined, i.e., independent of the sequences used?

An inner product space must satisfy hf , f iH = 0 iff f = 0. Is


this true when we define the inner product on H as above?

3
4

Are the evaluation functionals still continuous on H?

Is H complete (i.e., does every H-Cauchy sequence converge)?


(1)+(2)+(3)+(4) = H is RKHS!
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem- Step B


Define H to be the set of functions f RX for which there exists
an H0 -Cauchy sequence {fn } converging pointwise to f . Clearly,
H0 H.
1

We define the inner product between f , g H as the limit of


an inner product of the H0 -Cauchy sequences {fn }, {gn }
converging to f and g respectively. Is this inner product well
defined, i.e., independent of the sequences used?

An inner product space must satisfy hf , f iH = 0 iff f = 0. Is


this true when we define the inner product on H as above?

3
4

Are the evaluation functionals still continuous on H?

Is H complete (i.e., does every H-Cauchy sequence converge)?


(1)+(2)+(3)+(4) = H is RKHS!
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem- Step B


Define H to be the set of functions f RX for which there exists
an H0 -Cauchy sequence {fn } converging pointwise to f . Clearly,
H0 H.
1

We define the inner product between f , g H as the limit of


an inner product of the H0 -Cauchy sequences {fn }, {gn }
converging to f and g respectively. Is this inner product well
defined, i.e., independent of the sequences used?

An inner product space must satisfy hf , f iH = 0 iff f = 0. Is


this true when we define the inner product on H as above?

3
4

Are the evaluation functionals still continuous on H?

Is H complete (i.e., does every H-Cauchy sequence converge)?


(1)+(2)+(3)+(4) = H is RKHS!
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem- Step B


Define H to be the set of functions f RX for which there exists
an H0 -Cauchy sequence {fn } converging pointwise to f . Clearly,
H0 H.
1

We define the inner product between f , g H as the limit of


an inner product of the H0 -Cauchy sequences {fn }, {gn }
converging to f and g respectively. Is this inner product well
defined, i.e., independent of the sequences used?

An inner product space must satisfy hf , f iH = 0 iff f = 0. Is


this true when we define the inner product on H as above?

3
4

Are the evaluation functionals still continuous on H?

Is H complete (i.e., does every H-Cauchy sequence converge)?


(1)+(2)+(3)+(4) = H is RKHS!
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem- Step B


Define H to be the set of functions f RX for which there exists
an H0 -Cauchy sequence {fn } converging pointwise to f . Clearly,
H0 H.
1

We define the inner product between f , g H as the limit of


an inner product of the H0 -Cauchy sequences {fn }, {gn }
converging to f and g respectively. Is this inner product well
defined, i.e., independent of the sequences used?

An inner product space must satisfy hf , f iH = 0 iff f = 0. Is


this true when we define the inner product on H as above?

3
4

Are the evaluation functionals still continuous on H?

Is H complete (i.e., does every H-Cauchy sequence converge)?


(1)+(2)+(3)+(4) = H is RKHS!
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

What is a kernel?
Constructing new kernels
Positive definite functions
Reproducing kernel Hilbert space

Moore-Aronszajn Theorem- Step B


Define H to be the set of functions f RX for which there exists
an H0 -Cauchy sequence {fn } converging pointwise to f . Clearly,
H0 H.
1

We define the inner product between f , g H as the limit of


an inner product of the H0 -Cauchy sequences {fn }, {gn }
converging to f and g respectively. Is this inner product well
defined, i.e., independent of the sequences used?

An inner product space must satisfy hf , f iH = 0 iff f = 0. Is


this true when we define the inner product on H as above?

3
4

Are the evaluation functionals still continuous on H?

Is H complete (i.e., does every H-Cauchy sequence converge)?


(1)+(2)+(3)+(4) = H is RKHS!
Lecture 1: Introduction to RKHS

Kernel Ridge Regression

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Kernel ridge regression

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.2

0.4

0.4

0.4

0.6

0.6

0.6

0.8

0.8

0.8

1
0.5

0.5

1.5

1
0.5

0.5

1.5

1
0.5

Very simple to implement, works well when no outliers.

Lecture 1: Introduction to RKHS

0.5

1.5

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Ridge regression: case of RD


We are given n training points in RD :
X =

x1 . . . xn

RDn

y :=

y1 . . . yn

>

Define some > 0. Our goal is:


f = arg min

f Rd

n
X
(yi xi> f )2 + kf k2

i=1

The second term kf k2 is chosen to avoid problems in high


dimensional spaces (see below).

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Ridge regression: case of RD


We are given n training points in RD :
X =

x1 . . . xn

RDn

y :=

y1 . . . yn

Define some > 0. Our goal is:


n
X
(yi xi> f )2 + kf k2

a = arg min

f Rd

i=1

Solution is:
f =

XX > + I

1

Xy ,

which is the classic regularized least squares solution.


Lecture 1: Introduction to RKHS

>

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Kernel ridge regression


Use features of (xi ) in the place of xi :
f = arg min

f H

n
X
i=1

!
(yi hf , (xi )iH )2 + kf k2H

E.g. for finite dimensional feature spaces,

p (x) =

x
x2
..
.
x`

s (x) =

sin x
cos x
sin 2x
..
.

cos `x

a is a vector of length ` giving weight to each of these features so


as to find the mapping between x and y . Feature vectors can also
have infinite length (more soon).
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Kernel ridge regression


Solution easy if we already know f is a linear combination of
feature space mappings of points: representer theorem.
f =

n
X

i (xi ) =

i=1

n
X
i=1

i k(xi , ).

1
0.8

f(x)

0.6
0.4
0.2
0
0.2
0.4
6

x
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Representer theorem
Given a set of paired observations (x1 , y1 ), . . . (xn , yn ) (regression or
classification).
Find the function f in the RKHS H which satisfies
J(f ) = min J(f ),
f H

where



J(f ) = Ly (f (x1 ), . . . , f (xn )) + kf k2H ,

is non-decreasing, and y is the vector of yi .


P
Classification: Ly (f (x1 ), . . . , f (xn )) = ni=1 Iyi f (xi )0
P
Regression: Ly (f (x1 ), . . . , f (xn )) = ni=1 (yi f (xi ))2
Lecture 1: Introduction to RKHS

(4)

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Representer theorem

The representer theorem:(simple version) solution to


h

i
min Ly (f (x1 ), . . . , f (xn )) + kf k2H
f H

takes the form


f =

n
X
i=1

i k(xi , ).

If is strictly increasing, all solutions have this form.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Representer theorem: proof


Proof: Denote fs projection of f onto the subspace
span {k(xi , ) : 1 i n} ,
such that
f = fs + f ,
Pn

where fs =
Regularizer:

i=1 i k(xi , ).

kf k2H = kfs k2H + kf k2H kfs k2H ,


then





kf k2H kfs k2H ,

so this term is minimized for f = fs .


Lecture 1: Introduction to RKHS

(5)

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Representer theorem: proof


Proof (cont.): Individual terms f (xi ) in the loss:
f (xi ) = hf , k(xi , )iH = hfs + f , k(xi , )iH = hfs , k(xi , )iH ,
so
Ly (f (x1 ), . . . , f (xn )) = Ly (fs (x1 ), . . . , fs (xn )).
Hence
Loss L(. . .) only depends on the component of f in the data
subspace,
Regularizer (. . .) minimized when f = fs .
If is strictly non-decreasing, then kf kH = 0 is required at
the minimum.
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Kernel ridge regression: proof


We begin knowing f is a linear combination of feature space
mappings of points (representer theorem)
f =

n
X

i (xi ).

i=1

Then
n
X
i=1

(yi hf , (xi )iH )2 + kf k2H = ky K k2 + > K

Differentiating wrt and setting this to zero, we get


= (K + In )1 y .
Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Reminder: smoothness
What does kakH have to do with smoothing?
Example 1: The Fourier series representation on torus T:
f (x) =

fl exp(lx),

l=

and
hf , g iH =


X
fl gl
.
kl

l=

Thus,
kf k2H = hf , f iH =

2

fl

l=

kl

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Reminder: smoothness
What does kakH have to do with smoothing?
Example 2: The Gaussian kernel on R. Recall
f (x) =

X
p
ai i ei (x),
i=1

kf k2H =

ai2 .

i=1

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Parameter selection for KRR


Given the objective
f

= arg min

f H

n
X
i=1

!
2

(yi hf , (xi )iH ) + kf

k2H

How do we choose
The regularization parameter ?
The kernel parameter: for Gaussian kernel, in


kx y k2
k(x, y ) = exp
.

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Choice of

=0.1, =0.6
1

0.5

0.5

1
0.5

0.5

1.5

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Choice of

=0.1, =0.6

=10, =0.6

0.5

0.5

=1e07, =0.6
1.5
1
0.5

0.5

0.5

1
0.5

0.5

1.5

1
0.5

0.5

0.5

1.5

1
0.5

0.5

Lecture 1: Introduction to RKHS

1.5

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Choice of

=0.1, =0.6
1

0.5

0.5

1
0.5

0.5

1.5

Lecture 1: Introduction to RKHS

Feature space
Basics of reproducing kernel Hilbert spaces
Kernel Ridge Regression

Choice of

=0.1, =0.6

=0.1, =2

=0.1, =0.1

0.5

0.5

0.5

0.5

0.5

0.5

1
0.5

0.5

1.5

1
0.5

0.5

1.5

1
0.5

0.5

Lecture 1: Introduction to RKHS

1.5

S-ar putea să vă placă și