Sunteți pe pagina 1din 25

Review of Le

ture 2

Sin e

has to be one of

h 1 , h2 , , hM ,

we on lude that
Is Learning feasible?
Yes, in a

probabilisti sense.

If:

Hi

|E (g) E (g)| >


in

Eout(h)

out

Then:

|E (h1) E (h1)| > or


|E (h2) E (h2)| > or

|E (hM ) E (hM )| >


in

out

in

out

in

out

Ein(h)
Hi

P[

22N

|E (h) E (h)| > ] 2e


in

out

This gives us an added

fa tor.

Learning From Data


Yaser S. Abu-Mostafa

California Institute of Te hnology

Le ture 3:

Linear Models I

Sponsored by Calte h's Provost O e, E&AS Division, and IST

Tuesday, April 10, 2012

Outline

Input representation

Linear Classi ation

Linear Regression

Nonlinear Transformation

Learning From Data - Le ture 3

2/23

A real data set

Learning From Data - Le ture 3

3/23

Input representation

`raw' input x = (x ,x , x , , x
linear model: (w , w , w , , w
0

256)
256)

Extra t useful information, e.g.,


intensity and symmetry x = (x ,x , x )
linear model:
(w , w , w )

Features:

Learning From Data - Le ture 3

4/23

810 0.91
1214
0
Illustration of features
0.1
1618 0.2
x =02(x0,x0.3
x1: intensity
1, x2)
0.4
46 0.5
0.6
810 0.7
1214 0.8
0.9
1618 -81
510 -7-6
155 -5
1015 -4-3
510 -2-1
0
15
Learning From Data - Le ture 3

810
1214
1618
x2: 02symmetry
46
810
1214
1618
510
155
1015
510
15
5/23

Evolution of Ein and Eout

a ements

50%

Eout

10%
1%
0

250

Learning From Data - Le ture 3

500

Ein

750

0.35
What PLA does0.4
-8Final per eptron boundary
-7
-6
-5
-4
-3
-2
-1
0
1
1000
6/23

The `po ket' algorithm


PLA:

a ements

Po ket:

50%

PSfrag repla ements

Eout

10%

10%

Eout

1%

1%
0

50%

250

Learning From Data - Le ture 3

500

Ein

750 1000

Ein

250

500

750 1000
7/23

a ements
0.05
0.1
0.15PLA:
0.2
0.25
0.3
0.35
0.4
-8
-7
-6
-5
-4
-3
-2
-1
0
1
Learning From Data - Le ture 3

0.35
Classi ation boundary0.4- PLA versus Po ket
-8Po ket:
-7
-6
-5
-4
-3
-2
-1
0
1
8/23

Outline

Input representation

Linear Classi ation

Linear Regression

Nonlinear Transformation

Learning From Data - Le ture 3

regression

real-valued output

9/23

Credit again

Credit approval (yes/no)


Regression: Credit line (dollar amount)
Classi ation:

Input:

x =

age
annual salary
years in residen e
years in job
urrent debt

Linear regression output:

h(x) =

d
X

23 years
$30,000
1 year
1 year
$15,000

wi xi = wTx

i=0

Learning From Data - Le ture 3

10/23

The data set

Credit o ers de ide on redit lines:


(x1, y1), (x2, y2), , (xN , yN )
yn R

is the redit line for ustomer x .


n

Linear regression tries to repli ate that.


Learning From Data - Le ture 3

11/23

How to measure the error

How well does h(x) = w x approximate f (x)?


In linear regression, we use squared error (h(x) f (x))
T

in-sample error:

Learning From Data - Le ture 3

N
X
1
2
(h(xn) yn)
Ein(h) =
N n=1

12/23

x
Learning From Data - Le ture 3

0.5
1
Illustration of linear0 regression
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
x
1
1

0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

x2
13/23

The expression for

in

N
X
1
2
T
( w xn yn)
Ein(w) =
N n=1

1
2
=
kXw yk
N

where
Learning From Data - Le ture 3

X=

x 
x 
.
x 
1
2

T
T

y1

y2
y=

yN

14/23

Minimizing
Ein(w) =
Ein(w) =

1
N kXw

2 T
(Xw
X
N

in

yk

y) = 0

XTXw = XTy
w = Xy where X = (XTX)1XT
X is the `pseudo-inverse' of X

Learning From Data - Le ture 3

15/23

The pseudo-inverse
X = (XTX)1XT

Learning From Data - Le ture 3



{z| {z }}
d+1
d+1
N d+1
|
N

{z }
d+1
{z

d+1 N

{z

d+1 N

16/23

The linear regression algorithm


1:

Constru t the matrix

and the ve tor y from the data set


(x , y ), , (x , y ) as follows


xT
y

T

x

y

X=
y = . .
,
.


xT 
y
|
|
{z
}
{z
}
1

target ve tor

input data matrix

2:

3:

Compute the pseudo-inverse X


Return w = X y.

Learning From Data - Le ture 3

= (XTX)1XT

17/23

Linear regression for lassi ation

Linear regression learns a real-valued fun tion y = f (x) R


Binary-valued fun tions are also real-valued! 1 R
Use linear regression to get w where w x y = 1
In this ase, sign(w x ) is likely to agree with y = 1
Good initial weights for lassi ation
T

Learning From Data - Le ture 3

18/23

Linear regression boundary

Symmetry

0.5
0.55
-8
-7
-6
-5
-4
-3
-2
-1
0
Learning From Data - Le ture 3

Average Intensity

18/23

Outline

Input representation

Linear Classi ation

Linear Regression

Nonlinear Transformation

Learning From Data - Le ture 3

19/23

frag repla ements


Data:
-1
-0.5
0
0.5
1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Learning From Data - Le ture 3

-1
Linear is limited
-0.5
0
Hypothesis:
0.5
1
-1.5
-1
-0.5
0
0.5
1
1.5
20/23

Another example
Credit line is ae ted by `years in residen e'
but not in a linear way!
Nonlinear [[xi < 1]] and [[xi > 5]] are better.
Can we do that with linear models?
Learning From Data - Le ture 3

21/23

Linear in what?

Linear regression implements

d
X

wi xi

i=0

Linear lassi ation implements

sign

d
X
i=0

Algorithms work be ause of


Learning From Data - Le ture 3

wi xi

linearity in the weights


22/23

PSfrag repla ements


-1
-0.5
0
0.5
1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Learning From Data - Le ture 3

0.6
0.8
Transform the data nonlinearly
1
0
(x , x ) 0.1(x , x )
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9

2
1

2
2

23/23

S-ar putea să vă placă și