C Se 546 Wi 12 Linear Regression

CSL346: Llnear 8egresslon
8las / varlance 1radeo

WlnLer 2012
Luke Zeulemoyer

Slldes adapLed from Carlos CuesLrln
redlcuon of conunuous varlables
8llllonalre says: WalL, LhaL's noL whaL l meanL!
?ou say: Chlll ouL, dude.
Pe says: l wanL Lo predlcL a conunuous
varlable for conunuous lnpuLs: l wanL Lo
predlcL salarles from CA.
?ou say: ! #$% &'(&')) *+$*,
0 20
0
20
40
0
10
20
30
40
0
10
20
30
20
22
24
26
Linear Regression
Prediction Prediction
Ordinary Least Squares (OLS)
0 20
0
Error or residual
Prediction
Observation
1he regresslon problem
lnsLances: <-
[
, L
[
>
Learn: Mapplng from x Lo L(-)
PypoLhesls space:
Clven, basls funcuons {h
1
,,h
k
}
llnd coes .=[w
1
,,w
k
}
Why ls Lhls usually called !"#$%& &$(&$))"*#?
model ls llnear ln Lhe parameLers
Can we esumaLe funcuons LhaL are noL llnes???
reclsely, mlnlmlze Lhe resldual squared error:
8egresslon: maLrlx noLauon
N

d
a
t
a

p
o
i
n
t
s

K basis functions
N

o
b
s
e
r
v
e
d

o
u
t
p
u
t
s

measurements
weights
K

b
a
s
i
s

f
u
n
c

8egresslon soluuon: slmple maLrlx maLh
where
k!k matrix
for k basis functions
k!1 vector
8uL, why?
8llllonalre (agaln) says: Why sum squared
error???
?ou say: Causslans, ur. CaLeson, Causslans.
Model: predlcuon ls llnear funcuon plus
Causslan nolse
L(-) = !
l
w
l
h
l
(-) + "

Learn . uslng MLL:
Maxlmlzlng log-llkellhood
Maximize wrt w:
Least-squares Linear Regression is MLE for Gaussians!!!
MLE
,
MLE
= arg max
,
P(D | , )
=
N
i=1
(x
i
)
2
= 0
=
N
i=1
x
i
+N = 0
=
N
+
N
i=1
(x
i
)
2
3
= 0
arg max
w
ln
N
+
N
j=1
[t
j
i
w
i
h
i
(x
j
)]
2
2
2
= arg max
w
N
j=1
[t
j
i
w
i
h
i
(x
j
)]
2
2
2
2
MLE
,
MLE
= arg max
,
P(D | , )
=
N
i=1
(x
i
)
2
= 0
=
N
i=1
x
i
+N = 0
=
N
+
N
i=1
(x
i
)
2
3
= 0
arg max
w
ln
N
+
N
j=1
[t
j
i
w
i
h
i
(x
j
)]
2
2
2
= arg max
w
N
j=1
[t
j
i
w
i
h
i
(x
j
)]
2
2
2
2
MLE
,
MLE
= arg max
,
P(D | , )
=
N
i=1
(x
i
)
2
= 0
=
N
i=1
x
i
+N = 0
=
N
+
N
i=1
(x
i
)
2
3
= 0
arg max
w
ln
N
+
N
j=1
[t
j
i
w
i
h
i
(x
j
)]
2
2
2
= arg max
w
N
j=1
[t
j
i
w
i
h
i
(x
j
)]
2
2
2
= arg min
w
N
j=1
[t
j
i
w
i
h
i
(x
j
)]
2
2
8las-varlance Lradeo - lnLuluon
Model Loo slmple: does
noL L Lhe daLa well
A +"%)$, soluuon

Model Loo complex: small
changes Lo Lhe daLa,
soluuon changes a loL
A -"(-./%&"%#0$ soluuon
x
t
M = 0
0 1
!1
0
1
x
t
M = 9
0 1
!1
0
1
(Squared) 8las of learner
Clven: daLaseL 1 wlLh 2 samples
Learn: for dlerenL daLaseLs 1, you wlll geL dlerenL
funcuons h(x)
LxpecLed predlcuon (averaged over hypoLheses): E
D
[h(x)]
8las: dlerence beLween expecLed predlcuon and LruLh
Measures how well you expecL Lo represenL Lrue
soluuon
uecreases wlLh more complex model
varlance of learner
Clven: daLaseL 1 wlLh 2 samples
Learn: for dlerenL daLaseLs 1, you wlll geL dlerenL
funcuons h(x)
LxpecLed predlcuon (averaged over hypoLheses): E
D
[h(x)]
varlance: dlerence beLween whaL you expecL Lo learn and
whaL you learn from a from a parucular daLaseL
Measures how sensluve learner ls Lo speclc daLaseL
uecreases wlLh slmpler model
8las-varlance decomposluon of error
Conslder slmple regresslon problem f:x!1
f(x) = g(x) + "

CollecL some daLa, and learn a funcuon h(x)
WhaL are sources of predlcuon error?

nolse ~ n(0,$)
deLermlnlsuc
Sources of error 1 - nolse
WhaL lf we have perfecL learner, lnnlLe
daLa?
lf our learnlng soluuon h(x) sauses h(x)=g(x)
Sull have remalnlng, 3#%/*",%+!$ $&&*& of
$
2
due Lo nolse "
f(x) = g(x) + "
Sources of error 2 - llnlLe daLa
WhaL lf we have lmperfecL learner,
or only 2 Lralnlng examples?
WhaL ls our expecLed squared error per
example?
LxpecLauon Laken over random Lralnlng seLs 1 of
slze 2, drawn from dlsLrlbuuon (x,1)

f(x) = g(x) + "
8las-varlance uecomposluon of Lrror
Assume target function: t(x) = g(x) + "
Then expected squared error over fixed size training sets D
drawn from P(X,T) can be expressed as sum of three
components:
Where:
Bishop Chapter 3
8las-varlance 1radeo
Cholce of hypoLhesls class lnLroduces learnlng
blas
More complex class ! less blas
More complex class ! more varlance
1ralnlng seL error
Clven a daLaseL (1ralnlng daLa)
Choose a loss funcuon
e.g., squared error (L
2
) for regresslon
1ralnlng error: lor a parucular seL of
parameLers, loss funcuon on Lralnlng daLa:
1ralnlng error as a funcuon of model
complexlLy
redlcuon error
1ralnlng seL error can be poor measure
of quallLy" of soluuon
redlcuon error (Lrue error): We really
care abouL error over all posslblllues:
redlcuon error as a funcuon of model
complexlLy
Compuung predlcuon error
1o correcLly predlcL error
Monte Carlo integration (sampling approximation)
Sample a set of i.i.d. points {x
1
,",x
M
} from p(x)
Approximate integral with sample average
Hard integral!
May not know t(x) for every x, may not know p(x)
Why Lralnlng seL error doesn'L
approxlmaLe predlcuon error?
Sampllng approxlmauon of predlcuon error:
1ralnlng error :
very slmllar equauons!!!
Why ls Lralnlng seL a bad measure of predlcuon error???
Why Lralnlng seL error doesn'L
approxlmaLe predlcuon error?
Sampllng approxlmauon of predlcuon error:
1ralnlng error :
very slmllar equauons!!!
Why ls Lralnlng seL a bad measure of predlcuon error???
Because you cheated!!!

Training error good estimate for a single w,
But you optimized w with respect to the training error,
and found w that is good for this set of samples

Training error is a (optimistically) biased
estimate of prediction error
1esL seL error
Clven a daLaseL, &$%/0123 spllL lL lnLo Lwo
parLs:
1ralnlng daLa - [-
1
,., -
nLraln
}
1esL daLa - [-
1
,., -
nLesL
}
use Lralnlng daLa Lo opumlze parameLers .
1esL seL error: lor Lhe !"#$ &'$()'" .4,
evaluaLe Lhe error uslng:
1esL seL error as a funcuon of model
complexlLy
Cvermng: Lhls sllde ls so lmporLanL we
are looklng aL lL agaln!
Assume:
uaLa generaLed from dlsLrlbuuon D(X,Y)
A hypothesis space H
uene: errors for hypoLhesls h H
Training error: error
train
(h)
Data (true) error: error
true
(h)
We say h overfits the training data if there exists
an h H such that:
error
train
(h) < error
train
(h)
and
error
true
(h) > error
true
(h)
Summary: error esumaLors
Cold SLandard:

1ralnlng: opumlsucally blased
1esL: our nal meaure, unblased?
Lrror as a funcuon of number of Lralnlng
examples for a xed model complexlLy
little data
infinite data
Summary: error esumaLors
Cold SLandard:

1ralnlng: opumlsucally blased
1esL: our nal meaure, unblased?
Be careful!!!

Test set only unbiased if you never never ever ever
do any any any any learning on the test data

For example, if you use the test set to select
the degree of the polynomial" no longer unbiased!!!
(We will address this problem later in the semester)
WhaL you need Lo know
8egresslon
8asls funcuon = feaLures
Cpumlzlng sum squared error
8elauonshlp beLween regresslon and Causslans
8las-varlance Lrade-o
lay wlLh AppleL
hup://msLe.llllnols.edu/users/exner/[ava.f/leasLsquares/
1rue error, Lralnlng error, LesL error
never learn on Lhe LesL daLa
Cvermng

C Se 546 Wi 12 Linear Regression

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

C Se 546 Wi 12 Linear Regression

Încărcat de

Drepturi de autor:

Formate disponibile

CSL346: Llnear 8egresslon

8las / varlance 1radeo

S-ar putea să vă placă și