Sunteți pe pagina 1din 90

Assessing

Performance
Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
1

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Make predictions, get $, right??


Algorithm
Model

Model + algorithm
fitted function
Predictions
decisions outcome

Fit f

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Or, how much am I losing?


Example: Lost $ due to inaccurate listing price
- Too low low oers
- Too high few lookers + no/low oers

How much am I losing compared to perfection?


Perfect predictions: Loss = 0
My predictions: Loss = ???
3

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Measuring loss
Loss function:
L(y,f(x))
actual
value

Cost of using at x
when y is true

f(x) = predicted value

Examples:

(assuming loss for underpredicting = overpredicting)

Absolute error: L(y,f(x)) = |y-f(x)|


Squared error: L(y,f(x)) = (y-f(x))2
4

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Remember that all models are


wrong; the practical question is
how wrong do they have to be to
not be useful. George Box, 1987.

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Assessing the loss

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Assessing the loss


Part 1: Training error

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Define training data

price ($)

square feet (sq.ft.)


8

2015 Emily Fox & Carlos Guestrin

x
Machine Learning Specializa0on

Define training data

price ($)

square feet (sq.ft.)


9

2015 Emily Fox & Carlos Guestrin

x
Machine Learning Specializa0on

Example:
Fit quadratic to minimize RSS

price ($)

minimizes
RSS of
training data
square feet (sq.ft.)
10

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Compute training error


1. Define a loss function L(y,f(x))
- E.g., squared error, absolute error,

2. Training error
= avg. loss on houses in training set
N
1 X
=N
L(yi,f(xi))
i=1

fit using training data


11

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Example:
Use squared error loss (y-f(x))2

price ($)

square feet (sq.ft.)


12

2015 Emily Fox & Carlos Guestrin

Training error () = 1/N *


[($train 1-f(sq.ft.train 1))2
+ ($train 2-f(sq.ft.train 2))2
+ ($train 3-f(sq.ft.train 3))2
+ include all
x training houses]
Machine Learning Specializa0on

Example:
Use squared error loss (y-f(x))2

price ($)

Training error () =
N
1 X
2
(y
-f
(x
))
N i=1i i

square feet (sq.ft.)


13

2015 Emily Fox & Carlos Guestrin

RMSE
v =

u
N
u1 X
t
2
(y
-f
(x
))
N i=1 i i

Machine Learning Specializa0on

Training error vs.


model complexity

price ($)

Error

square feet (sq.ft.)

Model complexity
14

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Training error vs.


model complexity

price ($)

Error

square feet (sq.ft.)

Model complexity
15

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Training error vs.


model complexity

price ($)

Error

square feet (sq.ft.)

Model complexity
16

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Training error vs.


model complexity

price ($)

Error

square feet (sq.ft.)

Model complexity
17

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Error

Training error vs.


model complexity

y
18

Model complexity
x

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Is training error a good measure


of predictive performance?
How do we expect to perform on
a new house?

price ($)

19

square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Is training error a good measure


of predictive performance?
Is there something particularly bad
about having xt square feet???

price ($)

xt
20

square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Is training error a good measure


of predictive performance?
Issue: Training error is overly optimistic
because was fit to training data

price ($)

xt
21

square feet (sq.ft.)

Small training error > good predictions


unless training data includes everything you
might ever see

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Assessing the loss


Part 2: Generalization (true) error

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Generalization error
Really want estimate of loss
over all possible ( ,$) pairs
Lots of houses
in neighborhood,
but not in dataset

23

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Distribution over houses


In our neighborhood, houses of what
# sq.ft. ( ) are we likely to see?

24

square feet (sq.ft.)


2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Distribution over sales prices


For houses with a given # sq.ft. ( ),
what house prices $ are we likely to see?

For fixed
# sq.ft.

25

price ($)
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Generalization error definition


Really want estimate of loss
over all possible ( ,$) pairs

Formally:

average over all possible


(x,y) pairs weighted by
how likely each is

generalization error = Ex,y[L(y,f(x))]


fit using training data
26

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Generalization error vs.


model complexity

price ($)

Error

square feet (sq.ft.)

Model complexity
27

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Generalization error vs.


model complexity

price ($)

Error

square feet (sq.ft.)

Model complexity
28

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Generalization error vs.


model complexity
f

price ($)

Error

square feet (sq.ft.)

Model complexity
29

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Generalization error vs.


model complexity
f

price ($)

Error

square feet (sq.ft.)

Model complexity
30

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Generalization error vs.


model complexity

price ($)

Error

square feet (sq.ft.)

Model complexity
31

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Error

Generalization error vs.


model complexity

Cant
compute!
y
32

Model complexity
x

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Assessing the loss


Part 3: Test error

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Approximating
generalization error
Wanted estimate of loss
over all possible ( ,$) pairs

Approximate by looking at
houses not in training set
34

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Forming a test set


Hold out some ( ,$) that are
not used for fitting the model

Training set
Test set

35

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Forming a test set


Hold out some ( ,$) that are
not used for fitting the model

Training
seteverything you
Proxy for
might see
Test set

36

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Compute test error


Test error
= avg. loss on houses in test set
1

=N
L(yi,f(xi))
test i in test set
# test points

fit using training data

has never seen


test data!

37

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Example: As before,
fit quadratic to training data

price ($)

minimizes
RSS of
training data
square feet (sq.ft.)
38

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Example: As before,
use squared error loss (y-f(x))2

price ($)

square feet (sq.ft.)


39

2015 Emily Fox & Carlos Guestrin

Test error () = 1/N *


[($test 1-f(sq.ft.test 1))2
+ ($test 2-f(sq.ft.test 2))2
+ ($test 3-f(sq.ft.test 3))2
+ include all
x test houses]
Machine Learning Specializa0on

Error

Training, true, & test error vs.


model complexity

Overfitting if:

y
40

Model complexity
x

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Training/test split

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Training/test splits
Training set

Test set

how many? vs. how many?

42

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Training/test splits
Training
set

Test set

Too few poorly estimated

43

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Training/test splits
Training set

Test
set

Too few test error bad approximation


of generalization error

44

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Training/test splits
Training set

Test set

Typically, just enough test points to form a


reasonable estimate of generalization error
If this leaves too few for training, other
methods like cross validation (will see later)
45

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

3 sources of error +
the bias-variance tradeo

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

3 sources of error
In forming predictions, there
are 3 sources of error:
1. Noise
2. Bias
3. Variance

47 47

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Data inherently noisy


yi = fw(true)(xi)+i

price ($)

y
variance
of noise

Irreducible
error
square feet (sq.ft.)

48

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Bias contribution
Assume we fit a constant function

f(train1)

square feet (sq.ft.)

49

y
price ($)

price ($)

N other house
sales ( ,$)

N house
sales ( ,$)

f(train2)
square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Bias contribution
Over all possible size N training sets,
what do I expect my fit to be?

price ($)

fw(true)

f(train3) f
(train1)

fw

f(train2)
square feet (sq.ft.)
50

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Bias contribution
Bias(x) = fw(true)(x) - fw(x)
low complexity

high bias

fw(true)

price ($)

Is our approach flexible


enough to capture fw(true)?
If not, error in predictions.

fw
square feet (sq.ft.)
51

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Variance contribution
How much do specific fits
vary from the expected fit?

price ($)

f(train3) f
(train1)
fw

f(train2)
square feet (sq.ft.)
52

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Variance contribution
How much do specific fits
vary from the expected fit?

price ($)

f(train3) f
(train1)
fw

f(train2)
square feet (sq.ft.)
53

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Variance contribution
How much do specific fits
vary from the expected fit?
low complexity

low variance

price ($)

Can specific fits


vary widely?
If so, erratic
predictions

square feet (sq.ft.)


54

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Variance of high-complexity models


Assume we fit a high-order polynomial

f(train1)

square feet (sq.ft.)

55

y
price ($)

price ($)

f(train2)
square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Variance of high-complexity models


Assume we fit a high-order polynomial

price ($)

f(train1)
f(train2)
fw
f(train3)
square feet (sq.ft.)

56

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Variance of high-complexity models


high
complexity

high variance

price ($)

fw
square feet (sq.ft.)

57

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Bias of high-complexity models

price ($)

high
complexity

low bias

fw
square feet (sq.ft.)

58

fw(true)

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Bias-variance tradeo

y
59

Model complexity y
x

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Error

Error vs. amount of data

# data points in
training set
60

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

More in depth on the


3 sources of errors

OP

L
A
N
O
I
T
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Accounting for training set randomness


Training set was just a random sample of N houses sold
What if N other houses had been sold and recorded?

f(1)

square feet (sq.ft.)


62

f(2)

price ($)

price ($)

square feet (sq.ft.)


2015 Emily Fox & Carlos Guestrin

x
Machine Learning Specializa0on

Accounting for training set randomness


Training set was just a random sample of N houses sold
What if N other houses had been sold and recorded?
generalization error of (1)

f(1)

square feet (sq.ft.)


63

f(2)

price ($)

price ($)

generalization error of (2)

square feet (sq.ft.)


2015 Emily Fox & Carlos Guestrin

x
Machine Learning Specializa0on

Accounting for training set randomness


Ideally, want performance averaged
over all possible training sets of size N
generalization error of (1)

f(1)

square feet (sq.ft.)


64

f(2)

price ($)

price ($)

generalization error of (2)

square feet (sq.ft.)


2015 Emily Fox & Carlos Guestrin

x
Machine Learning Specializa0on

Expected prediction error


Etraining set[generalization error of (training set)]
averaging over all training sets
(weighted by how likely each is)

parameters fit
on a specific
training set

f(training set)

price ($)

65

square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Prediction error at target input


Start by considering:
1. Loss at target xt (e.g. 2640 sq.ft.)
2. Squared error loss L(y,f(x)) = (y-f(x))2

f(training set)

price ($)

xt
66

square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Sum of 3 sources of error


Average prediction error at xt
= 2 + [bias(f(xt))]2 + var(f(xt))

f(training set)

price ($)

xt
67

square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Error variance of the model


Average prediction error at xt
= 2 + [bias(f(xt))]2 + var(f(xt))
2 =
variance

y = fw(true)(x)+

price ($)

xt
68

square feet (sq.ft.)

Irreducible
error
x

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Bias of function estimator


Average prediction error at xt
= 2 + [bias(f(xt))]2 + var(f(xt))

price ($)

f(train1)
square feet (sq.ft.)

69

price ($)

f(train2)
square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Bias of function estimator


Average estimated function = fw(x)
True function = fw(x)
Etrain[f(train)(x)]
over all training
sets of size N

price ($)

y
fw
f(train2)

f(train1) fw

xt

70

square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Bias of function estimator


Average estimated function = fw(x)
True function = fw(x)
bias(f(xt)) = fw(xt) - fw(xt)

price ($)

y
fw

fw

xt
71

square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Bias of function estimator


Average prediction error at xt
= 2 + [bias(f(xt))]2 + var(f(xt))

72

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Variance of function estimator


Average prediction error at xt
= 2 + [bias(f(xt))]2 + var(f(xt))

f(train3)

price ($)

73

f(train1)
fw
f(train2)

square feet (sq.ft.)

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Variance of function estimator


Average prediction error at xt
= 2 + [bias(f(xt))]2 + var(f(xt))

price ($)

fw
xt

74

square feet (sq.ft.)

x
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Variance of function estimator


fit on a specific
training dataset

what I expect to learn


over all training sets

var(f(xt)) = Etrain[(f(train)(xt)-fw(xt))2]

price ($)

deviation of
over all training
sets of size N specific fit from
expected fit at xt

fw
xt

75

square feet (sq.ft.)

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Why 3 sources of error?


A formal derivation

OP

L
A
N
O
I
T
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Deriving expected
prediction error
Expected prediction error
= Etrain [generalization error of (train)]
= Etrain [Ex,y[L(y,f(train)(x))]]

1. Look at specific xt
2. Consider L(y,f(x)) = (y-f(x))2
Expected prediction error at xt
= Etrain, yt [(yt-f(train)(xt))2]
77

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Deriving expected
prediction error
Expected prediction error at xt
= Etrain, yt [(yt-f(train)(xt))2]

= Etrain, yt[((yt-fw(true)(xt)) + (fw(true)(xt)-f(train)(xt)))2]

78

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Equating MSE with


bias and variance
MSE[f(train)(xt)]

= Etrain[(fw(true)(xt) f(train)(xt))2]
= Etrain[((fw(true)(xt)fw(xt)) + (fw(xt)f(train)(xt)))2]

79

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Putting it all together


Expected prediction error at xt
= 2 + MSE[f(xt)]

= 2 + [bias(f(xt))]2 + var(f(xt))

3 sources of error

80

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Summary of tasks

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

The regression/ML workflow


1. Model selection
Often, need to choose tuning
parameters controlling model
complexity (e.g. degree of polynomial)
2. Model assessment
Having selected a model, assess
the generalization error
82

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Hypothetical implementation
Training set

Test set

1. Model selection
For each considered model complexity :
i. Estimate parameters on training data
ii. Assess performance of on test data
iii. Choose * to be with lowest test error
2. Model assessment
Compute test error of * (fitted model for selected
complexity *) to approx. generalization error
83

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Hypothetical implementation
Training set

Test set

1. Model selection
For each considered model complexity :
i. Estimate parameters on training data
ii. Assess performance of on test data
iii. Choose * to be with lowest test error

Overly optimistic!
2. Model assessment
Compute test error of * (fitted model for selected
complexity *) to approx. generalization error
84

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Hypothetical implementation
Training set

Test set

Issue: Just like fitting and assessing its


performance both on training data
* was selected to minimize test error
(i.e., * was fit on test data)
If test data is not representative of the whole
world, then * will typically perform worse than
test error indicates
85

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Practical implementation
Validation Test
Training
Training
setset
Test set
set
set
Solution: Create two test sets!
1. Select * such that * minimizes error on
validation set
2. Approximate generalization error of * using
test set
86

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Practical implementation
Training set
fit

87

Validation Test
set
set

test performance
of to select *
assess
generalization
error of *
2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Typical splits
Training set

88

Validation Test
set
set

80%

10%

10%

50%

25%

25%

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

Summary of
assessing performance

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

What you can do now

Describe what a loss function is and give examples


Contrast training, generalization, and test error
Compute training and test error given a loss function
Discuss issue of assessing performance on training set
Describe tradeos in forming training/test splits
List and interpret the 3 sources of avg. prediction error
- Irreducible error, bias, and variance

Discuss issue of selecting model complexity on test data


and then using test error to assess generalization error
Motivate use of a validation set for selecting tuning
parameters (e.g., model complexity)
Describe overall regression workflow
90

2015 Emily Fox & Carlos Guestrin

Machine Learning Specializa0on

S-ar putea să vă placă și