Documente Academic
Documente Profesional
Documente Cultură
Report
Assignment 1
Karan Goel, 2011EE50555
Bisquare Loss
68722207
3861330
17715.53
1375.599
23689.89
3780434
7884536392
8137.174
1855.813
109.4717
35.40511
132.2092
1266.367
52980.33
Types of Loss Functions Let the true value be denoted by y and the predicted value be denoted by y.
Let L denote the loss function and a = y y.
Squared Loss
1
L = a2
2
Absolute Loss
L = |a|
Huber Loss
(
L=
IIT Delhi
1 2
2a ,
|a|
1 2
2 ,
|a| ,
otherwise.
2015 | February | 10
Report
Hampel Loss
1 2
|a| 1 ,
2a ,
|a| 1 2 ,
1 < |a| 2 ,
1
2 1
L=
|a|
1
3
1 2 2 12 + (3 2 ) 21 (1 ( 3 2 )2 ), 2 < |a| 3
1 2 + ( ) 1 ,
otherwise.
1 2
3
2 2
2 1
2
6 (1
2
6,
(1
a2 3
) ),
2
|a| ,
otherwise.
Report
2015 | February | 10
Now restricting the analysis to Squared Loss, we observe the following cross-validation results.
Polynomial Order
1
2
3
4
5
6
Property
Residual Standard Error
Residual SSE
Estimate of Noise Variance
AIC
Polynomial Order
3
4
5
31.8
7.91 7.91
16195 937
877
1012 62.5 62.6
201
146
146
Cross validation clearly indicates that the low order polynomials (1st , 2nd , 3rd ) tend to underfit the data,
while the 6th order polynomial overfits and does not generalize well to unseen data.
The results seem to indicate that the 4th order polynomial fit works best. Even though the 5th order
polynomial is similar in most statistics, it tends to slightly overfit (shown clearly by cross-validation), and
it also fails the null hypothesis test for the x5 co-efficient (high p-value).
The 4th order polynomial is given by
y = 17963.10 55667.21x + 16741.70x2 1943.39x3 + 123.52x4
(1)
IIT Delhi
Report
2015 | February | 10
Figure 2: Mean Squared Error v/s Log Lambda for Best Lasso Regularization
IIT Delhi
Report
2015 | February | 10
Figure 3: Mean Squared Error v/s Log Lambda for Best Ridge Regularization
Cross-validation indicates that the best values of the regularization parameter () are lasso = 3.26 and
ridge = 2.10. Figures 2 and 3 show how the lambda value varies with cross-validated mean squared
errors. We select the value of that minimizes this error. The noise variance estimates come out to be
465 for ridge, and 161 for lasso. The corresponding mean residual sum of squares errors for ridge and
lasso are 279 and 104.
The overall regression equations are
y = 6545.751 + 2631.198x 165.514x2 67.667x3 + 23.701x4 5.397x5 + 1.103x6 0.218x7
(2)
(3)
IIT Delhi
Report
2015 | February | 10
6
6.82
4325
46.5
676
7
6.78
4230
46
676
Polynomial Order
7 (Absolute Loss) 6 (Huber Loss)
6.92
6.82
4409
4330
47.9
46.6
667
677
7 (Hampel Loss)
6.91
4390
47.7
680
Figure 4: Plot of Best Fit 6th degree polynomial from Squared Loss
The equation of the 6th order squared loss polynomial turns out to be
y = 6372 12304x 87712x2 12301x3 + 51000x4 5003x5 + 9846x6
IIT Delhi
(4)
2015 | February | 10
Report
and for the 7th order squared loss polynomial is
Noise Variance
46.5
46
54.4
Noise Variance
46.6
46
53.1
L1-regularization gives the best overall performance, with a noise variance of 46.6. The regression equation
given by the Lasso regularization is
y = 7.372 1.167x 1.711x2 4.041x3 0.469x4 2.997x5 + 3.467x6
(6)
Given the best performance of this model, and the small co-efficient values, this is the best guess for the
actual underlying polynomial of the dataset.
IIT Delhi
All Features
4.75
0.734
108
3028
AIC (upto 4th power)
3.56
0.85
87.7
2757
Lasso
3.89
0.816
-
Ridge
3.89
0.822
-
Report
2015 | February | 10
Figure 5: Residuals v/s Fitted for AIC Feature Selected Model (4th power)
The distribution of the residuals v/s fitted values curves show that the noise in the data has been wellexplained by the final model (ideally the red line in the graph should be a horizontal line passing through
0). The basic model (with all features), has biased residuals (with some regularity) which indicates that
the model is not as expressive.
IIT Delhi
Report
2015 | February | 10
IIT Delhi