Sunteți pe pagina 1din 7

Appendix:

INITIAL DATA ANALYSIS


Correlation: mpg, year, weight, engine, horse, accel, origin, cylinder
mpg
0.575
0.000

year

weight

-0.822
0.000

-0.282
0.000

engine

-0.785
0.000

-0.362
0.000

0.930
0.000

horse

-0.743
0.000

-0.369
0.000

0.861
0.000

0.884
0.000

accel

0.349
0.000

0.206
0.004

-0.417
0.000

-0.535
0.000

-0.701
0.000

origin

0.511
0.000

0.132
0.064

-0.558
0.000

-0.589
0.000

-0.415
0.000

0.166
0.019

-0.760
0.000

-0.348
0.000

0.903
0.000

0.950
0.000

0.842
0.000

-0.488
0.000

year

cylinder

weight

engine

horse

accel

origin

-0.524
0.000

Cell Contents: Pearson correlation


P-Value

Regression Analysis: mpg versus year, weight, engine, horse, accel,


cylinder, origin
Method
Categorical predictor coding

(1, 0)

Analysis of Variance
Source
DF
Seq SS Contribution
Adj SS
Adj MS F-Value P-Value
Regression
8
9257.5
81.40% 9257.51 1157.19
100.67
0.000
year
1
3790.2
33.33% 1344.39 1344.39
116.96
0.000
weight
1
5349.1
47.03%
621.89
621.89
54.10
0.000
engine
1
18.0
0.16%
40.84
40.84
3.55
0.061
horse
1
0.3
0.00%
4.25
4.25
0.37
0.544
accel
1
0.3
0.00%
0.06
0.06
0.01
0.943
cylinder
1
0.0
0.00%
2.15
2.15
0.19
0.666
origin
2
99.6
0.88%
99.59
49.80
4.33
0.015
Error
184
2115.1
18.60% 2115.06
11.49
Total
192 11372.6
100.00%
Model Summary
S
R-sq R-sq(adj)
PRESS R-sq(pred)
3.39041 81.40%
80.59% 2353.79
79.30%
Coefficients
Term
Coef
SE Coef
95% CI
T-Value P-Value
VIF
Constant
-1601
153 (
-1902,
-1300)
-10.49
0.000
year
0.8225
0.0761 (
0.6725,
0.9726)
10.81
0.000
1.34
weight
-0.007346 0.000999 (-0.009316, -0.005375)
-7.36
0.000 11.77
engine
0.0211
0.0112 ( -0.0010,
0.0432)
1.88
0.061 21.83
horse
-0.0123
0.0201 ( -0.0520,
0.0275)
-0.61
0.544
9.78

accel
cylinder
origin
2
3

-0.011
-0.209

0.156
0.484

(
(

-0.318,
-1.165,

0.296)
0.746)

-0.07
-0.43

0.943
0.666

2.86
11.08

1.745
2.323

0.834
0.826

(
(

0.100,
0.693,

3.389)
3.952)

2.09
2.81

0.038
0.005

1.65
1.74

Regression Equation
origin
1

mpg = -1601 + 0.8225 year - 0.007346 weight + 0.0211 engine - 0.0123 horse
- 0.011 accel - 0.209 cylinder

mpg = -1600 + 0.8225 year - 0.007346 weight + 0.0211 engine - 0.0123 horse
- 0.011 accel - 0.209 cylinder

mpg = -1599 + 0.8225 year - 0.007346 weight + 0.0211 engine - 0.0123 horse
- 0.011 accel - 0.209 cylinder

Fits and Diagnostics for Unusual Observations


Obs
8
64
69
72
77
78
107
142
146
153

mpg
21.000
47.000
41.000
15.000
18.000
38.000
17.000
13.000
14.000
43.000

Obs
8
64
69
72
77
78
107
142
146
153

Cooks D
0.03
0.07
0.03
0.03
0.05
0.07
0.03
0.05
0.02
0.11

R
X

Fit
29.106
33.047
33.015
21.846
26.797
28.782
23.773
21.834
10.569
31.990

SE Fit
0.653
0.627
0.674
0.820
0.836
0.907
0.875
0.846
1.278
0.951

DFITS
-0.48487
0.82673
0.49404
-0.52361
-0.69344
0.79888
-0.55770
-0.70536
0.44502
1.01775

Large residual
Unusual X

95%
(27.817,
(31.810,
(31.685,
(20.228,
(25.147,
(26.992,
(22.046,
(20.165,
( 8.047,
(30.115,

R
R
R
R
R
R
R
R
X
R

CI
30.394)
34.285)
34.345)
23.464)
28.447)
30.571)
25.500)
23.503)
13.091)
33.866)

Resid
-8.106
13.953
7.985
-6.846
-8.797
9.218
-6.773
-8.834
3.431
11.010

Std Resid
-2.44
4.19
2.40
-2.08
-2.68
2.82
-2.07
-2.69
1.09
3.38

Del Resid
-2.47
4.39
2.44
-2.10
-2.72
2.88
-2.09
-2.74
1.09
3.48

HI
0.037108
0.034239
0.039533
0.058518
0.060873
0.071581
0.066672
0.062245
0.142150
0.078629

Residual Plots for mpg


Normal Probability Plot

Versus Fits
Standardized Residual

99.9

Percent

99
90
50
10
1

-4

-2

Frequency

-2
10

20

30

Histogram

Versus Order

30
20
10
-2

Fitted Value

40

Standardized Residual

Standardized Residual

0.1

-1

40

4
2
0
-2

20

40

Standardized Residual

60

80

100 120 140 160

180

Observation Order

Probability Plot of SRES1


Normal
99.9

Mean 0.001807
StDev
1.005
N
193
AD
0.655
P-Value
0.086

99
95

Percent

90
80
70
60
50
40
30
20
10
5
1
0.1

-4

-3

-2

-1

SRES1

Transformations

Regression Analysis: ln(mpg) versus 1/year, 1/weight, 1/engine,


1/horse, 1/accel,

Residual Plots for ln(mpg)


Versus Fits

Percent

99
90
50
10
1
0.1

-4

-2

Standardized Residual

Normal Probability Plot


99.9

4
2
0
-2
-4

2.4

2.7

3.0

Standardized Residual

Frequency

40
30
20
10
-3

-2

-1

3.6

Versus Order
Standardized Residual

Histogram

3.3

Fitted Value

4
2
0
-2
-4

20

40

60

Standardized Residual

80

100 120 140 160 180

Observation Order

Probability Plot of SRES2


Normal
99.9

Mean -0.001696
StDev
1.008
N
193
AD
0.720
P-Value
0.059

99
95

Percent

90
80
70
60
50
40
30
20
10
5
1
0.1

-3

-2

-1

SRES2

Model selection
Best Subsets Regression: ln(mpg) versus 1/year, 1/weight, ...
Response is ln(mpg)
193 cases used, 5 cases contain missing values

Vars
1
1
2
2
3
3
4
4
5
5
6

R-Sq
71.4
62.7
86.8
78.6
87.1
87.0
87.3
87.2
87.6
87.4
87.8

R-Sq
(adj)
71.3
62.5
86.7
78.3
86.8
86.8
87.0
87.0
87.3
87.1
87.4

PRESS
6.3
8.3
3.0
4.9
3.0
3.0
2.9
3.0
2.9
3.0
2.9

R-Sq
(pred)
70.8
61.6
86.4
77.6
86.4
86.3
86.5
86.3
86.6
86.4
86.5

Mallows
Cp
245.3
377.6
13.4
138.5
11.6
12.7
10.1
10.9
7.5
9.7
7.0

S
0.18022
0.20585
0.12274
0.15643
0.12190
0.12223
0.12111
0.12136
0.11999
0.12068
0.11951

1
/
y
e
a
r

1
/
w
e
i
g
h
t
X

1
/
e
n
g
i
n
e

1
/
h
o
r
s
e

1
/
a
c
c
e
l

1
/
c
y
l
i
n
d
e
r

X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X X
X
X
X
X
X X X
X X X
X
X X X X X

Regression Analysis: ln(mpg) versus 1/year, 1/weight, 1/horse,


1/accel, origin
Analysis of Variance
Source
DF
Seq SS
Regression
6 18.9555
1/year
1
7.1753
1/weight
1 11.6567
1/horse
1
0.0539
1/accel
1
0.0509
origin
2
0.0187
Error
186
2.7390
Total
192 21.6944

Contribution
87.37%
33.07%
53.73%
0.25%
0.23%
0.09%
12.63%
100.00%

Adj SS
18.9555
3.1037
1.2005
0.1000
0.0533
0.0187
2.7390

Adj MS
3.15924
3.10370
1.20054
0.10004
0.05332
0.00934
0.01473

F-Value
214.54
210.77
81.53
6.79
3.62
0.63

P-Value
0.000
0.000
0.000
0.010
0.059
0.532

Regression Analysis: ln(mpg) versus 1/year, 1/weight, 1/horse


Analysis of Variance
Source
DF
Seq SS
Regression
3 18.8859
1/year
1
7.1753
1/weight
1 11.6567
1/horse
1
0.0539
Error
189
2.8086
Lack-of-Fit 188
2.7993
Pure Error
1
0.0093
Total
192 21.6944

Contribution
87.05%
33.07%
53.73%
0.25%
12.95%
12.90%
0.04%
100.00%

Adj SS
18.8859
3.1085
2.4076
0.0539
2.8086
2.7993
0.0093

Adj MS
6.29529
3.10855
2.40764
0.05385
0.01486
0.01489
0.00933

F-Value
423.63
209.19
162.02
3.62

P-Value
0.000
0.000
0.000
0.058

1.60

0.570

THE CHOSEN MODEL


Regression Analysis: ln(mpg) versus 1/year, 1/weight
Analysis of Variance
Source
DF
Seq SS
Regression
2 19.4261
1/year
1
7.3431
1/weight
1 12.0830
Error
195
2.8822
Lack-of-Fit 192
2.8659
Pure Error
3
0.0163

Contribution
87.08%
32.92%
54.16%
12.92%
12.85%
0.07%

Adj SS
19.4261
3.4940
12.0830
2.8822
2.8659
0.0163

Adj MS
9.7130
3.4940
12.0830
0.0148
0.0149
0.0054

F-Value
657.16
236.39
817.51

P-Value
0.000
0.000
0.000

2.75

0.220

Total
197 22.3083
100.00%
Model Summary
S
R-sq R-sq(adj)
PRESS R-sq(pred)
0.121574 87.08%
86.95% 2.96980
86.69%
Coefficients
Term
Coef SE Coef
95% CI
T-Value
Constant
74.78
4.73 ( 65.46,
84.11)
15.81
1/year
-145173
9442 (-163795, -126551)
-15.38
1/weight
2567.0
89.8 ( 2389.9, 2744.0)
28.59
Regression Equation
ln(mpg) = 74.78 - 145173 1/year + 2567.0 1/weight
Fits and Diagnostics for Unusual Observations
Obs Cooks D
DFITS
34
0.03
0.305638 R
62
0.01
0.189486 R
64
0.04
0.335230 R
77
0.05 -0.413827 R
78
0.08
0.495817 R
82
0.02 -0.230354 R
111
0.04 -0.327245 R
142
0.02 -0.252045 R
158
0.03
0.279476 R
180
0.02 -0.238590 R
187
0.03
0.278628 R
R

P-Value
0.000
0.000
0.000

VIF
1.05
1.05

Large residual

Residual Plots for ln(mpg)


Versus Fits

Percent

99
90
50
10
1
0.1

-4

-2

Standardized Residual

Normal Probability Plot


99.9

4
2
0
-2
-4

2.5

3.0

Standardized Residual

Versus Order

Frequency

40
30
20
10
-3

-2

-1

Standardized Residual

Standardized Residual

Histogram

3.5

Fitted Value

4
2
0
-2
-4

20

40

60

80

100 120 140 160 180

Observation Order

Probability Plot of SRES5


Normal
99.9

Mean -0.0001372
StDev
1.002
N
198
AD
0.637
P-Value
0.095

99
95

Percent

90
80
70
60
50
40
30
20
10
5
1
0.1

-3

-2

-1

SRES5

Descriptive Statistics for the validation set: mpg, engine, horse,


weight, accel, year, origin, and cylinder
Variable
mpg
engine
horse
weight
accel
year
origin
cylinder

Total
Count
199
199
199
199
199
199
199
199

Mean
23.739
195.59
104.24
2966.6
15.643
1999.0
1.6080
5.518

SE Mean
0.556
7.58
2.75
60.3
0.202
0.257
0.0582
0.122

Minimum
10.000
71.00
46.00
1613.0
8.000
1993.0
1.0000
3.000

Maximum
45.000
455.00
225.00
5140.0
25.000
2005.0
3.0000
8.000

Regression Analysis: lnmpg versus year_1, weight_1


Analysis of Variance
Source
DF
Seq SS
Regression
2 19.6419
year_1
1
7.5263
weight_1
1 12.1156
Error
196
2.8691
Lack-of-Fit 195
2.8543
Pure Error
1
0.0148
Total
198 22.5110

Contribution
87.25%
33.43%
53.82%
12.75%
12.68%
0.07%
100.00%

Model Summary
S
R-sq
0.120988 87.25%

PRESS
2.96530

R-sq(adj)
87.12%

Adj SS
19.6419
2.7057
12.1156
2.8691
2.8543
0.0148

R-sq(pred)
86.83%

Adj MS
9.8209
2.7057
12.1156
0.0146
0.0146
0.0148

F-Value
670.91
184.84
827.67

P-Value
0.000
0.000
0.000

0.99

0.684

S-ar putea să vă placă și