Documente Academic
Documente Profesional
Documente Cultură
The following estimated equations is about major league baseball salaries. The dependent
variable is Isalary and it is a log of salary. The two explanatory variables in the major league
(0.098) + (0.0132)
b. Why is the SSE smaller in the second regression than the first?
c. The sample correlation coefficient between years and rbisyr is about 0.487. Does this
make sense
d. What is the variance of inflation for the slop coefficient in the multiple regression?
Will you say there is little, moderate or strong collinearity between years and rbisyr
e. How come the standard error for the coefficient of years in the multiple regression is
𝐷𝑓 = 𝑛 − 𝑘 − 1
𝑛 = 353 , 𝑘=1
= 353 − 1 − 1 = 351
𝐷𝑓 = 𝑛 − 𝑘 − 1
𝑛 = 353 , 𝑘=2
= 353 − 2 − 1 = 351
b. The first regression is that of a simple regression whereas the second is a multiple
regression with two regressors. For any two models (a simple regression model and a
multiple regression model), if the value of 𝛽1 = 0 then SSE will be identical in both a
simple regression model and a multiple regression model. However, as more explanatory
variables are added to the model, the SSE decreases. For the above models, one more
explanatory variable is included in the second regression thus SSE in the second regression
is smaller. The SSR falls from 326.196 to198.475 when another explanatory variable is
added, and the degrees of freedom also falls by one, which also affects the standard error
c. The range of values for the correlation coefficient is between -1.0 to 1.0. A correlation
coefficient of 0.487 implies a positive correlation between the major league (years) and
runs batted in per year (rbisyr). This means that the more years spent in the major league,
1
𝑉𝐼𝐹𝑦𝑒𝑎𝑟𝑠 = 2
𝑅𝑦𝑒𝑎𝑟𝑠
1
= 1−0.597
= 2.48139
The variance of the slope coefficient (the number of years) is inflated by the factor of
2.48139. By the rule of thumb if the variance of inflation factor is between 1 and 5 it is
moderately correlated. From this value, we can say there is a moderate collinearity between
number of years in the major league and the runs batted in per year.
e. In the multiple regression, the coefficient of years is lower because one more explanatory