Documente Academic
Documente Profesional
Documente Cultură
Taller correlacion
cod: 2122274
1. See Table E11-1 for data on the ratings of quarterbacks for the 2008 National Football League season (The
Sports Network). It is suspected that the rating (y) is related to the average number of yards gained per pass
attempt (x).
format long
x=[8.39,7.67,7.66,7.98,7.21,7.53,8.01,7.66,7.21,7.16,7.93,7.10,6.33,6.76,6.86,7.35,7.22,7.94,6.
y=[105.5,97.4,96.9,96.2,95,93.8,92.7,91.4,90.2,89.4,87.7,87.5,87,86.4,86.4,86,85.4,84.7,84.3,81
[b1,b0,s]=regresion_lineal(x,y,1);
(a) Calculate R2 for this model and provide a practical interpretation of this quantity.
yy=0;
yy1=0;
n=length(y);
for i=1:1:n
yy=((y(i)-mean(y))*(x(i)-mean(x)))+yy;
yy1=(y(i)-mean(y))*(y(i)-mean(y))+yy1;
end
R21=(b1*yy)/yy1
R21 =
0.671801770078746
Se puede observar que el modelo se ajusta de buena manera, es decir, es fiable ya que el coeficiente de
determinacion da un valor que se acerca a 1.
(b) Prepare a normal probability plot of the residuals from the least squares model. Does the normality
assumption seem to be satisfied?
e=[];
for i=1:1:n
e(i)=(y(i)-(b0+(b1*x(i)))) ;
end
1
figure()
normplot(e)
(c) Plot the residuals versus the fitted values and against x. Interpret these graphs (The linear regression model
appears to be appropriate).
figure()
plot(e,y,'o')
title("Y vs Residuales")
xlabel("e")
ylabel("Y")
2
figure()
plot(e,x,'o')
title("X vs Residuales")
xlabel("e")
ylabel("X")
3
2. An article in Technometrics by S. C. Narula and J. F. Wellington [“Prediction, Linear Regression, and a
Minimum Sum of Relative Errors” (1977, Vol. 19)] presents data on the selling price and annual taxes for 24
houses. The data are in the Table E11-2. Refer to the data in table on house-selling price y and taxes paid x.
x2=[5.0500,8.2464,6.6969,7.7841,9.0384,5.9894,7.5422,8.7951,6.0831,8.3607,8.1400,9.1416];
y2=[30.0,36.9,41.9,40.5,43.9,37.5,37.9,44.5,37.9,38.9,36.9,45.8]
y2 = 1×12
30.000000000000000 36.899999999999999 41.899999999999999 40.500000000000000
[b12,b02,s2]=regresion_lineal(x2,y2,1);
e2=[];
n2=length(y2);
for j=1:1:n2
e2(j)=(y(j)-(b02+(b12*x2(j)))) ;
end
disp("Residuales")
Residuales
disp(e2)
4
Columns 1 through 8
Columns 9 through 12
(b) Prepare a normal probability plot of the residuals and interpret this display.
figure()
normplot(e2)
(c) Plot the residuals versus ybi and versus xi . Does the normality assumption seem to be satisfied?
figure()
plot(e2,y2,'o')
title("Y vs Residuales")
xlabel("e")
ylabel("Y")
5
figure()
plot(e2,x2,'o')
title("X vs Residuales")
xlabel("e")
ylabel("X")
6
(d) What proportion of total variability is explained by the regression model?
yy2=0;
yy22=0;
n=length(y2);
for i=1:1:n
yy2=((y2(i)-mean(y2))*(x2(i)-mean(x2)))+yy2;
yy22=(y2(i)-mean(y2))*(y2(i)-mean(y2))+yy22;
end
R22=(b12*yy2)/yy22
R22 =
0.544645445033293
3. The number of pounds of steam used per month by a chemical plant is thought to be related to the average
ambient temperature (in ◦F) for that month. The past year’s usage and temperatures are in the following table:
x3=[21,24,32,47,50,59,68,74,62,50,41,30];
y3=[185.79,214.47,288.03,424.84,454.58,539.03,621.55,675.06,562.03,452.93,369.95,273.98];
[b13,b03,s3]=regresion_lineal(x3,y3,1);
(a) What proportion of total variability is accounted for by the simple linear regression model?
yy3=0;
yy32=0;
n=length(y3);
for i3=1:1:n
7
yy3=((y3(i3)-mean(y3))*(x3(i3)-mean(x3)))+yy3;
yy32=(y3(i)-mean(y3))*(y3(i3)-mean(y3))+yy32;
end
R23=(b13*yy3)/yy32
R23 =
-5.932776967566785e+15
(b) Prepare a normal probability plot of the residuals and interpret this graph.
e3=[];
n3=length(y3);
for j3=1:1:n3
e3(j3)=(y(j3)-(b03+(b13*x3(j3)))) ;
end
figure()
normplot(e3)
figure()
plot(e3,y3,'o')
title("Y vs Residuales")
xlabel("e")
ylabel("Y")
8
figure()
plot(e3,x3,'o')
title("X vs Residuales")
xlabel("e")
ylabel("X")
9
4. Suppose that data are obtained from 20 pairs of (x, y) and the sample correlation coeffi cient is 0.8.
(a) Test the hypothesis that H0 : ρ = 0 against H1 : ρ with α = 0.05. Calculate the P-value.
T0=(0.8*sqrt(18))/(sqrt(1-(0.8*0.8)))
T0 =
5.656854249492381
Pvalue=2*(1-tcdf(abs(T0),18))
Pvalue =
2.292887199439875e-05
(b) Test the hypothesis that H1 : ρ = 0.5 against H1 : ρ ≠ 0.5 with α = 0.05. Calculate the P-value.
T02=(0.5*sqrt(18))/(sqrt(1-0.25))
T02 =
2.449489742783178
Pvalue2=2*(1-tcdf(abs(T02),18))
Pvalue2 =
0.024769558804110
10
(c) Construct a 95% two-sided confidence interval for the correlation coefficient.
z=norminv(0.025)
z =
-1.959963984540054
inf=tanh(atanh(0.8)-(z/sqrt(17)))
inf =
0.917655484096945
sup=tanh(atanh(0.8)+(z/sqrt(17)))
sup =
0.553387644453858
11