Sunteți pe pagina 1din 5

Confidence intervals for regression parameters

A statistics calculated from a sample provides a point estimate of the unknown


parameter. A point estimate can be thought of as the single best guess for the
population value. While the estimated value from the sample is typically different from
the value of the unknown population parameter, the hope is that it is not too for away.
Based on the sample estimates, it is possible to calculate a range of values that, with a
designated likelihood, includes the population value. Such a range is called a
confidence interval.

NOTE: 90% C.I can be interpret as If we take 100 samples of the same size under the
same conditions and compute 100 C.I’s about parameter, one from each sample, then
90 such C.Is will contain the parameter (i.e not all the constructed C.Is)
Confidence interval estimate of a parameter is more informative than point estimate

because it reflects the precision of the estimate.

The width of the C.I (i.e U.L – L.L) is called precision of the estimate. The precision can

be increased either by decreasing the confidence level or by increasing the sample size.

(iv):- Test the hypothesis that β0=3.5


Test of hypothesis for 0
1) Construction of hypotheses
Ho : o = 3.5
H1: o  3.5
2) Level of significance
 = 5%
3) Test Statistic
bo  o 3.47  3.5
t   0.5
SE (bo) 0.06
4) Decision Rule:- Reject Ho if t cal  t   t 0.025(6)  2.447
( n2)
2

5) Result:- So don’t reject Ho


(vi):- Construct 95% C.I for regression parameters.
95% C.I for 0:

b0  t / 2( n2) SE(b0)

3.47  t.025( 6) (0.06)

3.47  (2.447)(0.06)

(3.32 , 3.62)

(v):- Test the hypothesis that there is no linear relation between Y


and X. i.e β1=0

Test of hypothesis for 1

1) Construction of hypotheses
Ho : 1 = 0
H1: 1  0
2) Level of significance
 = 5%
3) TEST STATISTIC
b1   1  0.0878  0
t   17 .56
SE (b1) 0.005

4) Decision Rule:- Reject Ho if t cal  t   t 0.025(6)  2.447


( n2)
2

5) Result:- So reject Ho and conclude that there is significant relationship


between temperature and oxygen consumption.
(vi):- Construct 95% C.I for regression parameters.
95% C.I for 1

b1 t / 2( n2) SE(b1)

 0.0878  t.025(6) 0.005

 0.0878  (2.447 )0.005 

(-0.1 , -0.076)

(vii):-Perform Analysis of Variance. Calculate/ interpret coefficient


of determination
ANALYSIS OF VARIANCE IN SIMPLE LINEAR REGRESSION

The Analysis of Variance table is also known as the ANOVA table (for ANalysis

Of VAriance). It tells the story of how the regression equation accounts for variablity in

the response variable.

The column labeled Source has three rows: Regression, Residual, and Total. The

column labeled Sum of Squares describes the variability in the response variable, Y.

Partition of variation in dependent variable into explained and unexplained variation

Total variation=Explained variation (Variation due to X also called variation due to

regression) + Unexplained variation (Variation due to unknown factors)

Total variation:- First, the overall variability of the dependent var is calculated by

computing the sumof squares of deviations of Y-values from Y, a quantity termed the

total sum of squares.

S(YY)= 8.915
Explained variation (Variation in Y due to X also called variation due to regression):

b S(XY) =-0.0878(-99.65)=8.74

Unexplained Variation: Total variation – explained variation=8.915-8.7452=0.1698

Associated with any sun of square is its degree of freedom (the # of independent

observations) TSS has n-1 d.f b/c it lose 1 d.f in computing sample mean Y and reg .SS

has (k-1) d.f. b/c there is only one independent var. and residual SS has n-k d.f. where k

is the # of parameters in the model.

The hypothesis 1=0 may be tested by analysis of variance procedure.

ANOVA TABLE

Source Of Degree of Sum of Mean Sum Fcal Ftab


Variation Freedom Squares
of Squares
(S.O.V) (DF) (SS)
(MSS=SS/df)

Regression k-1=1 8.7452 8.7452 308.93* F.05(1,6)=5.99

Error n-k=7-1=6 0.1698 0.0283

TOTAL n-1=8-1=7 8.915

S = 0.1682 R-Sq = 98.1% R-Sq(adj) = 97.8%

Relation between F and t for testing 1=0

F=t2 308.93=(-17.56)2
Goodness of Fit:
An important part of any statistical procedure that built models from data is establishing
how well the model actually fits. This topic encompasses the detecting of possible
violations of the required assumptions in the data being analyzed and to check how
close the observed data points to the fitted line.

A commonly used measure of the goodness of fit of a linear model is R 2 called


coefficient of determination. R² is the squared multiple correlation coefficient. It is also
called the Coefficient of Determination. R² is the Regression sum of squares divided
by the Total sum of squares, RegSS/TotSS. It is the fraction of the variability in the
response that is accounted for by the model. Some call R² the proportion of the variance
explained by the model. If a model has perfect predictability, the Residual Sum of
Squares will be 0 and R²=1. If a model has no predictive capability, R²=0.

Re g.SS 8.7452
R2  x100  x100  98.1%
TotalSS 8.915

The value of R2, indicates that about 98% variation in the dependent variable has been
explained by the linear relationship with X and remaining are due to some other
unknown factors.

(x): Test the goodness of fit of the regression model by residual plot

Residual Plot:- The estimated residuals ei’s are defined as the difference
between observed and fitted values of yi’s i.e e  Y  Yˆ

The plot of ei against the corresponding fitted values Yˆ ' s , provides useful information

about the appropriateness of the model. If the plot of residuals against Yˆ ' s is a random,

scatter and does not show any systematic pattern, then we conclude that the model is

appropriate.

NOTE: If there are same residuals with very large values, it may be indication of the

presence of outliers (the values that are not consistent with data)

S-ar putea să vă placă și