Documente Academic
Documente Profesional
Documente Cultură
No. of cars
2 3 2 3
2 4 2 f(x) = 0.4x + 0.8
2.5 R² = 0.8
3 5 3
2
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members
3
f(x) = 0.9767186839 exp( 0.2197224577 x )
2.5 R² = 0.7683868434
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members
No. of cars
3
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members
3
f(x) = - 8.90158753083262E-17x^2 + 0.4x + 0.8
2.5 R² = 0.8
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members
4.5 5 5.5
Logarithmic: y = c + b*ln(x) + u
This is a linear-log relationship.
4.5 5 5.5
Exponential: y = c*exp(bx)*u
This is a log-linear relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)
4.5 5 5.5
Power: y = a*(x^b)*u
This is a log-log relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)
4.5 5 5.5
4.5 5 5.5
CARS HH SIZE CORRELATION COEFFICIENT
1 1
2 2 The correlation coefficient between two series, say x and y, equals
2 3
2 4 Covariance(x,y) / [Sqrt(Variance(x)) * Sqrt(Variance(y))]
3 5
CALCULATION USING THE DATA ANALYSIS ADD-IN where
CARS HH SIZE Covariance(x,y) is the sample covariance between x and y: (1/(n-1)) ×
CARS 1 Variance(x) is the sample variance of x: (1/(n-1)) × Σ i (xi - xbar)2
HH SIZE 0.894427 1 Variance(y) is the sample variance of y: (1/(n-1)) × Σ i (yi - ybar)2
0.894427 On the Formula Tab select the Function Library group and More Functions and Statistical
Select Correlation and fill out the dialog box as below
COVARIANCE
This is obtained in a similar way to correlation.
We can use Data Analysis Add-in and Covariance
CARS HH SIZE
CARS 0.4
HH SIZE 0.8 2
0.8
0.8
ies, say x and y, equals
es a 5 x 5 table of
CARS HH SIZE TWO-VARIABLE LINEAR REGRESSION
1 1 The population regression model is: y = β1 + β2 x + u
2 2 We wish to estimate the regression line: y = b1 + b2 x
2 3
2 4
3 5
We obtain
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.894427
R Square 0.8
Adjusted R 0.733333
Standard E 0.365148
Observatio 5
ANOVA
df SS MS F Significance F
Regression 1 1.6 1.6 12 0.040519
Residual 3 0.4 0.133333
Total 4 2
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 0.8 0.382971 2.088932 0.127907 -0.418784 2.018784 -0.418784 2.018784
HH SIZE 0.4 0.11547 3.464102 0.040519 0.032523 0.767477 0.032523 0.767477
The key output is given in the Coefficients column in the last set of output:
The regression statistics outyput gives measures of how well the model fits the data. In particular
Standard error = 0.365 which measures the standard deviation of yi around its fitted value.
The remaining output (ANOVA table and t Stat, p-value, .... ) is used for statistical inference.
CARS HH SIZE Statistical Inference for Two-variable Regression
1 1
2 2
2 3
2 4
3 5
Regression Statistics
Multiple R 0.894427
R Square 0.8
Adjusted R 0.733333
Standard E 0.365148
Observatio 5
ANOVA
df SS MS F Significance F
Regression 1 1.6 1.6 12 0.040519
Residual 3 0.4 0.133333
Total 4 2
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 0.8 0.382971 2.088932 0.127907 -0.418784 2.018784 -0.418784 2.018784
HH SIZE 0.4 0.11547 3.464102 0.040519 0.032523 0.767477 0.032523 0.767477
The standard error here refers to the estimated standard deviation of the error term u.
It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).
SSE = Residual (or error) sum of squares 0.365148
The ANOVA (analysis of variance) table splits the sum of squares into its components.
For example:
R2 = 1 - Residual SS / Total SS (general formula for R2)
= 1 - 0.4/2.0 (from data in the ANOVA table)
0.8 (which equals R2 given in the regression Statistics table).
The remainder of the ANOVA table is described in more detail in Excel: Multiple Regression.
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 0.8 0.382971 2.088932 0.127907 -0.418784 2.018784 -0.418784 2.018784
HH SIZE 0.4 0.11547 3.464102 0.040519 0.032523 0.767477 0.032523 0.767477
CONFIDENCE INTERVALS FOR SLOPE COEFFICIENT
RESIDUAL OUTPUT
Observation
Predicted CARS
Residuals
1 1.2 -0.2 -0.2
2 1.6 0.4 0.4
3 2 0 0
4 2.4 -0.4 -0.4
5 2.8 0.2 0.2
0.6
0.4
0.2
0
HH
Residuals
0.6
Observation
Predicted CARS
Residuals
Standard Residuals Percentile CARS
0.4
1 1.2 -0.2 -0.632456 10 1
2 1.6 0.4 1.264911 30 2 0.2
3 2 0 0 50 2 0
4 2.4 -0.4 -1.264911 70 2 -0.2 0.5 1 1.5
Norm
3.5
CARS
3
2.5
2
1.5
1
0.5
0
0 10 20
HH SIZE Residual Plot HH SIZE Line Fit Plot
Residuals
0.6
CARS
3.5
0.4 3
2.5
0.2 2
0 1.5 CARS
HH SIZE Residual Plot HH SIZE Line Fit Plot
Residuals
0.6
CARS
3.5
0.4 3
2.5
0.2 2
0 1.5 CARS
1 Predicted CARS
-0.2 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
0.5
-0.4 0
HH SIZE 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
-0.6 HH SIZE
3
2.5
2
1.5
1
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Sample Percentile
lot
CARS
lot
CARS
Predicted CARS
5.5
CARS HH SIZE
1 1
2 2
2 3
2 4
3 5
REGRESSION USING EXCEL FUNCTIONS INTERCEPT, SLOPE, RSQ, STEYX and FORECAST
The individual functions INTERCEPT, SLOPE, RSQ, STEYX and FORECAST can be used to get key results for
two-variable regression
To get just the coefficients give the LINEST command with the last entry 0 rather than 1, ie.
LINEST(A2:A6,B2:B6,1,0),
and then highlight cells A8:B8, say, hit F2 key, and hit CTRL-SHIFT-ENTER.
CARS HH SIZE
1 1
2 2
2 3
2 4
3 5
0.4 0.8
CARS HH SIZE
1 1 1.245731 0.976719
2 2 0.069647 0.230995
2 3 0.768387 0.220245
2 4 9.952632 3
3 5 0.48278 0.145523
1.245731 0.976719
Exponential: y = c*exp(bx)*u
This is a log-linear relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)
CARS regressed on HHSIZE
3.5
of cars
3
nential relationship is estimated
CARS regressed on HHSIZE
3.5
No. of cars
3
f(x) = 0.9767186839 exp( 0.2197224577 x )
2.5 R² = 0.7683868434
1.5
0.5
+ b*ln(x) + ln(u) 0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members
.5 5 5.5
CARS HH SIZE CUBED HH SIZE EXCEL 2007: Multiple Regression
1 1 1
2 2 8
2 3 27
2 4 64
3 5 125
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.895828
R Square 0.802508
Adjusted R 0.605016
Standard Er 0.444401
Observation 5
ANOVA
df SS MS F Significance F
Regression 2 1.605016 0.802508 4.063492 0.197492
Residual 2 0.394984 0.197492
Total 4 2
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept 0.896552 0.764398 1.172886 0.361624 -2.392388 4.185491 -2.392388 4.185491
HH SIZE 0.336468 0.422704 0.79599 0.509507 -1.482279 2.155216 -1.482279 2.155216
CUBED HH S 0.00209 0.013114 0.159364 0.888021 -0.054334 0.058514 -0.054334 0.058514
The standard error here refers to the estimated standard deviation of the error term u.
It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).
SSE = Residual (or error) sum of squares 0.444401
For example:
R2 = 1 - Residual SS / Total SS (general formula for R2)
= 1 - 0.3950/2.0 (from data in the ANOVA table)
0.8025 (which equals R2 given in the regression Statistics table).
The column labeled F gives the overall F-test of H0: β2 = 0 and β3 = 0 versus Ha: at least
one of β2 and β3 does not
Note: Significance F in general = FDIST(F, k-1, n-k) where k is the number of regressors
including hte intercept. where k equals = 3
95% confidence interval for slope coefficient β2 is from Excel output (-1.4823, 2.1552).
b2 ± t_.025(3) × se(b2)
= 0.33647 ± TINV(0.05, 2) × 0.42270 TINV(0.05,2)
= 0.33647 ± 4.303 × 0.42270 4.303
= 0.33647 ± 1.8189 1.8187
= (-1.4823, 2.1552). -1.4823 2.1552
TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE")
Then
t = (b2 - H0 value of β2) / (standard error of b2 )
= (0.33647 - 1) / 0.4227
-1.569733
From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975.
Since the p-value is not less than 0.05 we do not reject the null hypothesis that the
regression parameters are zero at significance level 0.05.
Conclude that the parameters are jointly statistically insignificant at significance level 0.05.
Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors
including hte intercept. Here FDIST(4.0635,2,2) = 0.1975. 0.197492
Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.
yhat = b1 + b2 x2 + b3 x3 = 0.88966 + 0.3365×4 + 0.0021×64 = 2.37006 2.376176
EXCEL LIMITATIONS
Excel standard errors and t-statistics and p-values are based on the assumption that the
error is independent with constant variance (homoskedastic).