Sunteți pe pagina 1din 16

Chapter 2

Regression Approaches
• Initial investigations.
• Simple linear regression model.
• Parameter estimation.
• Forecasting.
• Multivariate linear regression model.
• Parameter estimation.
• Forecasting.
• Model building and residual analysis.

. – p.1/15
Initial Investigations

• It is a good practice to carry out some investigation on the data


before performing advance analyses (e.g. modelling).
• Some reasons for performing initial analyses:
• to identify some pattern of the data,
• to identify any potential outlier or non-normal behaviour of
some observations and
• to understand the data better.
• Some possible methods that can be used:
• Plots (e.g. scatter plot, histogram, distribution plot etc.).
• Simple statistics measurements (e.g. mean, variance,
skewness etc.).

. – p.2/15
Simple Linear Regression Model

• Objective: to model a relationship between two variables.


• This model assumes that the relationship between the dependent
variable, y, and the independent variable, x, can be described by
a straight line:

y = β0 + β1 x + ǫ (1)

where
β0 - intercept of y when x = 0
β1 - slope; the change in the mean value of y associated with a
unit increase in x
ǫ - error term
• All the unknown parameters can be estimated using least square
method so that the estimated model is ŷ = b0 + b1 x where b0 and
b1 are unbiased estimators of β0 and β1 respectively.

. – p.3/15
Least Square Method

• Objective: this method seek for estimators (b0 and b1 ) that give
minimum total value of error rate e.
• The total error is computed by:
X X
e2t = (yt − ŷt )2 (2)
t t

• By solving equation (2), then we obtain the estimators as follows:


• b0 = ȳ − b1 x̄
• b1 = SSxy /SSxx
where P P
n
P P xt yt
t t
SSxy = (xt − x̄)(yt − ȳ) = xt yt − n
t=1 t
(xt )2
P
n
(xt − x̄)2 = x2t −
P P t
SSxx = n
t=1 t
n
P n
P
ȳ = yt /n and x̄ = xt /n
t=1 t=1 . – p.4/15
Model Fit

(i) Determination of relationship between x and y.


• Degree of relationship between x and y represents how variability
in y can be explained by x.
• In regression analysis, total variation consists of explained
variation and unexplained variation,

the total of squared of errors obtained when we do


Total variation
not consider the explain variable x, (yt − ȳ)2 .
P
t
Unexplained variation it measures the amount of variation in the
values of y that is NOT explained by x. Also called SSE,
(yt − ŷt )2 .
P
t
Explained variation it measures the amount
P of variation in the
values of ŷ that is explained by x, (ŷt − ȳ)2 .
t

. – p.5/15
Model Fit
• So, the degree of relationship between x and y can be measured
using a simple coefficient called R2
Explained variation
R2 =
Total variation
where 0 ≤ R2 ≤ 1.
• This coefficient gives the proportion of the total variation in ŷ that
is explained by the simple linear regression model based on the
sample of size n. The constructed model is explainable when R2
approaching 1.

• R = R2 (−1 ≤ R ≤ 1) gives a direction of relationship; R > 0
shows a positive relationship and R < 0 exhibits negative
relationship.

. – p.6/15
Model Fit
• Hypothesis testing for determining significance relationship of x
and y
H0 : There is no relationship between x and y, ρ = 0.
H1 : There is a relationship between x and y, ρ 6= 1.
• The test statistic

r√ n−2
t= 1−r 2

. – p.7/15
Model Fit

(ii) An F -test for testing the model.


• This statistic tests the significance of the constructed model.
• Hypothesis testing
H0 : β0 = β1 = 0.
H1 : some parameters are important in the model.
• The relevant test statistic

Explained variation
FM = .
Unexplained variation/(n−1)

If the regression assumptions hold, then under H0 the statistic FM


will have F -distribution with 1 and n − 2 degrees of freedom.

. – p.8/15
Model Fit

(iii) Testing significance of b1 .


• Objective: to check the significance relationship between x and y.
• Null hypothesis (for example)

H0 : β1 = 0 vs β1 6= 0
• If the regression assumptions hold, then

b1 ∼ N (β1 , σb1 = σ/ SSxx )

where the estimator of σb1 is sb1 = s/ SSxx
• Then,

b1 −β1
s b1

has t−distribution with n − 2 degrees of freedom.

. – p.9/15
Model Fit

(iv) Testing significance of b0 .


• Objective: to check the significance of intercept in y-axis.
• Null hypothesis (for example)

H0 : β0 = 0 vs β0 6= 0
• If the regression assumptions hold, then b0 ∼ N (β0 , σb0 ) where
q
1 x̄2
the estimator of σb0 is sb0 = s n + SSxx
• Then,

b0 −β0
s b0

has t−distribution with n − 2 degrees of freedom.

. – p.10/15
Model Adequacy Check

• Statistic models depend on some assumptions. These must be


checked so that the obtained results can be accepted.
• In least square linear model, the following assumptions must be
fulfilled.
• A linear relationship between x and y.
• Error term, ǫ, must be normally distributed with mean 0 and a
constant variance σ.
• Any value of error is statistically independent of each other.
• Mean square error, σ 2 , is estimated by
s2 = yt2 − b0 yt − b1 xt yt = SSE
P P P P

• Standard error, σ, is estimated by


q
s = SSE n−2

• All these can be checked through plots.

. – p.11/15
Some Informative Plots

. – p.12/15
Forecasting Using the
Simple Linear Regression Model

• Once the constructed model has been checked and we are


satisfied with it, then forecasting can be made.
• Forecasting can be made through
• point estimate
e.g. let the constructed model is ŷ = 0.5 + 2x. By replacing x
with a value then we obtain the value of y.
• interval estimate
e.g. by giving a value of x, we want to know range of possible
values of ŷ. This can be done by solving the following
equation
s
(n−2) 1 (x − x̄)2
ŷ ± t(α/2) × s +
n SSxx

. – p.13/15
Example:
Quality Home Improvement Center (QHIC)

QHIC operates five stores in a large metropolitan area. The marketing

department at QHIC wishes to study the relationship between home

value (in thousands $), x, and yearly expenditure on home upkeep

($), y. A random sample of 40 homeowners is taken, and they are asked

to estimate their expenditures during the previous year on the types of

home upkeep products and services offered by QHIC.

. – p.14/15
How to Investigate?

Check list
1. What is the relationship between x and y? Is it linear?
2. What is the estimated value of parameters β0 and β1 ?
3. Is the constructed model good enough?
4. Does the constructed model fulfill all the assumptions? (Check
the error plots)
5. Can prediction or forecasting can be made?
6. What can we conclude about the constructed model?

. – p.15/15
Some Issues in Regression

• What should we do if relationship between x and y is non-linear?


• How can we increase the value of R2 ?
• What if the data contain some outliers or extreme values?
• How do we know that the constructed model is good enough?

. – p.16/15

S-ar putea să vă placă și