Documente Academic
Documente Profesional
Documente Cultură
Points to highlight
— The Simple Linear Regression Model
— Least Square Method
— Assessing the Fit of the Simple Linear Regression
Model
— The Multiple Regression Model
— Inference and Regression
— Categorical Independent Variables
— Modeling Nonlinear Relationship
— Model Fitting
1
11/6/2017
Introduction
— Managerial decisions are often based on the
relationship between two or more variables
— Example: After considering the relationship between
advertising expenditures and sales, a marketing manager
might attempt to predict sales for a given level of
advertising expenditures
— Sometimes a manager will rely on intuition to judge
how two variables are related
— If data can be obtained, a statistical procedure called
regression analysis can be used to develop an
equation showing how the variables are related
3
Introduction
— Regression analysis
- Describe a relationship between one variable and
the other variables in mathematical terms.
- Predict the value of a dependent variable based on
the value of at least one independent variable
- Explain the impact of changes in an independent
variable on the dependent variable
2
11/6/2017
Introduction
— Dependent variable or response: Variable being
predicted (Variable we wish to explain)
— Independent variables or predictor variables: Variables
being used to predict the value of the dependent variable
(Variable used to explain the dependent variable)
— Linear regression: A regression analysis involving at least
one independent variable and one dependent variable
— In statistical notation:
y = dependent variable
x = independent variable
Examples
Example 1: The product manager of a particular
brand of children’s breakfast cereal would like to
predict the demand for cereal during the next
year. To use regression analysis, she and her staff
list the following variables as likely to affect sales:
•Price of the product
•Number of children 5 to 12 years of age (the target
•Price of competitors’ products
•Effectiveness of advertising
•Annual sales this year
•Annual sales in previous year
6
3
11/6/2017
Examples
Example 2: A real estate agent wants to predict
the selling price of houses more accurately. She
believes that the following variables affect the
price of a house:
• Size of the house (number of square feet)
• Number of bedrooms
• Frontage of the lot
• Condition
• Location
Introduction
— Simple linear regression: A regression analysis for
only one independent variable, x, and one dependent
variable, y
— Multiple linear regression: Regression analysis
involving two or more independent variables
4
11/6/2017
§ Regression Model
§ Estimated Regression Equation
10
5
11/6/2017
11
12
6
11/6/2017
13
14
7
11/6/2017
16
8
11/6/2017
17
18
9
11/6/2017
19
min − = min
20
10
11/6/2017
(x i x )( yi y )
b1 i 1
n
and b0 y b1 x
(x i x )2
i 1
where
xi: value of the independent variable for the ith observation
yi: value of the dependent variable for the ith observation
x : mean value for the independent variable
y : mean value for the dependent variable
n: total number of observations
21
22
11
11/6/2017
23
24
12
11/6/2017
25
26
13
11/6/2017
27
28
14
11/6/2017
30
15
11/6/2017
SSE = 8.0288
16
11/6/2017
33
34
17
11/6/2017
35
36
18
11/6/2017
37
38
19
11/6/2017
39
40
20
11/6/2017
21
11/6/2017
§ Regression Model
§ Estimated Multiple Regression Equation
§ Least Squares Method and Multiple Regression
§ Butler Trucking Company and Multiple Regression
§ Using Excel’s Regression Tool to Develop the Estimated
Multiple Regression Equation
22
11/6/2017
45
46
23
11/6/2017
47
48
24
11/6/2017
25
11/6/2017
51
52
26
11/6/2017
Example
A distributor of frozen desert pies wants
to evaluate factors thought to influence
demand
27
11/6/2017
Price Advertising
Week Pie Sales ($) ($100s)
1 350 5.50 3.3
2 460 7.50 3.3
3 350 8.00 3.0
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Example
Dependent variable (y): Pie sales
ŷ b0 b1 x1 b2 x2
28
11/6/2017
Example calculation
y 5990 x x 345.46
1 2
2
x 99.2
1 x 675.26
1
x 2 52.2 x 22 1 8 5
2
x y 39152
1 y 2448500
x y 21087
2
29
11/6/2017
Example calculation
5990 15b0 99.2b1 52.2b2
39152 99.2b0 675.26b1 345.46b2
21087 52.2b 345.46b 185b
0 1 2
b0 3 0 6 .5 2 5
b1 2 4 . 9 7 5
b 7 4 .1 3 1
2
Example calculation
Estimated (Predicted) regression equation:
30
11/6/2017
Predicted sales
is 428.62 pies
31
11/6/2017
Example calculation
2
SSR ( yˆ y ) 29459.96
i
2
SST ( y y ) 56493.33
i
29459.96
R2 0.521 Indication
56493.33 ?
64
32
11/6/2017
65
66
33
11/6/2017
67
34
11/6/2017
69
70
35
11/6/2017
71
72
36
11/6/2017
73
74
37
11/6/2017
75
38
11/6/2017
77
α: level of /2 /2
significance
-tα/2 0 tα/2
39
11/6/2017
79
40
11/6/2017
81
82
41
11/6/2017
84
42
11/6/2017
86
43
11/6/2017
0
Do not Reject H0 F
reject H0
F ,q ,n q 1
44
11/6/2017
1 if an assignment included
travel on the congested
segment of highway during
afternoon rush hour
89
90
45
11/6/2017
91
92
46
11/6/2017
93
94
47
11/6/2017
95
96
48
11/6/2017
97
98
49
11/6/2017
100
50
11/6/2017
101
102
51
11/6/2017
103
104
52
11/6/2017
105
106
53
11/6/2017
107
108
54
11/6/2017
109
110
55
11/6/2017
111
112
56
11/6/2017
Model Fitting
Variable Selection Procedures
— Special procedures are sometimes employed to select
the independent variables to include in the regression
model Iterative procedures:
At each step of the
— Stepwise regression
procedure a single
— Forward selection procedure independent variable
is added or removed
— Sequential replacement procedure and the new model is
evaluated
— Best-subsets procedure Evaluates regression
models involving
different subsets
of the independent
variables 114
57
11/6/2017
Model Fitting
— Variable Selection Procedures
— Backward elimination
— Forward selection
— Stepwise selection
— Best subsets
— Forward selection procedure:
— The analyst establishes a criterion for allowing independent
variables to enter the model
— Example: The independent variable j with the smallest p-
value associated with the test of the hypothesis βj = 0,
subject to some predetermined maximum p-value for which
a potential independent variable will be allowed to enter the
model
115
Model Fitting
— Forward selection procedure (contd.):
— First step: The independent variable that best satisfies
the criterion is added to the model
— Each subsequent step: The remaining independent
variables not in the current model are evaluated, and
the one that best satisfies the criterion is added to the
model
— Procedure stops: When there are no independent
variables not currently in the model that meet the
criterion for being added to the regression model
116
58
11/6/2017
Model Fitting
— Backward selection procedure:
— The analyst establishes a criterion for allowing
independent variables to remain in the model.
— Example: The largest p-value associated with the test of
the hypothesis βj = 0, subject to some predetermined
minimum p-value for which a potential independent
variable will be allowed to remain in the model.
117
Model Fitting
— Backward selection procedure (contd.):
— First step: The independent variable that violates this
criterion to the greatest degree is removed from the model
— Each subsequent step: The independent variables in the
current model are evaluated, and the one that violates this
criterion to the greatest degree is removed from the model
— Procedure stops: When there are no independent variables
currently in the model that violate the criterion for remaining
in the regression model
118
59
11/6/2017
Model Fitting
— Stepwise procedure:
— The analyst establishes both a criterion for allowing
independent variables to enter the model and a
criterion for allowing independent variables to remain
in the model
— In the first step of the procedure, the independent
variable that best satisfies the criterion for entering
the model is added
— First, the remaining independent variables not in the
current model are evaluated, and the one that best
satisfies the criterion for entering is added to the
model
119
Model Fitting
— Stepwise procedure (contd.):
— Then the independent variables in the current model
are evaluated, and the one that violates the criterion
for remaining in the model to the greatest degree is
removed
— The procedure stops when no independent variables
not currently in the model meet the criterion for
being added to the regression model, and no
independent variables currently in the model violate
the criterion for remaining in the regression model
120
60
11/6/2017
Model Fitting
— Best-subsets procedure:
— Simple linear regressions for each of the independent
variables under consideration are generated, and then
the multiple regressions with all combinations of two
independent variables under consideration are
generated, and so on
— Once a regression has been generated for every
possible subset of the independent variables under
consideration, an output that provides some criteria
for selecting regression models is produced for all
models generated
121
Model Fitting
Overfitting
— Results from creating an overly complex model to explain
idiosyncrasies in the sample data
— Results from the use of complex functional forms or
independent variables that do not have meaningful
relationships with the dependent variable
— If a model is overfit to the sample data, it will perform
better on the sample data used to fit the model than it will
on other data from the population
— Thus, an overfit model can be misleading about its
predictive capability and its interpretation
122
61
11/6/2017
Model Fitting
— How does one avoid overfitting a model?
— Use only independent variables that you expect to have
real and meaningful relationships with the dependent
variable
— Use complex models, such as quadratic models and
piecewise linear regression models, only when you have
a reasonable expectation that such complexity provides
a more accurate depiction of what you are modeling
— Do not let software dictate your model; use iterative
modeling procedures, such as the stepwise and best-
subsets procedures, only for guidance and not to
generate your final model
123
Model Fitting
— How does one avoid overfitting a model? (contd.)
— If you have access to a sufficient quantity of data, assess
your model on data other than the sample data that
were used to generate the model (this is referred to as
cross-validation)
— It is recommended to divide the original sample data
into training and validation sets
— Training set: The data set used to build the candidate
models that appear to make practical sense
— Validation set: The set of data used to compare model
performances and ultimately pick a model for
predicting values of the dependent variable
124
62
11/6/2017
Model Fitting
— Holdout method: The sample data are randomly divided
into mutually exclusive and collectively exhaustive training
and validation sets
— k-fold cross-validation: The sample data are randomly
divided into k equal-sized, mutually exclusive, and
collectively exhaustive subsets called fold, and k iterations
are executed
— Leave-one-out cross-validation: For a sample of n
observations, an iteration consists of estimating the model
on n – 1 observations and evaluating the model on the
single observation that was omitted from the training data
63