Documente Academic
Documente Profesional
Documente Cultură
REGRESSION
O N E - FA C TO R E X P E R I M E N TA L D E S I G N
Y, Response variable
Continuous Discrete
(Output has a mean and variance) (Output is a proportion, i.e., 15 out of 50 or 30%)
Continuous: data that can be subdivided into smaller and smaller increments, i.e., Time: Weeks, Hours, Minutes, seconds, etc.
Discrete: data that falls into distinct categories, i.e., Gender (Male or Female), Day of Week (Mon, Tues, etc.), University (OSU,MSU, etc.)
* Regression is a tool that uses a one-sample t hypothesis test, constant = 0, to test for significance.
Overview
Correlation Coefficient
Simple Linear Regression (SLR) Example
Components of SLR
Goodness of Fit
Regression Requirements
Confidence & Prediction Intervals
Example
Describing Relationships
7
TWO QUANTITATIVE
VARIABLES
Scatter Plot
8
60000
50000
40000
Coffees
30000
y
29913
20000
10000
0
-10 0 10 20 30 40 50 60 70
Temperature
s xy xi x yi y
1
Correlation r
sx s y n 1
s x s y
Equation
"r" ranges
from -1 to +1
r is a sample statistic
that estimates the
population parameter, ρ
Strength: how
closely the points
follow a straight line.
Direction: is positive
when individuals with
higher X values tend
to have higher values
of Y.
9
Application
10
Important inputs?
Regression Analysis
11
Regression Analysis is a technique used to build an
equation that can be used to estimate or predict the value
of one variable by using its relationship with one or more
other variables, i.e., describe the “tendency” or the “FIT,”
with an equation:
Straight line: y = a + bx
Deterministic Relationship
13
50
40
Profit
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10 11
Units Sold
The First Order Deterministic Equation
14
The value of Y is perfectly determined by the value of X.
y
Slope = b =
x
Y-intercept = a
Stochastic Relationship
15
50.0
40.0
Profit
30.0
20.0
10.0
0.0
0 2 4 6 8 10 12
Units Sold
The First Order Stochastic Equation
16
yi a bxi ei
Y
x1
x2
X yˆ a bx
Exercise
17
Correlation Coefficient
Simple Linear Regression (SLR) Example
Components of SLR
Goodness of Fit
Regression Requirements
Confidence & Prediction Intervals
Example
Application
19
21
The company hopes to better understand
market share by considering how it relates to
the costs and benefits of the additional press:
22
The Data
23
Company 1 2 3 4 5 6
Conclusions:
Scatter Plot For the Printing Industry
35
30
Y =Market Share (%)
25
(x5 = 2.0, y5 = 13)
20
15
10
0
0 1 2 3
X = Benefit/Cost Ratio
Overview
Correlation Coefficient
Simple Linear Regression (SLR) Example
Components of SLR
Goodness of Fit
Regression Requirements
Confidence & Prediction Intervals
Example
Key Components
26
Yˆi b0 b1 X i
Yˆ i = The estimated mean value and the predicted value
of the dependent variable. Do not confuse this value with
Yi.
E{bi} = i,
Ho: i = 0
Ha: i = 0
The Residuals
32
Scatterplot of y vs x
30
25
20 ŷ i ei yi yˆ i
y
16.67
15
yi
10
Since Yˆi b0 b1 X i:
n n
min i 0 1 i min
(Y
i 1
(b b X )) 2
i 0 1 i
(Y
i 1
b b X ) 2
34
Formulas for the sample slope and intercept
35
b1
( xi x )( yi y ) Cov( x, y )
r
sy
and b0 y b1 x
( xi x ) 2 2
sX sx
where
Cov(x, y)
( xi x )( yi y ) , s 2X
i
( x x ) 2
, s 2y
i
( y y ) 2
n 1 n 1 n 1
b1
xi yi ( xi )( yi ) / n xi yi nx y
x 2
i
( i
x ) 2
/ n x 2
nxi
2
Computations for Our Case
36
i xi yi ( xi x ) ( yi y ) ( xi x ) ( y i y ) ( xi x ) 2 ( yi y ) 2
1 1.8 11.0 -0.3833 -5.6667 2.1720 0.1469 32.1111
2 2.3 16.0 0.1167 -0.6667 -0.0778 0.0136 0.4444
3 2.9 31.0 0.7167 14.3333 10.2725 0.5136 205.4444
4 1.6 5.0 -0.5833 -11.6667 6.8052 0.3403 136.1111
5 2.0 13.0 -0.1833 -3.6667 0.6721 0.0336 13.4444
6 2.5 24.0 0.3167 7.3333 2.3225 0.1003 53.7778
13.1 100.0 0.00000 0.0000 22.1665 1.14833 441.3332
13.1 100 22.1665 1.14833 441.3332
x y s xy s x2 s 2y
6 6 5 5 5
x 2.1833 y 16.6667 s xy 4.4333 s x2 0.22967 s 2y 88.2666
4.4333
r 0.985
0.22967 88.2666
Exercise
37
Ben/Cost MktShr Ben/Cost MktShr
Correlation One Variable Data Set Data Set
Table Data Set #1 Data Set #1 Summary #1 #1
Ben/Cost 1.000 0.985 Mean 2.1833 16.667
MktShr 0.985 1.000
Variance 0.2297 88.267
Ben/Cost MktShr Std. Dev. 0.4792 9.395
Covariance Minimum 1.6000 5.000
Table Data Set #1 Data Set #1
Maximum 2.9000 31.000
Ben/Cost 0.2297 4.4333
MktShr 4.4333 88.2667 Count 6 6
b1= b0 =
Help From Technology
38
42
Exercise
43
Correlation Coefficient
Simple Linear Regression (SLR) Example
Components of SLR
Goodness of Fit
Regression Requirements
Confidence & Prediction Intervals
Example
Evaluating the Goodness of Fit
45
sY2
( yi y ) 2
( yi 16.6667 ) 2
441 .3334 SSTotal
n 1 6 1 5 dfT
24 – 22.78 = 1.22
(2.50, 24%)
22.78
47
SSTotal (SST) = ( yi yi ) 2 = 441.33
( yˆi y) 2 = 427.89
48
Help From Technology
49
51
The Standard Error of the Model, s
52
s
( yi yi )
ˆ 2
SSE
MSE
n2 n2
53
Interpretation of s
Compare it to the values of the response variable:
Range: 5 to 31
y = 16.6667
sY2 = 88.2667
54
Help From Technology
55
a. 0
b. 24.667
c. 1.5
d. 2.75
Exercise
58
a. Only I is true.
b. Only II is true.
c. Only III is true.
d. II and III are both true.
e. I, II, and III are all true.
Overview
Correlation Coefficient
Simple Linear Regression (SLR) Example
Components of SLR
Goodness of Fit
Regression Requirements
Confidence & Prediction Intervals
Example
Required Data Conditions
for Inference: εi iid N(0, σ)
60
yi 0 1 xi i .
Y is also Normally distributed.
2. Homogeneity of Variance
62
E ( y) E(0 1 x ) E (0 1 x) E ( )
μY/x
0
That is, the expected value of yi is the population
regression equation.
In Addition, Fit the Best Model
65
R-sq = 0.835
S = 0.9589
R-sq = 0.981
S = 0.3308
Verifying the Required
conditions
66
Residual Analysis
67
xi yi ŷ i ei yi yˆ i
68
Check the assumption of Normality:
look at a Normal probability plot, of the residuals
conduct a chi-square goodness of fit test (Normality
test) on the residuals
69
Healthy Residual Plot
70
Normal spread 2
No outliers
Standardized Residual
1
-1
-2
Standardized Residual
99
90
1
Percent
0
50
-1
10
-2
1
0.1
-4 -2 0 2 4 800 1000 1200 1400 1600
Standardized Residual Fitted Value
Standardized Residual 1
18
Frequency
0
12
-1
6
-2
0
-2 -1 0 1 2 1 10 20 30 40 50 60 70 80 90 100
Standardized Residual Observation Order
Exercise
76
90 100
Residual
Percent
100
Residual
36
0
24
-100
12
-200
0
-225 -150 -75 0 75 150 225 1 20 40 60 80 100 120 140 160 180 200 220
Expected value of F:
E (F ) E
MSR
2
1
2
( xi x ) 2
1
MSE
2
2
When H0: β1 = 0 is true, E ( F ) 2 1.0
F-ratio has an F sampling distribution with:
numerator df = dfR = p
denominator df = dfE = n – p – 1
Testing Significance of a Regression Model:
The F-Test
80
Ha: 1 0.0
Summary R R-Square Estimate
s MSE
We estimate b1 with SEb1 sb1
i ( xi x ) 2 (n 1) s X2
b1 1
Then, ~ t(dfE).
sb1
This means that our usual Normal-based “templates” for creating a
confidence interval and for conducting a hypothesis test are valid.
Inference about β1
83
Ho: 1 =
Reject
Ha: 1 α/2
Reject
α/2
t
Ho: 1 = Reject
Ha: 1 > α
t Test Statistic =
Reject b1 1
tobs =
Ho: 1 = α
sb1
Ha: 1 <
t
Mini-Case: Test of Significance of Ben/Cost
85
b1 [0] 19.303
Test Statistic: t obs 11.28
sb1 3.36 / 1.1483
Reject the H0
It appears that the benefit/cost ratio is a significant
variable to predict the market share.
Help From Technology
86
Correlation Coefficient
Simple Linear Regression (SLR) Example
Components of SLR
Goodness of Fit
Regression Requirements
Confidence & Prediction Intervals
Example
Confidence & Prediction
Intervals
88
Point estimate ±
1 (x x)
ˆ ta 2,n p 1 sˆ yˆ ta 2,n p 1
* 2
MSE 2
m y=
n
i
( x i x )
95% CI Estimate of μY/X = 1.92
91
1 ( x* x ) 2
Standard Error: S ˆ MSE
2
n ( xi x )
1 (1.92 2.1833) 2
S ˆ 3.36 .8733
6 (5)(0.2297)
1 ( x* x ) 2
Standard Error: S yˆ MSE 1 2
n ( x x)
Correlation Coefficient
Simple Linear Regression (SLR) Example
Components of SLR
Goodness of Fit
Regression Requirements
Confidence & Prediction Intervals
Example
Example
97
E D D+ C- C C+ B- B B+ A- A
0.0 1.0 1.3 1.7 2.0 2.3 2.7 3.0 3.3 3.7 4.0
Residual
Percent
50 0.0
-0.5
10
-1.0
1
-1.0 -0.5 0.0 0.5 1.0 2.5 3.0 3.5 4.0
Residual Fitted Value
Residual
8 0.0
-0.5
4
-1.0
0
-1.5 -1.0 -0.5 0.0 0.5 1.0 2 4 6 8 10 12 14 16 18 20
Residual Observation Order
Does Amount of Time Studying Affect Your Grade
105
Enter the standardized residuals, SRES1 as the variable in the dialog box
Click on “OK”
Probability Plot of SRES1
Normal
99
Mean 0.004307
StDev 1.035
95 N 20
AD 0.996
90
P-Value 0.010
80
70
Percent
60
50
40
30
20
10
5
1
-3 -2 -1 0 1 2 3
SRES1
The null and alternative hypotheses are: Since the p-value is < 0.05, we
Ho: Data follow a normal distribution reject the null and determine the
Ha: Data do not follow a normal distribution residuals are not normal.