Documente Academic
Documente Profesional
Documente Cultură
linear. • Linear 17
Weight
16
Weight
16
14
line model can be • No correlation 14
13
13
developed. 12
12
20 21 22 23 24
Length
20 21 22 23 24
Length
Chapter 5 # 1 Chapter 5 # 2
20
called non-linear 10
200
correlated at all
C4
0
Chapter 5 # 3 Chapter 5 # 4
Linear Correlation Linear Correlation
Weight
16
• Positively correlated
15 • As the value of one
Student GPA
variables vary directly.
14
variable increases the
13 3
12
other decreases
20 21 22 23 24
Length
2
0 10 20 30 40
Hours Worked
Chapter 5 # 5 Chapter 5 # 6
90
75
Student GPA
65
55
correlation strength. 50
2 55 65 75 85 95
0 10 20 30 40
Midterm Stats Grade
Hours Worked
Chapter 5 # 7 Chapter 5 # 8
The Correlation Coefficient Interpreting r
Chapter 5 # 9 Chapter 5 # 10
Interpreting r Cautions
• The size (magnitude) of the correlation • The correlation coefficient only gives us an
coefficient tells us the strength of a linear indication about the strength of a linear
relationship relationship.
If | r | > 0.90 implies a strong linear association • Two variables may have a strong curvilinear
For 0.65 < | r | < 0.90 implies a moderate linear relationship, but they could have a “weak” value
association
for r
For | r | < 0.65 this is a weak linear association
Chapter 5 # 11 Chapter 5 # 12
Fundamental Rule of Correlation Setting
• Correlation DOES NOT imply causation • A chemical engineer would like to determine if a
– Just because two variables are highly correlated does relationship exists between the extrusion
not mean that the explanatory variable “causes” the
response
temperature and the strength of a certain
formulation of plastic. She oversees the
production of 15 batches of plastic at various
• Recall the discussion about the correlation temperatures and records the strength results.
between sexual assaults and ice cream cone sales
Chapter 5 # 13 Chapter 5 # 14
Chapter 5 # 15 Chapter 5 # 16
The Scatter Plot
Conclusions by Inspection
• The scatter diagram for
• Does there appear to be a relationship between the
the temperature versus Scatter diagram of Strength vs Temperature
60
study variables?
strength data allows us to
• Classify the relationship as: Linear, curvilinear, no
Strength (psi)
50
deduce the nature of the
relationship between these 40 relationship
two variables 30 • Classify the correlation as positive, negative, or no
20 correlation
120 130 140 150
Temperature (F)
160 170
• Classify the strength of the correlation as strong,
moderate, weak, or none
Computing r Computing r
z-scores
df z-scores for y data
for x data
Chapter 5 # 19 Chapter 5 # 20
Computing r - Example Classifying the strength of linear
correlation
Chapter 5 # 21 Chapter 5 # 22
Chapter 5 # 23 Chapter 5 # 24
Using the Line of Best Fit to Make
The Line of Best Fit Plot Predictions
Strength
Strength
40
30
extruded at 142 degrees?
• This is the Line of Best 30
20
Fit (LOBF) 120 130 140 150 160 170
20
Chapter 5 # 25 Chapter 5 # 26
Using the Line of Best Fit to Make Using the Line of Best Fit to Make
Predictions Predictions
Strength
• Based on this model we 40
order to achieve a strength
would predict a strength of
appx. 39 psi for plastic of 45 psi? 30
extruded at 142 F 20
Chapter 5 # 27 Chapter 5 # 28
Using the Line of Best Fit to Make
Predictions Computing the LSR model
Chapter 5 # 29 Chapter 5 # 30
Bivariate data and the sample linear The straight line model
regression model
yˆ = b o + b1 x
Chapter 5 # 31 Chapter 5 # 32
The Parameter Estimators Calculating the Parameter Estimators
Chapter 5 # 33 Chapter 5 # 34
• To get the slope estimator we use: • The intercept estimator is computed from the
variable means and the slope:
n Σ (x y ) − Σ x ⋅ Σ y
b1 =
( )
n Σ x 2 − (Σ x )
2 b0 = y − b1 x
or
• Realize that both the slope and intercept
sy estimated in these last two slides are really point
b 1 = r estimates for the true slope and y-intercept
sx Chapter 5 # 35 Chapter 5 # 36
Revisit the manatee example Computing the estimators
Chapter 5 # 37 Chapter 5 # 38
Chapter 5 # 39 Chapter 5 # 40
The slope estimate The slope estimate
• b1 is the estimated slope of the line • In our example the estimated slope is 2.27
• The interpretation of the slope is, “The amount of change
in the response for every one unit change in the • This is interpreted as, “For each additional 10,000 boats
independent variable.” registered, an additional 2.27 more manatees are killed
Chapter 5 # 41 Chapter 5 # 42
• Recall the sample regression model: • Sometimes this value is meaningful. For example
resting metabolic rate versus ambient temperature in
“b0” is the estimated y- intercept
Centigrade (oC)
Chapter 5 # 49 Chapter 5 # 50
Regression Estimates
The coefficient of determination The r2 Value
• r2 is called the coefficient of determination.
• r2 is a proportion, so it is a number between 0 and 1 • If r2 is, say, 0.857 we can conclude that 85.7% of the
inclusive. variability in the response is explained by the
• r2 quantifies the amount of variation in the response that is variability in the independent variable.
due to the variability in the predictor variable.
• This leaves 100 - 85.7 = 14.3% left unexplained. It’s
• r2 values close to 0 mean that our estimated model is a only the unexplained variation that is incorporated into
poor one while values close to 1 imply that our model the “uncertainty”
does a great job explaining the variation
Chapter 5 # 51 Chapter 5 # 52
Scatter of Points and r2
r2 and the correlation coefficient
r2 = 0.848 r2 = 0.992
Chapter 5 # 53 Chapter 5 # 54