Sunteți pe pagina 1din 32

The McGraw-Hill Companies, Inc.

, 2000
11-1
Chapter 11
Correlation and
Regression
The McGraw-Hill Companies, Inc., 2000
11-2
Outline
11-1 Introduction
11-2 Scatter Plots
11-3 Correlation
11-4 Regression
The McGraw-Hill Companies, Inc., 2000
11-3
Outline
11-5 Coefficient of
Determination and
Standard Error of Estimate
The McGraw-Hill Companies, Inc., 2000
11-4
Objectives
Draw a scatter plot for a set of
ordered pairs.
Find the correlation coefficient.
Test the hypothesis H
0
: = 0.
Find the equation of the
regression line.
The McGraw-Hill Companies, Inc., 2000
11-5
Objectives
Find the coefficient of
determination.
Find the standard error of
estimate.
The McGraw-Hill Companies, Inc., 2000
11-6
11-2 Scatter Plots
A scatter plot is a graph of the
ordered pairs (x, y) of numbers
consisting of the independent
variable, x, and the dependent
variable, y.
The McGraw-Hill Companies, Inc., 2000
11-7
11-2 Scatter Plots - Example
Construct a scatter plot for the data
obtained in a study of age and systolic
blood pressure of six randomly selected
subjects.
The data is given on the next slide.
The McGraw-Hill Companies, Inc., 2000
11-8
11-2 Scatter Plots - Example
Subject Age, x Pressure, y
A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
The McGraw-Hill Companies, Inc., 2000
11-9
11-2 Scatter Plots - Example
70 60 50 40
150
140
130
120
Age
P
r
e
s
s
u
r
e
70 60 50 40
150
140
130
120
Age
P
r
e
s
s
u
r
e
Positive Relationship
The McGraw-Hill Companies, Inc., 2000
11-10
11-2 Scatter Plots - Other Examples
15 10 5
90
80
70
60
50
40
Number of absences
F
i
n
a
l

g
r
a
d
e
15 10 5
90
80
70
60
50
40
Number of absences
F
i
n
a
l

g
r
a
d
e
Negative Relationship
The McGraw-Hill Companies, Inc., 2000
11-11
11-2 Scatter Plots - Other Examples
7 0 6 0 5 0 4 0 3 0 2 0 1 0 0
1 0
5
0
X
Y

7 0 6 0 5 0 4 0 3 0 2 0 1 0 0
1 0
5
0
x
y

No Relationship
The McGraw-Hill Companies, Inc., 2000
11-12
11-3 Correlation Coefficient
The correlation coefficient
computed from the sample data
measures the strength and direction
of a relationship between two
variables.
Sample correlation coefficient, r.
Population correlation coefficient, .
The McGraw-Hill Companies, Inc., 2000
11-13
11-3 Range of Values for the
Correlation Coefficient
1 +1 0
Strong negative
relationship
Strong positive
relationship
No linear
relationship
The McGraw-Hill Companies, Inc., 2000
11-14
11-3 Formula for the Correlation
Coefficient r
( ) ( ) ( )
( ) ( )
| |
( ) ( )
| |
r
n xy x y
n x x n y y
=








2
2
2
2
Where n is the number of data pairs
The McGraw-Hill Companies, Inc., 2000
11-15
11-3 Correlation Coefficient -
Example (Verify)
Compute the correlation coefficient
for the age and blood pressure data.
. 897 . 0

. 443 112 , 399 20
634 47 = , 819 = , 345
2 2
=
= =
=


r
gives r f or f ormula the in ng Substituti
y x
xy y x
The McGraw-Hill Companies, Inc., 2000
11-16
11-3 The Significance of the
Correlation Coefficient
The population correlation
coefficient, , is the correlation
between all possible pairs of
data values (x, y) taken from a
population.
The McGraw-Hill Companies, Inc., 2000
11-17
11-3 The Significance of the
Correlation Coefficient
H
0
: = 0 H
1
: = 0
This tests for a significant
correlation between the variables
in the population.
The McGraw-Hill Companies, Inc., 2000
11-18
11-3 Formula for the t-tests for the
Correlation Coefficient
t
n
r
with
d
f n
=


=
2
1
2
2
. .
The McGraw-Hill Companies, Inc., 2000
11-19
11-3 Example
Test the significance of the correlation
coefficient for the age and blood
pressure data. Use o = 0.05 and
r = 0.897.
Step 1: State the hypotheses.
H
0
: = 0 H
1
: = 0
The McGraw-Hill Companies, Inc., 2000
11-20
Step 2: Find the critical values. Since
o = 0.05 and there are 6 2 = 4 degrees
of freedom, the critical values are
t = +2.776 and t = 2.776.
Step 3: Compute the test value.
t = 4.059 (verify).
11-3 Example
The McGraw-Hill Companies, Inc., 2000
11-21
Step 4: Make the decision. Reject the
null hypothesis, since the test value
falls in the critical region (4.059 > 2.776).
Step 5: Summarize the results. There is
a significant relationship between the
variables of age and blood pressure.
11-3 Example
The McGraw-Hill Companies, Inc., 2000
11-22
The scatter plot for the age and blood
pressure data displays a linear pattern.
We can model this relationship with a
straight line.
This regression line is called the line of
best fit or the regression line.
The equation of the line is y = a + bx.
11-4 Regression
The McGraw-Hill Companies, Inc., 2000
11-23
11-4 Formulas for the Regression
Line y = a + bx.
)

( ) ( ( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )
a
y x x xy
n x x
b
n xy x y
n x x
=






=






2
2
2
2
2

Where a is the y intercept and b is
the slope of the line.
)
The McGraw-Hill Companies, Inc., 2000
11-24
11-4 Example
Find the equation of the regression line
for the age and the blood pressure data.
Substituting into the formulas give
a = 81.048 and b = 0.964 (verify).
Hence, y = 81.048 + 0.964x.
Note, a represents the intercept and b
the slope of the line.
The McGraw-Hill Companies, Inc., 2000
11-25
11-4 Example
70 60 50 40
150
140
130
120
Age
P
r
e
s
s
u
r
e
70 60 50 40
150
140
130
120
Age
P
r
e
s
s
u
r
e
y
'
= 81.048 + 0.964x
The McGraw-Hill Companies, Inc., 2000
11-26
11-4 Using the Regression Line to
Predict
The regression line can be used to
predict a value for the dependent
variable (y) for a given value of the
independent variable (x).
Caution: Use x values within the
experimental region when
predicting y values.
The McGraw-Hill Companies, Inc., 2000
11-27
11-4 Example
Use the equation of the regression line
to predict the blood pressure for a
person who is 50 years old.
Since y = 81.048 + 0.964x, then
y = 81.048 + 0.964(50) = 129.248 ~ 129.2
Note that the value of 50 is within the
range of x values.
The McGraw-Hill Companies, Inc., 2000
11-28
11-5 Coefficient of Determination
and Standard Error of Estimate
The coefficient of determination,
denoted by r
2
, is a measure of
the variation of the dependent
variable that is explained by the
regression line and the
independent variable.

The McGraw-Hill Companies, Inc., 2000
11-29
11-5 Coefficient of Determination
and Standard Error of Estimate
r
2
is the square of the correlation
coefficient.
The coefficient of
nondetermination is (1 r
2
).
Example: If r = 0.90, then
r
2
= 0.81.
The McGraw-Hill Companies, Inc., 2000
11-30
11-5 Coefficient of Determination
and Standard Error of Estimate
The standard error of estimate,
denoted by s
est
, is the standard
deviation of the observed y
values about the predicted y
values.
The formula is given on the next
slide.
The McGraw-Hill Companies, Inc., 2000
11-31
11-5 Formula for the Standard
Error of Estimate
( )
s
y y
n
or
s
y a y b xy
n
est
est
=



=





2
2
2
2
The McGraw-Hill Companies, Inc., 2000
11-32
11-5 Standard Error of Estimate -
Example
From the regression equation,
y = 55.57 + 8.13x and n = 6, find s
est
.
Here, a = 55.57, b = 8.13, and n = 6.
Substituting into the formula gives
s
est
= 6.48 (verify).

S-ar putea să vă placă și