Documente Academic
Documente Profesional
Documente Cultură
Chapter 10 Overview
Introduction
10-1 Correlation 10-2 Regression 10-3 Coefficient of Determination and Standard Error of the Estimate
Bluman, Chapter 10
Chapter 10 Objectives
1. Draw a scatter plot for a set of ordered pairs. 2. Compute the correlation coefficient. 3. Test the hypothesis H0: = 0. 4. Compute the equation of the regression line. 5. Compute the coefficient of determination. 6. Compute the standard error of the estimate. 7. Find a prediction interval.
Bluman, Chapter 10
Introduction
In addition to hypothesis testing and confidence intervals, inferential statistics involves determining whether a relationship between two or more numerical or quantitative variables exists.
Bluman, Chapter 10
Introduction
Correlation is a statistical method used to determine whether a linear relationship between variables exists. Regression is a statistical method used to describe the nature of the relationship between variablesthat is, positive or negative, linear or nonlinear.
Bluman, Chapter 10 5
Introduction
The purpose of this chapter is to answer these questions statistically: 1. Are two or more variables related? 2. If so, what is the strength of the relationship? 3. What type of relationship exists? 4. What kind of predictions can be made from the relationship?
Bluman, Chapter 10 6
Introduction
1. Are two or more variables related? 2. If so, what is the strength of the relationship?
To answer these two questions, statisticians use coefficient the correlation coefficient, a numerical measure to determine whether two or more variables are related and to determine the strength of the relationship between or among the variables.
Bluman, Chapter 10 7
Introduction
3. What type of relationship exists?
There are two types of relationships: simple and multiple. In a simple relationship, there are two variables: an independent variable (predictor variable) and a dependent variable (response variable). In a multiple relationship, there are two or more independent variables that are used to predict one dependent variable.
Bluman, Chapter 10 8
Introduction
4. What kind of predictions can be made from the relationship?
Predictions are made in all areas and daily. Examples include weather forecasting, stock market analyses, sales predictions, crop predictions, gasoline price predictions, and sports predictions. Some predictions are more accurate than others, due to the strength of the relationship. That is, the stronger the relationship is between variables, the more accurate the prediction is.
Bluman, Chapter 10
A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable x and the dependent variable y.
Bluman, Chapter 10
10
Section 10-1
Example 10-1 Page #536
Bluman, Chapter 10 11
Step 1: Draw and label the x and y axes. Step 2: Plot each point on the graph.
Bluman, Chapter 10 12
Bluman, Chapter 10
13
Correlation
The correlation coefficient computed from the sample data measures the strength and direction of a linear relationship between two variables. There are several types of correlation coefficients. The one explained in this section is called the Pearson product moment correlation coefficient (PPMC) (PPMC). The symbol for the sample correlation coefficient is r. The symbol for the population correlation coefficient is V.
Bluman, Chapter 10 14
Correlation
The range of the correlation coefficient is from 1 to 1. If there is a strong positive linear relationship between the variables, the value of r will be close to 1. If there is a strong negative linear relationship between the variables, the value of r will be close to 1.
Bluman, Chapter 10
15
Correlation
Bluman, Chapter 10
16
Correlation Coefficient
The formula for the correlation coefficient is
r!
n xy x y n x 2 x 2 n y 2 y 2
Section 10-1
Example 10-2 Page #540
Bluman, Chapter 10 18
Bluman, Chapter 10
19
r!
n xy x y n x 2 x 2 n y 2 y 2
r!
Hypothesis Testing
In hypothesis testing, one of the following is true: H0: V ! 0 This null hypothesis means that there is no correlation between the x and y variables in the population. H1 : V { 0 This alternative hypothesis means that there is a significant correlation between the variables in the population.
Bluman, Chapter 10
21
Bluman, Chapter 10
22
Section 10-1
Example 10-3 Page #544
Bluman, Chapter 10 23
Bluman, Chapter 10
24
Step 5: Summarize the results. There is a significant relationship between the number of cars a rental agency owns and its annual income.
Bluman, Chapter 10 25
Section 10-1
Example 10-4 Page #545
Bluman, Chapter 10 26
Bluman, Chapter 10
27
Bluman, Chapter 10
29
Bluman, Chapter 10
30
Bluman, Chapter 10
32
Bluman, Chapter 10
33
10.2 Regression
If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is the datas line of best fit.
Bluman, Chapter 10
34
Regression
Best fit means that the sum of the squares of the vertical distance from each point to the line is at a minimum.
Bluman, Chapter 10
35
! Regression Line yd a bx
a! y
x 2
x
xy
n x
2 2 2
x
n xy
x
y
b! n x
x
2
Section 10-2
Example 10-5 Page #553
Bluman, Chapter 10 37
y
x
x
xy
a! n x
x
2 2 2
x
! 0.106
yd a bx !
yd 0.396 0.106 x !
Bluman, Chapter 10 38
Bluman, Chapter 10
39
40, 4.636
15, 1.986
Bluman, Chapter 10
40
Section 10-2
Example 10-6 Page #555
Bluman, Chapter 10 41
Regression
The magnitude of the change in one variable when the other variable changes exactly 1 unit change is called a marginal change. The value of slope b of the regression line equation represents the marginal change. For valid predictions, the value of the correlation coefficient must be significant. When r is not significantly different from 0, the best predictor of y is the mean of the data values of y.
Bluman, Chapter 10 43
Bluman, Chapter 10
44
Extrapolation, Extrapolation or making predictions beyond the bounds of the data, must be interpreted cautiously. Remember that when predictions are made, they are based on present conditions or on the premise that present trends will continue. This assumption may or may not prove true in the future.
Bluman, Chapter 10
45
Procedure Table
Step 1: Make a table with subject, x, y, xy, x2, and y2 columns. Step 2: Find the values of xy, x2, and y2. Place them in the appropriate columns and sum each column. Step 3: Substitute in the formula to find the value of r. Step 4: When r is significant, substitute in the formulas to find the values of a and b for the regression line equation y d = a + bx.
Bluman, Chapter 10 46
10.3 Coefficient of Determination and Standard Error of the Estimate 2 The total variation y y
is the
sum of the squares of the vertical distances each point is from the mean.
The total variation can be divided into two parts: that which is attributed to the relationship of x and y, and that which is due to chance.
Bluman, Chapter 10
47
Variation
The variation obtained from the relationship (i.e., from the predicted y' 2 values) is yd y
and is called the explained variation variation. Variation due to chance, found by 2 yd y
, is called the unexplained variation. variation This variation cannot be attributed to the relationships.
Bluman, Chapter 10
48
Variation
Bluman, Chapter 10
49
Coefficient of Determiation
The coefficient of determination is the ratio of the explained variation to the total variation. The symbol for the coefficient of determination is r 2.
Another way to arrive at the value for r 2 is to square the correlation coefficient.
Bluman, Chapter 10 50
Coefficient of Nondetermiation
The coefficient of nondetermination is a measure of the unexplained variation. The formula for the coefficient of determination is 1.00 r 2.
Bluman, Chapter 10
51
The standard error of estimate estimate, denoted by sest is the standard deviation of the observed y values about the predicted y' values. The formula for the standard error of estimate is:
sest !
y yd
2
n2
Bluman, Chapter 10
52
Section 10-3
Example 10-7 Page #569
Bluman, Chapter 10 53
Bluman, Chapter 10
54
yd 55.57 8.13 x ! yd 55.57 8.13 1
! 63.70 ! yd 55.57 8.13 2
! 71.83 ! yd 55.57 8.13 3
! 79.96 ! yd 55.57 8.13 4
! 88.09 ! yd 55.57 8.13 6
! 104.35 !
Bluman, Chapter 10
sest ! sest !
y yd
2
55
Section 10-3
Example 10-8 Page #570
Bluman, Chapter 10 56
sest !
y 2 a y b xy n2
Bluman, Chapter 10
57
496
1778
42,186
sest ! sest !
y
2
n x X 1 1 n n x 2 x 2
with d.f. = n - 2
Bluman, Chapter 10 59
Section 10-3
Example 10-9 Page #571
Bluman, Chapter 10 60
x ! 20 x
! 82
20 X ! ! 3.3 6
y
2
n x X
1 1 n n x 2 x
2 6 3 3.3
2
1 6 6 82 20 2
y
2
1 6 6 82
20
2
62
6 3 3.3
Bluman, Chapter 10
y
2
1 1 6 6 82 20 2
6 3 3.3
Hence, you can be 95% confident that the interval 60.53 < y < 99.39 contains the actual value of y.
Bluman, Chapter 10
63