Documente Academic
Documente Profesional
Documente Cultură
Chapter 5
Correlation and Regression
FHMM1034
Mathematics III
Content
5.1 Introduction
5.2 Linear Correlation
5.3 Simple Linear Regression
5.4 Coefficient of Determination
5.5 Regression Analysis : A complete example
FHMM1034
Mathematics III
5.1
Introduction
FHMM1034
Mathematics III
Introduction
The main objective of this chapter is to analyze a
collection of paired sample data (or bivariate data)
and determine whether there appears to be a
relationship between the two variables.
Example:
What is the relationship between cholesterol levels
and the incidence of heart disease?
FHMM1034
Mathematics III
Introduction
There are 2 most common procedures for examining
relationships between measured variables:
1.
Correlation Analysis
Regression Analysis
Bivariate Data
When two variables are measured on a single
experimental unit, the resulting data are called bivariate
data.
You can describe each variable individually, and you
can also explore the relationship between the two
variables.
Bivariate data can be described with
Graphs
Numerical Measures
FHMM1034
Mathematics III
Examining Relationship
Dependent variable (also known as Y variable)
which measures the outcome of a study. It is the
variable that is being predicted or estimated.
Independent variable (also known as X variable)
which is a variable that attempts to explain the
variation in Y. It is the predictor variable.
FHMM1034
Mathematics III
Scatter Diagram
When both of the variables are quantitative, call one variable x
and the other y. A single measurement is a pair of numbers (x, y)
that can be plotted using a two-dimensional graph called a
scatter plot.
y
(2, 5)
y=5
x
x=2
Scatter diagram (scatter plot) is a plot of paired observations that
portrays the relationship between the X and Y variables.
FHMM1034
Mathematics III
Example 1
Incomes and food expenditure of seven households are
listed below. Using the information, draw a scatter
diagram.
FHMM1034
Mathematics III
Income
(hundreds RM)
Food expenditure
(hundreds of dollars)
35
49
21
39
15
28
25
9
15
7
11
5
8
9
9
Example 1 (cont.)
The scatter diagram:
FHMM1034
Mathematics III
10
5.2
Linear Correlation
FHMM1034
Mathematics III
11
Correlation Analysis
A group of techniques to measure the association
relationship between variables.
Examples:
1.Time spent study and exam grade.
2.Salary and years of working experience.
3.Age and blood pressure.
4.Smoking and lungs cancer.
FHMM1034
Mathematics III
12
13
r
FHMM1034
Mathematics III
S XY
S XX
SYY
14
S XY
S XX
SYY
where,
S XX x 2
SYY y
FHMM1034
Mathematics III
S XY
xy
xy
n
15
Linear Correlation
FHMM1034
Mathematics III
16
Linear Correlation
FHMM1034
Mathematics III
17
Linear Correlation
FHMM1034
Mathematics III
18
Linear Correlation
Perfect positive linear correlation :
When r = 1:
In this case, all points in the scatter diagram lie on a
straight line that slopes upward from left to right.
y
r=1
FHMM1034
Mathematics III
x
19
Linear Correlation
Perfect negative linear correlation :
When r = 1:
In this case, all points in the scatter diagram fall on a
straight line that slopes downward from left to the right.
y
FHMM1034
Mathematics III
r = 1
20
Linear Correlation
FHMM1034
Mathematics III
21
Linear Correlation
Properties of the linear correlation coefficient, r :
(i)The value of r is always between 1 and 1 inclusive.
That is,
1 r 1
(ii) r measures the strength of a linear relationship. It is
not designed to measure the strength of a relationship
that is not linear.
FHMM1034
Mathematics III
22
Example 2
Calculate the correlation coefficient of the example
of incomes and food expenditures of seven
households in Example 1.
FHMM1034
Mathematics III
23
5.3
Simple Linear
Regression
FHMM1034
Mathematics III
24
25
26
y
= dependent variable
= independent variable
= y-intercept
= slope
27
y a b x
FHMM1034
Mathematics III
28
and
a y bx
FHMM1034
Mathematics III
29
Interpretation of a and b
Note:
When b is positive, an increase in x will lead to an
increase in y and a decrease in x will lead to a
decrease in y. Such a relationship between x and y
is called a positive linear relationship.
If the value of b is negative, an increase in x will
cause a decrease in y and a decrease in x will cause an
increase in y. Such a relationship between x and y
is called a negative linear relationship.
FHMM1034
Mathematics III
30
FHMM1034
Mathematics III
31
Example 3
Table below shows the incomes and food expenditures
(in hundreds of dollar) of seven households.
FHMM1034
Mathematics III
Income
Food Expenditure
35
49
15
21
39
11
15
28
25
9
32
Example 3
(a) Find the least squares regression line for the data on
incomes and food expenditures on the seven
households.
(b) What is the predicted food expenditure for a
household with income of RM3000?
(c) Give a brief interpretation of the values of a and b
calculated in part (a).
FHMM1034
Mathematics III
33
5.4
Coefficient of
Determination
FHMM1034
Mathematics III
34
SSE ( y y )
FHMM1034
Mathematics III
35
Standard Deviation of
Random Errors, Se
The standard deviation of errors tells how widely the
errors and hence the values of y are spread for a given x.
SSE
se
n2
where SSE ( y y )
SYY bS XY
se
n2
FHMM1034
Mathematics III
Example 4
Compute the standard deviation of errors, Se, for
the data on monthly incomes and food expenditures
of the seven households given in Example 3.
FHMM1034
Mathematics III
37
SST ( y y )
SYY
( y )
y
n
2
FHMM1034
Mathematics III
38
Example 5
For the regression line in Example 3, find the value
of its SSE and SST.
FHMM1034
Mathematics III
39
FHMM1034
Mathematics III
40
Coefficient of Determination, r2
Measure how well does the independent variable
explain the dependent variable in the regression
model.
FHMM1034
Mathematics III
41
Coefficient of Determination, r2
FHMM1034
Mathematics III
42
Example 6
For the data in Example 3, calculate the coefficient
of determination. Interpret your answer.
FHMM1034
Mathematics III
43
5.5
Regression Analysis:
A Complete Example
FHMM1034
Mathematics III
44
Regression Analysis:
A Complete Example
A random sample of eight drivers insured with a
company and having similar auto insurance policies
was selected. The following table lists their driving
experiences (in years) and monthly auto insurance
premiums (in dollars).
FHMM1034
Mathematics III
45
Regression Analysis:
A Complete Example
Driving Experiences
(in years)
64
87
12
50
71
15
44
56
25
42
16
60
FHMM1034
Mathematics III
46
Regression Analysis:
A Complete Example
(a) Does the insurance premium depend on the driving
experience or does the driving experience depend on
the insurance premium? Do you expect a positive or
negative relationship between these two variables?
(b) Compute SXX , SYY and SXY .
(c) Find the least squares regression line by choosing
appropriate dependent and independent variables
based on answer in part (a).
FHMM1034
Mathematics III
47
Regression Analysis:
A Complete Example
(d) Interpret the meaning of the values of a and b
calculated in part (c).
(e) Plot the scatter diagram and the regression line.
(f) Calculate r and r2 and explain what they mean.
(g) Calculate standard deviation of errors.
(h) Predict the monthly auto insurance premium for a
driver with 10 years of driving experience.
FHMM1034
Mathematics III
48
The End
of
Chapter 5
FHMM1034
Mathematics III
49