Documente Academic
Documente Profesional
Documente Cultură
Chapter 5
Discovering
Relationships
Objectives:
Creating a scatter plot.
Calculating the correlation coefficient.
Discovering Relationships
Sections 5.2-5.5 Scatter Plots and Correlation
Discovering Relationships
Section 5.1 Bivariate Data
Bivariate Data:
In previous chapters, the statistical summary measurements, like the
mean, variance, and proportions, were all concerned with describing
univariate data (measurements from one variable).
To understand the relationship between two variables, data on both
variables need to be collected. This type of data is called bivariate
data.
With bivariate data, two observations are recorded from some entity.
Important questions to ask yourself when you encounter bivariate
data:
How was the data obtained?
What exactly does the data measure?
Is the data measured accurately?
Discovering Relationships
Section 5.2 Looking for Patterns in the Data
Scatterplot:
Detecting a relationship between two variables often begins with a
graph.
In the case of bivariate data, a scatterplot is the traditional
explanatory graphical method to display the relationship between two
variables.
In a scatterplot, measurements are plotted in pairs with one
variable plotted on each axis.
When examining the scatterplot we are trying to draw conclusions
concerning the overall pattern of the data.
Questions to ask yourself when analyzing a scatterplot:
Does the pattern roughly follow a line?
Is the pattern upward sloping or downward sloping?
Are the data values tightly clustered or widely dispersed?
Are there significant deviations from the pattern?
Discovering Relationships
Strong Relationships:
In these two scatterplots the data are strongly related and fall in a straight
line.
In the scatterplot to the left the slope is positive, meaning as the X variable
increases the Y variable increases as well.
In the plot to the right the relationship is negative; as the X variable
increases, the Y variable decreases.
This is also called an inverse relationship.
Discovering Relationships
Discovering Relationships
Discovering Relationships
Section 5.3 Building a Model
Building a Model:
Consider the problem of deciding how long to study for an
upcoming test.
If we knew the exact relationship between time spent studying
and the grade received, it could be useful in allocating study time.
One method of defining a precise relationship between two or
more variables is with the use of a mathematical model.
Suppose, for example, the relationship between test and study
time was given by the linear equation below:
Test Score = 45 + 3.8 (hours of study time).
Discovering Relationships
Building a Model:
Test Score = 45 + 3.8 (hours of study time)
If this mathematical model is accurate, then anyone would be able
to control his/her destiny. If a person only studied 10 hours,
according to the model his/her test score would be:
Test Score = 45 + 3.8 (10) = 83.
If this score is not high enough, then study 12 hours:
Test Score = 45 + 3.8 (12) = 90.6.
If you had to make a 95 on the test, how many hours do you have
to study?
95 = 45 + 3.8 (hours of study time)
hours of study time =
95 45
13.16.
3.8
Discovering Relationships
Section 5.3 Building a Model
Error in a Model:
Sorry folks, but there is no model that can precisely predict a
test score just on the basis of time studied; there are many
variables that affect your test score.
But suppose there was a model which, though imperfect, fairly
reliably predicted test scores based on the hours studied.
Test Score = 45 + 3.8 (hours of study time) + error
The new model admits the possibility of error. Now if
someone studies 10 hours, the model would predict
Test Score = 45 + 3.8 (10) = 83 + error
Discovering Relationships
Linear Relationship:
A linear relationship is graphically described as a line.
Mathematically, a line is a set of points that satisfy the functional
relationship
y mx b
where m is the slope of the line and b is the point where the
function crosses the Y-axis, which is called the Y-intercept.
If two variables appear be related in a straight line manner, we can
use a linear equation to model their relationship.
Very few observed relationships are exactly linear, although most
follow an inexact linear pattern.
Discovering Relationships
Linear Equation:
y
Discovering Relationships
Section 5.3 Building a Model
Linear Relationships:
As X increases, Y increases
As X increases, Y
decreases
Discovering Relationships
Section 5.4 Measuring the Degree of Linear Relationship
Correlation Coefficient:
A scatter diagram is a useful exploratory tool for detecting
relationships between two variables.
Eventually a researcher will want to know the strength of the
relationship between the two variables
Karl Peterson developed the correlation coefficient, r, to measure
the degree of linear relationship.
The correlation coefficient is an index number used to summarize
the strength of the linear relationship.
1
r
n 1
Do
xi x
sx
i 1
n
yi y
s y
1 r 1
xi x
y y
and i
look familiar?
sx
sy
Discovering Relationships
Section 5.4 Measuring the Degree of Linear Relationship
Deviation Measures:
xi x
is a z - score that shows how far x deviates
sx
from its mean.
yi y
is a z - score that shows how far y deviates
sy
Discovering Relationships
Section 5.4 Measuring the Degree of Linear Relationship
Positive Relationships:
When r is positive, there is a tendency for Y to increase as X
increases.
If both of the deviations are positive, then each of the
observations is above the mean.
If both are negative, the each is below the mean.
When one of the variables is above its mean, the other
variable tends to be above its mean.
If one variable is below its mean, the other tends to be below
its mean.
Discovering Relationships
Positive Relationship:
Points above the
means of X and Y
The mean of Y
Points below the
means of X and Y
The mean of x
the expression
y i y
is positive.
sy
the expression
y i y
is positive.
s
y
Discovering Relationships
Negative Relationship:
Points below
the mean of X,
above the mean
of Y
The mean of Y
Points above
the mean of X,
below the mean
of Y
The mean of
x
the expression
y i y
is negative.
sy
y i y
is negative.
s
y
Discovering Relationships
Section 5.4 Measuring the Degree of Linear Relationship
Discovering Relationships
Correlation Pitfalls:
A high correlation does not imply causation.
Suppose that a high correlation has been observed between the
weekly sales of ice cream and the number of snake bites each week.
It seems unlikely that ice cream sales would cause snakes to bite
people or that more snake bites would cause higher ice cream sales.
The apparent relationship is an illusion caused by a phenomenon
called common response. This means that both variables are
related to a third variable.
Discovering Relationships
Correlation Pitfalls:
Correlating summary measures (such as means) will tend to
provide an inflated correlation measurement.
Ignoring the variation of the individual values magnifies the
correlation measure and gives a somewhat distorted view of the
underlying relationship.
Suppose there is a good reason to believe that a causal
relationship exists between two variables, but when a correlation is
performed the value of the correlation is near zero, indicating no
association.
A low correlation could indicate that no linear relationship exists.
Discovering Relationships
Nonlinear Relationship:
In the figure above, the relationship between X and Y is not a straight line.
The correlation measure for these points is going to be very close to zero.
Yet there does appear to be a strong relationship between X and Y. The kind
of relationship exhibited by this data is called a quadratic relationship.
Discovering Relationships
Confounding:
Another problem that can produce low correlations is
confounding. Confounding occurs when more than one
variable affects the dependent variable.
X
Z
For example:
The variable Y is dependent on X. As X changes, Y changes.
Such a relationship should produce a significant correlation
measure.
But also suppose there is another variable Z, which also affects Y.
As Z changes so does Y. Changes in Z could mask the changes
caused by X.
Discovering Relationships
Objectives:
Finding the Least Squares Line
Determining the slope of the line.
Calculating the y-intercept of the line.
Evaluating the fit of the model.
Discovering Relationships
HAWKES LEARNING SYSTEMS
math courseware specialists
Regression Analysis:
In the previous section the correlation coefficient is used to
measure the degree of linear relationship between two variables.
However, the correlation coefficient does not describe the exact
linear association between X and Y.
Regression analysis determines the specific relationship
between X and Y.
Using regression analysis we may be able to use X to predict Y.
Discovering Relationships
HAWKES LEARNING SYSTEMS
Analysis
Regression Analysis:
Recall, the equation of a line is
y mx b.
m slope
b y - intercept
However, traditional statistics uses different symbols for the slope and
intercept in the equation of a line. Instead of b , let b0 be the symbol
used to describe the y-intercept and b1 be the symbol used to
represent the slope of the line.
Using this new set of symbols, the equation of the line becomes
y b0 b1 x.
Discovering Relationships
HAWKES LEARNING SYSTEMS
math courseware specialists
Regression Analysis:
The linear equation relation X to Y is referred to as a
mathematical model.
Y is called the dependent variable.
X is called the independent variable.
Now we are ready to look at examples of linear relationships.
Discovering Relationships
HAWKES LEARNING SYSTEMS
math courseware specialists
Example:
Discovering Relationships
HAWKES LEARNING SYSTEMS
math courseware specialists
Discovering Relationships
HAWKES LEARNING SYSTEMS
math courseware specialists
If we plug in
x=4 in our
model
we get
Observed
value
Discovering Relationships
HAWKES LEARNING SYSTEMS
math courseware specialists
Error:
To determine how well the line fits the data, first we need to
look at the error.
Error = observed Y predicted Y = 2 3.8 = 1.8.
Using symbols,
Discovering Relationships
HAWKES LEARNING SYSTEMS
Analysis
SSE error i
2
yi yi yi b0 b1 xi
2
The best line is called the Least Squares Line, and has the
smallest SSE.
Discovering Relationships
HAWKES LEARNING SYSTEMS
Analysis
Example:
Use this chart to determine the distance from the observed points to the line
Y = 1 + 0.7X.
X
2
4
5
8
9
Y
3
2
6
5
8
Predicted Y
Y Y error
Y 1 .7 X
Error2
2.4 = 1 +
0.7(2)
3 2.4 = +0.6
0.36
3.8 = 1 +
0.7(4)
4.5 = 1 +
0.7(5)
2 3.8 = 1.8
6 4.5 = +1.5
3.24
2.25
6.6 = 1 +
0.7(8)
5 6.6 = 1.6
2.56
7.3 = 1 +
0.7(9)
8 7.3 = +0.7
0.49
error 0.6
Discovering Relationships
b1
n xy x y
n x x
2
1
b0 y b1 x
n
The x and y referred to in the expressions are the observed data
values of X and Y respectively.
Discovering Relationships