Sunteți pe pagina 1din 26

Association Between Variables

Measured at the Interval-Ratio Level:


Bivariate Correlation and Regression
Calculating the Correlation Coefficient:
Formula for Pearsons r
Definitional formula for Pearsons r:

r
x xy y
x x 2 y y 2

*Use the computational formula to calculate*:

nXY (X )( Y )
r
[nX (X ) ][ nY (Y ) ]
2 2 2 2
If r = +.70 or higher Very strong positive
relationship
+.40 to +.69 Strong positive relationship
+.30 to +.39 Moderate positive relationship
+.20 to +.29 weak positive relationship
+.01 to +.19 No or negligible relationship
-.01 to -.19 No or negligible relationship
-.20 to -.29 weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69 Strong negative relationship
-.70 or higher Very strong negative relationship
Pearsons r

Like Gamma, r varies from -1.00 to +1.00


Pearsons r is a measure of association for
Interval-Ratio variables.
For the hypothetical relationship between %
college educated and turnout, assume r =.32
This relationship would be positive and moderate.
As level of education increases, % turnout
increases.
The Coefficient of Determination: r 2

Total variation in y ( y y ) is the sum of


2

the explained variation ( y' y )


2

and the unexplained variation ( y y ' )


2

The explained variation (the portion explained


by x) is represented by the formula:


y ' y
y y
2
r

Or, alternatively: r2 = (r)2


Practical Example

The computation and interpretation of Pearsons r


and r2 will be illustrated using an example (% Turnout
by Education (Years of Schooling) but with only 5
cases)
The variables are:
Voter turnout (Y) is the dependent variable.
Average years of school (X) is the independent variable.
The sample is 5 cities.
This is only to simplify the calculation. A sample of 5 is
actually very small.
Data from Problem 15.1:

City X Y The scores on each


variable are displayed
A 11.9 55
in table format:
B 12.1 60 Y = % Turnout
X = Years of Education
C 12.7 65

D 12.8 68

E 13.0 70
Make a Computational Table:
X Y X2 Y2 XY

11.9 55 141.61 3025 654.5


12.1 60 146.41 3600 726
12.7 65 161.29 4225 825.5
12.8 68 163.84 4624 870.4
13.0 70 169 4900 910
X = 62.5 Y = 318 X2 =782.15 Y2 = 20374 XY = 3986.4

Sums () are needed to compute Pearsons r.


As well, the mean of X and Y are needed:
X X / n 62.5 / 5 12.5 Y Y / n 318 / 5 63.6
Pearsons r

Calculate the correlation coefficient r

nXY (X )( Y )
r
[nX (X ) ][ nY (Y ) ]
2 2 2 2
Interpret Pearsons r
nXY (X )( Y )
r
[nX 2 (X ) 2 ][ nY 2 (Y ) 2

5(3986 .4) (62 .5)(318 )


.984
[5(782 .15 ) (62 .5) 2 ][5(20374 ) (318 ) 2 ]

An r of 0.98 indicates an extremely strong


relationship between years of education and voter
turnout for these five cities.
Find the Coefficient of Determination (r2)
and Interpret:

r (r ) (.984) .968
2 2 2

The coefficient of determination is r2 = .968.


Education, by itself, explains 96.8% of the
variation in voter turnout.
Testing r for significance:

We can test the relationship between % turnout and


years of education (represented by Pearsons r) for
significance using the 5 step model and the following
formula:

n2
t obtained r
1 r 2

Degrees of Freedom = N-2


Step 1: Assumptions
There are 3 main assumptions
1. The dependent and independent are normally
distributed. We can test this by looking at the histograms
for the two variables.
2. The relationship between X and Y is linear. We can
check this by looking at the scattergram.
3. The relationship is homoscedastic. We can test
homoscedasticity by looking at the scattergram and
observing that the data points form a roughly symmetrical,
cigar-shaped pattern about the regression line.

If the above 3 assumptions have been met, then


we can use linear regression and correlation and
test r for significance.
Step 2: Null and Alternate Hypotheses:
Ho: = 0.0
H1: 0.0
(Note that (rho) is the population parameter, while r is the sample
statistic.)

Step 3: Sampling Distribution and Critical Region:


S.D. = t-distribution
Alpha = .05
DF = n - 2 = 5 - 2 = 3
tcritical = 3.182
Step 4. Computing the Test Statistic:
Use Formula

n2 52
t obtained r .984 9.53
1 r 2
1 (.984 ) 2

Step 5. Decision and Interpretation:


Tobtained = 9.53 > tcritical = 3.182
Reject Ho. The relationship between % turnout and years
of schooling is significant.
Always include a brief summary of your
results:

There is a very strong, positive relationship


between % voter turnout and years of
schooling for the five cities. As years of
schooling increase, the % of voter turnout
goes up. The relationship is significant (t=9.53,
df=3, = .05) . Years of schooling explain
96.8% of the variation in % voter turnout.
TESTING PEARSONS r FOR SIGNIFICANCE
To illustrate this test, the r of 0.50 from the dual
wage-earner family sample will be used. As was the case
when testing gamma and Spearmans rho, the null
hypothesis states that there is no linear association
between the two variables in the population from which the
sample was drawn. The population parameter is
symbolized as (rho), and the appropriate sampling
distribution is the t distribution.
Step 5. Making a Decision and Interpreting the
Results of the Test.

Since the test statistic does not fall into the


critical region as marked by t(critical), we fail to
reject the null hypothesis. Even though the
variables are substantially related in the sample,
we cannot conclude that they are also related in
the population.
The test indicates that the sample value of r
0.50 could have occurred by chance alone if the
null hypothesis is true and the variables are
unrelated in the population.

S-ar putea să vă placă și