Sunteți pe pagina 1din 26

Correlation

Farrokh Alemi, Ph.D. Kashif Haqqi M.D.

Go to Table of Content

Additional Reading
For additional reading see Chapter 6 in Michael R. Middletons Data Analysis Using Excel, Duxbury Thompson Publishers, 2000. See also Chapter 4 section 7 of Keller and Warracks Statistics for Management and Economics. Fifth Edition, Duxbury Thompson Learning Publisher, 2000. Read any introductory statistics book about correlation.
Go to Table of Content

Which Approach Is Appropriate When?


Choosing the right method for the data is the key statistical expertise that you need to have. You might want to review a decision tool that we have organized for you to help you in choosing the right statistical method.

Go to Table of Content

Do I Need to Know the Formulas?


You do not need to know exact formulas. You do need to know where they are in your reference book. You do need to understand the concept behind them and the general statistical concepts imbedded in the use of the formulas. You do not need to be able to do correlation and regression by hand. You must be able to do it on a computer using Excel or other software.
Go to Table of Content

Table of Content
Objectives Independent and dependent variables Example Scatter plot Correlation coefficient Range of correlation coefficient Formula for correlation coefficient Example for correlation coefficient Possible relationships between variables
Go to Table of Content

Objectives
To learn the assumptions behind and the interpretation of correlation. To use Excel to calculate correlations.

Go to Table of Content

Purpose of Correlation
Correlation determines whether values of one variable are related to another.
Go to Table of Content

Independent and Dependent Variables


Independent variable: is a variable that can be controlled or manipulated. Dependent variable: is a variable that cannot be controlled or manipulated. Its values are predicted from the independent variable.

Go to Table of Content

Example
Independent variable in this example is the number of hours studied. The grade the student receives is a dependent variable. The grade student receives depend upon the number of hours he or she will study. Are these two variables related?
Student Hours studied % Grade

82

B
C D

2
1 5

63
57 88

E
F

3
2

68
75
9

Go to Table of Content

Scatter Plot
The independent and dependent can be plotted on a graph called a scatter plot. By convention, the independent variable is plotted on the horizontal x-axis. The dependent variable is plotted on the vertical y-axis.

Go to Table of Content

10

Example of Scatter Plot


A scatter plot is a graph of the ordered pairs (x,y) of numbers consisting of the independent variables, x, and the dependent variables, y. Please use excel to create a scatter plot.
Scatter Plot
100 80
Grade (%)

60 40 20 0 0 1 2 3 4 5 6 7 Hours Studied

Go to Table of Content

11

Interpret a Scatter Plot

The graph suggests a positive relationship between hours of studies and grades

Scatter Plot
100 80

Grade (%)

60 40 20 0 0 1 2 3 4 5 6 7 Hours Studied

Go to Table of Content

12

Correlation Coefficient
The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables. The range of the correlation coefficient is. - 1 to + 1 and is identified by r.

Go to Table of Content

13

Positive and Negative Correlations


A positive relationship exists when both variables increase or decrease at the same time. (Weight and height). A negative relationship exist when one variable increases and the other variable decreases or vice versa. (Strength and age).

Go to Table of Content

14

Range of correlation coefficient


In case of exact positive linear relationship the value of r is +1. In case of a strong positive linear relationship, the value of r will be close to + 1.

Correlation = +1
Dependent variable
25 20 15 10 12 14 16 18 20 Independent variable

Go to Table of Content

15

Range of correlation coefficient


In case of exact negative linear relationship the value of r is 1. In case of a strong negative linear relationship, the value of r will be close to 1.

Correlation = -1
Dependent variable
25 20 15 10 12 14 16 18 20 Independent variable

Go to Table of Content

16

Range of correlation coefficient

Dependent variable

In case of a weak relationship the value of r will be close to 0.

Correlation = 0
30 25 20 15 10 0 2 4 6 8 10 12 Independent variable

Go to Table of Content

17

Range of correlation coefficient

Dependent variable

In case of nonlinear relationship the value of r will be close to 0.

Correlation = 0
30 20 10 0 0 2 4 6 8 10 12 Independent variable

Go to Table of Content

18

Formula for correlation coefficient


The formula to compute a correlation coefficient is:
r = [n(xy) (x)(y)] / {[n(x2) (x)2][n(y2) (y)2]}0.5

Where n is the number of data pairs, x is the independent variable and y the dependent variable.
Go to Table of Content

19

Example for correlation coefficient


Lets do an example. Using the data on age and blood pressure, lets calculate the x, y, xy, x2 and y2.
Student Age A B C D E F Sum 43 48 56 61 67 70 345 Blood Age* Pressure BP 128 120 135 143 141 152 819 5504 5760 7560 8723 9447 age2 1849 2304 3136 3721 4489 BP2 16384 14400 18225 20449 19881 23104 112443
20

10640 4900 47634 20399

Go to Table of Content

Example for correlation coefficient


Substitute in the formula and solve for r:
r= {(6*47634)-(345*819)}/{[(6*20399)3452][(6*112443)-8192]}0.5. r= 0.897.

The correlation coefficient suggests a strong positive relationship between age and blood pressure.
Go to Table of Content

21

Possible Relationships Between Variables


Direct cause and effect, that is x cause y or water causes plant to grow. Both cause and effect, that y cause x or coffee consumption causes nervousness as well nervous people have more coffee. Relationship caused by third variable; Death due to drowning and soft drink consumption during summer. Both variables are related to heat and humidity (third variable).
Go to Table of Content

22

Possible Relationships Between Variables


Complexity of interrelationships among many variables; Relationship between students high school grade and college grades. But others variables are involved too such as IQ, hours of study, influence of parents, motivation, age, and instructors. Coincidental relationship; Increase in the number of people exercising and increase in the number of people committing crimes.
Go to Table of Content

23

Interpretation
The correlation is 0.9 There is a strong positive relationship between age and blood pressure

Age Blood Pressure 0.90

Go to Table of Content

24

Test of Correlation
Null hypothesis: correlation is zero Test statistic is t = r [(n-2)/(1-r2)]0.5 The statistic is distributed as Student t distribution with n-2 degrees of freedom Excel does not calculate this statistic and you can manually calculate it

Go to Table of Content

25

Take Home Lesson


Correlation measures association and not causation. Correlation assumes linear relationship. Values range between 1 and +1 and measure the strength and direction of the relationship.

Go to Table of Content

26