Sunteți pe pagina 1din 37

Correlation Analysis

LEARNING OBJECTIVES

Understand and interpret the terms dependent


and independent variable.
Calculate and interpret the coefficient of
correlation, the coefficient of determination, and
the standard error of estimate.
Pearsons Product Moment Correlation of
Coefficient, rp
Spearmans Rank Correlation of Coefficient, rs

Correlation Analysis and Scatter Diagram


Correlation Analysis is the study of the relationship
between variables. It is also defined as group of
techniques to measure the association between
two variables.
A Scatter Diagram is a chart that portrays the
relationship between the two variables. It is the
usual first step in correlations analysis

Dependent vs. Independent Variable


DEPENDENT VARIABLE
The variable that is being
predicted or estimated. It is
scaled on the Y-axis.

INDEPENDENT VARIABLE
The variable that provides
the basis for estimation. It is
the predictor variable. It is
scaled on the X-axis.

Scatter Diagram

The Coefficient of Correlation, r


The Coefficient of Correlation (r) is a measure of
the strength of the relationship between two
variables. It requires interval or ratio-scaled data.
It can range from -1.00 to 1.00.
Values of -1.00 or 1.00 indicate perfect and strong
correlation.
Values close to 0.0 indicate weak correlation.
Negative values indicate an inverse relationship
and positive values indicate a direct relationship.
6

Perfect Correlation

Scatter Plots and Correlation


A scatter plot (or scatter diagram) is used
to show the relationship between two
quantitative variables
The linear relationship can be:
Positive as x increases, y increases
As advertising dollars increase, sales
increase

Negative as x increases, y decreases


As expenses increase, net income decrease
8

Scatter Plot Examples


Linear relationships
y

Curvilinear relationships
y

x
y

x
y

x
9

Scatter Plot Examples


(continued)
Strong relationships
y

Weak relationships
y

x
y

x
y

x
10

Scatter Plot Examples


No relationship
y

x
y

x
11

Correlation Coefficient - Interpretation

12

Examples of Approximate
r Values
y

r = -1

r = -.6

r=0

r = +.3

r = +1

13

Pearsons Product Moment Correlation of


Coefficient, rp

Formula,
rp

n XY X Y

n X X n Y Y
2

where

n = number of paired observations


r = Sample correlation coefficient
x = Value of the independent variable
y = Value of the dependent variable

14

The duration of the last 9 business trips made by an employee and


the corresponding expenses claimed are shown in the following
table.
No. of days

Expenses ($)

100

300

90

30 240

200

150

170

60

Calculate the product moment correlation of coefficient between


the number of days and expenses.
Solution
Let

X be the number of days


Y be the expenses
15

Total

XY

X2

Y2

100

300

10,000

300

1500

25

90,000

90

180

8,100

30

30

900

240

720

57,600

200

800

16

40,000

150

150

22,500

170

510

28,900

60

60

3,600

23

1340

4250

75

261,600

16

rp

n XY X Y

n X X n Y Y
2

0.8226

The coefficient of correlation, rp = +0.8228, indicates


that there is a ___________ _______________
correlation between the number of days and
expenses.
17

Question:
The following sample observations
were randomly selected.
X: 4 5 3 6 10
Y: 4 6 5 7
7
Compute rp. Interpret your answer.

18

Spearmans Rank Correlation of


Coefficient, rs
Developed in the 1920s by Charles
Spearman (British psychologist).
Based on rank-order scores.
Works correctly even if the original scores
are nonnumeric.
Much less affected by outliers.

19

Spearmans Rank Correlation of


Coefficient, rs
Formula,

rs 1

6 d

n n 1
2

where
d = r 1 r2
r1 = ranks for x
r2 = ranks for y
20

Example 1: Rank Correlation of Coefficient


(Data has already been ranked)
A German language teacher takes a group of 5 students. She rank
orders them in order of how confident they are when speaking
(1 - extremely confident, 5 - not at all confident) and wants to
correlate this with performance in the oral examination. A different
teacher has given ratings of how well the students spoke in the oral
exam (1 - hopeless, 5 - excellent).The following table was obtained
as a result. Compute rs and interpret.
Person
A
B
C
D
E

Confidence
5
4
1
3
2

Oral exam performance


2
4
5
3
5

21

Example 1: Rank Correlation of Coefficient


(Data has already been ranked)
Solution
Let
r1 = rankings of confidence
r2 = rankings of oral exam performance
Person
A
B
C
D
E
Total

r1

r2

d2

34
22

Example 1: Rank Correlation of Coefficient


(Data has already been ranked)
rs 1

6 d 2

n n2 1

1 1.7
0.7
The coefficient of rank correlation, rs = - 0.7,
indicates that there is a ___________ ____________
between the confidence and oral exam performance
in their rankings.
23

Example 2: Rank Correlation of Coefficient


(Data has not yet been ranked)
The following data relates to the marks obtained
by 5 students in the Economics and Statistics
examinations. Compute rs and interpret.
Marks
Student
Economics
Statistics
1
36
52
2
98
91
3
75
68
4
65
53
5
82
62
24

Example 2: Rank Correlation of Coefficient


(Data has not yet been ranked)
Solution
Let
X = marks in Economics
Y = marks in Statistics
r1 = ranks for X
r2 = ranks for Y
Student
1
2
3
4
5
Total

X
36
98
75
65
82

Y
52
91
68
53
62

r1

r2

d
0
0
1
0
-1

d2
0
0
1
0
1
2

25

Example 2: Rank Correlation of Coefficient


(Data has not yet been ranked)
Solution
Let
X = marks in Economics
Y = marks in Statistics
r1 = ranks for X
r2 = ranks for Y
Student
1
2
3
4
5
Total

X
36
98
75
65
82

Y
52
91
68
53
62

r1
5
1
3
4
2

r2
5
1
2
4
3

d
0
0
1
0
-1

d2
0
0
1
0
1
2

26

Example 2: Rank Correlation of Coefficient


(Data has not yet been ranked)
rs 1

6 d 2

n n2 1

1 0.1
0.9

The coefficient of rank correlation, rs = +0.9,


indicates that there is a ___________
_____________ between the rankings in both
subjects.
27

Example 3: Rank Correlation of Coefficient


(Tied Ranks)
The following data relates to the marks obtained by 5
students in Accounting and Costing examinations.
Compute rs and interpret.

Student
1
2
3
4
5

Marks
Accounting Costing
86
91
86
82
77
68
63
77
89
77
28

Example 3: Rank Correlation of Coefficient


(Tied Ranks)
Solution
Let
X = marks in Accounting
Y = marks in Costing
r1 = ranks for X
r2 = ranks for Y

Student
1
2
3
4
5
Total

X
86
86
77
63
89

Y
91
82
68
77
77

r1
2.5
2.5
4
5
1

r2
1
2
5
3.5
3.5

d
d2
1.5 2.25
0.5 0.25
-1
1
1.5 2.25
-2.5 6.25
12
29

Example 3: Rank Correlation of Coefficient


(Tied Ranks)
rs 1

6 d 2

n n2 1

1 0.6
0.4

The coefficient of rank correlation, rs = +0.4,


indicates that there is a ___________
____________ between the rankings in both
subjects.
30

Question:
The following sample observations
were randomly selected.
X: 4 5 3 6 10
Y: 4 6 5 7
7
Compute rs. Interpret your answer.

31

Coefficient of Determination
The coefficient of determination (r2) is the
proportion of the total variation in the dependent
variable (Y) that is explained or accounted for by
the variation in the independent variable (X).
It is the square of the coefficient of correlation.
It ranges from 0 to 1.
It does not give any information on the direction
of the relationship between the variables.

32

Coefficient of Determination (r2) - Example

Recall Example 1

The coefficient of determination, r2 ,is 0.677,


found by (0.8228)2
This is a proportion or a percent; we can say that
67.7 percent of the variation in the expenses is
explained, or accounted for, by the variation in the
number of days.

33

Testing the Significance of


the Correlation Coefficient
H0: = 0 (the correlation in the population is 0)
H1: 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2

34

Testing the Significance of


the Correlation Coefficient - Example
H0: = 0 (the correlation in the population is 0)
H1: 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2
t > t0.025,8 or t < -t0.025,8
t > 2.306 or t < -2.306

35

Testing the Significance of


the Correlation Coefficient - Example

The computed t (3.297) is within the rejection region, therefore, we will


reject H0. This means the correlation in the population is not zero.
From a practical standpoint, it indicates to the sales manager that
there is correlation with respect to the number of sales calls made
and the number of copiers sold.

36

37

S-ar putea să vă placă și