Sunteți pe pagina 1din 17

CORRELATION

Introduction
“Correlation is a Statistical Technique which shows relationship between two or more
variables”
Types of Correlation
a) Positive and Negative Correlation
b) Linear and Non-Linear Correlation
c) Simple and Multiple Correlation
When only two variables are involved it is known as simple correlation e.g. relation between price
and Demand or Supply and Price.
Correlation is said to be multiple when relation is to be known between more than two variables e.g.
Relation between Income, saving and investment.
Methods to calculate Correlation
Correlation can be measured by the following methods:
A. Graphical Methods
Scatter diagram or Scatter gram.
B. Mathematical Methods
i. Karl Pearson’s Coefficient of Correlation method or Karl Pearson’s method.
ii. Spearman’s Coefficient of Correlation method or Spearman’s Method and;
iii. Concurrent Deviations Method

GRAPHICAL METHOD
It is one of the simplest procedures to judge correlation between two variables. One variable is taken
along X-axis and other along Y-axis. The graph is known as scatter diagram.
After plotting the graph between two variables, we judge the shape of the curve the different dots
represent. There are three types of relations.
1. Positive Correlation:
When these dots represent a figure rising diagonally from left bottom to right top, such a figure
indicates positive correlation.
2. Negative Correlation:
When these are dots represent a figure as shown below i.e., figure falling diagonally from left top to
right bottom, it indicates negative correlation.
3. No correlation or Zero Correlation
When figure plotted does not represent any specific trend, then we can’t estimate any type of relation
between them. We conclude that there is no correlation existing between these two variables.

Uday N
Assistant Prof. of Commerce
MATHEMATICAL METHODS
Karl Pearson’s Method
Karl Pearson’s formulated perhaps the greatest formula to find degree and extent of correlation
between two or more variables.
Assumptions
Karl Pearson based his formula on following basic assumptions
a) Two variables are affected by many independent causes and form a normal distribution.
b) The causes and effect relationship exists between two variables.
c) The relationship between two variables is linear.
It is often dented by r

A. Direct Method
Type I: This method is used when given variables are small in magnitude. Following formula is
used.

𝑁𝑁 ∑ 𝑋𝑋𝑋𝑋 − ∑ 𝑋𝑋 ∑ 𝑌𝑌
𝑟𝑟 =
�𝑁𝑁 ∑ 𝑋𝑋 2 − (∑ 𝑋𝑋)2 �𝑁𝑁 ∑ 𝑌𝑌 2 − (∑ 𝑌𝑌)2
Where N is the number or pairs.
∑ 𝑋𝑋 is the sum of terms of I series.
∑ 𝑌𝑌 is the sum of terms of II series.
∑ 𝑋𝑋 2 is the sum of squares of terms of I series.
∑ 𝑌𝑌 2 is the sum of squares of terms of II series.
∑ 𝑋𝑋𝑋𝑋 is the sum of products of corresponding terms.

Uday N
Assistant Prof. of Commerce
1. Calculate Karl Pearson’s Coefficient of Correlation between the age and weight of the children.
Age (years) 1 2 3 4 5
Weight (kg) 3 4 6 7 12

Solution:
Age Weight 𝑿𝑿𝟐𝟐 𝒀𝒀𝟐𝟐 𝑿𝑿𝑿𝑿
1 3 1 9 3
2 4 4 16 8
3 6 9 36 18
4 7 16 49 28
5 12 29 144 60
∑ 𝑿𝑿 = 𝟏𝟏𝟏𝟏 ∑ 𝒀𝒀 = 𝟑𝟑𝟑𝟑 𝟐𝟐
∑ 𝑿𝑿 = 𝟓𝟓𝟓𝟓 𝟐𝟐
∑ 𝒀𝒀 = 𝟐𝟐𝟐𝟐𝟐𝟐 ∑ 𝑿𝑿𝑿𝑿 = 𝟏𝟏𝟏𝟏𝟏𝟏

𝑁𝑁 ∑ 𝑋𝑋𝑋𝑋 − ∑ 𝑋𝑋 ∑ 𝑌𝑌
𝑟𝑟 =
�𝑁𝑁 ∑ 𝑋𝑋 2 − (∑ 𝑋𝑋)2 �𝑁𝑁 ∑ 𝑌𝑌 2 − (∑ 𝑌𝑌)2

5 𝑋𝑋 117 − 15 𝑋𝑋 32
𝑟𝑟 =
�5 𝑋𝑋 55 − (15)2 �5 𝑋𝑋 254 − (32)2

585 − 480
=
√275 − 225 √1270 − 1024
105 105 105
= = = = 𝟎𝟎. 𝟗𝟗𝟗𝟗𝟗𝟗𝟗𝟗
√50 𝑋𝑋 246 √12300 110.90

� and 𝒀𝒀
Type II: it is direct formula to find r. this formula can effectively be used where 𝑿𝑿 � is
not in fractions. The formula is
∑ 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅
𝒓𝒓 =
�∑ 𝒅𝒅𝒙𝒙𝟐𝟐 . ∑ 𝒅𝒅𝒚𝒚𝟐𝟐
Where
�.
dx is the deviation of X variable from its 𝑿𝑿
�.
dy is the deviation of Y variable from its 𝒀𝒀
dxdy is the product of the two above.

Uday N
Assistant Prof. of Commerce
𝑑𝑑𝑥𝑥 2 is the square of dx
𝑑𝑑𝑦𝑦 2 is the square of dy.
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 is the product of the two above
𝑑𝑑𝑥𝑥 2 is the square of 𝑑𝑑𝑑𝑑
𝑑𝑑𝑦𝑦 2 is the square of 𝑑𝑑𝑑𝑑

2. Calculate coefficient of correlation between death and birth rate for the following data.
Birth rate 24 26 32 33 35 30
Death rate 15 20 22 24 27 24

Solution
Birth rate Death rate (𝑋𝑋 − 𝑋𝑋� ) (𝑌𝑌 − 𝑌𝑌�) 𝑋𝑋 − 𝑋𝑋� )2 𝑌𝑌 − 𝑌𝑌� )2 (𝑋𝑋 − 𝑋𝑋�)(𝑌𝑌 − 𝑌𝑌�)
X Y = 𝑑𝑑𝑑𝑑 = 𝑑𝑑𝑑𝑑 = 𝑑𝑑𝑥𝑥 2 = 𝑑𝑑𝑦𝑦 2 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
24 15 -6 -7 36 49 42
26 20 -4 -2 16 4 8
32 22 2 0 4 0 0
33 24 3 2 9 4 6
35 27 5 5 14 25 25
30 24 0 2 0 4 0
∑ 𝑋𝑋 = 180 ∑ 𝑌𝑌 = 132 ∑ 𝑑𝑑𝑑𝑑 = 0 ∑ 𝑑𝑑𝑑𝑑 = 0 ∑ 𝑑𝑑𝑥𝑥 2 = 90 ∑ 𝑑𝑑𝑦𝑦 2 = 86 ∑ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 81

𝟏𝟏𝟏𝟏𝟏𝟏
�=
𝑿𝑿 = 𝟑𝟑𝟑𝟑
𝟔𝟔
𝟏𝟏𝟏𝟏𝟏𝟏
�=
𝒀𝒀 = 𝟐𝟐𝟐𝟐
𝟔𝟔
∑ 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 (𝟖𝟖𝟖𝟖) 𝟖𝟖𝟖𝟖 𝟖𝟖𝟖𝟖
𝒓𝒓 = = = = = 𝟎𝟎. 𝟗𝟗𝟗𝟗
�∑ 𝒅𝒅𝒙𝒙𝟐𝟐 . ∑ 𝒅𝒅𝒚𝒚𝟐𝟐 √𝟗𝟗𝟗𝟗 𝑿𝑿 𝟖𝟖𝟖𝟖 √𝟕𝟕𝟕𝟕𝟕𝟕𝟕𝟕 𝟖𝟖𝟖𝟖. 𝟗𝟗𝟗𝟗

Uday N
Assistant Prof. of Commerce
B. Short cur method.
𝑁𝑁 ∑ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − ∑ 𝑑𝑑𝑑𝑑 ∑ 𝑑𝑑𝑑𝑑
𝑟𝑟 =
�𝑁𝑁 ∑ 𝑑𝑑𝑑𝑑 2 − (∑ 𝑑𝑑𝑑𝑑)2 �𝑁𝑁 ∑ 𝑑𝑑𝑑𝑑 2 − (∑ 𝑑𝑑𝑑𝑑)2

Where
∑ 𝑑𝑑𝑑𝑑= sum of deviations of X series from its Assumed mean
∑ 𝑑𝑑𝑑𝑑 = sum of deviations of Y series from its Assumed mean
∑ 𝑑𝑑𝑥𝑥 2 = sum of squared deviations of X series from its Assumed mean
∑ 𝑑𝑑𝑦𝑦 2 – sum of squared deviations of Y series from its Assumed mean
∑ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = sum of products of deviations of X and Y series from their respective assumed mean
N = Number of pairs

3. Calculate coefficient of correlation between profits of two firms X and Y in a particular year
using Karl Pearson’s Method.
Profit of Firm X 14 12 14 16 16 17 16 15
Profit of Firm Y 13 11 10 15 15 9 14 17

Solution:
(X) (Y) 𝒅𝒅𝒅𝒅 = 𝑿𝑿 − 𝑨𝑨𝒙𝒙 𝒅𝒅𝒙𝒙𝟐𝟐 𝒅𝒅𝒅𝒅 𝒅𝒅𝒚𝒚𝟐𝟐 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅
= 𝑿𝑿 − 𝑨𝑨𝒚𝒚
14 13 -1 1 -1 1 1
12 11 -3 9 -3 9 9
14 10 -1 1 -4 16 4
16 15 1 1 1 1 1
16 15 1 1 1 1 1
17 9 2 4 -5 25 -10
16 14 1 1 0 0 0
15 17 0 0 3 9 0
∑ 𝑑𝑑𝑑𝑑 = 0 ∑ 𝑑𝑑𝑥𝑥 2 = 18 ∑ 𝑑𝑑𝑑𝑑 = −8 ∑ 𝑑𝑑𝑦𝑦 2 = 62 ∑ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 6

Let 𝐴𝐴𝑥𝑥 = 15 𝑎𝑎𝑎𝑎𝑎𝑎 𝐴𝐴𝑦𝑦 = 14

𝑁𝑁 ∑ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − ∑ 𝑑𝑑𝑑𝑑 ∑ 𝑑𝑑𝑑𝑑


𝑟𝑟 =
�𝑁𝑁 ∑ 𝑑𝑑𝑑𝑑 2 − (∑ 𝑑𝑑𝑑𝑑)2 �𝑁𝑁 ∑ 𝑑𝑑𝑑𝑑 2 − (∑ 𝑑𝑑𝑑𝑑)2
8 𝑋𝑋 6 − (0) 𝑋𝑋 (−8)
=
�8 𝑋𝑋 18 − (0)2 𝑋𝑋 �8 𝑋𝑋 62 (−8)2

Uday N
Assistant Prof. of Commerce
40 − 0 48
= =
√144 − 0 𝑋𝑋 √496 − 64 √144 𝑋𝑋 432
48 48
= = = 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏 = 𝟎𝟎. 𝟏𝟏𝟏𝟏
√62208 249.41

Problems to be solved
1. Calculate Karl Pearson’s Coefficient of Correlation.
X 1 2 3 4 5
Y 5 8 11 14 17

2. Calculate Karl Pearson’s coefficient of correlation between X and Y series.


X 14 12 14 16 16 17 16 15
Y 13 11 10 15 15 9 14 17

3. Calculate the coefficient of correlation by Kraal Pearson’s method by the following


Marks in Economics 65 66 67 67 68 69 70 72
Marks in Statistics 67 68 65 68 72 72 69 71

4. Calculate Karl Pearson’s coefficient of correlation from the data given below
Ages of Husbands (Years) 22 25 26 28 30 32 34 37 41 45
Ages of Wives (Years) 18 20 21 25 26 29 32 35 40 44

5. Calculate Karl Pearson’s coefficient of correlation from the following data


𝑁𝑁 = 12
∑ 𝑑𝑑𝑑𝑑 = −14
∑ 𝑑𝑑𝑑𝑑 = 18
∑ 𝑑𝑑𝑥𝑥 2 = 4304
∑ 𝑑𝑑𝑦𝑦 2 = 6308
∑ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 1510

Uday N
Assistant Prof. of Commerce
Properties of Karl Pearson’s Coefficient of Correlation
1. Karl Pearson’s coefficient of correlation lies between 1 and 1; i.e., 1≥ r ≥ - 1 or we can say
that r is always numerically less than one.
2. Karl Pearson’s coefficient is independent of changes of scale. For example, if the terms of a
series are 25,50,100,150,200,225; we can simply these terms as 1,2,4,6,8,9; and calculate r
easily. (Dividing by 25)
3. Karl Pearson’s coefficient of correlation is independent of origin. For Example, we may
take 23,27,29,25,29,31 as 1,5,7,3,7,9. (decreasing each of term by 22)
4. If 𝑏𝑏𝑥𝑥𝑥𝑥 𝑎𝑎𝑎𝑎𝑎𝑎 𝑏𝑏𝑦𝑦𝑦𝑦 are two regression coefficients; Karl Pearson’s coefficient of correlation is
�𝑏𝑏𝑥𝑥𝑥𝑥 𝑋𝑋 𝑏𝑏𝑦𝑦𝑦𝑦 .
5. r is independent of unit of measurement.
6. Karl Pearson’s coefficient of correlation works both ways; or 𝑟𝑟𝑥𝑥𝑥𝑥 = 𝑟𝑟𝑦𝑦𝑦𝑦 . i.e. we may take any
series dependent and others as independent, its value remains the same.

1. Regression coefficient of X and Y is 0.87 and regression coefficient of Y and X is 0.49. Find r.
Solution:

As 𝑟𝑟 = �𝑏𝑏𝑥𝑥𝑥𝑥 𝑋𝑋𝑏𝑏𝑦𝑦𝑦𝑦

𝑟𝑟 = √0.87 𝑋𝑋 0.49 = √0.4263 = 0.653

2. Covariance between X and Y variables is 10.6 ad variance of X and Y is 16 and 9 Find r.


Solution:
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 10.6 10.6
𝑟𝑟 = = = = 0.888
𝜎𝜎𝑥𝑥 𝜎𝜎𝑦𝑦 √16.√9 12

3. Coefficient of correlation between two variables two variates X and Y is 0.48. Their covariance
is 36. The variance of X is 16. Find the standard deviation of Y series.
Solution:

𝜎𝜎𝑥𝑥2 = 16, 𝜎𝜎𝑥𝑥 = 4


𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 36
𝑟𝑟 = 0.48
𝜎𝜎𝑥𝑥 . 𝜎𝜎𝑦𝑦 4 . 𝜎𝜎𝑦𝑦
36 36 100
𝜎𝜎𝑦𝑦 = × 0.48 → ×
4 4 48
75
𝜎𝜎𝑦𝑦 = = 𝟏𝟏𝟏𝟏. 𝟕𝟕𝟕𝟕
4

Uday N
Assistant Prof. of Commerce
To be solved.
1. Regression coefficient of X and Y is 0.34 and that of Y and X is 0.93. Calculate r.
2. Regression coefficient of X and Y is 0.07 and that of Y on X is 0.63. calculate r.
3. If the Covariance between X and Y variables is 10 and the variance of X and Y are
respectively 16 and 9, find the coefficient of correlation.
4. Coefficient of correlation between two variables X and Y is 0.48. Their covariance is 36. The
variance of X is 6. Find the standard deviation of Y series.

WRONG TERMS INCLUDED


1. A computer while calculating correlation coefficient between two variables X and Y from 30
pairs of observation obtained the following results:
∑ 𝑋𝑋 = 120 ∑ 𝑋𝑋 2 = 600 ∑ 𝑋𝑋𝑋𝑋 = 356
∑ 𝑌𝑌 = 90 ∑ 𝑌𝑌 2 = 250

It was however, later discovered at time of checking that if had copied down two pairs as
X Y
8 10
12 7

While the correct values are


X Y
8 12
10 8
Obtain the correct value of Correlation Coefficient.
Solution:
Correct ∑ 𝑋𝑋 = 120 + 8 + 10 − 8 − 12 = 118
∑ 𝒀𝒀 = 𝟗𝟗𝟗𝟗 + 𝟏𝟏𝟏𝟏 + 𝟖𝟖 − 𝟕𝟕 − 𝟏𝟏𝟏𝟏 = 𝟗𝟗𝟗𝟗
∑ 𝑿𝑿𝟐𝟐 = 𝟔𝟔𝟔𝟔𝟔𝟔 + 𝟖𝟖𝟐𝟐 + 𝟏𝟏𝟏𝟏𝟐𝟐 − 𝟖𝟖𝟐𝟐 − 𝟏𝟏𝟏𝟏𝟐𝟐 = 𝟓𝟓𝟓𝟓𝟓𝟓
∑ 𝒀𝒀𝟐𝟐 = 𝟐𝟐𝟐𝟐𝟐𝟐 + 𝟏𝟏𝟏𝟏𝟐𝟐 + 𝟖𝟖𝟐𝟐 − 𝟕𝟕𝟐𝟐 − 𝟏𝟏𝟏𝟏𝟐𝟐 = 𝟑𝟑𝟑𝟑𝟑𝟑
∑ 𝑿𝑿𝑿𝑿 = 𝟑𝟑𝟑𝟑𝟑𝟑 + (𝟖𝟖 × 𝟏𝟏𝟏𝟏) + (𝟏𝟏𝟏𝟏 × 𝟖𝟖) − (𝟖𝟖 × 𝟏𝟏𝟏𝟏) − (𝟏𝟏𝟏𝟏 × 𝟕𝟕) = 𝟑𝟑𝟑𝟑𝟑𝟑

𝟑𝟑𝟑𝟑 × 𝟑𝟑𝟑𝟑𝟑𝟑 − 𝟏𝟏𝟏𝟏𝟏𝟏 × 𝟗𝟗𝟗𝟗


𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝒓𝒓 =
√𝟑𝟑𝟑𝟑 × 𝟓𝟓𝟓𝟓𝟓𝟓 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟐𝟐 √𝟑𝟑𝟑𝟑 × 𝟑𝟑𝟑𝟑𝟑𝟑 − 𝟗𝟗𝟗𝟗𝟐𝟐
𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏
=
√𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏√𝟗𝟗𝟗𝟗𝟗𝟗𝟗𝟗 − 𝟖𝟖𝟖𝟖𝟖𝟖𝟖𝟖

Uday N
Assistant Prof. of Commerce
𝟔𝟔𝟔𝟔 𝟔𝟔𝟔𝟔
= = = 𝟎𝟎. 𝟎𝟎𝟎𝟎
√𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐 × 𝟔𝟔𝟔𝟔𝟔𝟔 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏. 𝟐𝟐𝟐𝟐

2. Given, number of pairs of observations of X and Y series = 10


X – series arithmetic average = 70
Y – Series assumed average = 65
X – series standard deviation = 12
Y – series arithmetic mean = 120
Y – Series assumed mean = 110
Y – series standard deviation = 10
Summation of products of corresponding deviations of X and Y series = 1450. Find r
Solution
Given N = 10;
� = 𝟕𝟕𝟕𝟕
𝑿𝑿
� = 𝟏𝟏𝟏𝟏𝟏𝟏
𝒀𝒀
S.D of X series = 12, Y series = 10
Assumed mean; X series = 65; Y series = 110 And ∑ 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 = 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏
� −𝑨𝑨𝒙𝒙)(𝒀𝒀
∑ 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 −𝑵𝑵 (𝑿𝑿 �−𝑨𝑨𝒚𝒚 )
As 𝒓𝒓 =
𝑵𝑵(𝑺𝑺.𝑫𝑫)𝒙𝒙 (𝑺𝑺.𝑫𝑫)𝒚𝒚

𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 − 𝟏𝟏𝟏𝟏(𝟕𝟕𝟕𝟕 − 𝟔𝟔𝟔𝟔)(𝟏𝟏𝟏𝟏𝟏𝟏 − 𝟏𝟏𝟏𝟏𝟏𝟏)


𝒓𝒓 =
𝟏𝟏𝟏𝟏 × 𝟏𝟏𝟏𝟏 × 𝟏𝟏𝟏𝟏
𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 − 𝟏𝟏𝟏𝟏 × 𝟓𝟓 × 𝟏𝟏𝟏𝟏 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 − 𝟓𝟓𝟓𝟓𝟓𝟓 𝟗𝟗𝟗𝟗𝟗𝟗
= = = = 𝟎𝟎. 𝟕𝟕𝟕𝟕𝟕𝟕
𝟏𝟏𝟏𝟏 × 𝟏𝟏𝟏𝟏 × 𝟏𝟏𝟏𝟏 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏

Uday N
Assistant Prof. of Commerce
To be Solved
1. In order to find the coefficient of correlation between two variables X and Y from 12 pairs of
observations.
The following calculations were made.
∑ 𝑋𝑋 = 30 ∑ 𝑌𝑌 = 5 ∑ 𝑋𝑋 2 = 670 ∑ 𝑌𝑌 2 = 285 ∑ 𝑋𝑋𝑋𝑋 = 334
On the Subsequent verification, it was fond that the pairs (X=11, Y=4) was copied wrongly, the
correct value being (X=10, Y=14). Find the correct value of correlation coefficient.
2. A computer while calculating the correlation between X and Y obtained following values.
∑ 𝑋𝑋 = 120 ∑ 𝑋𝑋 2 = 600 ∑ 𝑋𝑋𝑋𝑋 = 356
∑ 𝑌𝑌 = 90 2
∑ 𝑌𝑌 = 250 𝑁𝑁 = 30
It was however, later discovered at time of checking that if had copied down two pairs as

X Y
6 8
10 5

While the correct values are


X Y
6 10
8 6
Find correct r.

Uday N
Assistant Prof. of Commerce
SPEARMAN’S COEFFICIENT OF RANK CORRELATION
Karl Pearson’s method is applicable when variables are measured in quantitative from. But in cases
where terms can only be judged in qualitative form such as poverty, beauty, honesty etc. this formula
can’t be applied. To overcome this difficulty, professor Charles Spearman gave a formula by judging
their ranks as follows. It is known as Rank correlation formula.
𝟔𝟔 ∑ 𝑫𝑫𝟐𝟐
𝒓𝒓 = 𝟏𝟏 −
𝑵𝑵(𝑵𝑵𝟐𝟐 − 𝟏𝟏)
where
r is coefficient of Rank Correlation
N is number of Pairs
D is sum of squares of difference in respective ranks.

When Ranks are given


1. Following are given the ranks of 8 pairs. Find r.
Rank X 4 2 7 5 3 1 8 6
Rank Y 8 3 6 5 1 2 7 4

Solution
Rank X Rank Y Differences of Ranks (D) Squares of Differences of Ranks
(𝑫𝑫𝟐𝟐 )
4 8 -4 +16
2 3 -1 +1
7 6 +1 +1
5 5 0 0
3 1 +2 +4
1 2 -1 +1
8 7 +1 +1
6 4 +2 +4
2
∑ 𝐷𝐷 = +28

𝟔𝟔 ∑ 𝑫𝑫𝟐𝟐
𝒓𝒓 = 𝟏𝟏 −
𝑵𝑵(𝑵𝑵𝟐𝟐 − 𝟏𝟏)
Where ∑ 𝐷𝐷2 = 28, 𝑁𝑁 = 8
6(28) 168
𝑟𝑟 = 1 − =1−
8(8 − 1)
2 8(64 − 1)
168 168
=1− =1− = 1 − 0.33 = 𝟎𝟎. 𝟔𝟔𝟔𝟔
8(63) 504

Uday N
Assistant Prof. of Commerce
2. The coefficient of rank correlation of the marks obtained by 10 students in statistics and
accountancy was found to be 0.5. it was later discovered that the difference in ranks in the
two subjects obtained by one of the students was wrongly taken as 3 instead of 7. Find the
correct coefficient of rank correlation.
Solution
The Coefficient of rank correlation is defined as
𝟔𝟔 ∑ 𝑫𝑫𝟐𝟐
𝒓𝒓 = 𝟏𝟏 −
𝑵𝑵(𝑵𝑵𝟐𝟐 − 𝟏𝟏)
6 ∑ 𝐷𝐷2 6 ∑ 𝐷𝐷2 990 − 6 ∑ 𝐷𝐷2
0.5 = 1 − =1− =
10(100 − 1) 990 990
0.5(990) = 990 − 6 ∑ 𝐷𝐷2
495 = 990 − 6 ∑ 𝐷𝐷2
6 ∑ 𝐷𝐷2 = 990 − 495
495
Incorrect ∑ 𝐷𝐷 2 = = 82.5
6

Correct ∑ 𝐷𝐷2 = 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 ∑ 𝐷𝐷2 − (𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑)2 + (𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷)2


= 82.5 − 9 + 49 = 82.5 + 40 = 𝟏𝟏𝟏𝟏𝟏𝟏. 𝟓𝟓
6(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 ∑ 𝐷𝐷2 )
𝑟𝑟 = 1 −
𝑁𝑁(𝑁𝑁 2 − 1)
6(122.5) 990 − 735 255
=1− = = = 𝟎𝟎. 𝟐𝟐𝟐𝟐𝟐𝟐
990 990 990
3. Find out Rank Correlation from the following.
X 56 66 49 55 64 68 46 50
Y 40 70 50 60 80 75 49 62

Solution
X 𝑹𝑹𝟏𝟏 Y 𝑹𝑹𝟐𝟐 𝑫𝑫 = 𝑹𝑹𝟏𝟏 − 𝑹𝑹𝟐𝟐 𝑫𝑫𝟐𝟐
56 4 40 8 -4 16
66 2 70 3 -1 1
49 7 50 6 1 1
55 5 60 5 0 0
64 3 80 1 2 4
68 1 75 2 -1 1
46 8 49 7 1 1
50 6 62 4 2 4
∑ 𝑫𝑫𝟐𝟐 = 𝟐𝟐𝟐𝟐
𝟔𝟔 ∑ 𝑫𝑫𝟐𝟐 𝟔𝟔 × 𝟐𝟐𝟐𝟐 𝟏𝟏𝟏𝟏𝟏𝟏
𝒓𝒓 = 𝟏𝟏 − = 𝟏𝟏 − = 𝟏𝟏 − = 𝟏𝟏 − 𝟎𝟎. 𝟑𝟑𝟑𝟑𝟑𝟑 = −𝟎𝟎. 𝟔𝟔𝟔𝟔𝟔𝟔
𝑵𝑵(𝑵𝑵𝟐𝟐 − 𝟏𝟏) 𝟖𝟖(𝟖𝟖𝟐𝟐 − 𝟏𝟏) 𝟓𝟓𝟓𝟓𝟓𝟓

Uday N
Assistant Prof. of Commerce
To be Solved:
1. Calculate Rank differences correlation; Rank are given
𝑅𝑅1 1 2 3 4 5
𝑅𝑅2 5 3 1 2 4

2. The coefficient of rank correlation of marks obtained by 10 students in physics and


Chemistry was found to be 0.8. it was later discovered that the difference in ranks in two
subjects obtained by one of the students was wrongly taken as 7 instead of 9. Find the correct
coefficient of rank correlation.

Uday N
Assistant Prof. of Commerce
Repeated Ranks
When some terms in the series are equal, we use another formula, given as following.
3
(𝑚𝑚13 − 𝑚𝑚1 (𝑚𝑚2 − 𝑚𝑚2)
6 �∑ 𝐷𝐷2 + + + ⋯�
12 12
𝑟𝑟 = 1 −
𝑁𝑁(𝑁𝑁 2 − 1)

Where m is the no. of terms whose ranks are equal.


Important Note: only difference in this case is that equal ranks area given to equal values. It is
obtained by dividing the sum of ranks by number of equal values. If such values are ‘two’ and
3+4
respective number of ranks is 3 and 4, each term would be given = 3.5 as rank; in cases values
2
6+7+8
are three and the ranks are 6,7,8 each term would be given = 7 as rank.
3

1. Eight students have obtained the following marks in Accountancy and Economics. Calculate the
rank coefficient of correlation.
Accountancy (X) 25 30 38 22 50 70 30 90
Economics (Y) 50 40 60 40 30 20 40 70

Solution:
Accountancy Economics (𝑅𝑅1 − 𝑅𝑅2 )
𝐷𝐷2
X 𝑅𝑅1 Y 𝑅𝑅2 D
25 2 50 6 -4 16.00
30 3.5 40 4 -0.5 0.25
38 5 60 7 -2 4.00
22 1 40 4 -3 9.00
50 6 30 2 +4 16.00
70 7 20 1 +6 36.00
30 3.5 40 4 -0.5 0.25
90 8 70 8 0 0.00
2
∑ 𝐷𝐷 = 81.5

1 1
6 �∑ 𝐷𝐷2 + 12 (𝑚𝑚3 − 𝑚𝑚) + 12 (𝑚𝑚3 − 𝑚𝑚)�
𝑟𝑟 = 1 −
𝑁𝑁 3 − 𝑁𝑁

� 𝐷𝐷2 = 81.5 𝑁𝑁 = 8

As item 30 is repeated 2 times in X-series so 𝑚𝑚 = 2.


In series Y is item 40 is repeated 3 times, so 𝑚𝑚 = 3.

Uday N
Assistant Prof. of Commerce
1 1
6 �81.5 + 12 (23 − 2) + 12 (33 − 3)�
𝑟𝑟 = 1 −
83 − 8

6(81.5 + 0.5 + 2) 6(84) 504


=1− =1− =1− = 1 − 1 = 𝟎𝟎
504 504 504

To be solved
1. Calculate Spearman’s correlation coefficient from the following two series.
X 24 29 19 14 30 19 27 30 20 28 11
Y 37 35 16 26 23 27 19 20 16 11 21

2. Find out rank correlation coefficient.


X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70

3. For the following data compute rank correlation coefficient.


X 48 33 40 9 16 16 65 24 16 57
Y 13 13 24 6 15 4 20 9 6 19

Uday N
Assistant Prof. of Commerce
When series are more than Two
1. Following were the ranks given by three judges in a beauty contest. Find degree of
correlation between 1st and 2nd, 2nd and 3rd and 1st, And also mention which pair of judges
agree or disagree the most.
Judge 1 1 3 7 9 2 4 10 8 6 5
Judge 2 7 5 4 6 1 2 3 8 10 9
Judge 3 4 10 3 9 2 8 1 5 7 6

Solution
𝑹𝑹𝟏𝟏 𝑹𝑹𝟐𝟐 𝑹𝑹𝟑𝟑 𝑫𝑫𝟏𝟏 𝑫𝑫𝟐𝟐𝟏𝟏 𝑫𝑫𝟐𝟐 𝑫𝑫𝟑𝟑 𝑫𝑫𝟐𝟐𝟑𝟑
= 𝑹𝑹𝟏𝟏 − 𝑹𝑹𝟐𝟐 = 𝑹𝑹𝟐𝟐 − 𝑹𝑹𝟑𝟑 𝑫𝑫𝟐𝟐𝟐𝟐 = 𝑹𝑹𝟏𝟏 − 𝑹𝑹𝟑𝟑

1 7 4 -6 36 -3 9 -3 9
3 5 10 -2 4 -5 25 -7 49
7 4 3 3 9 1 1 4 16
9 6 9 3 9 -3 9 0 0
2 1 2 1 1 -1 1 0 0
4 2 8 2 4 -6 36 -4 16
10 3 1 7 49 2 4 9 81
8 8 5 0 0 3 9 3 9
6 10 7 -4 16 3 9 -1 1
5 9 6 -4 16 3 9 -1 1
2
∑ 𝐷𝐷1 = 144 2
∑ 𝐷𝐷2 = 112 2
∑ 𝐷𝐷3 = 182
𝑁𝑁
= 10

6 ∑ 𝐷𝐷12 6 × 144 864


𝑟𝑟12 = 1 − =1− =1− = 1 − 0.873
𝑁𝑁(𝑁𝑁 − 1)
2 10(10 − 1)
2 990
= 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏 (𝐿𝐿𝐿𝐿𝐿𝐿 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + 𝑖𝑖𝑖𝑖𝑖𝑖 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶)

6 ∑ 𝐷𝐷22 6 × 112 672


𝑟𝑟23 =1− =1− =1− = 1 − 0.679
𝑁𝑁(𝑁𝑁 − 1)
2 10(10 − 1)
2 990
= 𝟎𝟎. 𝟑𝟑𝟑𝟑𝟑𝟑 (𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + 𝑖𝑖𝑖𝑖𝑖𝑖 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶)

6 ∑ 𝐷𝐷32 6 × 182 1092


𝑟𝑟31 = 1 − = 1 − = 1 − = 1 − 1.103
𝑁𝑁(𝑁𝑁 2 − 1) 10(102 − 1) 990
= −𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏 (𝐿𝐿𝐿𝐿𝐿𝐿 𝑑𝑑𝑒𝑒𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 − 𝑖𝑖𝑖𝑖𝑖𝑖 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶)

𝑟𝑟23 is highest, hence 2nd and 3rdd judges agree the most, whereas 3rd and 1st judges disagree the most,
𝑟𝑟31 being lowest.

Uday N
Assistant Prof. of Commerce
Merits and demerits of Rank difference coefficient of correlation
Merits
1. It is easy to calculate
2. It is simple to understand
3. It can be applied to any type of data. Qualitative or Quantitative. Hence correlation with
qualitative data such as honesty, beauty can be found.
4. This is most suitable in case there are two attributes.
Demerits
1. It is only an approximately calculated measure as actual values are not used for
calculations.
2. For large samples it is not convenient method.
3. Combined r of difference series cannot be obtained as in case of mean and S.D.
4. It cannot be treated further algebraically.

To be Solved.
1. Ten companies in a Beauty Contest are ranked by three judges as follows:
Judge A 6 5 3 10 2 4 9 7 8 1
Judge B 5 8 4 7 10 2 1 6 9 3
Judge C 4 9 8 1 2 3 10 5 7 6
Discuss which pair of judges has the nearest approach to common test s in beauty.
2. Three judges gave the following ranks to eight contestants.
Judge A 1 2 3 4 5 6 7 8 9
Judge B 7 4 3 9 1 2 5 6 8
Judge C 3 9 5 8 7 4 1 2 6
Calculate coefficient of correlation. Also mention which of the judges
a. Coincide most
b. Differ most

3. Ten entries are submitted for a competition. Three judges study each entry and then list them
in the rank order as under.
Entry No. 1 2 3 4 5 6 7 8 9 10
Judge A 9 3 7 5 1 6 2 4 10 8
Judge B 9 1 10 4 3 8 5 2 7 6
Judge C 6 3 8 7 2 4 1 5 9 10
Use the method of Rank correlation to determine which pair of judges disagree the most.

Uday N
Assistant Prof. of Commerce

S-ar putea să vă placă și