Documente Academic
Documente Profesional
Documente Cultură
PRESENTATION OF DATA
Data are collections of any number of related observations. A collection of data is called a data
set and a single observation a data point.
Raw data - the data which have not been arranged and analysed is called raw data.
Structured data - data which are arranged in a systematic manner where from some inference
can be drawn.
Types of data : Data can come from actual observtions or from records that are kept for
normal purposes. There are two types of data :
Primary data Primary data are collected through first hand investigation. When the data
required for a particular study can be found neither in the internal records of the enterprise, nor in
the published sources, it becomes necessary to collect original data by conducting first hand
investigation. Data so collected are called primary data.
There are two methods of collecting primary data :
(i)Questioning
(ii)Observation
Secondary data The data which have already been collected by others.
Why to arrange data ?
For data to be useful, the observations must be organised properly so that the pattern could
be understood and a logical conclusion could be drawn.
Data can come from actual observtions or from records that are kept for normal purposes.
Data can assist decision makers :
1
The marketing survey may reveal that the product is preferred by suburban
community, average incomes, and average education. So, the products advertising can
cover this target audience. If hospital records show that more patients used the x-ray
facilities in June than in January, the hospital personnel division should determine
whether this was accidental to this year or an indication of trend, and perhaps it should
adjust its hiring accordingly.
Classification of data
Types of classification :
Data can be classified on the following four basis :
1.Geographical As for example, area wise ( states, cities, districts etc.)
2.Chronological As for example, on the basis of time
Year
Sales of the company
2012
Rs.30 crores
2011
Rs.39 crores
2010
Rs.29 crores
3.Qualitative Data are classified on the basis of some attribute or quality, such as literacy,
religion, sex etc. In this type of classification, the attribute under study can not be measured. For
example, if the attribute under study is Blindness, we may find out how many persons are blind
in a given population. It is not possible to measure the degree of blindness in each case.
Populatio
n
Male
Literate
Female
Illiterate
Literate
Illiterate
No. of families
10
400
800
Age
20-25
25-30
30-35
No. of employees
10
15
40
Formation of frequency distribution For forming a frequency distribution table, we are to count
the number of times a particular value is repeated which is called frequency of the class. In order
to facilitate counting, a column of tally is prepared. In another column, all possible values of
variables are placed from the lowest to the highest. Then a bar(vertical line) is put opposite the
particular value to which it relates. To facilitate counting, blocks of five bars are prepared and
some space is left in between each block. We finally count the number of blocks and bars
corresponding to each value of the variable and place it in the column of frequency. The process
shall be clear from the following example of number of refrigerators sold on 20 working days by
a company :
23,
30,
20,
26,
30,
30,
20,
23,
40,
40,
26,
20,
23,
40,
28,
26,
23,
30,
40,
28,
28,
30,
IIII
4
T 22
OTAL
columns and rows. The purpose of a table is to simplify the presentation and to facilitate
comparisons.
Parts of a Table :
1.Table number Each table should be numbered. The number may be given either in the centre
at the top above the title or in the side of the table at the top or at the bottom of the table on the
left hand side.
2.Title of the table Every table must have a suitable title which should describe the content of
the table. A complete title has to answer :
(i)What precisely are the data in the table?
(ii)Where the data occurred?
(iii)When the data occurred?
3.Caption - The caption refers to the column headings. It explains what the column represents.
4.Stub Stubs are the designation of the rows or row headings.
5.Body of the table The body of the table contains the numerical informations. Data presented
in the body are arranged according to descriptions are classification of the captions and stub.
Headnote - It is a brief explanatory statement applying to all or major part of the material in the
table, and is placed below the title entered and enclosed in brackets.
Footnote - Anything in a table which the reader may find difficult to understand from the title,
captions and stub should be explained in footnotes.
Types of table : (i)Simple and complex table In a simple table, only one characteristic is shown.
Hence this type of table is known as one-way table. In a complex table, on the other hand, two or
more characteristics are shown.
Example of simple or one-way table is shown below :
Age (In years)
No. of Employees
Below 25
50
25 35
67
35 45
43
45 55
15
55 and above
5
Total 180
:
Total
50
67
43
15
5
180
Charting data
One of the most convincing and appealing ways in which data may be presented is through
charts. Evidence of this can be found in the financial pages of newspapers, journals,
advertisements etc. The pictorial presentation helps in quick understanding of the data. Through
pictorial presentation data can be presented in an interesting form.
Types of Diagrams
1.One-dimensional diagram, e.g., Bar diagrams
2.Two-dimensional diagrams, e.g., Rectangles, Squares and Circles
3.Pictograms and Cartograms
Types of Bar Diagrams :
Simple Bar Diagram
Subdivided Bar Diagram
Multiple Bar Diagram
60
40
20
0
2010-2011
2009-2010
2008-2009
Bank(In
2008-2009
80
100
178
2010-2011
98
126
205
250
200
150
2008-2009
2010-2011
100
50
0
Sales of product TV
Sales of AC
Pie Diagram This type of diagram enables us to show the partitioning of a total into
component parts.In constructing a pie chart, the first step is to prepare the data so that the
various component values can be transposed in a series. The market share of different
companies are : Samsung 25%, Technip 60%, Hitachi 10%, LG -5%. The Pie chart for
this is shown below :
Line graphs
When we observe the values of a variable at different points of time, the series so formed is
known as time series. The technique of graphic presentation is extremely helpful in analysing
changes at different points of time.
Illustration. The following data relate to imports of steel pipes by IOCL ;
Year
:
2000 2001 2002 2003 2004 2005
Imports
:
2
3
2.8
4.2
6.7
8.5
(In Million Tonnes)
Imports of steel
pipes(In Million
Tonnes)
Histograms
A Histogram is a graphical method for presenting data, where the observations are located on
a horizontal axis (Usually grouped into intervals) and the frequency of those observations is
depicted along the vertical axis.
The histogram is most widely used for graphical presentation of a frequency
distribution.The histograms should be clearly distinguished from a bar diagram. The
distinction lies in the fact that whereas a bar diagram is one dimensional, i.e. only the
length of the bar is the material not the width; a histogram is two-dimensional, i.e. in a
histogram both the length as well as the width are important.
Frequency Polygon
A frequency polygon is a graph of frequency distribution. It is prticularly effective in
comparing two or more frquncy distribution.There are two ways in which a frequency
polygon may be constructed :
1.We may draw a histogram of the given data and then join by straight lines the mid-points of
upper horizontal side of each rectangle with the adjacent rectangle. The figure so formed is
called the frequency polygon.
2.Another method of constructing frequency polygon is to take the mid-points of the various
class-intervals and then plot the frequency corresponding to each point and to join all these
points by straight lines.In this method, we do not have to construct a histogram.
By constructing a frequency polygon the value of mode can be easily ascertained. If
from the apex of the polygon a perpendicular is drawn on the X-axis, we get the value of
mode.
62
31
78
50
35
68
32
80
56
37
70
78
81
72
40
42
45
62
58
55
SOLUTION :
Marks
15-25
FREQUENCY DISTRIBUTION
Tallies
Frequency
I
1
25-35
IIII
35-45
IIII III
45-55
IIII I
55-65
IIII IIII
65-75
IIII II
75-85
IIII IIII
IIII
14
7
9
TOT 50
AL :
#2. Classify the following data by taking class interval such that their mid-values are 17, 22,
27, 32, and so on.
30
42
30
54
40
48
15
17
51
42
25
41
30
27
42
36
28
26
37
54
44
31
36
40
36
22
30
31
19
48
16
42
32
21
22
46
33
41
21
SOLUTION : Since we are to classify the data in such a way that the mid-values are 17,
22, 27, 32, and so on, the first class should be 15-19 (Mid-value = (15+19) 2 = 17), the
second class 20-24 etc.
Frequency Distribution
Variables
Tallies
Frequency
15-19
IIII
4
20-24
IIII
25-29
IIII
30-34
IIII III
35-39
IIII
40-44
IIII IIII
45-49
III
50-54
III
3
3
Tot 39
al :
#3. The data given below relate to the height and weight of 20 persons. You are required to
form a two-way frequency table with class intervl 62 to 64, 64 to 66 and so on and 115
to 125 lb, 125 to 135 lb, etc.
Sl.No.
Weight
Height
Sl.No.
Weight
Height
1
2
3
4
5
6
7
8
9
10
170
135
136
137
148
121
117
128
143
129
70
65
65
64
69
63
65
70
71
62
11
12
13
14
15
16
17
18
19
20
163
139
122
134
140
132
120
148
129
152
70
67
63
68
67
69
65
68
67
69
SOLUTION:
As per the requirement of the question, the population is to be divided into five classes according
to the height of the persons included in each group and six classes according to the weight. Thus,
there will be 5 x 6 = 30 cells.
For tabulating the information in appropriate cells, first, the raw to which the height
measurement (say X) should belong is determined. Afterwards on consideration of the weight
(say Y), the column in which it should be included is determined. The tabulation is recorded by
Tally Bars. Thus the two-way table shall be prepared as follows :
TWO-WAY FREQUENCY TABLE SHOWING WEGHT AND HEIGHT OF 20 PERSONS
Weight in 115 - 125 125 - 135 135 - 145 145 - 155 155 - 165 165 - 175 Total
lbs.(Y)
Height in
Inches(X
)
62 - 64
64 66
66 68
68 70
70 -72
Total
II (2)
II (2)
I (1)
I (1)
II (2)
I (1)
5
III (3)
II (2)
I (1)
6
I (1)
II (2)
3
I (1)
1
I (1)
1
3
5
4
4
4
20
#4. The following table gives the birth rate per thousand of different countries over a
certain period :
Country
Birth rate
India
33
Germany
16
U.K.
20
China
40
New Zealand
30
Sweden
15
Solution :
Birt rate
45
40
35
30
25
20
15
10
5
0
Birt rate
#5. The production of steel by Govt. sector and Private sector are given below. Represent
the data by sub-divided bar diagram.
Year
Govt.
Private
1996-97
1997-98
1998-99
19992000
2000-01
2001-02
2002-03
Sector
400
370
550
Sector
150
75
270
620
710
780
600
330
440
500
410
1400
1200
1000
800
600
Private Sector
400
Govt. Sector
200
0
Net profit
(Rs. In lacs)
20
30
35
40
45
160
140
120
100
Sales(Rs. In lacs)
Gross Profit(Rs. In lacs)
80
60
40
20
0
1
#7. Draw a Pie diagram for the following data of sixth Five-Year Plan Public Sector
outlays :
Agriculture and Rural Development
: 12.9 %
Irrigation
: 12.5 %
Energy
: 27.2 %
Industry and Minerals
: 15.4 %
Transport, communication
: 15.9 %
Social Services and others
: 16.1 %
Solution :
The Angle at the centre is given by
Percentage outlay
x 360 = Percentage outlay x 3.6
100
COMPUTATION FOR PIE-DIAGRAM
Sector
Percentage Angle outlay
Agriculture and rural deelopment
12.9
12.9 x 3.6 = 46 deg.
Irrigation
12.5
12.5 x 3.6 = 45 deg.
Energy
27.2
27.2 x 3.6 = 98 deg.
Industry and Minerals
15.4
15.4 x 3.6 = 56 deg.
Transport, communication
15.9
15.9 x 3.6 = 57 deg.
Social services and others
16.1
16.1 x 3.6 = 58 deg.
Tota 100
360 deg.
l:
Percentage
Agriculture
development;
13%
Social
services and
and rural
others;
16%
Irrigation; 13%
Transport, communication; 16%
Energy;
27%
Industry and minerals;
15%
#8. Draw the histogram and frequency polygon from the following data :
Marks
Number of students
0-10
4
10-20
6
20-40
14
40-50
16
50-60
14
60-70
8
70-90
16
90-100
5
Harmonic mean
Arithmatic Mean : The most popularly used measure for representing the entire data by one
value is what laymen call is average and what statisticians call is arithmatic mean.
CALCULATION OF ARITHMATIC MEAN :
A CALCULATION OF SIMPLE ARITHMATIC
OBSERVATIONS
Direct method
x = (x) N
Where
x = Values of observations
N = Number of observations
MEAN
INDIVIDUAL
Short-cut method
x = A + (d) N
Where
d=xA
A = Assumed mean
#1. The following table gives the monthly income of 10 employees in an office ;
Income Rs. : 1780 1760 1690 1750 1840 1920 1100 1810 1050 1950
Calculate the arithmatic mean of incomes.
Solution : (Direct Method)
Calculation of Arithmatic Mean
Employee
Monthly Income(Rs.)
1
1780
2
1760
3
1690
4
1750
5
1840
6
1920
7
1100
8
1810
9
1050
10
1950
N=10
X = 16650
X = (X) /N = 16650 / 10 = 1665
Short-cut method
x = A + (d) N
Where
d=xA
A = Assumed mean
Employee
1
2
3
4
5
6
7
8
9
10
N=10
Income
1780
1760
1690
1750
1840
1920
1100
1810
1050
1950
x = A + (d) N
Let A = 1800,
d = - 1350,
X = 1800 + (-1350) 10
= 1800 135 = 1665
(X 1800) = d
- 20
- 40
- 110
- 50
+ 40
+ 120
- 700
+10
- 750
+150
d = -1350
N=10
Short-cut method
x = A + (f.d) N
Where
d=xA
A = Assumed mean
N = Total number of observations i.e.f
#2. From the following data of the marks obtained by 60 stuents of a class, calculate the
arithmatic mean :
Marks
No. of sudents
Marks
No. of students
20
8
50
10
30
12
60
6
40
20
70
4
Solution :
By Direct method :
Marks
x
20
30
40
50
60
70
No. of students
f
8
12
20
10
6
4
N=60
f.x
160
360
800
500
360
280
fx=246
0
X 40 =
d
-20
-10
0
+10
+20
+30
x = (f.x) N = 2460/60 = 41
By Short-cut method :
Let Assumed Mean, A = 40
X = A + (f. d) N = 40 + 60/60 =40 + 1 = 41
C FOR GROUPED DATA [Continuous series]
1 Direct Method (For grouped data)
x = ( f. m) N
Where
x = Sample arithmetic mean
f = The frequency of each class
m = Mid-point of the class
N = The total frequency
f.d
-160
-120
0
+100
+120
+120
f.d =
60
A = Assumed mean
m = Mid point of the class
d = (m A)
N = Total number of observations
#3. From the following data compute arithmetic mean by Direct method and Short-cut
method :
Marks
0-10
10-20
20-30
30-40
40-50
50-60
No. of
5
10
25
30
20
10
students
Solution :
By direct method
Marks
Mid-point
m
5
15
25
35
45
55
0-10
10-20
20-30
30-40
40-50
50-60
No. of students
f
5
10
25
30
20
10
N=100
f.m
25
150
625
1050
900
550
f.m =3300
x = ( f. m) N
= 3300 / 100 = 33
By short-cut method
Marks
0-10
10-20
20-30
30-40
40-50
50-60
Mid-point
m
5
15
25
35
45
55
No. of students
f
5
10
25
30
20
10
N=100
x = A + (f.d ) N = 35 (200/100) = 35 2 = 33
(m 35)
D
-30
-20
-10
0
+10
+20
f.d
-150
-200
-250
0
+200
+200
fd = -200
No. of students
4
6
10
Marks
30-40
40-50
Above 50
No. of students
15
8
7
In the above case, since the class interval is uniform, the appropriate assumption would be
that the lower limit of the first class is zero and the upper limit of the last class is 60. The first
class thus would be 0-10 and the last class 50-60.
COMBINED MEAN OF TWO GROUPS
x12 = (N1 x1 + N2 x2 ) (N1 + N2 )
Where
x1 = Mean of Ist group
N1 = No. of observations of Ist group
#4. The mean height of 25 male workers in a factory is 61 inches and the mean height of
35 female workers in the same facotry is 58 inches. Find the combined mean height of
60 workers in the factory.
Solution :
x12 = (N1 x1 + N2 x2 ) (N1 + N2 )
N1 = 25,
X1 =61,
N2 = 35,
X2 = 58
X12 = [ (25 x 61) + (35 x 58) ] (25 + 35)
= [ 1525 + 2030 ] 60
= 3555 60
= 59.25
Thus the combined mean height of 60 workers is 59.25 inches.
Weighted mean takes into account the importance of each value to the overall total.
WX = (w.x) w
Where
x = Value of each element
w = Weight assigned to each observation
CONCEPT OF MEDIAN
The median is a measure of central tendency. The Median by definition refers to the middle
value in a distribution. Half of the items lie above this point, and the other half lie below it.
As distinct from the arithmatic mean which is calculated from the value of of every item in
the series, the median is called a positional average. The term position refers to the place of a
value in a series. The place of of the median in a series is such that an equal number of items
lie on either side of it.
CALCULATING THE MEDIAN FROM UNGROUPED DATA INDIVIDUAL
OBSERVATIONS
To find the median of a data set :
1 Arrange the data in asscending or descending order of magnitude
2 If the data set contains an odd number of items, the middle item of the array is the
median.
3 If there is an even number of items, the median is the average of the two middle items.
n+1
Median = (
) th item in a data array
2
COMPUTATION OF MEDIAN - DISCRETE SERIES
STEPS :
1.Arrange the data in asscending or descending order of magnitude.
2.Find out the cumulative frequencies
3.Apply the formula :
n+1
Median = Size of (
)
2
4.Now look at the cumulative frequency column and find that total which is either equal to
n+1
2
or next higher to that and determine the value of the variable corresponding to it.
1000
1500
1800
2000
2500
Median = Size of
24
26
30
20
6
n+1
2
40
66
96
116
122
) th item
Marks
No. of students
f
29
195
241
117
52
10
6
3
2
0-5
5-10
10-15
15-20
20-25
25-30
30-35
35-40
40-45
Median = Size of
N
2
c.f.
29
224
465
582
634
644
650
653
655
xi
L = 10,
N/2 = 327.5
p.c.f = 224
i=5
Median = 10 + [ (327.5 224)/241] x 5
= 10 + [(103.5)/241] x 5
= 10 + 0.429 x 5
= 10 + 2.145 = 12.145
#3. An incomplete distribution is given below :
Variable :
0-10 10-20 20-30 30-40 40-50 50-60 60-70
Frequency:
10
20
?
40
?
25
15
i
ii
You are given that the median value is 35. Find out missing frequency, given that
the total frequency is 170
Calculate the arithmatic mean of the completed table
Solution :
Let the missing frequency of the class 20-30 is f1 and that of 40-50 is f2
The total frequency of the casses = 170
Therefore, 170 = 10 + 20 + f1 + 40 + f2 + 25 + 15
Or, 170 = 110 + f1 + f2
Hence, f1 + f2 = 60 . (1)
Median =
L +
N
p.c.f .
2
f
xi
c.f.
10
30
30+f1
70+f1
70 + f1 + f2
95 + f1 + f2
110 + f1 + f2
Variable
0-10
10-20
10
20
Mid-point
m
5
15
(m 35)
d
5 - 35 = - 30
15 - 35 = -20
f.d
-
300
400
20-30
30-40
40-50
50-60
60-70
35
40
25
25
15
N=170
25
35
45
55
65
25 35 = - 10
35 35 = 0
45 35 = 10
55 35 = 20
65 35 = 30
350
0
+ 250
+ 500
+ 450
150
x = A + (f.d ) N
= 35 + 0.882 = 35.882
QUARTILES, DECILES, PERCENTILES
Besides median, there are other measures which divide a series into equal number of parts.
Important amongst these are Quartiles, Deciles and Percentiles.
Quartiles are those values of the variate which divide the total frequency into four equal
parts.
Deciles divide the total frequency into 10 equal parts.
Percentiles divide the total frequency in 100 equal parts.
Just as one point divides a series into two parts, three points would divide it into four parts, 9
points into 10 parts and 99 points into 100 parts, consequently there are only 3 Quartiles, 9
Deciles and 99 Percentiles for a series. The quartiles are denoted by symbol Q, deciles by D
and percentiles by P. The subscript 1,2,3 etc., beneath Q, D, P would refer to the particular
value that we want to compute. Thus Q1 would refer to first quartile, D1 first decile, P1 first
percentile.
COMPUTATION OF QUARTILES, DECILES, PERCENTILES :
The procedure for computing quartiles, deciles, percentiles is the same as for median.
For grouped data, the following formulae are used for quartiles, deciles, and percentiles :
Qj = L
DK = L
Pm = L
jN
p.c.f .
4
f
xi
kN
p . c . f .
10
f
xi
for K = 1,2,3,..
mN
p.c.f .
100
f
xi
for m = 1,2,3,..
for j = 1,2,3,..
c.f.
4
12
30
60
75
85
93
100
x 10= 40 +
2512
18
x 10
= 40 + 7.22 = 47.22
That means 25% of the companies earn an annual profit of Rs.47.22 lacs or less.
Q2 = Size of 2N/4 observation = 2x100/4 = 50th observation
Hence, Q2 lies in the class 50 60.
jN
2 x 100
p.c.f .
30
4
4
Q2 = L +
x i = 50 +
x 10
f
30
10
= 50 +
5030
30
= 50 + 6.67 = 56.67
That means 50% of the companies earn an annual profit of Rs.56.67 lacs or less.
D4 = Size of
4N
10
D4 = L
4N
p . c . f .
10
f
Thus, 40% of the companies earn an annual profit of Rs.53.33 lacs or less
P80 = Size of
80 N
100
P80 = 70
80 x 100
75
100
10
x 10 = 70 + [(80 75)/10]x 10 = 70 + 5 = 75
Thus, 80% of the companies earn an annual profit of Rs.75 lacs or less and 20% of the
companies earn an annual profit of more than Rs.75 lacs.
Concept of Mode
The mode is another measure of central tendency that is different from the mean but somewhat
like the median. The mode or the modal value is that value in a series which occurs with
highest frequency.
We rarely use the mode of ungrouped data as a measure of central tendency. Table-1, for
example, shows the number of
delivery trips per day made by supplier. The mode or the
modal value is 15 because it occurs more often than any other value (three times).
Table-1:Delivery trips per day in 20 day period
0
0
1
2
2
4
5
5
6
7
7
8
12
15
1515
1515
15
19
A mode of 15 implies that 15 is the most frequent number of trips, but it fails to let us know that
most of the values are under 10.
If we group these data into a frequency distribution as shown in Table -2, we select the class of
4-7 with the most observations as the modal class. This class is more representative of the
delivery trips than the mode of 15 trips per day. For this reason, whenever we use the mode as a
measure of the central tendency of a data set, we should calculate the mode from grouped
data.
0-3
4 -7
8 - 11
12 & above
MODAL CLASS
Calculating the mode from grouped data
1.When data are grouped in a frequency distribution, we must assume that the mode is
located in the class with the highest frequency. To determine a single value for the mode from
this modal class, we use equation below :
M0 = L + {
d1
d 1+d 2
}. i
Where
L = Lower limit of the modal class
d1 = The difference between the frequency of the modal class and the frequency of the preceding
class
d2 = The difference between the frequency of the modal class and the frequency of the
succeeding class
i = Size of the modal class
2.Another form of this formula is :
M0 = L + [( f1 f0 ) / (2f1 f0 f2)] x i
Where
L = Lower limit of the modal class
f1 = Frequency of the modal class
f0 = Frequency of the class preceding the modal class
f2 = Frequency of the class succeeding the modal class
i = Width of the modal class interval
No. of companies
10
3
2
#2. The median and mode of the following wage distribution are Rs.33.5 and Rs.34
respectively. However, three frequencies are missing. Determine their values.
Wages :
0-10 10-20 20-30 30-40 40-50 50-60 60-70 Total
(In hundred Rs.)
Frequencies :
4
16
?
?
?
6
4
230
Solution :
Let the missing frequencies be f0, f1, and f2 corresponding to classes 20-30, 30-40 and 40-50
respectively. Since median and mode are 33.5 and 34, they lie in the class 30-40. The frequency
of this class is f1.
DETERMINING MISSING VALUES
Wages (In hundred Rs.)
Frequency
Cumulative frequency
0-10
4
4
10-20
16
20
20-30
f0
20+f0
30-40
f1
20+f0+f1
40-50
f2
20+f0+f1+f2
50-60
6
226
60-70
4
230
N=230
From the given frequencies in Table above, we can write,
f0 + f1 + f2 = 230 - (4+16+6+4) = 200
Or, f2 = 200 f0 f1 ..(1)
Mode = L +[ ( f1 f0 ) / (2f1 f0 f2)] x i
Therefore, 34 = 30 + [(f1 f0)/(2f1 f0 f2)] x i
Or, 34 30 =[ (f 1 f0)/{2f1 f0 (200 f0 f1)}] x 10 [Putting the value of
200 f0 f1 from Equation (1) above ]
Or, 4/10 = (f1 f0)/(2f1 f0 200 + f0 + f1 )
Or, 4/10 = (f1 f0)/(3f1 200)
Or, 4(3f1 200) = 10(f1 f0)
Or, 12f1 800 = 10f1 10 f0
Or, 2f1 + 10f0 = 800
Or, f1 + 5f0 = 400
Or, f1 = 400 5f0 ..(2)
Median =
L +
N
p.c.f .
2
f
xi
f2 =
Or, 7(400 -5f0) = 1900 20f0 [ Substituting f1 = 400 5f0 from Equation (2) ]
Or, 2800 35f0 = 1900 20f0
Or, 2800 1900 = -20f0 + 35f0
Or, 900 = 15f0
Or, f0 = 900/15
Or, f0 = 60
Now substituting the value of f0=60 in Equation (2), we get
f1 = 400 5 x 60 = 400 300 = 100
Now, substituting values of f0 = 60 and f1 = 100 in Equation (1), we get
f2 = 200 60 100 = 40
Therefore, f0 = 60, f1 = 100,
f2 = 40
Under peak
Curve
Divides area
in halves
Centre of
Gravity
M0
Me X
M0 : Mode
Me : Median
X : Mean
In moderately skewed or asymmetrical distributions a very important relationship exists among
mean, median and mode. In such distributions the distance between the mean and median is
about one-third the distance between mean and the mode.
Karl Pearson has expressed this relationship as follows :
Mode = Mean - 3[Mean - Median]
#4.In a moderately asymmetrical distribution, the mode and the mean are 32.1 and 35.4
respectively. Find the value of Median.
Solution:
Mode = 3 Median 2 Mean
Given Mean = 35.4,
Mode = 32.1
Therefore, 32.1 = 3 x Median - 2 x 35.4
Or, 3 Median = 32.1 + 70.8 = 102.9
Or, Median = 34.3
1991 1992
0.075 0.08
SOLUTION :
G.M. = [(1.11)(1.09)(1.075)(1.08)(1.095)(1.108)(1.12)]1/7
= [1.908769992]1/7
= 1.09675
Harmonic Mean
Harmonic mean is used for computing the average rate of increase of profits or average speed at
which journey has been performed. The rate usually indicates the relation between two different
types of measuring units that can be expressed reciprocally. For example speed = km/hr. Here,
km and hr are two different units.
Harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocal of the
individual observations. Thus, by definition,
for individual observations,
HM = [ N/ (1/X1 + 1/X2 + 1/X3 + . + 1/Xn ) ]
H.M. = N/[(1/X)
1
For discrete series, H.M. = N/[f. X ]
For continuous series, H.M. = N/[f.
1
m
#. An aeroplane covers the four sides of a square at speeds of 1000, 2000, 3000 and 4000 km
per hour respectively. What is the average speed of the plane in its flight around the
square?
Solution:
If we compute the arithmetic mean, we get the following answer :
X = [1000+2000+3000+4000]/4 = 2500 km/hr
However, this is not the correct answer. In such a problem, harmonic mean is an appropriate
average.
H.M. = N/[(1/X)]
1
1
1
1
+
+
+
100 2000 3000 4000
=
4
= 1920 km/hour
Marks
No. of students
: 10
: 20
20
30
25
50
40
15
50
5
Solution :
CALCULATION OF HARMONIC MEAN
f
(f/X)
20
2
30
1.5
50
2
15
0.375
5
0.1
N=120
(f/X)=5.975
Marks
10
20
25
40
50
H.M. =
N
f
( )
X
120
5.975
= 20.08
50-60
3
Solution :
Class interval
10-20
20-30
30-40
40-50
50-60
H.M. = N/ (f/m)
f/m
0.267
0.240
0.286
0.156
0.055
(f/m) = 1.004
= 30/1.004 = 29.88
(a x b)
Or, a + b 2(ab)
Or, a + b - 2(ab) 0
Or, (a - b)2 0
But square of any real quantity is positive. Hence, (a - b)2 will be
Positive.
a+b
Hence,
(ab)
2
Let us now prove that G.M. H.M.
2 ab
Or, (ab) a+b
Or, a + b
2 ab
( ab)
Or, a + b 2(ab)
This has already been proved above. Hence, G.M. H.M.
Therefore, A.M. G.M. H.M.
If a and b are equal in that case, A.M. = G.M. = H.M.
Thus, A.M. G.M. H.M.
In any distribution when the original items differ in size the value of A.M., G.M., and
H.M. would also differ and will be in the following order :
A.M. G.M. H.M.
The equality signs hold only if all the numbers X1, X2, .Xn are identical.
WHICH AVERAGE TO USE ?
The methods of computing various types of average have been discussed in details above. Now
question comes that which type of average is to be used under what condition. The following
considerations influence the selection of an appropriate average :
The type of data available. If the data are badly skewed, avoid the Mean.
If the data are gappy around the middle, avoid the Median.
If the data are unequal in class-interval, avoid the Mode.
Arithmetic Mean In following cases the arithmetic mean should not be used :
In highly skewed distributions.
In distributions with open-end intervals
To average ratios, and rates of change
When there is very large and small items as there will be undue influence from extreme
items.
Median The median is generally the best average in open-end grouped distributions especially
where if plotted as a frequency curve, one gets a J or reverse J curve. For example, in case of
income distribution or price distribution, very high or very low values would cause the mean to
be higher to lower than the most common values. In such cases, the median or middle value of
the series may be a more representative figure to use in describing the mass of data.
Mode The mode is best suited where there is an outstandingly large frequency. The mode can
be used in problems involving the expression of preferences where the quantitative
measurements are not possible. If we want to compare the consumer preferences for different
kinds of products or different kinds of advertisements, we can compare the modal preferences
expressed by different groups of people but we can not calculate the median or mean.Mode is
particularly useful average for discrete series e.g. number of people wearing a given size of shoe.
Geometric Mean Geometric mean is useful for averaging ratios, percentages and in computing
average rates of increase or decrease. It is particularly important in Economics and Business
Statistics in Index Number construction.
Harmonic Mean Harmonic mean is useful in Problems in which values of a variable are
compared with a constant quantity of another variable i.e.distance covered within certain time
and quantities purchased or sold per unit.
MESURE OF VARIATION
The various measures of central value discussed above gives us one single figure that represents
the entire data. But the average alone can not adequately describe a set of observations, unless all
the observations are same. It is necessary to describe the variability or dispersion of the
observations. In two or more distributions, the central value may be the same but still there can
be wide disparities in the formation of distributions. Measures of dispersion help us to study this
important characteristic of a distribution.
Some important definitions of dispersion are given below :
1 Dispersion is the measure of variation of the items. [A.L.Bowley]
2 The degree to which numerical data tend to spread about an average value is called the
variation or dispersion of the data. [Spiegel]
3 Dispersion or spread is the degree of the scatter or variation of the variable about a
central value.
The measurement of the scatteredness of the mass of figures in a series about an average
is called measure of variation or dispersion. [Simpson & Kalfa]
interquartile range. In other words, interquartile range represents the difference between the
third quartile and the first quartile.
Symbolically, Interquartile Range = Q3 Q1
Quartile Deviation
Quartile deviation gives the average amount by which the two quartiles differ from the
median. In asymmetrical distribution, the two quartiles are equidistant from the median.
Q 3Q1
Quartile Deviation =
2
Coefficient of Quartile Deviation
The relative measure corresponding to quartile deviation is called Coefficient of Quartile
Deviation.
Q 3Q1
Coefficient of quartile deviation = Q3+Q 1
#. You are given the frequency distribution of 292 workers of a factory according to
their average weekly income. Calculate quartile deviation and its coefficient from the
following data :
Weekly Income
No. of workers
Weekly Income
No. of workers
(Rs.)
(Rs.)
Below 1350
8
1450-1470
22
1350-1370
16
1470-1490
15
1370-1390
39
1490-1510
15
1390-1410
58
1510-1530
9
1410-1430
60
1530 & above
10
1430-1450
40
Solution :
Weekly income
Below 1350
1350-1370
1370-1390
1390-1410
1410-1430
1430-1450
No. of workers
f
8
16
39
58
60
40
c.f.
8
24
63
121
181
221
1450-1470
1470-1490
1490-1510
1510-1530
1530 & above
22
15
15
9
10
N=292
243
258
273
282
292
146121
x 20
60
Q1 =
L +
N
p.c.f .
4
f
x i = 1390 +
7363
x
58
Q3 =
L +
3N
p.c.f .
4
f
Coefficient of Q.D. =
Q 3Q1
Q3+Q 1
x i = 1430 +
219181
x
40
20 = 1430 + 19 = 1449
= 0.020
an average. The two other measures, namely, the average deviation and standard deviation,
help us in achieving this goal.
The mean deviation is also known as average deviation. It is the average difference between
the items in a distribution and the median or mean of that series. Theoretically there is an
advantage in taking the deviations form median because the sum of deviations of items from
median is minimum when signs are ignored. However, in practice, the arithmetic mean is
more frequently used in calculating the value of average deviation and this is the reason why
it is more commonly called mean deviation. The mean deviation is obtained by calculating
the absolute deviations of each observations from median (or mean), and then
averaging these deviations by taking their arithmetic mean.
1 Computation of Mean Deviation Individual Observations
If, X1, X2,X3, .XN are N given observations, then the deviation about an
average A is given by
1
1
M.D. = N | X A | = N | D |
Where
| D | = |X A
Coefficient of mean deviation The relative measure corresponding to the mean
deviation is called the coefficient of mean deviation. This is obtained by dividing mean
deviation by the particular average used in computing mean deviation. Thus, if mean
deviation is computed from median, the coefficient of mean deviation shall be obtained
by dividing mean deviation by median
M .D.
Coefficient of M.D. = Median
#. Calculate the mean deviation and its coefficient of the two income groups of five and
seven members given below :
I(Rs.)
II(Rs.)
4000
3000
4200
4000
4400
4200
4600
4400
4800
4600
4800
5800
SOLUTION :
Group I
Deviation from median 4400 |D|
4000
400
4200
200
4400
0
4600
200
Group II
Deviation from median 4400 |D|
3000
1400
4000
400
4200
200
4400
0
4800
N=5
400
|D| = 1200
4600
4800
5800
N=7
200
400
1400
|D| = 4000
SOLUTION :
CALCULATION OF MEAN DEVIATION
f
|D|
f|D|
3
2
6
12
1
12
18
0
0
12
1
12
3
2
6
N=48
f|D| = 36
X
10
11
12
13
14
M.D. =
1
N
c.f.
3
15
33
45
48
f | D |
X =
fX
N
= 160/20 = 8
M.D. =
f .D
N
= 56/20 = 2.8
Frequency
16
14
8
Solution :
Size
0-10
10-20
20-30
30-40
40-50
50-60
60-70
i=10
f|D|
211.4
242.4
183.6
5.0
156.8
277.2
238.4
f|D| =
1314.8
M.D. =
1
N
f | D | = 1314.8/100 = 13.148
The reason for taking absolute deviation is to avoid the signs since we want to find out the
amount of differences of observations from median rather than the direction of the
differences.
DISPERSION
The most comprehensive descriptions of dispersion are those that deal with the average
deviation from some measure of central tendency. Two of these measures are :
1 Variance
2 Standard deviation.
Both of these tell us an average distance of any observation in the data set from the mean of
the distribution.
STANDARD DEVIATION
The standard deviation concept was introduced by Karl Pearson in 1823. It is most widely
used measure of studying dispersion.The standard deviation is also known as Root Mean
Square Deviation for the reason that it is the square root of the mean of the squared
deviation from the arithmetic mean. The standard deviation measures the absolute dispersion;
the greater the standard deviation, the greater will be the magnitude of the deviations of the
values from their mean. A small standard deviation means a high degree of uniformity of the
observations as well as homogeneity of the series; a large standard deviation means just the
opposite.
VARIANCE : the variance is the average of the squared distances of the
observations from the mean. Every population has a variance, which is
symbolised by 2 .
2 =
( x ) 2
N
x2
N
- 2
Where
2 = Variance
X = Item or observation
= Population mean
N = Total number of items in the population
CALCULATION OF STANDARD DEVIATION
X
240.12
240.13
240.15
240.12
240.17
240.15
240.17
240.16
240.22
240.21
N=10
= [ (d2/N ) (d/N)2 ]
= [(0.2666/10) (1.6/10)2] = 0.033
2.FOR DISCRETE SERIES
For calculating standard deviation in discrete series, any of the following methods may be
applied :
(a)Actual Mean Method
(b)Assumed Mean Method
(c)Step Deviation Method
a
(c) Step Deviation Method : When this method is used, we take deviations of mid-points
from an Assumed Mean and divide these deviations by the width of Class Interval i.e. i
In such case, = [(fd2)/N ((fd/N)2] x i, Where d = (X A)/i and i = Class Interval
# The annual salaries of a group of employees are given in the following table :
Salaries (In Rs.000) 45
50
55
60
65
70
75
80
No. of persons
3
5
8
7
9
7
4
7
Calculate the standard deviation of the salaries.
f.d2
27
20
8
0
9
28
36
112
f.d2=240
f
5
15
25
35
45
55
65
N=200
Mid-point
m
5
12
30
45
50
37
21
d=(m-35)/10
fd
fd2
-3
-2
-1
0
+1
+2
+3
-15
-24
-30
0
+50
+74
+63
fd=118
fd2=510
No. of
employed
50
60
90
Standard deviation
(In Rs.)
60
70
80
Solution :
X123 = [ N1X1 + N2X2 + N3X3 ]/ [N1 + N2 + N3 ]
= [(50 x 1113) + (60 x 1120) + (90 x 1115)] / [50 + 60 + 90]
= 223200/200 = Rs.1116
d1 = | X1 - X123 | = | 1113 1116 | = 3
d2 = | X2 - X123 | = | 1113 1116 | = 4
d3 = | X3 - X123 | = | 1113 1116 | = 1
123 = [ (N1.12 + N2.22 + N3.32 + N1.d12 + N2d22 + N3d32 ) / (N1 + N2 + N3)]
= [ (50 x 602 + 60 x 702 + 90 x 802 + 50 x 32 + 60 x 42 + 90 x 12) / (50+60+90)]
= [(180000 + 294000 + 576000 + 450 + 960 + 90) / 200 ]
= [ (1051500 / 200 ) ]
= 5257.5
= 72.51
35
108
54
107
52
105
53
105
56
106
58
107
52
104
50
103
51
104
49
101
Solution :
In order to find out which share is more stable, we are to compare coefficient of
variations.
X
35
54
52
53
56
58
52
50
51
49
X = 510
X = 510/10 = 51
= [(x2)/N ] = [350/10] = 5.916
C.V. = [/X ] . 100 = [ 5.916/51 ] . 100 = 11.6
Y = 1050/10 = 105
= [(y2)/N ] = [40/10] = 2
C.V. = [/Y ] . 100 = [ 2/105 ] . 100 = 1.905
y2
9
4
0
0
1
4
1
4
1
16
y2=40
Since, the coefficient of variation is much less in case of shares of Company Y, hence they
are more stable as compared to that of X.