Sunteți pe pagina 1din 16

Meanings of Statistics

Now-a-days the word statistics is used to give the following three meanings
 Firstly, it is used in the plural sense refer to numerical facts in any field of study. The
word “data” is used in the same sense and is always in the plural.
 Secondly, the word statistics is used in the singular sense. In this sense, it refers to the
science comprising method, presentation, analysis and interpretation of numerical
data.
 Thirdly, the word statistics is used in a technical sense as a plural of statistic. By
statistic we mean a quantity calculated from sample observations.

Characteristics of Statistics
Statistics (as data) have the following characteristics
 Statistics are aggregate of facts.
 Statistics are affected to a great extend by multiplicity of causes.
 Statistics are numerically expressed.
 Statistics are collected in a systematic manner.

Importance of Statistics
There are following importance (application or scope) of statistics in different fields
 Statistics plays an important role in business.
 It is the eyes of the administration of the state.
 Statistical data and method are the needs of various insurance companies.
 It has a pivotal position in almost all the natural and social science

Variable and Constant


Variable
Any quantitative characteristic which varies (changes) from one individual or object to
another is called variable. For examples: height, weight, family size etc.
Constant
Any quantity which does not change but remain fixed is called a constant. For examples:
𝜋 = 3.14159 … , and 𝑒 = 2.71828

Discrete and Continuous variable


Discrete Variable
A variable which takes countable number of values (integers) is called discrete variable. For
examples: family size and no of students etc.
Continuous variable
A variable which takes measureable values in an interval (fractional values) is called
continuous variable. For examples: weight, height and ages of students etc.

Qualitative variable and Quantitative variable


Qualitative variable
A variable which changes only in quality (in capable of numerical measurement) from one
individual to another individual is called a qualitative variable. For example marital status
(single, married, divorced etc.)
Quantitative variable
A variable which changes only in quantity from one individual to another individual is called
a quantitative variable. For examples per capita income of country, age, weight etc.

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 1


Primary and Secondary data
Primary data
The data published or used by an organization which originally collected them are called
Primary data. For example the population census reports are primary data.

Secondary data
The data published or used by an organization other than the one who originally collected
them are called secondary data. For example the data in the Economic survey of Pakistan are
secondary data.

Sources of primary and secondary data


There are following sources of primary and secondary data
Sources of primary data
 Direct personal observation
 Registration
 Estimation through local correspondence
 Investigation through enumerators
 Information through mailed questionnaire
Sources of secondary data
 Official sources
 Semi official sources
 Private sources
 Publications of research organizations

Population and Sample


Population
A set of individuals or objects having some common characteristics of interest is called
population.
Sample
A sub-set or part of population selected for study is called population.

Descriptive and Inferential Statistics


Descriptive Statistics
A branch of statistics in which we analyze and interpret the results of collected data is called
descriptive statistics
Inferential Statistics
A branch of statistics in which we analyze and interpret the results of collected and arranged
data and also draw conclusion about the population from the interpreted results of collected
data is called inferential statistics.
Presentation
Presentation is the manner or style in which some thing is expressed.
Presentation of data
The raw data arranged and reduced into a form, which is easy to understand, analyze
and interpret is called presentation of data.
Methods to present data.
There are following methods to present the data
 Classification

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 2


 Tabulation
 Diagrams
 Graphs
Classification
The process of arranging the data into classes or categories according to some
common characteristics present in the data is called classification.
Characteristics of good classification
 Classification should be unambiguous
 Classification should be stable
 Classification should be rigid
Basis of classification
There are four basis of classification
 Qualitative classification
 Quantitative classification
 Geographical classification
 Chronological or temporal classification
Qualitative classification
When data are classified by attributes such as sex, religion, marital status etc is called
qualitative classification.
Quantitative classification
When data are classified by quantitative characteristics such as height, weight, income
etc. is called quantitative classification.
Geographical classification
When data are classified by geographical regions or locations such as the population
of a country may be classified by provinces, Divisions, Districts or Towns etc is called
geographical classification.
Chronological or Temporal classification
When data are arranged by time of occurrence such as years, months, weeks, days,
hours etc. is called chronological or temporal classification. Temporal arrangement of values
at equal intervals is also called time series.
Types of classification
The data may be classified or presented by one, two or more characteristics at a time.
 One-way classification.
 Two-way classification.
 Three-way classification
 Many-way classification
One-way classification
When the data are classified according to one characteristic, it is called one-way
classification.
Two-way classification
When the data are classified according to two characteristics at a time, classification is
said to be two-way classification.
Three-way classification
When the data are classified according to three characteristics at a time, classification
is said to be three-way classification.
Many-way classification
When data are classified according to many characteristics at a time, classification is
said to be many-way classification.

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 3


Table
A table is a systematic arrangement of data into vertical columns and horizontal rows.
Tabulation
The process of arranging the data into rows and columns is called tabulation
Parts of Table
A statistical table has following parts
 Title
 Column captions and Box-head
 Row captions and Stub
 Body of the table and Arrangement of data
 Source Note
 Prefatory Notes and Footnotes
Ungrouped data(raw data)
The data which is not presented in the form of a frequency distribution is called ungrouped
data
Array
Data arranged in ascending or descending order of magnitude is called an array.
Frequency distribution (grouped data)
Arrangement of data or objects into different classes with their frequencies is called
frequency distribution.
Continuous frequency distribution (continuous series)
If data of continuous variable is arranged into different classes with their frequencies is called
continuous frequency distribution.
Discrete frequency distribution (discrete series)
If data of discrete variable is arranged into different classes with their frequencies is called
discrete frequency distribution.
Frequency
The number of values which falls in a class called frequency of that class denoted by f.
Steps in the construction of frequency distribution
Following steps are taken into account while making a frequency distribution of continuous
series.
 Calculate range of the date, where
Range = Maximum value in the data – Minimum value in the data.
 Decide about the number of classes by the formula
Number of classes K= 1+3.3 log(n) ∴ n is total number of values
 Decide about width of the class. it is usually abbreviated by h and is obtained by the
following relation
𝑅𝑎𝑛𝑔𝑒
ℎ=
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
 Decide what should be the lower class limits of (or the lower class boundary) of the
lowest class
 Find upper class boundary by adding the class interval size to the lower class
boundary and then determine the upper class limits.
 Distribute the values of the data (raw data) into classes (frequency of the class).
Class limits
The smallest and largest values in any given class are called class limits.

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 4


Class Boundaries
The class boundaries are obtained by increasing the upper class limits and decreasing the
lower class limits by the same amount so that there are no gaps between consecutive classes.
Size of class interval
The size of the class interval (also called the class width or class length) is the difference
between the upper class boundary and lower class boundary. If all the class intervals of a
frequency distribution are equal size, the common width is denoted by h.
The class mark or midpoint
The class mark or the midpoint is that value which divides a class into two equal parts. For
(110+119)
example if we have class limits 110-119 then midpoint is = 114.5
2
Write the class boundaries and class width of
 2.5 - 3.4
 -3 - (+3)
Class boundaries are
 2-3.9
 -3.5- 3.5
Class width or class interval
 1.9
 7
Relative frequency
The relative frequency for a particular class is equal to the class frequency divided by
the total frequency.
𝑓
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
∑𝑓
Given the frequencies of 5 classes 2, 6,7,3,4. Find relative frequencies?
𝑓
We know that the relative frequency of the class is ∑ 𝑓.
Relative
f
frequencies
2 2/22=0.909
6 6/22=0.27272
7 7/22=0.31818
3 3/22=0.13636
4 4/22=0.18181
∑ 𝑓 = 22
Cumulative frequency
The cumulative frequency is the sum of the frequency for several consecutive classes of a
frequency distribution.
Graphs
A graph consists of curves or straight lines. They bring to light the salient features of the data
at a glance and render comparison of two or more statistical series easy.
Graphs of frequency distributions
There are following graphs of frequency distributions
 Histogram
 Frequency polygon
 Frequency curves
 Cumulative frequency polygon or ogive

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 5


Histogram and hitorigram
Graphical representation of frequency distribution in the form of adjacent bars, whose heights
are proportional to the frequencies taken along y-axis against class boundaries plotted along
x-axis, is called histogram.
Graph of a time series is called historigram. In it the variable time is taken along x-axis and
observed values along y-axis.
Ogive(cumulative frequency polygon)
A graph showing the cumulative frequencies plotted against the upper class boundaries is
called a cumulative frequency polygon or an ogive.
Steps in construction of ogive
Following steps are required to construct the ogive or cumulative frequency polygon
 Compute the cumulative frequencies
 Mark-off the class intervals along x-axis
 Plot cumulative frequencies against the upper class boundaries of the class intervals
 Join the points with a smooth curve.
Charts or Diagrams
Charts or diagrams give visual representations of magnitudes, groupings, trends and patterns
in the data. Diagrams also show comparisons between two or more series of data.
Types of Charts or Diagrams
There are following types of charts
 Simple Bar Chart
 Multiple Bar Chart
 Component Bar Chart
 Percentage Component Bar Chart
 Pie Chart
Multiple Bar Diagram
This diagram is an extension of the simple bar diagram i.e when simple bar diagrams are
placed side by side to represent two or more set of inter related data in one diagram, it is
called multiple bar diagram.

Average
A single value which represents all the values of the data set is called an Average.
Why Average is called Measure of central tendency?
The Average value lies center of the data that’s why it is called measure of central tendency.
Qualities of a good Average
The good Average should possess the following qualities.
 It should be clearly defined by a mathematical formula.
 It should be based on all the values of data.
 It should be simple to understand and easy to calculate. It should be not be
affected by extreme values.
 It should have sampling stability
Types of an Average.
There are following types of an average
 The Arithmetic mean or mean (𝑥̅ )
 The Geometric mean (G.M)
 The Harmonic mean (H.M)
 The Median(𝑥 ̃)
 T h e M o d e (𝑥̂)

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 6


The Arithmetic mean.
The Arithmetic mean or simply the mean is defined as a value obtained by dividing the sum of
values by their number denoted by 𝑥̅ .Let 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are n values then 𝑥̅ is defined
as
𝑥1 + 𝑥2 + 𝑥3 + ⋯ 𝑥𝑛 ∑ 𝑥
𝑋̅ = =
𝑛 𝑛
Similarly in group data where each x has corresponding class frequency f then
Arithmetic mean is defined as

𝑓1 𝑥1 + 𝑓2 𝑥2 + ⋯ 𝑓𝑘 𝑥𝑘 ∑ 𝑓𝑥
𝑋̅ = =
𝑓1 + 𝑓2 + ⋯ 𝑓𝑘 ∑𝑓
Methods to calculate the Arithmetic Mean
There are three methods are used to calculate the Arithmetic mean
 Direct Method
 S h o r t c ut M e t h od
 C o d i n g M e t h od

Direct Method
Ungroup data. Group data.
∑𝑋 ∑ 𝑓𝑥
̅=
𝑋 ̅=
𝑋
𝑛 ∑𝑓

Short cut Method


Ungroup data Group data
∑𝐷 ∑ 𝑓𝐷
̅=𝐴+
𝑋 ̅ =𝐴+
𝑋
𝑛 ∑𝑓
Where 𝐷 = 𝑋 − 𝐴 and 𝐴 is an Arbitrary Constant
Coding Method
Ungroup data Group data

∑𝑈 ∑ 𝑓𝑈
̅=𝐴+
𝑋 ×ℎ ̅ =𝐴+
𝑋 ×ℎ
𝑛 ∑𝑓
𝑋−𝐴
Where 𝑈 = ℎ . 𝐴 is an Arbitrary Constant and ℎ is equal class interval in group data and
equal difference between values in ungroup data.
If you have ∑ 𝒇𝒙 = 𝟑𝟎𝟖 𝒂𝒏𝒅 ∑ 𝒇 = 𝟑𝟎 𝒄𝒂𝒏 𝒘𝒆 𝒇𝒊𝒏𝒅 𝒎𝒆𝒂𝒏 𝒂𝒏𝒅 𝒘𝒉𝒂𝒕 𝒊𝒔?
Yes we can find the mean and its value is obtained as
∑ 𝑓𝑥 308
𝑋̅ = = = 10.26
∑𝑓 30
The sum of deviations of 15 values from 20 is 45 find Arithmetic mean?
Here
𝑛 = 15, 𝐴 = 20 𝑎𝑛𝑑 ∑ 𝐷 = 45
Then by short cut method
∑𝐷 45
̅ =𝐴+
𝑋 = 20 +
𝑛 15
= 20 + 3 = 23
Properties of Arithmetic mean
There are following properties of Arithmetic mean.
 Sum of the deviations of values from their mean is always equal to zero.

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 7


Symbolically it is described as
∑(𝑋 − 𝑋̅) = 0
 Sum of the squared deviations of values from a Constant( A) is minimum if and only if 𝐴 =
𝑋̅
Symbolically
∑(𝑋 − 𝑋̅)2 < ∑(𝑋 − 𝐴)2
 If 𝑛1 values have 𝑋̅1 ( mean) and 𝑛2 values have 𝑋̅2 (mean) and so on 𝑛𝑘 values have 𝑋̅𝑘
(mean) then mean of all the values is
𝑛1 𝑋̅1 + 𝑛2 𝑋̅2 + ⋯ + 𝑛𝑘 𝑋̅𝑘 ∑ 𝑛𝑋̅
𝑋̅𝐶 = =
𝑛1 + 𝑛2 + ⋯ 𝑛𝑘 ∑𝑛
 Arithmetic mean is affected by change of origin and change of scale.

If for any distribution ∑(𝒙 − 𝟏𝟎) = −𝟏𝟒, ∑(𝒙 − 𝟐𝟎) = 𝟏𝟒 𝒂𝒏𝒅 ∑(𝒙 − 𝟏𝟓) = 𝟎 then
what is mean?
By the property of Arithmetic mean we know that “Sum of the deviations of values from their mean is
always equal to zero”
Therefore
∑(𝑥 − 15) = 0 , So 𝑋̅ = 15
We have ∑(𝒙 − 𝟏𝟓)𝟐 = 𝟖𝟔𝟖, ∑(𝒙 − 𝟏𝟔)𝟐 = 𝟕𝟐𝟎 𝒂𝒏𝒅 ∑(𝒙 − 𝟐𝟎)𝟐 = 𝟗𝟖𝟐 what is the
value of mean an why?
By the property of Arithmetic mean we know that “Sum of the squared deviations of values from a
Constant (A) is minimum if and only if 𝐴 = 𝑋̅”
Therefore
∑(𝑥 − 16)2 = 720 Is minimum therefore 𝑋̅ = 16
Weighted Arithmetic mean.
In simple arithmetic mean equal importance or weight is given to all values of the given data but
when in a data set, values are not of equal importance we assign them weight (𝑤1 , 𝑤2 , …,𝑤𝑛 )
according to their relative importance the computed mean is called weighted mean denoted by 𝑋̅𝑤 .
if 𝑥1 , 𝑥2 , … 𝑥𝑛 are n values with corresponding weights 𝑤1 , 𝑤2 , …,𝑤𝑛 then 𝑋̅𝑤 is defined as
∑ 𝑤𝑥
𝑋̅𝑤 =
∑𝑤
Geometric mean
The geometric mean is defined as a value obtained by the nth root of the product of n positive values
denoted by G.M. Let 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are non negative ( 𝑋𝑖 > 0) values then
geometric mean is defined as G.M= 𝑛√𝑋1 . 𝑋2 . … . 𝑋𝑛 . Similarl y in group data
∑𝑓
where each x has corresponding class frequencies then G.M= √𝑋1 𝑓1 . 𝑋2 𝑓2 . … 𝑋𝑘 𝑓𝑘 . After
taking log of each value then geometric mean is also obtained as
∑ 𝑙𝑜𝑔𝑋
𝐺. 𝑀 = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔 ( )
𝑛
for grouped data
∑ 𝑓𝑙𝑜𝑔𝑋
𝐺. 𝑀 = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔 ( )
∑𝑓
The G.M of three positive numbers is 4 By including the fourth number in the series G.M
becomes 2 what is the fourth number?
Let a, b, c,
3
d are four positive numbers and G.M of three numbers is
𝐺. 𝑀 = √𝑎𝑏𝑐1
= 4 = (𝑎𝑏𝑐)3

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 8


= 64 = 𝑎𝑏𝑐 ………………..(A) {Product of three positive numbers}
Now G.M 4
of four numbers
𝐺. 𝑀 = √𝑎𝑏𝑐𝑑1
= 2 = (𝑎𝑏𝑐𝑑)4
= 16 = 𝑎𝑏𝑐𝑑…………………(B)
Now putting (A) in (B)
16
16 = 64𝑑 = 𝑑 = = 0.25
th
64
Therefore 4 number is 0.25
Harmonic Mean
The harmonic mean is defined as a value obtained by the reciprocal of mean of the
reciprocals of values denoted by H.M. Let 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are n values then harmonic mean
𝑛
is defined as𝐻. 𝑀 = 1 and for group data where each x has corresponding class frequencies

𝑥
then harmonic mean is defined as
∑𝑓
𝐻. 𝑀 =
∑ 𝑓/𝑥
The median
A Single value which divides an array set of data or distribution into two equal parts is
called Median denoted by 𝑥̃. For ungrouped data and grouped data (discrete series) it is
(𝑛+1)
obtained as 𝑥̃ = 2 th value. Similarly for grouped data (continuous series)
ℎ 𝑛
𝑖𝑡 𝑖𝑠 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑎𝑠 𝑥̃ = 𝑙 + ( − 𝑐)
𝑓 2
Where
𝑙 = Lower class boundary of the median class (median class is that class, corresponding to
𝑛
the cumulative frequency in which lies)
2
ℎ = Class interval size of the median class.
𝑓 = Frequency of the median class.
𝑛 = number of values or total frequency
𝑐 = Cumulative frequency of the class preceding the median class.
The Mode
The mode is defined as a value which repeats most frequently in a data set denoted by 𝑥̂(x-
cap). For group data (discrete series) it is that value corresponding to the maximum
frequency. For uni-modal group data (continuous series) having equal class interval it is
obtained as
𝑓𝑚 − 𝑓1
𝑋̂ = 𝑙 + ×ℎ
(𝑓𝑚 − 𝑓1 ) + (𝑓𝑚 − 𝑓2 )
𝑙 = Lower class boundary of the modal class (the class with the highest frequency)
𝑓𝑚 = Frequency of the modal class.
𝑓1 = Frequency of the class preceding to the modal class
𝑓2 = Frequency of the class following the modal class.
ℎ = class interval size of the modal class.
Define quartiles
3 values which divide an array set of data or distribution into four equal parts called quartiles
which are first, second and third quartiles and denoted by 𝑄1 , 𝑄2 and 𝑄3 respectively. For
ungroup data and group data (discrete series) these values can be obtained as
(𝑛 + 1)
𝑄1 = th value {𝐿𝑜𝑤𝑒𝑟 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒}
4

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 9


2(𝑛 + 1) (n + 1)
𝑄2 = th value = th value {𝑇ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛}
4 2
3(𝑛 + 1)
𝑄3 = th value {𝑈𝑝𝑝𝑒𝑟 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒}
4
For group data (continuous series) these values can be obtained as
ℎ 𝑛
𝑄1 = 𝑙 + ( − 𝑐)
𝑓 4
ℎ 2𝑛
𝑄2 = 𝑙 + ( − 𝑐)
𝑓 4
ℎ 3𝑛
𝑄3 = 𝑙 + ( − 𝑐)
𝑓 4
Define deciles
9 values which divide an array set of data or distribution into 10 equal parts called deciles which are the
first, second….ninth deciles and denoted by 𝐷1 , 𝐷2 , … , 𝐷9 respectively. For ungroup data and
group data (discrete series) these values obtained as
(𝑛 + 1) 2(𝑛 + 1) 9(𝑛 + 1)
𝐷1 = th value , 𝐷2 = th value , … , 𝐷9 = th value
10 10 10
For grouped data (continuous series) these values obtained as
ℎ 𝑛 ℎ 2𝑛 ℎ 9𝑛
𝐷1 = 𝑙 + ( − 𝑐) , 𝐷2 = 𝑙 + ( − 𝑐) , … , 𝐷9 = 𝑙 + ( − 𝑐)
𝑓 10 𝑓 10 𝑓 10
Define percentiles
99 values which divide an array set of data or distribution into hundred equal parts called percentiles
which are first, second… ninety-ninth percentiles and denoted by 𝑃1 , 𝑃2 , … , 𝑃99 respectively. For
ungrouped and group data (discrete series) these values obtained as

(𝑛 + 1) 2(𝑛 + 1) 99(𝑛 + 1)
𝑃1 = th value , 𝑃2 = th value , … , 𝑃99 = th value
100 100 100
Define Quantiles.
Collectively, the quartiles, deciles, percentiles and other values obtained by equal sub-
division of the data are called quantiles.
Empirical relation between mean median and mode
In a symmetrical distribution mean, median and mode coincide. In a moderately skewed
(asymmetrical) median lies between mean and mode and it is twice as far from the mode as
from the mean the following approximate relation holds between these three averages
which is called empirical relation.
Mode=3Median-2Mean

In a moderately skewed distribution Median=42.5 and Mode is 40 find Mean?


We know the empirical relation between mean, median and mode is
𝑀𝑜𝑑𝑒 = 3𝑀𝑒𝑑𝑖𝑎𝑛 − 2𝑀𝑒𝑎𝑛
Therefore
40 = 3(42.5) − 2𝑀𝑒𝑎𝑛
2𝑀𝑒𝑎𝑛 = 3(42.5) − 40
2𝑀𝑒𝑎𝑛 = 127.5 − 40
87.5
𝑀𝑒𝑎𝑛 = = 43.75
2
What would be the shape and name of the frequency distribution if
𝑴𝒆𝒂𝒏 > 𝑀𝑒𝑑𝑖𝑎𝑛 > 𝑀𝑜𝑑𝑒?

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 10


We know that distribution is positively skewed when 𝑀𝑒𝑎𝑛 > 𝑀𝑒𝑑𝑖𝑎𝑛 > 𝑀𝑜𝑑𝑒
Advantages of Arithmetic Mean
 It is rigidly defined by a mathematical formula.
 It is easy to calculate and easy to understand.
 It depends upon all the values of the data.
 It is capable of further algebraic manipulation.
Disadvantage of Arithmetic Mean
 It is affected of extreme values.
 It is not an appropriate Average for highly skewed distribution.
 It is not a good average in open-end class frequency distribution.
Advantages of Geometric Mean
 It is based on all values of the data.
 It is less affected of extremely large values.
 It gives equal weights to all observation.
 It is a good average for averaging rates and ratios.
Disadvantages of Geometric Mean
 It is not easy to calculate and understand.
 It is not calculated when any value of the data is zero.
 It cannot be calculated in the case of negative values.
Advantages of Harmonic Mean
 It is rigidly defined by a mathematically formula.
 It is based on all the observations.
 It is an appropriate average for averaging time rate (speed per hour) and ratios (units
purchase per rupees).
Disadvantages of Harmonic Mean
 It is neither simple to understand nor easy to calculate.
 It cannot be calculated if any value in a data set is zero.
 It is much affected by extremely small values.
Advantages of Median
 It is easy to calculate and understand.
 It is not affected by extreme values and can be calculated by in open-end
classes.
 It is an appropriate average in case of highly skewed distribution.
Disadvantages of Median
 It is not rigidly defined.
 It is not based on all values.
 It is not capable for further algebraic manipulation.
Advantages of Mode
 It can be computed for both quantitative and qualitative data.
 It is not affected by extreme values.
Disadvantages of Mode
 Its values are not always unique.
 It is not based on all the observations.
 It is not capable for further statistical treatment.

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 11


Measures of Dispersion, Skewness and Kurtosis

Define dispersion and measures of dispersion.


By dispersion, we mean the extent to which the values are spread out from the
average value and the measures used for the study of dispersion or variation
are called measures of variation or measures of dispersion.

Types of measures of dispersion .


There are two t ypes of dispersions
Absolute dispersion.
The actual dispersion or variation which is measured in same units as the units of the
original data is called absolute measure of dispersion it is not used to compare two or more
sets of data of different units.
Relative dispersion
The dispersion which is expressed in the form of the ratio, coefficient or percentage and is free from
units of measurements is defined as the ratio of absolute dispersion and average calculated
from the same data.
Absolute Disperion
Relative Dispersion =
Average
Absolute measures of dispersion
The main absolute measures of dispersions are
 The Range
 The Quartile Deviation
 The Mean deviation
 The Variance & Standard deviation.
Relative measures of dispersion
Corresponding to measures of absolute dispersion, we have following
measures of dispersion.
 Coefficient of Range ( coefficient of dispersion)
 Coefficient of Quartile deviation
 Coefficient of Mean deviation
 Coefficient of Standard deviation or Coefficient of variation
Define Range
The range is defined as the difference between the maximum and minimum values of the
data set and demoted by R
R= 𝑋𝑚 − 𝑋0
Where 𝑋𝑚 is the maximum value and 𝑋0 is the minimum value
In grouped data the range is calculated as
 The difference between the maximum midpoint and minimum midpoint.
 The difference between the upper class-boundary of the heights class and the
lower class-boundary of the lower class

Define Quartile deviation


The quartile deviation or semi inter-quartile range is defined as the value obtained by the
half of the difference between upper quartile and lower quartile denoted by Q.D.
𝑄3 − 𝑄1
𝑄. 𝐷 =
2

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 12


Define the Mean Deviation
The mean deviation or average deviation is defined as the value obtained by the mean of the
absolute deviations of values of the data from their average (mean, median) denoted by M.D.
Let 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are n values and 𝑥̅ is their mean then mean deviation from mean is
∑|𝑥 − 𝑥̅ |
𝑀. 𝐷𝑥̅ =
𝑛
Similarly if 𝑥̃ is median of the data then mean deviation from median is
∑|𝑥 − 𝑥̃|
𝑀. 𝐷𝑥̅ =
𝑛
For group data it is obtained as

∑ 𝑓|𝑥 − 𝑥̅ | ∑ 𝑓|𝑥 − 𝑥̃|


𝑀. 𝐷𝑥̅ = and for median it is obtained as 𝑀. 𝐷𝑥̃̅ =
∑𝑓 ∑𝑓
Define the Variance
The Variance is defined as the value obtained by the squared deviations of the values of the
data from their mean denoted by𝑆 2 . Let𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are n values then variance is defined
as
2
2
∑ 𝑥2 ∑𝑥
𝑆 = −( )
𝑛 𝑛
For group data where each x has corresponding the class frequencies the variance is defined
as
2
2
∑ 𝑓𝑥 2 ∑ 𝑓𝑥
𝑆 = −( )
∑𝑓 ∑𝑓
The Standard deviation
The positive square root of the variance is called standard deviation and is
defined as
2 2
∑ 𝑥2 ∑𝑥 ∑ 𝑓𝑥 2 ∑ 𝑓𝑥
𝑆=√ − ( ) and for grouped data 𝑆 = √ −( )
𝑛 𝑛 ∑𝑓 ∑𝑓
If 𝑿 = 𝟏, 𝟑, 𝟗 find Variance?
X 𝑋2
1 1
3 9
9 81
∑ 𝑥 = 13 ∑ 𝑥 2 = 91
2
2
∑ 𝑥2 ∑𝑥 91 13 2
𝑆 = −( ) = − ( ) = 11.55
𝑛 𝑛 3 3
For a series of 12 observations, the sum of square of deviation from mean
is 48 find its SD?
We know that sum of square of deviations ∑(𝑋 − 𝑋̅)2 = 48 and 𝑛 = 12 therfore
∑(𝑋 − 𝑋̅)2 48
𝑆=√ = √ = √4 = 2
𝑛 12
Properties of the Variance and Standard deviation.
There are following properties of the variance and standard deviation are.
 Variance or Standard deviation of a constant is always equal to zero.
𝑆(𝑎) = 0 and 𝑉𝑎𝑟(𝑎) = 0

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 13


 The variance and standard deviation are not affected by change of
origin.( they are no change by adding or subtracting any constant in the
variable)
𝑆(𝑋 ± 𝑎) = 𝑆. 𝐷(𝑥) and for variance 𝑉𝑎𝑟(𝑋 ± 𝑎) = 𝑉𝑎𝑟(𝑥)
 The variance and standard deviation are affected by change of scale.
𝑥 1 𝑥 1
𝑆 ( ) = 𝑠(𝑥) similarly Var ( ) = 2 𝑉𝑎𝑟(𝑥)
𝑎 𝑎 𝑎 𝑎
 The variance of the sum or difference of two independent random
variables is the sum of th eir respective variances.
𝑉𝑎𝑟(𝑥 ± 𝑦) = 𝑉𝑎𝑟(𝑥) + 𝑉𝑎𝑟(𝑦)

What will be the varianc if 𝑽𝒂𝒓(𝒙) = 𝟐𝟓 𝒇𝒐𝒓 1) 𝟐𝒙 2) 𝟑𝒙 − 𝟓


We know that
𝑉𝑎𝑟(𝑎𝑥) = 𝑎2 𝑉𝑎𝑟(𝑥) {Change of Scale}
Therefore
𝑉𝑎𝑟(2𝑥) = 4(25) = 100
Similarl y
𝑉𝑎𝑟(𝑎𝑥 − 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑥) {Change of origin and change of scale}
𝑉𝑎𝑟(3𝑥 − 5) = 9(25) = 225
Coefficient of Range

𝑋𝑚 − 𝑋0
𝑋𝑚 + 𝑋0
Coefficient of Quartile deviation
𝑄3 − 𝑄1
𝑄3 + 𝑄1
Coefficient of mean deviation
𝑀. 𝐷𝑥̅ 𝑀. 𝐷𝑋̃
By mean it is obtained as and By median it is obtained as
𝑥̅ 𝑥̃

Coefficient of variation or coefficient of standard deviation


Coefficient of variation or C.V is the type of relative measure of dispersion and is defined
𝑆
asthe percentage ratio between the standard deviation and the arithmetic mean 𝐶. 𝑉 = 𝑋̅ ×
100
Uses of C.V
There are following uses of C.V
 To compare the dispersion or variation of different sets of data which differ in
their means or units.
 It is used as a criterion for consistent performance. The smaller the C.V the
more consistent is the performance.

Moments
Moments are defined as the mean of the different power of the deviations of the observations
taken from their mean. For sample data, the first four moments about the mean 𝑋̅ are defined
as
∑(𝑥 − 𝑥̅ ) ∑(𝑥 − 𝑥̅ )2
𝑚1 = = 0(𝑎𝑙𝑤𝑎𝑦𝑠) 𝑚2 = = (𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒)
𝑛 𝑛

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 14


∑(𝑥 − 𝑥̅ )3 ∑(𝑥 − 𝑥̅ )4
𝑚3 = 𝑚4 =
𝑛 𝑛
For a frequency distribution the first four moments about the mean 𝑋̅ are defined as
∑ 𝑓(𝑥 − 𝑥̅ ) ∑ 𝑓(𝑥 − 𝑥̅ )2
𝑚1 = = 0(𝑎𝑙𝑤𝑎𝑦𝑠) 𝑚2 = = (𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒)
∑𝑓 ∑𝑓
∑ 𝑓(𝑥 − 𝑥̅ )3 ∑ 𝑓(𝑥 − 𝑥̅ )4
𝑚3 = 𝑚4 =
∑𝑓 ∑𝑓
Note: The above moments are also called the central moments or the mean moments
Moment Ratios
The ratios in which both the numerators are denominators are called moment ratios.
Important moment ratios are 𝑏1 𝑎𝑛𝑑 𝑏2
𝑚32 𝑚4
𝑏1 = 3 (𝑚𝑒𝑎𝑠𝑢𝑟𝑒 𝑜𝑓 𝑠𝑘𝑒𝑤𝑛𝑒𝑠𝑠) 𝑎𝑛𝑑 𝑏2 = 2 (𝑚𝑒𝑎𝑠𝑢𝑟𝑒 𝑜𝑓 𝑘𝑢𝑟𝑡𝑜𝑠𝑖𝑠)
𝑚2 𝑚2
Raw Moments
The moments about the mean 𝑋̅ are calculated by taking deviations of the observations from
the mean 𝑋̅. Sometimes the mean 𝑋̅ is not a whole number or the mean contains many
decimals places the deviations calculated from this mean require a lot of time and labour. We
can overcome this problem by calculating moments called raw moments. Raw moments can
be obtained as
 Moments about an arbitrary value or Arbitrary origin
 Moments about Zero
Moments about an arbitrary value or arbitrary origin
The rth sample moment about any arbitrary origin A (or Provisional mean) denoted
/ / ∑ 𝐷𝑟
by 𝑚𝑟 is defined as, if 𝐷 = 𝑥 − 𝐴 then for ungrouped data 𝑚𝑟 = 𝑛
So,
/ ∑𝐷 / ∑ 𝐷2 / ∑ 𝐷3 / ∑ 𝐷4
𝑚1 = , 𝑚2 = , 𝑚3 = , 𝑚4 =
𝑛 𝑛 𝑛 𝑛
For grouped data
/ ∑ 𝑓𝐷 / ∑ 𝑓𝐷2 / ∑ 𝑓𝐷3 / ∑ 𝑓𝐷4
𝑚1 = , 𝑚2 = , 𝑚3 = , 𝑚4 =
∑𝑓 ∑𝑓 ∑𝑓 ∑𝑓
Moments about an arbitrary value with equal class interval
𝑋−𝐴
If 𝑈 = ℎ , where A is an arbitrary origin and h is the class interval. The raw
moments based on the variable u are called the raw moments in the class interval units. These
are defined as
/ ∑ 𝑓𝑢 / ∑ 𝑓𝑢2 / ∑ 𝑓𝑢3 / ∑ 𝑓𝑢4
𝑚1 = × ℎ , 𝑚2 = × ℎ 2 , 𝑚3 = × ℎ 3 , 𝑚4 = × ℎ4
∑𝑓 ∑𝑓 ∑𝑓 ∑𝑓
Moments about Zero
When the arbitrary origin A is equal to zero, we have 𝑋 − 𝐴 = 𝑋. thus the raw
moments about 𝑋 = 0 are given by
/ ∑ 𝑓𝑥 / ∑ 𝑓𝑥 2 / ∑ 𝑓𝑥 3 / ∑ 𝑓𝑥 4
𝑚1 = , 𝑚2 = , 𝑚3 = , 𝑚4 =
∑𝑓 ∑𝑓 ∑𝑓 ∑𝑓
Relations between Moments
The following relation exists between the moments about the mean and the moments
about an arbitrary origin A or moments about zero
𝑚1 = 𝑚1 / − 𝑚1 / = 0
𝑚2 = 𝑚2 / − (𝑚1 / )2

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 15


𝑚3 = 𝑚3 / − 3𝑚2 / 𝑚1 / + 2(𝑚1 / )3
2 4
𝑚4 = 𝑚4 / − 4𝑚3 / 𝑚1 / + 6𝑚2 / 𝑚1 / − 3𝑚1 /

Role of Moments
Moments play vital role to describe the distribution. We can obtain the following information
from the moements
 The center value of the distribution
 The measure of dispersion (variance )
 The measure of symmetry of the distribution
 The measure of peakedness or flatness of the distribution
The first two moments about 𝑿 = 𝟒 are 1 and 16 find C.V?
Here
𝐴 = 4, 𝑚1 / = 1 𝑎𝑛𝑑 𝑚2 / = 16
Then 𝑋̅ = 𝐴 + 𝑚1 / = 4 + 1 = 5
2
and 𝑚2 = 𝑆 2 = 𝑚2 / − 𝑚1 / = 16 − (1)2 = 15
Now
𝑆 3.873
𝐶. 𝑉 = × 100 = × 100 = 77.46%
𝑋̅ 5
Symmetry
The property of a unimodal distribution (distribution having one mode) that the values
equidistant from maximum height, have equal heights/frequencies is called symmetry.
Skewness
The lack of symmetry in a distribution around some centre value (mean, media or
mode) is called skewness.
There are following methods to measure skewness
1. Karl pearson coefficient of skewness
Mean − Mode
𝑆𝑘 =
Standard deviation

Sometimes mode is difficult to find there for by empirical relation exists between mean,
median and mode, coefficient of skewness can be changed as
3(Mean − Median)
𝑆𝑘 =
Standard deviation

2. Bowley’s coeffiecnt of skewness


Bowley introduced a new method to measure coefficient of skewness by quartiles
Q 3 + Q1 − 2Q 2
𝑆𝑘 =
Q 3 − Q1
Kurtos
The degree of peakedness of a distribution is called kurtosis usually taken relative to a
Normal distribution.

Prepared by Noman Rasheed (M.Phil Statistics, M.Ed) Page 16

S-ar putea să vă placă și