Sunteți pe pagina 1din 33

Measures of Dispersion, Skewness,

and Kurtosis

Descriptive summary measure


Helps characterize data
Variation of observations
Determine degree of dispersion of observations about the
center of the distribution

Absolute dispersion
Same unit as the observations

Relative dispersion
No unit

Measures of dispersion cannot be negative


Smallest possible value is zero

Absolute Dispersion

Simplest and easiest to use


Difference between the highest and the lowest observation

=
=

Disadvantages
Description of data is not
comprehensive
Affected by outliers
Smaller for small samples;
larger for large samples
Cannot be computed when
there is an open-ended class
interval

Advantages
Simple
Easy to compute
Easy to understand

Describe variation of the measurements


Average squared difference of each observation from the
mean
May also be used as a measure of how good the mean is as a
measure of central tendency
Unit of the variance is the squared unit of the observations
People tend to use standard deviation for easier interpretation

Population Variance
Denoted by 2
N elements
Parameter
Cannot be computed using
sample data

Sample Variance
Denoted by s2
n elements
Statistic
Estimate value of the
population variance

Utilizes every observation


Affected by outliers; extreme values make the standard
deviation bloated
Cannot be computed when there are open-ended intervals
Addition or subtraction of a constant c to each observation
would yield the same standard deviation as the original data
set
Multiplication or division of each observation by a constant
would result in a standard deviation multiplied by or divided by
the constant

Relative Dispersion

Compare variability of two or more data sets even if they have


different means or different units of measurement
Ratio of the standard deviation to the mean, expressed as a
percentage (denoted by CV)
Small CV means less variability; large CV means greater
variability
Not to be used when mean is 0 or negative

A sample survey in a certain province showed the number of


underweight children under five years of age in each
barangay: 3 5 6 4 7 8 6 9 10 4 6 7 5 8 9 8 3 4 5 5
Given the frequency distribution table of scores

The number of incorrect answers on


a true-false exam for a random
sample of 20 students was
recorded as follows: 2, 1, 3, 2,
3, 2, 1, 3, 0, 1, 3, 6, 0, 3, 3,
5, 2, 1, 4, and 2.
Given the frequency distribution of
scores of 200 students in an
entrance exam in college.

Scores

Freq.

<CFD

LCB

UCB

59 62

58.5

62.5

63 66

12

14

62.5

66.5

67 70

24

38

66.5

70.5

71 74

46

84

70.5

74.5

75 78

62

146

74.5

78.5

79 82

36

182

78.5

82.5

83 86

16

198

82.5

86.5

87 90

200

86.5

90.5

Relying solely on the mean and standard deviation may be


misleading
Possible for two data sets to have same mean and standard
deviations, yet different shapes
If it is possible to divide the histogram at the center into two
identical halves where each half is a mirror image of the other,
then the distribution is symmetric. Otherwise, it is skewed.

Positively Skewed
Skewed to the right
Values concentrated at the
left
Upper tail stretches out more
than the lower tail

Negatively Skewed
Skewed to the left
Values concentrated to the
right
Lower tail stretches out more
than the upper tail

Single value that indicates the degree and direction of


asymmetry
Denoted by Sk
Sk = 0

Symmetric

Sk > 0

Positively skewed

Sk < 0

Negatively skewed

To determine degree of skewness, use |Sk| (magnitude of Sk)


If |Sk| is far from 0, then it is an indication that the distribution
is seriously skewed
Most commonly used measures
Pearsons first and second coefficients of skewness
Coefficient of skewness based on third moment
Coefficient of skewness based on the quartiles

Relationships among the mean, median, and mode as basis


Signs of the measures depend only on the sign of the numerator
because S is not negative
Problems with Pearsons first coefficient of skewness
associated with problems of using the mode

Based on the definition of quartiles


Around 25 percent fall between Q1 and the median
Around 25 percent fall between the median and Q3

Symmetric distribution distance between Q1 and Md =


distance between Md and Q3
Skewed distribution
Positively skewed Md is closer to Q1
Negatively skewed Md closer to Q3

Term coined by Karl Pearson


Greek word kurtos which means convex
Shape of a hump of a relative frequency distribution compared
to the normal distribution
Three classifications
Mesokurtic
Leptokurtic
Platykurtic

Graph
Displays the following
Location
Spread
Symmetry
Extremes
Outliers

1. Construct a rectangle with one end at the first quartile and the
other end at the third quartile.
2. Put a vertical line at the median, across the interior of the
rectangle.
3. Compute for the inter-quartile range, lower fence, and upper
fence.
4. Locate smallest and largest values within the intervals [FL , Q1]
and [Q3, FU], respectively. Draw a line from these values to
the quartiles.
5. Values falling outside the fences are considered outliers,
denoted by x.

Construct the boxplot for the following data set:

15

21

22

24

10

18

22

23

25

14

20

22

24

28

Definition
Population Variance

2 =

=1(

)2

Sample Variance
2 =

=1(

)2
1

Computational Formula
Population Variance
2 =

=1

(
2

=1

Sample Variance
2 =

=1

( =1 )2
( 1)

Definition
Population Variance

=1 (

Computational Formula
Population Variance

)2

)
1

=1

(
2

=1

Sample Variance

Sample Variance

=1 (

2 =

2 =

=1

( =1 )2
( 1)

Population CV

Sample CV

= 100%

Where
is the population standard deviation
is the population mean

= 100%

Where
s is the sample standard deviation
is the sample mean

First Coefficient of Skewness

Second Coefficient of Skewness


1 =

3( )
2 =

Where = sample mean; Md= sample median; Mo = sample mode;


S = sample standard deviation

3 ( 1 ) 1 + 3 2
4 =
=
3 1
3 1