Documente Academic
Documente Profesional
Documente Cultură
SKEWNESS.
MEASURES
AND KURTOSIS
5.I INTRODUCTION
Of greatcOncernio the statistician is the var-iation in the events of nature.
The variatiDn of one measuremenl from another is a persislingcharacter
-6
-
SampleC 15 tt +1 +9 +13
Inspectionof these numbers suggeststhat as variation increases, thedepartureof
the observations from their sample mean increases. We may use
this characteristic to define a measure of variation. One such measureis
lhe mean deviation. The mean deviation is the arithmetic mean of the
absolute deviations from thearithmetic mean. An absolute deviation is a
deviationwithout regard to algebraicsign. To obtain the mean deviation
we simply calculatethe deviations from the arithmetic mean, sumthese.
disregardingalgebraicsign, and divide by N. For sample ,4 above, the
mean deviation is 0. For sample B the mean deviation is (6 + 3 +
0+ 3 +6)/5: +:3.6. + 11 +4+
ForsampleCthemeandeviationis(15
9+13)/5:Y:10.4.
The mean deviation is givenin algebraic language by the fbrmula
t5;rl ItD-:x,ltl
Here X -* is a deviation from the mean and l,f'- tl is a deviation
wilhout regard to algebraic sign. The verticalbarsmeanthal signsareignored.
Hitherto, symbols above and below the summation sign ! have been
used to indicatethe limits of the summation. In the above formula for the
mean deviation these symbols have beenomitted, the summation being
clearly understood to extendover the N members in the sample. In this
and subsequent chapters symbols indicating the limits of summation will,
for convenience, be omitted wherethese are understood clearly from the
context to extend over N sample members. Where anypossibilityofdoubt
could exist, the symbolsaboveand belowthe summation sign will be inserted.
The mean score for the experimental groupis 50.0, and that for the control,
51.5. The investigator might be led to concludefrom inspecting these
meansthat the drug had little or no effect on the performanceof the subjects.
The standard deviations for the two groupsare, respectively. 35.63
and 14.86, the experimentalgroupbeing much more variable in performancethan
the control group. Quiteclearly the treatment appearsto be exening
a substantial influenceon the variationin performance,althoughits
influenceon level of performanceis negligible. In the analysis ofexperimental
data the investigator mustattendto, and if possibleinterpret. differencesin
the standard deviation, or variance,as well as differences in the
arithmeticmean.
SIANDAPOSCORES
68
MaasuRls sKEwNEss, xuetosts
ot vARtAloN AND
the marks assigne
THE VARIANCE
the same as the
5.6 CATCUI.ATING SAMPTE AND THE
FROM UNGROUPED
This result follow
['or purposesof calculation. it is convenient to write the variance and the
SIANDARDDEVIATION
DATA
correspondingobs
standard deviation in a different form. The variance may be $'ritten irs
meanof the origi
-
A deviati,
, >(,Y x)"
X*c.
-
addedis then (X
-
x t. Sinceth
-
>\xt + t! 2xr )
tion of a constant,
NI
trate,by adding a
>.Y:+Nt,'_2N.t' we obtain6, 9, 12
7, and the mean o
:r' -Nt'' 12. The deviation
-6, -3,0, +t, anr
If all measuret
ln this derivation note lhat the summation of X' over N is simply NX J: dard dav
iationis a
tl\o thc \umnl:Ltion ol 2XX i\ 2t:t' .-2Ntr. since >X: NX. The the standard devi
a
standarddeviation is given by
tipliedby the consl
is3x4:12. To
of a sample of mea
c is cX. A devia
squaring,summinl
Thus to calculate the standard deviation using this formula, we sum the
obtarn
squares of the original observations. subtract from this N times the square
of the arithmeticmean, divide by N- l. and then take the square root.
For example,the five observations I, .1.7, 10. and l3 havea mean of 7.
The squirres of these obscrvations are I, | 6, 49, 100,and I 69. The sum of
thesesqrrared observations is 335. The variance is then
variancehascerlainaddi
dard deviation as the unit of measurement. In the above exampleindivid
intoadditivecomponents
ualI is l.ll standard deviations, or standard deviation units, below the
cumstance.The sample
mean. while individual F is 1.58 standard devialionunitsabove the mean.
eslimateof thepopulatiot
Standardscores are frequently used to obtain compalability of obser
"lt: N N uslngstandardscoresil
measurementsto anotl_,,:rr,*rr" directlyanalogousto th
As an illustrationof
-2(X *)'
'n'
AI
BI
ln general,the rth moment aboutthe mean is givenby
X)'
Thesenumbersexpressr
-
t5.el
^,._2(X
A
The term "moment" originatesin mechanics. Considera lever sup-B
portedby a fulcrum. lf a force li is applied to the lever at a distance-r.
from the origin, then.l,r: is called the momentof the force. Further, ifa Set ,4
is a symmetrical
deyiationsraisedtothetl
second force.ll is applied at a distance -rr. the total moment isfix1 *./l.rr.
lf we square th distances x, we obtain the second moment;if we cube
.4 64
them,we obtain the third moment; and so on. When we come to consider
a -64
frequency distributions, the origin is the analog of the fulcrum and the
frequenciesin the various class intervals are analogousto forces operating
For setl, ru,: 0 and g
at variousdistancesfrom the origin. Observe that the first moment about
.387. SetB is a positiv
the mean is 0 and the secondmoment is (N l)/N timestheunbiasedsam-
The commonly used
ple variance. The third momentis used to obtain a measureof skewness,
and is definedas
and the fourth moment.a measure of kurtosis.
t5.lrl
5,I1 MEASURES AND KURTOSIS
This definitionis based
OF SKEWNESS
mean, when raisedto th(
The commonly used measureof skewness makes use of the third moment fourthmoment.
Thecol
and is defined as tive thicknessof the tails
tionmaybeflatteror mol
I5.r0l meancontributemuchm
m2\ m2
The termzzr, is used to a
5.11MIASURISOF S([WN[SS AND (URlOSrS
The rationale fbr this statistic is based on the observation that when a distrib
ution(
or any set ofnumbers) is symmetrical. the sum ofdeviations above
the mean, u,hen raised to the third power. will balance the sum ofdeviations
below the mean. when raised to the third power. Thus for a symmetrical
distribution.,r r: 0. and the A,,: L If the distritrution is asymmetrical. the
sums of deviations above and below the mean, when raised to the third
power. will not balance. Thus for an asymrnetrical distribution rrr,; 0 and
e, + 0. lf the distribution.or set of numbers.is positivelyskewed.g, is
positive;when negatively skewed gr is negative. The quantit!,ar.f rri,is
introducedin order to ensure that gr is comparable for distributions that differ
in variability. Thus g, is independent ofthe scale ofmeasurement. The
skewnessof a set Df measurementsin gmms, meters.pounds,or units of
score on a psychologicaltest can be directly compared usingg,. The reader
(,\ -tlir.
rvillrecallthata standard score is tlehned a\:: Oncreasonfor
using standard scores is 1() achieve comparability of scores fiom one set of
measurementsto anothef. The use rrf ri:r r4 in the definirionof e, is
directly analogous to the use ol's in the definition of a slandard score.
As an illustration of g,,considertq,o sets of numbers. ,4and I
A6 I0 l:t4
R l0 l5