Induction Descriptive Stat Students Notes

INSTITUTE OF ACCOUNTANCY ARUSHA
AND
COVENTRY UNIVERSITY
MASTER OF SCIENCE – FINANCE & INVESTMENT
INDUCTION COURSE
ON
STATISTICS
DESCRIPTIVE STATISTICS
UNIT ONE
STUDENTS’ NOTES
SABDAT N. SABDAT
IAA - ARUSHA - TANZANIA

Dec.02 - 2014
1
DESCRIPTIVE STATISTICS
1.1 Introduction
Statistical methods provide a powerful set of tools for analyzing data and drawing conclusions
from them. Whether we analyzing asset returns, earnings growth rates, commodity prices, or any
other financial data, statistical tools help us quantify and communicate the data’s important
features.
This unit presents the descriptive statistics which are mainly methods of describing various types
of data. In this unit, therefore, we are going to explore the following:
 Fundamental concepts which span the nature of statistics, populations and samples, and
measurement scales.
 Presentation methods such as graphical forms and frequency distributions.
 Summations and product operations.
 Measures of central tendency
 Measures of dispersion , and
 Measures of shape.
1.2 Fundamental Concepts
Before starting our study of statistics in this unit, it will be quite in order to familiarize ourselves
with the overall picture of the field. Hence, we give a brief description of the scope of statistics
and its branches. Secondly, we explain the concepts of population and sample. Lastly, we discuss
the various types of data, understanding of which is critical for their measurement and choice of
appropriate statistical methods for their analysis.
2
1.2.1 The Nature of Statistics
The term statistics can have two broad meanings, one referring to data and the other to method.
A company’s average earnings per share (EPS) for the last 20 quarters, or its average returns for
the last ten years are statistics. We may also analyze historical EPS to forecast future EPS, or use
the company’s past returns to infer its risk. The totality of methods we employ to collect and
analyze data is also called statistics.
Otherwise, we may formally define Statistics as the science that processes and analyses data in
order to provide managers with useful information to aid in decision making. Another definition
for statistics is viewing it as a set of methods for making decisions under uncertainty.
Statistical methods include descriptive statistics and statistical inference. Descriptive statistics is
viewed as those methods involving the collection, presentation and characterization of a set of
data in order to describe the various features of that set of data properly. By consolidating a mass
of data (numerical details), descriptive statistics turns data into information. Statistical inference
involves analysis of data and making of decisions or estimates based on information obtained
from data. The foundation of statistical inference is probability theory. Our units two and three
will discuss probability theory and statistical inference respectively.
1.2.2 Populations and Samples
Throughout the study of statistics we a distinction of the terms population and sample is made. In
this subsection, we therefore explain these terms together with related terms parameter and
sample statistic.
A population is defined as all members of a specified group. Any descriptive measure – a
numerical quantity – that summarizes some aspect of a population is called a parameter.
Although a population can have many parameters, finance and investment analysts are usually
concerned with only a few, such as the mean value, standard deviation and variance.
Even if it is possible to observe all the members of a population, it is often too expensive in
terms of time or money to attempt to do so. For example, if the population is all computer
customers across the entire region of East Africa and an analyst is interested in their purchasing
3
plans, she will find it too costly to observe the entire population. The analyst can address this
situation by taking a sample of the population.
A sample is a subset of a population. In taking a sample, the analyst hopes it is characteristic of a
population. The field of statistics known as sampling deals with taking samples in appropriate
ways to achieve the objective of representing the population properly. Sampling will be dealt in
unit three.
Just as a parameter is a descriptive measure of a population characteristic, a statistic is a
numerical quantity that summarizes a sample.
Since the purpose of a statistical study is to learn about key parameters of populations, the
analysis usually focuses on the statistics that correspond to these parameters. Consequently, these
concepts are critical not only in this unit in connection with areas such as measures of centrality
and dispersion but the next units as well when we deal with statistical inference.
1.2.3 Measurement Scales
To choose the appropriate statistical methods for summarizing and analyzing data, we need to
distinguish among different measurement scales or levels of measurement. All data
measurements are taken on one of four major scales: nominal, ordinal interval, or ratio.
Nominal Scales represent weakest level of measurement. They categorize data (by labels or
assigned values) but do not rank them. For example, if we assigned integers to mutual funds that
follow different investment strategy, the number 1 might refer to a small – cap value fund, the
number 2 to a large –cap value fund and so on for each possible style. This nominal scale
categories the fund according to their style but does not rank them. Other examples of nominal
data include gender (1 = male, 2 = female), manufacturer of automobile (1 = Toyota, 2 = Nissan,
3 = Mitsubishi) and ownership status of resident dwellers (1 = own, 2 = rent).
Ordinal Scales reflects stronger level of measurement. Ordinal scales sort data into categories
that are ordered with respect to some characteristic. For example, Morningstar and Standard &
Poor’s star ratings for mutual funds represent an ordinal scale in which one star represents a
group of funds judged to have had relatively the worst performance, with two, three, four, and
4
five stars representing groups with increasingly better performance, as evaluated by those
services. Instead of stars numbers may be used: e.g. For five – year cumulative return # 1 to top
10% of funds and so on so that # 10 represent bottom 10% of funds. Ordinal scale is stronger
than nominal because it reveals that a fund ranked 1 performed better than a fund ranked 2.
However, scale does not show the difference in performance say between fund’s ranked 1 and 2
compared with 3 and 4 or 9 and 10.
Interval scales provide not only ranking but also assurance that difference scale values are equal
(added or subtracted scale values). Examples are Celsius or Fahrenheit scales. Nevertheless, the
zero point of interval scale does not reflect complete absence of what is measured. Zero degrees
Celsius corresponds to freezing point or natural zero but does not mean absence of temperature.
As a consequence of the absence of a true zero point, we cannot meaningfully form ratios on
interval scales. Many of the techniques used to analyze data in statistics require data that are at
least of this strength.
Ratio Scales represent strongest level of measurement. These scales have all characteristics of
interval measurement scales as well as a true zero point as the origin. Here we can meaningfully
compute ratios as well as add and subtract amounts within the scale. As a result, we can apply
widest range of statistical tools to data measured on a ratio scale. Rate of returns are, then
measured on a ratio scale, as is money. If we have twice as much money, then we have twice the
purchasing power. Note that the scale has a natural zero – zero means no money. Also, typically
data consisting of areas, counts, volumes, and weights are ratio data.
These four measurement scales are summarized in Table 1.1 below
Table 1.1 Summary of Scales of Measurements.

Scale of Measurement
Property Nominal Ordinal Interval Ratio
Order (Rank) of data is meaningful No Yes Yes Yes
Differences between data values is meaningful No No Yes Yes
Zero point represents total absence No No No Yes
5
1.3 Graphical Forms and Frequency Distributions .
Having collected data, the next step in its analysis is to display it visually. A visual display will
normally summarize the data to make it more manageable, while highlighting its features. Such
methods include pie and bar charts, frequency tables, time series and cumulative frequency
graphs, scatter diagrams, histograms and stem and leaf plots. However, in this section we intend
to cover only a subset of the above methods.
1.3.1 Stem and Leaf Plot (Display)
The first step in handling data is to organize it. One way is to use a stem and leaf display. In this
method, the first part of a number is the stem, and the rest of the number is the leaf. All possible
stems are displayed in a column with each of the leaves in the same row. For example, in Table
1.2 below are recorded the prices in thousands of shillings of twenty randomly chosen used
autos.
Table 1.2: Data set for Autos

8520 9274 8142 11298 10624
7987 11172 12899 10737 9198
13625 9462 11847 10178 12240
11690 10069 11240 12745 12995
The Stem and Leaf Display appears in Table1.3 as follows:
6
Table 1.3: Stem and leaf Display
Stem Leaf Count
7 987 1
8 520,142 2
9 274,198,462 3
10 624,737,178,069 4
11 298,172,847,690,240 5
12 899,240,745,995 4
13 625 1
20
1.3.2 Relative Frequency Distribution
Another method of organizing data is a relative frequency distribution. Frequency distribution is

a tabular arrangement of data by classes (categories) together with the corresponding class
frequencies. Numbers are grouped in classes whose sizes, or widths, are equal. Once the class
boundaries have been decided upon, the number in each class is tabulated. The percentage or
proportion of numbers in each class is also displayed. This proportion is called the relative
frequency because it measures the part of the entire data set in each class. For example, suppose
a class width of Tshs. 1500 is used for the prices of used cars, then this information can be
tabulated in a relative frequency distribution as shown in Table 1.4 as follows:
Table 1.4: Relative frequency distribution of cars and their prices
Price Frequency Relative Frequency

7000 - 8499 2 0.10
8500 - 9999 4 0.20
10000 - 11499 7 0.35
11500 - 12999 6 0.30
7
13000 - 14499 1 0.05
1.00
1.3.3 Histograms
Data organized in frequency distribution tables is often displayed in a Histogram or bar graph.
The classes, each with the same width, are indicated on the horizontal axis. The vertical axis can
display either the frequency or relative frequency. Both frequencies and relative frequencies are
indicated in the following histogram (figure 1.1) which represents the prices of used autos; these
prices were taken from table 1.3
.40 -8
.35 -7
.30 -6
.25 -5
.20 - 4
0.15 - 3
.10 - 2
.05 - 1
5000 7000 8500 10000 11500 13000 14500 16000

FIGURE 1.1: Relative Frequency Histogram (prices of used autos)
8
Notice that unless the first class starts at 0 we compress the horizontal axis and begin the
numeration of a class width below our first class. On the vertical axis is the percentage or
proportion of the entire number set in each class – the relative frequency. The actual frequencies
can also appear on the vertical axis. The horizontal and vertical axes will usually not use the
same unit of measure.
Class boundaries for Histograms

The number of data points determines the number of classes into which data set should be
divided for displaying the numbers in a relative frequency distribution and a histogram. The
number of classes usually varies from five to twenty; the higher numbers might be used for data
sets with several hundred elements.
The following Table 1.5 shows weights in pounds of twenty four men.
Table 1.5: Weights in pounds of twenty men

173 157 204 198 162 153
140 172 189 191 166 147
132 212 198 183 165 171
167 158 166 163 179 155
In order to divide these weights into classes, first we determine their range. The difference
between the largest and smallest numbers in a data set is range of the data set, which is 212 – 132
= 80. To be sure that the highest number falls within the last class we add 1 to this number to
produce 81. Based on a range of from five to seven classes, we choose or decide upon five as the
number of classes to be used: thus 81/5 = 16.2  17 is the class width.
To be certain that none of the numbers fall on the boundaries and that the high boundary in class
one is the same as the low one in class two, we start the lowest class boundary 0.5 below the
lowest weighty (132) at 131.5. By adding 17 to 131.5 repeatedly all lowest and highest class
boundaries for the five classes will be determined. Next we count the number of weights in
each class and find the percentage in each class of the total number of weights. The frequency
and relative frequency table (1.4) and the histogram (figure 1.2) for this problem (Table 1.5)
9
Table 1.6: Frequency and Relative table for data from Table 1.5
Weight (Class) Frequency Relative Frequency
131.5 – 148.5 3 0.12
148.5 – 165.5 7 0.29
165.5 – 182. 5 7 0.29
182.5 – 199.5 5 0.21
199.5 – 216.5 2 0.08
24 0.99*
* Not equal to 1.00 due to rounding.

Relative
Freq.
.35 -7
.30 -6
.25 -5
.20 - 4
.15 - 3
.10 - 2
.05 - 1
114.5 131.5 148.5 165.5 182.5 199.5 216.5 233.5Weights
FIGURE 1.2 Relative Frequency Histogram: Men’s weights
10
1.3.4 Relative Frequency Polygons
A relative frequency polygon is the set of line segments formed by plotting the class mark
(midpoint of the class interval) against the class frequency or relative frequency. The relative
frequency polygon for the data in Figure 1.2 is obtained by joining in order the midpoints of the
tops of all rectangles (bars) on the histogram.
.35 -7
.30 -6
.0.25 - 5
0.20 - 4
.15 - 3
.10 - 2
.05 - 1
123 225
114.5 131.5 148.5 165.5 182.5 199.5 216.5 233.5
WEIGHTS (POUNDS)
FIGURE 1.3: Relative Frequency Polygon: Men’s Weights
11
The first connection is a segment from the midpoint of the class before the first class [middle of
the interval [(114.5, 131.5)] to the middle of the top of the first bar. The class mark can be found
by averaging the end points: (114.5 + 131.5)/2 = 123. The last connection is a segment from the
middle of the top of the last bar to the midpoint of the class after it [middle of interval (216.5,
233.5)], which is 225.
We can think of the first and last points of the relative frequency polygon as being midpoints of
bars with frequency O, since none of the 24 weights is in either class. The frequency polygon for
the histogram in Figure 1.2 is shown in Figure 1.3.
Cumulative frequencies, Ogives and smoothing curves which are closely associated with relative
frequency distributions are not covered here.
1.4 Summation and Product Operations
In analysis of data we often work with sums of numbers and hence we need a simple notation for
indicating a sum. The Greek capital letter ∑ (sigma) is used to indicate summation.
Thus,
n
 Xi 
i 1
X1 + X2 + X3 ………………………..Xn
Sum of the important properties of the summation operator ∑ are:
n
1. k
i 1
= nk, where k is a constant.

4
Thus, i 1
3  4.3  12
 kx;  k i 1 xi , , Where k is a constant.

n n
2. i 1
12
 (a  bx; )  na  bi 1 x; Where a and b are constants and where use is made of
n n
3. i 1
properties 1 and 2 above.
 (x  y )   x   y
n n n
4. i 1 i i 1 i i 1
i i
The summation operation can also be extended to multiple sums. Thus,
  , the double summation operator, is defined as

n m N
 Xij  ( xi1  xi 2  ...............  xim)

i 1 j 1 I 1
= (X11 + X21 + …………+ Xn1) + (X12 + X22 + ….+ Xn2) + ……+ (X1m + X2m + ….+ Xnm)
Sum of the properties of  are
   
n m m n
1. i 1 j 1
Xij = j 1 i 1
Xij ; that is, the order in which the double summation is
performed is interchangeable.
  xy  x y
n m n m
2. i 1 j 1 i j = i 1 i j 1 j
  (x  y )    Xij  i 1 Yij
n n n m n
3. i 1 j 1 ij i 1 j 1
ij
j 1
 x²   
n n n 1 n
4. i 1 i
= i 1
Xi ² + 2 i 1 j i 1 xxi j
 x x
n 2
= i 1 i
+2 ij
ij
The product operator ∏ is defined as
13
n
 x  x . x ................. x
i 1
i 1 2 n
3
Thus x  x .x .x
i 1
1 1 2 3
1.5 Measures of Central Tendency
In section 1.1 we started transforming raw or ungrouped data into a meaningful form and
organized it into frequency distribution and portrayed it graphically in a histogram or a frequency
distribution. We also described it by stem – and leaf displays. Although all these methods (of
section 1.1) provide a convenient way to summarize a series of observations, they are just a first
step toward describing the data.
In this section we continue to develop methods to describe or characterize data by finding a

single value to describe a set of data.This single value is referred as measure of central tendency,
or more commonly, an average. Loosely defined the central tendency of a set of numbers is the
tendency of the data to cluster around certain numerical values. Thus, measure of central
tendency is a single value that summarizes a set of data. It locates the center of the value.
Measures of central tendency and location are more widely used than any statistical measures
because they can be computed and applied easily.
This section will cover six measures of central tendency namely: the arithmetic mean, the
weighted mean, the median, the mode, the geometric mean and harmonic mean. Formulas for
both ungrouped and grouped data (where possible) are discussed.
Measures of Location (Quantiles) which include quartiles, quintiles, deciles and percentiles are
not explored in this section due to brevity. Readers are advised to consult other sources for the
purpose of their familiarization.
14
1.5.1 Measures of Central Tendency for Ungrouped Data
The Arithmetic Mean
The arithmetic mean may be computed for either a population or a sample and more specifically
they may be called population and sample means respectively. For raw data, that is, data that has
not been grouped in a frequency distribution or a stem – and – leaf display, the population mean
is the sum of all the values in the of values in the population. To find the population mean, we
use the following formula.
Population mean = Sum of all the values in the population

Number of values in the population
µ = (∑ X) / N Eq. (1.1)
Where:
µ (MU) – represents the population mean.
N – the number of items in the population
X – Represents any particular value
∑ X – is the sum of the x values.
Any measurable characteristic (e.g µ) of a population is called a parameter.
Example 1.1
Bongoland has 12 auto companies. Listed below are the number of patents granted by the
Bongoland government to each company last year.
Table 1.7: Number of patents granted by Bongoland Govt. to each Company

Company Number of patents Company Number of patents
granted granted
GM 511 Mazda 210
15
Nissan 385 Chrysler 97
D - Benz 275 Porsche 50
Toyota 257 Mitsubishi 36
Honda 249 Volvo 23
ford 234 BMV 13
Is this information a sample or a population? What is the arithmetic mean number of patents
granted?
Frequently, we select a sample from a population in order to find out something about a
specific characteristic of the population. For raw data, that is, ungrouped data, the sample
mean is the sum of all the values in the sample divided by the total number of values in the
sample. To find the mean for a sample we use the following formula.
Sample mean = Sum of all the values in the sample

Number of values in the sample
X = ∑ X/ n Eq. (1.2)
Where X stands for the sample mean and n represents the number in the sample.
The mean of a sample, or any other measure based on the sample data, is called a statistic, and
therefore a statistic is a characteristic of sample.
16
Example 1.2
The Merrill Lynch Global fund specializes in long – term obligations of foreign countries. We
are interested in the interest rate on these obligations. A random sample of six bonds revealed
the following.
Table 1.8: Bonds and related Interests

Issue Interest Rate (%)
AG – Bonds 9.5
BG – Bonds 7.25
CG – Bonds 6.50
FG – Bonds 4.75
IG - Bonds 12.00
SG - Bonds 8.30
What is the arithmetic mean interest rate on this sample of long-term obligations?
Properties of arithmetic mean include:
1. Every set of interval – level and ratio – level data has a mean. (Interval and ratio level –
data include such data as ages, incomes, and weights, with the distance between numbers
being constant).
2. All the values are included in computing the mean.
3. A set of data has only one (unique) mean.
4. The mean is a useful measure for comparing two or more populations.
5. The arithmetic mean is the only measure of location where the sum of the deviations of
each value from the mean will always be zero (a balance point)
The mean does have several disadvantages including: the mean might not be an appropriate
average to represent data if some of the sample (or populations) are extremely small or large.
17
The mean is also inappropriate if there is an open ended class for data grouped into a frequency
distribution.
Weighted mean
Weighted mean is a special case of the arithmetic mean. It occurs when there are several
observations of the same value which might occur if the data have been grouped into a frequency
distribution.
In general the weighted mean of a set of numbers designated X 1, X2 X3 ……… Xn with the
corresponding weights W1, W2, W3 …………….. Wn is computed by:
Weighted mean Xw = W1X1 + W2X2 + W3X3 +……….+ WnXn

W1 + W2 + W3 + ………… + Wn Eq. (1.3a)
OR X w = ( ∑ WX ) / ∑ W Eq. ( 1.3b)
The Median
Another “middle” of a data set, is the median. This number, which is sometimes denoted x˜ is
the middle value or average of two middle values in a set of numbers that has been arranged
lowest to highest. For data set containing one or two very small or very large (skewed) values,
the medium is a good measure of central tendency.
The properties of the Median are:
1. The median is unique; that is, like the mean, there is only one median for a set of data.
18
2. It is not affected by extremely small or large values and is therefore a valuable measure
of central tendency when such values do occur.
3. It can be computed for a frequency distribution with an open ended class provided the
median does not lie in an open-ended class.
4. It can be computed for ratio – level, interval – level, and ordinal – level data.
The Mode
The mode is another measure of central tendency. It is that value of the observation that
appears most frequently. The mode is especially useful in describing nominal and ordinal
levels of measurement. Example of nominal level is say the number of responses preferring
certain products in a market survey; product registering highest responses will register the
mode (# of highest responses).
The mode may be determined for all levels of data – nominal, ordinal, interval and ratio. The
mode has the advantage of not being affected by extremely high or low values. Like the
median, it can be used as a measure of central tendency for distributions with open – ended
classes. The mode does have a number of disadvantages however, that cause it to be used
less frequently than the mean or median. For many sets of data, there is no mode because no
value appears more than once. Since every value is different, however it could be argued that
every value is the mode. Conversely, for some data sets there is more than one mode.
The Geometric Mean (GM)
The Geometric Mean (GM) of a set of n positive numbers in defined as the nth root of the
product of n values. The GM is useful in finding the average of percentages, ratios, indexes
or growth rates. It has a wide application in business and economics because we are often
interested in finding the percentage change in sales, salaries or economic figures, such as the
Gross National Product. The formula for the GM is written as:
19
GM = n ( x1 ) x2)...........( xn) Eq. (1.4)
With Xi ≥ 0 for i = 1, 2, …….. n.
Alternatively,
Ln G = ( 1/ n) ln ( X1 X2 X3 ……Xn )
Or
n
Ln G =  ln X
j
i /n [When we have computed Ln G, then G = е ln G]
Note that the GM will always be less than or equal to (but never more than) the AM.
Example 1.3
The profits earned by ATK Construction Company on four recent projects were 3 percent, 2
percent, 4 percent and 6 percent. What is the GM Profit?
Since risky assets can have negative returns up to -100 percent ( if their price falls to zero ), we
must take some care in defining the relevant variables to average in computing a GM. We cannot
just use the product of the returns for the sample and then take the nth root because the returns
for any period could be negative. We must redefine the returns to make them positive. We do
this by adding 1.0 to the returns expressed as decimals. The term (1 + R t ) represents the year
ending value relative to an initial unit of investment at the beginning of the year. As long as we
use ( 1+ Rt ), the observations will never be negative because the biggest negative is -100
percent. The result is the GM of 1+Rt ; by then subtracting 1.0 from this result we obtain the GM
of the individual return Rt .
20
An equation that summarizes the calculation of the GM return, RG , is slightly modified version
of equation (1.4) in which the X represents “ One plus return in decimal form.” Because GM
returns use time series, we use a subscript t indexing time as well.
Therefore:
1+RG = n√ [ ( 1+R1 ) ( 1+R2 ) ( 1+R3 )…. ( 1+Rt )
 (1+Rt ) ]
1/T
1+RG = [
t 1
which leads to the following formula for the Geometric Mean Return.
Given a time series of holding period returns Rt, t =1, 2 … T, the GM Return over the time
period spanned by the returns Rt through RT is
 (1+Rt ) ]
1/T
RG = [ -1 Eq. (1.5a)
t 1
Equation (1.5a) can be used to solve for the GM return for any return data series. GM returns are
also referred to as compound returns. While Equation (1.5a) can handle any problem relating to
determination of an average (GM) increase over a period of time, the following short cut
computational formula (Equation 1.5 b) may be useful in case of two absolute values. A good
example here is, for example, if you earned Tshs. 30,000,000 in the year 2005 and Tshs.
50,000,000 in the year 2015 and you are interested to know the annual rate of increase over the
period.
Equation (1.5b) is given below:
𝑽𝒂𝒍𝒖𝒆 𝒂𝒕 𝒕𝒉𝒆 𝒆𝒏𝒅 𝒐𝒇 𝒑𝒆𝒓𝒊𝒐𝒅 𝟏/𝑻

RG = −1 Eq. (1.5b)
𝑽𝒂𝒍𝒖𝒆 𝒐𝒇 𝒃𝒆𝒈𝒊𝒏𝒏𝒊𝒏𝒈 𝒑𝒆𝒓𝒊𝒐𝒅
21
Example 1.4(a)
The population of a small Boma of Wahadzabe in 1998 was two (2) persons, by 2008 it was 22.
What is the average annual rate of percentage increase during the period?
The following example 1.4(b) illustrates the computation of the GM while contrasting the
geometric mean (GM) and the arithmetic mean (AM).
Example 1.4(b)
A hypothetical investment in a single stock initially costs € 100. One year later, the stock is
trading at € 200. At the end of the second year, the stock price falls back to the original purchase
price of € 100. No dividends are paid during the two – year period. Calculate the arithmetic and
the geometric mean annual returns. [ Hint: You may need to start with equation 1.5c below].
Holding Period Return (or Total Return)
Holding Period Return is shown in Equation 1.5(c) below as shown below:
Holding Period Return ( Or Total Return):

The holding period return for time period t , Rt ,is
Rt = (Pt – Pt-1 + Dt) / Pt-1 Eq. (1.5c)
Where
Pt = price per share at the end of time period t
Pt-1 = price per share at the end of time period t -1, the time period immediately preceding time
period t.
Dt = cash dividends received during time period t.
22
Application of Geometric and Arithmetic Means
Geometric mean (GM) is appropriate for making investments statements about past performance
while Arithmetic mean (AM) is appropriate for making investment statements in a forward –
looking context.
For reporting historical return, GM has more appeal because it is the rate of growth or return we
would have to earn each year to match the actual, cumulative investment performance.
In Example 1.4(b) above, for instance, we purchased a stock for €100 and two years later it was
worth €100, with an intervening year at €200. The GM of 0% is clearly the compound rate of
growth during the two years. Specifically, the ending amount is the beginning amount times
(1+RG) 2. GM is an excellent measure of past performance.
Same example 1.4(b) above, shows how AM can distort the assessment of historical
performance. Total performance for two years was zero (clearly). With 100% return for year 1
and - 50% for year 2, however, the AM was 25% (AM > GM). If we want to estimate the
average return over a one- period horizon, we should use the AM because the AM is the average
of one –period returns. If we want to estimate the average returns over more than one period,
however we should use the GM of returns because the GM captures how the total returns are
linked over time.
As a corollary to using the GM for performance reporting, the use of semi -logarithmic rather
arithmetic scales is more appropriate when graphing past performance. In the context of
reporting performance, a semi –logarithmic graph has an arithmetic scale on the horizontal axis
for time and a logarithmic scale on the vertical axis for the value of the investment. The vertical
axis values are spaced according to differences between their logarithms. Suppose we want to
represent €1, €10, €100, and €1,000 as values of an investment on the vertical axis. Note that
each successive value represents a 10 – fold increase over the previous value, and each will be
equally spaced on the vertical axis because the difference in their logarithms is roughly 2.3; that
is, ln 10 – ln 1 = ln 100 – ln 10 = ln 1,000 – ln100 = 2.30. On a semi –logarithmic scale, equal
movements on the vertical axis reflects equal percentage changes, and growth at a constant
compound rate plots as a straight line. A plot curving upwards reflects increasing growth rates
23
over time. The slopes of a plot at different points may be compared in order to judge relative
growth rates.
In addition to reporting historical performance, financial analysts need to calculate expected
equity risk premiums in a forward- looking context. For this purpose, the AM is appropriate.
We can illustrate the use of the AM in a forward – looking context with an example based on an
investment’s future cash flows. In contrasting the geometric and arithmetic means for
discounting future cash flows, the essential issue concerns uncertainty. Suppose an investor with
€100,000 faces an equal chance of a 100 percent return or a - 50 percent return, represented on
the tree diagram as a 50/50 chance of a 100 percent return or a -50 percent return per period.
With 100 percent return in one period and -50 percent return in the other, the GM return is
[ (2)(0.5)]1/2 -1 = 0.
€400,000
€200,000
€100,000
€100,000
€100,000
€50.000
€25,000
The GM return of 0 percent gives the mode or median of ending wealth after two periods and
thus accurately predicts the modal or median ending wealth of €100,000 in this example.
Nevertheless, the AM return better predicts AM ending wealth. With equal chances of 100
percent or -50 percent returns, consider the four equally likely outcomes of €400,000, €100,000,
€100,000, and €25,000 as if they actually occurred. The AM ending wealth would be
[€400,000 + €100,000 + €100,000 + €25,000] /4 = €156, 250. The actual returns would be 300
24
percent, 0 percent, 0 percent, and – 75 percent for a two – period AM return of (300 + 0 + 0 –
75) / 4 = 56.25 percent. This AM return predicts the AM ending wealth of €100,000 × 1.5625 =
€156,250. Noting that 56.25 percent for two periods is 25 percent per period, we then must
discount the expected terminal wealth of €156,250 at the 25 percent AM rate to reflect the
uncertainty in the cash flows.
Uncertainty in cash flows or returns causes the AM to be larger than the GM. The more uncertain
the returns, the more divergence exist between the arithmetic and geometric means. It has been
found out that the GM return approximately equals the AM return minus half the variance of the
return. Zero variance or zero uncertainty in returns would leave the GM and AM approximately
equal, but real- world uncertainty presents an AM return larger than the GM return.
The Harmonic Mean (HM)
The AM, the Weighted Mean (WM) and the GM are more frequently used concepts in Finance
and Investments. The fourth concept, the Harmonic Mean (HM) – XH – is appropriate in a
limited number of applications.
The HM formula follows:
The HM is of a set of observations X1, X2 , X3 ,……….. ,Xn is
XH = n / ( ∑in ( 1/ Xi ) Eq. ( 1.5 d)
With
Xi > 0 for i = 1, 2, 3, …….. , n.
The HM is the value obtained by summing the reciprocals of the observations (1/ Xi) then
averaging that sum by dividing it by the number of observations, n, and finally taking the
reciprocal of the average.
The HM is a special type of WM in which an observation weight is inversely proportional to its
magnitude. HM is a relatively specialized concept of mean that is a appropriate when averaging
25
ratios ( “ amount per unit”) are repeatedly applied to a fixed quantity to yield a variable number
of units.
A well known application is in the investment strategy known as cost averaging, which involves
the periodic investment of a fixed amount of money. In this application, the ratios we are
averaging are prices per share at purchase dates, and we are applying those prices to a constant
amount of money to yield a variable number of shares.
Suppose an investor purchases €1,000 of a security each month for n = 2 months. The share
prices are €10 and €15 at the two purchase dates. What is the average price paid for the security?
In this example, in the first month we purchase €1,000 / €10 = 100 shares and in the second
month we purchase €1,000 / €15 = 66.67, or 16.67 shares in total. Dividing the total Euro
amount invested, €2,000, by the total number of shares purchased, 166.67, gives an average price
paid of €2,000 / 166.67 = €12. The average price paid is in fact the HM of the asset’s prices at
the purchase dates. Using equation (1.5 d) the HM price is 2 / [(1/10) + (1/15)] = €12. The value
€12 is less than the AM purchase price (€10 + €15) / 2 = €12.5. However, the €12 could as well
be obtained through the WM formula [Equations 1.3(a) and (b)], where the weights on the
purchase prices equal the shares purchased at a given price as a proportion of the total shares
purchased. In our example, the calculation would be:
(100 / 166.67) €10.00 + (66.67 / 166.67) €15.00 = €12. If we had invested varying amounts of
money at each date, we could not use the HM formula. We could, however, still use the WM
formula in a manner similar to that just described.
HM is smaller than GM which is, in turn, smaller than the AM - (HM < GM < AM) - unless all
observations in a data set have the same value.
1.5.2 Measures for Central Tendency for Grouped Data
Often times data are grouped and presented in the form of a frequency distribution, and it is
usually impossible to secure the original raw data. Thus, if we are interested in a typical value to
represent the data, we are forced to estimate it based on the frequency distribution.
The Arithmetic Mean (AM)
26
To approximate the arithmetic mean of data organized into a frequency, we begin by assuming
the observations in each class are represented by the midpoint of the class. The mean of a
sample of data organized in a frequency distribution is computed by:
AM of Grouped Data, X = ( ∑ f x ) / n Eq. (1.6)

Where:
X - Is the designation for the AM.
X – Is the mid-value, or midpoint of each class.
F – Is the frequency in each class.
fx – is the frequency in each class times the midpoint of the class.
∑fx – is the sum of these products.
n = is the total number of frequencies.
Example 1.5
The following grouped data (frequency distribution) relates to selling prices of cars and their
frequencies.
Table 1.9: Freq. Distribution of selling prices of cars
Selling price Frequency
(Tshs. In millions)
12 up to 15 8
15 up to 18 23
18 up to 21 17
21 up to 24 18
24 up to 27 8
27 up to 30 4
30 up to 33 1
33 up to 36 1
TOTAL 80
Determine the arithmetic mean car selling price.
27
The Median
When data is organized into a frequency distribution the exact median cannot be determined.
However, it can be estimated by (1) locating the class in which the median lies and then (2)
interpolating within that class to arrive at the median. The rationale for this approach is that the
members of the median class are assumed to be evenly spaced throughout the class. The formula
is
Median of Grouped Data:
Median = L + [ (n / 2 – CF) / f ] (i) Eq. ( 1.7)
Where:
L– Is the lowest limit of the class containing the median
n – Is the total number of frequencies
f – Is the frequency in the median class.
CF – is the cumulative number of frequencies in all the classes preceding the class containing
the median.
i – is the width of the class in which the median lies.
Example 1.6
Assume the same data as give in the frequency distribution of example 0.5. What is the median
selling price for a new vehicle sold?
28
The Mode
For data grouped into a frequency distribution, the mode can be approximated by the mid point
of the class containing the largest number of class frequencies.
Empirical relation between mean, median and mode
For a symmetrical distribution, the mode, median and the mean are located at the center and are
always equal.
FIGURE 1.4: Symmetrical Distribution showing equality of mode, mean and median
X
Mode = Mean = Median
As the distribution becomes non symmetrical or skewed, the relationship among the three
averages changes. In a positively skewed distribution mean is largest of the three averages.
Conversely in a distribution that is negatively skewed, mean is the lowest of the three averages.
See both relationships in Figure 1.5 below.
29
FIGURE 1.5: Positively and Negatively Skewed Distributions
Mode
Mean
Median Mean Median Mode
1.6 Measures of Dispersion
In this section we continue to develop measures to describe a set of data, concentrating on

measures that describe the dispersion or variability of the data. The two main reasons for
studying measures of dispersion are namely: firstly, dispersion tests the reliability of the mean
and secondly, is for comparison of spread in two or more distributions.
Several measures will be considered here and will include measures much as range, mean
deviation, variance, standard deviation, semivarance and semideviation, and coefficient of
variation. Measures of dispersion for both ungrouped and grouped data will be discussed.
1.6.1 Measures of Dispersion for Ungrouped Data
Range
30
The simplest measure of dispersion is the range. It is the difference between the highest and the
lowest values in a data set. Put it in a form of an equation we have.
R Range = Highest Value – Lowest Value Eq. (1.8)
Mean Deviation (MD)
A serious defect of the range is that it is based on only two values, the highest and the lowest; it
does not take into consideration all of the values. The mean deviation (MD) overcomes the
above defect. MD measures the mean amount by which the values in a population, or sample,
vary form their mean. MD may be defined as the arithmetic mean of the absolute values of the
deviations from the arithmetic mean. The formula for MD for a sample is:
MD = ∑│X - X│ Eq. (1.9)

N
Where:
x – Is the value of each observation
x – Is the arithmetic mean of the values.
n – Is the number of observations in the sample.
││ - indicates the absolute value. That is the signs of the deviations from the mean are
disregarded.
Mean deviation is also called the mean absolute deviation (MAD).
31
Example 1.7
The number of patients in the emergency room at Njiro Hospital for a sample of five days last
year was: 103, 97, 101, 106 and 103. Determine the mean deviation ad interpret it.
MD has two advantages. Firstly, it uses all the values in the sample in the computation and
secondly, it is easy to understand – it is the average amount by which values deviate from the
mean. However, its major drawback is the use of absolute values. Generally, absolute values are
difficult to work with so the MD is not used as frequently as other measures of dispersion, such
as the standard deviation.
Variance and Standard Deviation
Population variance: the formulas for the population variance and sample variance are slightly
different. We first consider the earlier. The population variance fro the ungrouped data, that is,
data not tabulated into a frequency distribution is found by:
Pop Variance (σ²) = ∑ (X – µ) ² Eq. (1.10)

N
Where:
σ² - is the symbol for the population variance.
X – Is the value of an observation in the population.
µ – Is the arithmetic mean of the population.
N. – is the total number of the observations in the population.
Example 1.8
The ages of all the patients in the isolation ward of Kaloleni Hospital are 38, 26, 13, 41 and 22
years. What is the population variance?
32
Population Standard Deviation: The variance is difficult to interpret for a single set of
observations. The variance of 106.8, calculated in example 0.8 above, for the ages of the
patients in isolation are not in terms of years but rather: “years squared”. By taking the square
root of the population variance, we can transform it to the same unit of measurement used for the
original data. The square root of 106.8 years squared is 10.3 years. The square root of the
population variance is called the population standard deviation. In terms of a formula for
ungrouped data we have:
2
( X  )
Pop. Standard Deviation (σ) = N Eq. (1.11)
Sample Variance: the formula for the sample variance is:
S² = ∑ (X – X) 2
n–1 Eq. (1.12)
Where:
S² - is the symbol for the sample variance.
X – Is the value of each observation in the sample.
X – Is the mean of the sample
n – Is the total number of observations in the sample.
Why is this seemingly insignificant change made in the denominator? Although the use of n is
logical, it tends to underestimate the population variance, σ². The use of (n – 1) in the
denominator provided the appropriate correction for this tendency. Because the primary use of
sample statistics like S² is to estimate population parameters like σ², (n – 1) is preferred to n
when defining the sample variance. The same conversion will be used when computing the
sample standard deviation.
33
In other words, when we have n data points (observations) and we know their mean, the mean
acts as a restriction on the data, leaving us with only (n – 1) degrees of freedom, meaning only (n
– 1) observations are free to move. Therefore, if we want an average squared deviation from the
mean as our measure of variation within the sample, this average should be based on only (n – 1)
free points. This is why we divide by (n – 1) when we compute S², the unbiased estimator of the
population variance σ².
The number of degrees of freedom (d.f) is equal to the total number of observations (these are
not always raw data points), less the total number of restrictions on the observations. A
restriction is a quantity computed from the observations.
Sample variance may be computed easily by another version of eg. (1.12) thus:
S² = ∑x² - (∑x) ²/n

n–1 Eq. (1.13)
Example 1.9
The hourly wages for a sample of part time employees at Fruit Packers Company are: $2, $10, $6
and $9. What is the sample variance?
Sample standard deviation: The sample standard deviation is used as an estimator of the
population standard deviation. As it was for the population standard deviation, likewise, the
sample standard deviation is the square root of the sample variance. The sample standard
deviation for ungrouped data is most easily determined by:
S = √ { [ ∑X² -( ∑X) ²/n ] / [ n – 1 ] } Eq. (1.14)
34
Example 1.10
The sample variance in example (1.9) Involving hourly wages was computed to be 10. What is
the sample standard deviation?
Semivariance and Semideviation
An asset’s variance or standard deviation of returns is often interpreted as a measure of the

asset’s risk. Variance and standard deviation of returns take account of returns above and below
the mean, but investors are concerned only with downside risk, for example returns below the
mean. As a result, analysts have developed semivariance, semideviation, and related dispersion
measures that focus on downside risk. Semivariance is defined as the average squared deviation
below the mean. Semideviation (sometimes called semistandard deviation) is the positive square
root of semivariance. To compute the sample semivariance, for example, we take the following
steps:
(i) Calculate the sample mean.
(ii) Identify the observations that are smaller than or equal to the mean (discarding
observations greater than the mean).
(iii) Compute the sum of the squared negative deviations from the mean (using the
observations that are smaller than or equal to the mean).
(iv) Divide the sum of the squared negative deviations from Step (iii) by n -1.
A formula for semivariance approximating the unbiased estimator is
( 𝑋 − 𝑋 ) 2 /( 𝑛 − 1)
𝑓𝑜𝑟 𝑎𝑙𝑙 𝑋 𝑖 ≤𝑋
Suppose we take the case of a popular company’s shares with returns (in percent) of 16.2, 20.3,
9.3. -11.1, and -17.0. The calculated mean return and standard deviation 3.54 percent and 16.7
percent respectively. Two returns, -11.1 and -17.0, are smaller than 3.54. We compute the sum of
35
the squared negative deviations from the mean as ( -11.1 – 3.54 ) 2 + (-17.0 – 3.54 ) 2 = 214.3296
+421.8916 = 636.2212. With n – 1 = 4, we conclude that semivariance is 636.2212 /4 = 159.0553
and that semideviation is 159.0553 = 12.6 percent, approximately. The semideviation of 12.6
percent is less than the standard deviation of 16.7 percent. From this downside risk perspective,
therefore, standard deviation overstates risk.
In practice, we may be concerned with values of return (or another variable) below some level
other than the mean. For example, if our objective is 10 percent annually, we may be concerned
particularly with returns below 10 percent a year. We can call the 10 percent the target. The
name target semivariance has been given to average squared deviation below a stated target, and
target semideviation is its positive square root. To calculate a sample target semivariance, we
specify the target as a first step. After identifying observations below the target, we find the sum
of the squared negative deviations from the target and divide that sum by the number of
observations below the target minus 1.
A formula for target semivariance is
(𝑋𝑖 – 𝐵 )2 /( 𝑛 − 1)
𝑓𝑜𝑟 𝑎𝑙𝑙 𝑋 𝑖 ≤𝑋
Where B is the target and n is the number of observations. With a target return of 10 percent, we
find in the case of our popular company shares that three returns (9.3. -11.1, and -17.0) were
below the target. The target semivariance is [(9.3 – 10.0) 2 + (-11.1 – 10.0) 2 + (-17.0 – 10.0) 2] /
(5 -1) = 293.675, and the target semideviation is 293.675 = 17.14 percent, approximately.
When return distributions are symmetric, semivariance and variance are effectively equivalent.
For asymmetric distributions, variance and semivariance rank prospect’s risk differently.
Semivariance (or semideviation) and target semivariance ( or target semideviation) have intuitive
appeal, but they are harder to work with mathematically than variance. Variance or standard
deviation enters into the definition of many of the most commonly used finance risk concepts ,
such as the Sharpe’s ratio and beta . Perhaps because of these reasons, variance ( or standard
deviation) is much more frequently used in finance and investment practice.
36
1.6.2 Measures of Dispersion for Grouped Data
Range
To estimate the range from data already grouped into a frequency distribution, we subtract the
lower limit of the smallest class from the upper limit of the largest class.
Standard Deviation
If the data of interest are in grouped form (in a frequency distribution), the sample standard
deviation can be approximated by substituting
∑fx² for ∑X² and ∑fx for ∑X
The formula for the sample standard deviation then converts to:
S= { [ ∑fx² - (∑fx) ²/n ] / [ n – 1] } Eq. (1.15)
Where:
S – Represents the symbol for the sample standard deviation
X – Stands for midpoint of a class
f – Stands for class frequency
n – Represents the total number of sample observations
Example 1.11
A sample of the amounts invested in the DP Company’s profit sharing plan by employees was
organized, into a frequency distribution as shown below for further study. What is the standard
deviation of the data? What is the sample variance?
37
Table 1.16: Frequency distribution of amounts
Invested by employees
Amount invested Number of employees
$ 30 - $35 3
35 – 40 7
40 – 45 11
45 – 50 22
50 – 55 40
55 – 60 24
60- 65 9
65 - 70 4
1.6.3 Interpretation and Uses of the Standard Deviation
This part describes three main uses of standard deviation mainly: Chebyshev’s Theorem, the
Empirical rule and relative Dispersion (Coefficient of variation).
Chebyshev’s Theorem
A small standard deviation for a set of values indicates these values are located close to the
mean. Conversely, a large standard deviation reveals that the observations are widely scattered
about the mean. The Russian mathematician P.L Chebyshev (1821 – 1894) developed a theorem
that allows us to determine the minimum proportion of the values that lie within a specified
number of standard deviations of the mean. For example, based on Chebyshev’s Theorem, at the
three of four values, or 75 percent, must lie between the mean plus two standard deviations and
the mean minus two standard deviations. Further, at least eight of nine values, or 88.9 percent,
will lie between plus three standard deviations and minus three standard deviations of the mean.
At least 24 of 25 values, or 96 percent, will lie between plus and minus five standard deviations
of the mean.
In general terms, Chybyshev’s theorem states:
For any set of observations (sample of population), the proportion of the values that lie within K
standard deviations of the mean is at least 1 1/k², where k is any constant greater than 1.
38
Example 1.12
In the previous example (1.11) and solution, the arithmetic mean amount computed by
employees to the company’s profit sharing plan was $51.54, and the standard deviation was
computed to be $ 7.51. At least what percent of the contributions lie within plus 3.5 standard
deviations and minus 3.5 standard deviations of the mean?
The Empirical (Normal) Rule
Chebyshev’s theorem is concerned with any set of values; that is, the distribution of values can
have any shape. However, for a symmetrical, bell – shaped distribution much as the one shown
below, we can be more precise in explaining the dispersion about the mean. These relationships
involving the standard deviation and the mean are included in the Empirical Rule sometimes
called the Normal Rule.
The normal Rule
The empirical rule states that for a symmetrical, bell shapes frequency distribution,
approximately 68 percent of the observations will lie within plus and minus one standard
deviation of the mean; about 95% of the observations will lie within plus and minus two standard
deviations of the mean; and practically all (99.7 percent) will lie within plus and minus three
standard deviations of the mean.
39
FIGURE 1.6: A symmetrical, bell – shaped curve showing the relationship between the standard
deviation and the mean.
-3S -2S -1S X 1S 2S 3S
70 80 90 100 110 120 130
68%
95%
99.7%
Thus if X = 100, and S = 10, practically all the observations lie between 100 + 3(10) and 100 –
3(10) or 70 and 130. The range is therefore 60, found by 130 – 70. Conversely, if we know that
the range is60, we can approximate the standard deviation by dividing the range by 6.
Example 1.13
A sample of the monthly amounts spent for food by a retired citizen (senior citizen) living alone
approximates a symmetrical, bell – shaped frequency distribution. The sample mean is $150; the
standard deviation is $20. Using the empirical rule:
1. About 68% of the monthly food expenditures are between what two amounts?
2. About 95% are between what two amounts?
40
3. Almost all of the monthly expenditures are between what two amounts?
Relative Dispersion
Karl Pearson (1857 – 1936), who contributed significantly to the science of statistics, developed
a relative measure called the coefficient of variation (CV) which is used when:
1. The data are in different units (such as shillings and days absent)
2. The data are in the same units, but the means are far apart (such as the incomes of the top
executives and those of the unskilled employees)
CV is defined as the ratio of the standard deviation to the arithmetic mean expressed as a
percentage. In terms of a formula for a sample it is:
CV = S (100)
X Eq. (1.16)
Example 1.14
The variation in the annual incomes of executive is compared with the variation in incomes of
unskilled employees. For a sample of executives, X = shs. 500,000 and S = $50,000. For a
sample of unskilled employees, X = $ 22,000 and S = $2,200.
Is there any difference in the relative dispersion of the two groups?
1.7 Measures of Shape: Skewness and Kurtosis
41
After discussing measures of centrality and dispersion in sections 1.2 and 1.4 above, we now
complete this unit one by exploring measures of shape our data namely skewness and kurtosis.
In this section we will focus our attention on these measures which are critical for
understanding complete characteristics of data, especially return distribution, relating to finance
and investment.
1.7.1 Skewness in Return Distributions
Mean and variance may not adequately describe an investment’s distribution of returns. In
calculations of variance, for example, the deviations around the mean are squared, so we do not
know whether large deviations are likely to be positive or negative. We need to go beyond
measures of central tendency and dispersion to reveal other important characteristics of the
distribution. One important characteristic of interest to analysts is the degree of symmetry in
return distribution.
If a return is symmetrical about its mean, then each side of the distribution is a mirror image of
the other. Thus equal loss and gain intervals exhibit the same frequencies. Losses from – 10
percent to - 5 percent, for example occur with about the same frequency as gains from 5 percent
to 10 percent.
One of the most important distributions is the normal distribution, depicted in the previous
figures 1.4 and 1.6. This symmetrical, bell – shaped distribution plays a central role in the mean
– variance model of portfolio selection; it is also used extensively in financial risk management.
The normal distribution has the following characteristic:

 Its mean , median and mode are equal
 It is completely described by two parameters – its mean and variance
 Roughly 68 percent of its observations lie between plus and minus one standard
deviation from the mean; 95 percent lie between plus and minus two standard
deviations; and 99.7 percent lie between plus and minus three standard deviations.
Normal distribution will be explored more in unit two (under probability theory).
42
A distribution that is not symmetrical is called skewed. A return distribution with positive
skewness has frequent small losses and a few extreme gains. A return distribution with negative
skewness has frequent small gains and few extreme losses. In the previous subsection 1.3.2 we
showed in Figure 1.5 both positively and negatively skewed distributions. The positively skewed
distribution shown has a long tail on its right side whereas a negatively skewed distribution has a
long tail on its left side. For the positively skewed unimodal distribution the mode is less than the
median, which is less than the mean (Mode < Median < Mean). For the negatively skewed
unimodal distribution the mean is less than the median, which is less than the mode (Mean <
Median < Mode). Investors should be attracted by a positive skew because the mean return falls
above the median. Relative to the mean return, positive skew amounts to a limited, though
frequent, downward compared with a somewhat unlimited, but less frequent, upside.
Skewness is the name given to a statistical measure of skew – However, they are used
interchangeably. Like variance, skewness is computed using each observation’s deviation from
its mean. Skewness – sometimes referred to as relative skewness - is computed as the average
cubed deviation from the mean standardized by dividing by the standard deviation cubed to make
the measure free of scale. A symmetric distribution has skewness of zero, a positively skewed
distribution has positive skewness, and a negatively skewed distribution has negative skewness,
as given by this measure.
We can illustrate the principle behind the measure by focusing on the numerator. Cubing, unlike
squaring, preserves the sign of the deviations from the mean. If a distribution is positively
skewed with a mean greater than its median, then more than one half of the deviations from the
mean are negative and less than one half are positive. In order for the sum to be positive, the
losses must be small and likely, and the gains less but more extreme. Therefore, if skewness is
positive, the average magnitude of positive deviations is larger than the average magnitude of
negative deviations.
A simple example illustrates that a symmetrical distribution has a skewness measure equal to
zero. Suppose we have the following data: 1, 2, 3, 4, 5, 6, 7, 8, and 9. The mean outcome is 5.
And the deviations are - 4, -3, -2, -1, 0, 1, 2, 3, and 4. Cubing the deviations yields -64, -27, -8, -
1, 0, 1, 8, 27, and 64, with a sum of zero, supporting our claim. Below we give a formula for
computing skewness from a sample.
43
Sample skewness (also called sample relative skewness), SK , is
𝒏 𝟑
𝒏 𝒊=𝟏( 𝑿−𝑿 )
SK = Eq. (1.17)
𝒏−𝟏 (𝒏−𝟐) 𝒔𝟑
Where n is the number of observations in the sample and s is the sample standard deviation.
The first factor, n / [(n -1) (n -2)], corrects for a downward bias in small samples.
The algebraic sign of equation 1.17 indicates the direction of skew, with a negative S K
indicating a negatively skewed distribution and a positive SK indicating a positively skewed
distribution. Note that as n becomes large, the expression reduces to the cubed deviation.
𝑛 3
1 𝑖=1( 𝑋−𝑋 )
SK = . As a frame of reference, for a sample size of 100 or larger taken
𝑛 𝑠3
from a normal distribution, a skewness coefficient of ± 0.5 would be considered unusually large.
Some researchers believe that investors should prefer positive skewness, all else be equal – that
is, they should prefer portfolios with distributions offering a relatively large frequency of
unusually large payoffs. Different investment strategies may tend to introduce different types and
amounts of skewness into returns.
Example 1.15 below demonstrates the calculation of skewness for a managed portfolio.
Example 1.15
Table 1.18 below presents 10 years (2005 – 2014) of annual returns on TAJIRI Equity Income
Fund.
44
Table 1.18: Annual returns of TAJIRI Equity Income Fund for 10 years (2005 – 2014)
YEAR 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
RETURN 14.8 4.5 33.3 20.3 28.8 9.2 3.8 13.1 1.6 -13.0
(%)
Using the information in the table above, address the following:
(i) Calculate the skewness of TAJIRI Equity Fund showing two decimal places
(ii) Characterize the shape of the distribution of TAJIRI Equity Fund returns based on
your answer to part (i).
1.7.2 Kurtosis in Return Distributions
In the previous subsection, we discussed how to determine whether a return distribution deviates
from a normal distribution because of skewness. Another way in which a return distribution
might differ from a normal distribution is by having more returns clustered closely around the
mean ( being more peaked) and more returns with large deviations from the mean ( having fatter
tails). Relative to a normal distribution, such a distribution has a greater percentage of small
deviations from the mean return (more small surprises) and a greater percentage of extremely
large deviations from the mean return ( more big surprises). Most investors would perceive a
greater chance of extremely large deviations from the mean as increasing risk.
Kurtosis is the statistical measure that tells us when a distribution is more or less peaked than a
normal distribution. A distribution that is more peaked than normal is called leptokurtic (lepto
from the Greek word for slender); a distribution that is less peaked than normal is called
platykurtic ( platy from the Greek word for broad); and a distribution identical to the normal
distribution in this respect is called mesokurtic (meso from the Greek word for middle). The
situation of more frequent extremely large surprises that we described is one of leptokurtosis.
45
Figure 1.7 illustrates the three types of distributions – Leptokurtic, Mesokurtic and Platykurtic
FIGURE 1.7: Leptokurtic, Mesokurtic and Platykurtic Distributions
Leptokurtic (k > 3)
0.5
0.4 Mesokurtic (k = 3)
0.3 Platykurtic (k < 3)
0.2
0.1
-4 -3 -2 -1 0 1 2 3 4
The calculation for kurtosis involves finding the average of deviations from the mean raised to
the fourth power and then standardizing that average by dividing by the standard deviation raised
to the forth power. This measure is free of scale and it is always positive because the deviations
are raised to the fourth (even) power. For all normal distributions the kurtosis is equal to 3. Many
statistical packages report estimates of excess kurtosis, which is kurtosis minus 3. Excess
kurtosis thus characterizes kurtosis relative to the normal distribution. A normal or other
mesokurtic distribution has excess kurtosis equal to 0. A leptokurtic distribution has excess
kurtosis greater than 0, and a platykurtic distribution has excess kurtosis less than 0. A return
distribution with positive excess kurtosis – a leptokurtic return distribution - has more frequent
extremely large deviations from the mean than a normal distribution. Below is the expression for
computing kurtosis from a sample.
46
The sample excess kurtosis is :
𝒏 𝟒
𝒏(𝒏+𝟏) 𝒊=𝟏( 𝑿−𝑿 ) 𝟑(𝒏−𝟏)𝟐
KE = - Eq. (1.18)
𝒏−𝟏 𝒏−𝟐 (𝒏−𝟑) 𝒔𝟒 (𝒏−𝟐(𝒏−𝟑)
Where n is the sample size and s is the sample standard deviation.
In Equation 1.18, sample kurtosis is the first term. Note that as n becomes large, Equation 1.18
approximately equals
𝒏 𝒏
𝒏𝟐 𝒊=𝟏( 𝑿−𝑿 )
𝟒 𝟑𝒏𝟐 𝟏 𝒊=𝟏( 𝑿−𝑿 )
𝟒
- = -3
𝒏𝟑 𝒔𝟒 𝒏𝟐 𝒏 𝒔𝟒
For a sample of 100 or larger taken from a normal distribution, a sample excess kurtosis of 1.0 or
larger would be considered unusually large.
Most equity return series have been found to be leptokurtic. If a return distribution has positive
excess kurtosis (leptokurtosis) and we use statistical models that do not account for the fatter
tails, we will underestimate the likelihood of very bad or very good outcomes.
The following example 1.16 illustrates the calculations for sample excess kurtosis.
Example 1.16
Assume the same information of the TAJIRI Equity Income Fund as given in the previous
Example 1.15, address the following:
(i) Calculate the sample excess kurtosis of TAJIRI Equity Income Fund showing two
decimal places.
47
(ii) Characterize the shape of the distribution of TAJIRI Equity Income Fund returns
based on your answer to part (i) as leptokurtic, mesokurtic, or platykurtic.
48
1.8 REVIEW QUESTIONS
Review Question 1.1
(a) Distinguish between frequency distribution and relative frequency distribution

(b) How does a histogram differ from a frequency polygon
(c) Table 6.1 below shows a frequency distribution of the lifetimes of 400 radio tubes tested
at the L & M Tube Company. With reference to this table 1.21 determine the following:
Table 1.21: Freq. distribution of 400 radio tubes at L & M Co.
LIFETIME (HOURS) NUMBER OF TUBES
300 - 399 14
400 - 499 46
500 - 599 58
600 – 699 76
700 - 799 68
800 -899 62
900 - 999 48
1000 - 1099 22
1100 - 1199 6
Total 400
(i) Upper limit of the fifth class

(ii) Lower limit of the eighth class
(iii) Class mark of the seventh class
(iv) Class boundaries of the last class
(v) Class interval size
(vi) Frequency of the fourth class
(vii) Relative frequency of the sixth class
49
(viii) Percentage of tubes whose lifetimes do not exceed 600 hours
(ix) Percentage of tubes with lifetimes greater than or equal to 900 hours
(x) Percentage of tubes whose lifetimes are at least 500 but less than 100 hours.
Review Question 1.2
( a) Simplify and evaluate the following summations.
(i) ∑4i = 1 3X (ii) ∑4i = 1 Xi (iii) ∑4x = 1 ( X2 +2) (iv) ∑5y = 2 ( Y2 – 3Y + 2 )
( v) ∑4i = 1 ( 2Yi – 5 ) (vi) ∑ni = 1 ( Yi – 3 )
(b) Verify the following identities. Each identity is a shortcut formula. The symbols X¯ and Y̅
appearing in these identities have the following definitions
X¯ = ∑ni = 1 Xi / n ; Y̅ = ∑ni = 1 Yi / n
( i) ∑ni = 1 ( Yi - Y̅ ) 2 = ∑ni = 1 Yi2 - ( ∑ni = 1 Yi )2 / n
( ii) ∑ni = 1 ( Xi - X¯)( Yi - Y̅ ) = ∑ni = 1 Xi Yi - ( ∑ni = 1 Xi ) ( ∑ni = 1 Yi ) / n
( iii) ∑ni = 1 ( Yi - Y̅ ) 2 / [ (n – 1)] = [ ∑ni = 1 Yi2 - ( ∑ni = 1 Yi )2 / n ] / [ (n – 1)]
( i v) [ ∑ni = 1 ( Xi - X¯)( Yi - Y̅ ) ] / [ ∑ni = 1 ( Xi - X¯) 2 ]
= [ n ∑ni = 1 Xi Yi - ( ∑ni = 1 Xi ) ( ∑ni = 1 Yi ) ] / [ n ∑ni = 1 Xi2 - ( ∑ni = 1 Xi )2 ]
Review Question 1.3
You are given the I.Q.s of thirty students in a Quantitative Methods class as per the following
table below:
Table 1.22: I.Q.s of thirty students in QM Class

97 100 109 122 118 124
127 105 112 128 107 114
115 121 135 98 111 117
50
120 130 123 141 107 113
116 119 121 131 129 139
You are required to:
( a) Find the mean and standard deviation for the data set.
(b) Find the one , two , and three standard deviation intervals
( c) What actual percentages are of the above I.Q.s are within on, two, and three standard
deviations of the mean ?
( d) As prescribed by Chebyshev’s Rule, do at least ( 1 – 1/k2 ) of the measures fall within ( X –
ks , X + ks ) for k= 2 ?
( e) Is the distribution of I.Q.s in the data set above approximately normal ?
Review Question 1.4
Available to you, here below (Table 1.23), is the ordered array for the one – year total percentage
returns achieved by the domestic general stock funds whose marketing fees are paid from fund
assets.
Table 1.23: Percentage returns relating to domestic general stock funds
10.0 20.6 28.6 28.6 29.4 29.5 29.9 30.1 30.5 30.5
32.1 32.2 32.4 33.0 35.0 37.1 38.0
You are required to:
(a) Find the Range and the Interquartile Range (IR)

(b) Compute the sample variance ( S2 ) and the sample standard deviation (S) given that the
arithmetic mean ( X¯) = 29.86.
(c) Obtain the Coefficient of Variation ( CV)
51
Review Question 1.5
(a ) Suppose that the Operations Manager of a Package Delivery Service is contemplating the
purchase of a new fleet of trucks. When packages are efficiently stored in the trucks in
preparation for delivery, there are two major constraints that have to be considered – the weight
(in kgs) and the volumes ( in cubic meters) for each item.
Now suppose that in a sample of 200 packages the average weight is 26.0 kgs with a standard
deviation of 3.9 kgs. In addition, suppose that the average volume for each of these packages is
8.8 cubic meters with standard deviation of 2.2 cubic meters. How can we compare the variation
of the weight and the volume?
( b) Suppose that a potential investor is considering purchasing shares of stock in one of two
companies, A or B, that are listed on the Dar – es – Salaam Stock Exchange (DSE). If neither
company offers dividends to its stockholders and if both companies are rated equally high (by
various investments services) in terms of potential growth, the potential investor might want to
consider the volatility (variability) of the two stocks to aid in the investment decision. Now
suppose that each share of stock in company A has averaged Ths. 500 over the past few months
with a standard deviation of Tshs. 100. In addition, suppose that in the same period, the price per
share for company B stock averaged Tshs. 120 with a standard deviation of Tshs.40. How can
the investor determine which stock is more variable?
Review Question 1.6
Table 1.24 below gives the annual total returns on the MSCI KUSADIKIKA Index from 2005 to
2014. The returns are in the local currency. Use the information in this table to answer parts (a)
through (e) of this review question 1.6.
52
Table 1.24: MSC KUSADIKIKA Index Total Returns, 2015 - 2014
Year Return (%)
2005 46.21
2006 - 6.18
2007 8.04
2008 22.87
2009 45.90
2010 20.32
2011 41.20
2012 - 9.53
2013 - 17.75
2014 - 43.06
(a) To describe the distribution of observations, perform the following:
(i) Create a frequency distribution with five equally spaced classes (round up at the
second decimal place in computing the width of class intervals).
(ii) Calculate the cumulative frequency of the data.
(iii) Calculate the relative frequency and cumulative relative frequency of the data.
(iv) State whether the frequency distribution is symmetric or asymmetric. If the
distribution is asymmetric, characterize the nature of the asymmetry.
(b) To describe the tendency of the distribution, perform the following:

(i) Calculate the sample mean return.
(ii) Calculate the median return.
(iii) Identify the modal interval (or intervals) of the grouped returns.
(c) To describe the compound rate of growth of the MSCI KUSADIKIKA Index, calculate
the geometric mean return.
53
(d) To describe the dispersion of the distribution, perform the following:
(i) Calculate the range.
(ii) Calculate the mean absolute deviation (MAD).
(iii) Calculate the variance.
(iv) Calculate the standard deviation.
(v) Calculate the semivariance.
(vi) Calculate the semideviation.
(e) To describe the degree to which the distribution may depart from normality, perform the
following:
(i) Calculate the skewness.
(ii) Explain the finding for skewness in terms of the location of the median and mean
returns.
(iii) Calculate the excess kurtosis
(iv) Contrast the distribution of annual returns on the MSCI KUSADIKIKA Index to a
normal distribution model for returns.
Review Question 1.7
(a) Explain the relationship among arithmetic mean return, geometric mean return, and
variability of returns.
(b) Contrast the use of the arithmetic mean return to the geometric mean return of an
investment from the perspective of an investor concerned with the investment’s terminal
value.
(c) Contrast the use of the arithmetic mean return to the geometric mean return of an
investment from the perspective of an investor concerned with the investment’s average
one – year return.
……. End of Unit One ……
54

Induction Descriptive Stat Students Notes

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Induction Descriptive Stat Students Notes

Încărcat de

Drepturi de autor:

Formate disponibile

INSTITUTE OF ACCOUNTANCY ARUSHA

MASTER OF SCIENCE – FINANCE & INVESTMENT

IAA - ARUSHA - TANZANIA

1.2 Fundamental Concepts

1.2.2 Populations and Samples

1.2.3 Measurement Scales

These four measurement scales are summarized in Table 1.1 below

Table 1.1 Summary of Scales of Measurements.

1.3.1 Stem and Leaf Plot (Display)

Table 1.2: Data set for Autos

7987 11172 12899 10737 9198

13625 9462 11847 10178 12240

11690 10069 11240 12745 12995

The Stem and Leaf Display appears in Table1.3 as follows:

1.3.2 Relative Frequency Distribution

Another method of organizing data is a relative frequency distribution. Frequency distribution is

Table 1.4: Relative frequency distribution of cars and their prices

Price Frequency Relative Frequency

5000 7000 8500 10000 11500 13000 14500 16000

Class boundaries for Histograms

Table 1.5: Weights in pounds of twenty men

* Not equal to 1.00 due to rounding.

114.5 131.5 148.5 165.5 182.5 199.5 216.5 233.5Weights

FIGURE 1.2 Relative Frequency Histogram: Men’s weights

FIGURE 1.3: Relative Frequency Polygon: Men’s Weights

1.4 Summation and Product Operations

Sum of the important properties of the summation operator ∑ are:

 kx;  k i 1 xi , , Where k is a constant.

properties 1 and 2 above.

The summation operation can also be extended to multiple sums. Thus,

  , the double summation operator, is defined as

 Xij  ( xi1  xi 2  ...............  xim)

Sum of the properties of  are

The product operator ∏ is defined as

1.5 Measures of Central Tendency

In this section we continue to develop methods to describe or characterize data by finding a

The Arithmetic Mean

Population mean = Sum of all the values in the population

Table 1.7: Number of patents granted by Bongoland Govt. to each Company

Sample mean = Sum of all the values in the sample

Table 1.8: Bonds and related Interests

Properties of arithmetic mean include:

Weighted mean Xw = W1X1 + W2X2 + W3X3 +……….+ WnXn

The properties of the Median are:

The Geometric Mean (GM)

With Xi ≥ 0 for i = 1, 2, …….. n.

1+RG = n√ [ ( 1+R1 ) ( 1+R2 ) ( 1+R3 )…. ( 1+Rt )

Equation (1.5b) is given below:

𝑽𝒂𝒍𝒖𝒆 𝒂𝒕 𝒕𝒉𝒆 𝒆𝒏𝒅 𝒐𝒇 𝒑𝒆𝒓𝒊𝒐𝒅 𝟏/𝑻

Holding Period Return (or Total Return)

Holding Period Return is shown in Equation 1.5(c) below as shown below:

Holding Period Return ( Or Total Return):

Rt = (Pt – Pt-1 + Dt) / Pt-1 Eq. (1.5c)

The Harmonic Mean (HM)

XH = n / ( ∑in ( 1/ Xi ) Eq. ( 1.5 d)

1.5.2 Measures for Central Tendency for Grouped Data

The Arithmetic Mean (AM)

AM of Grouped Data, X = ( ∑ f x ) / n Eq. (1.6)

Determine the arithmetic mean car selling price.

Median of Grouped Data:

Median = L + [ (n / 2 – CF) / f ] (i) Eq. ( 1.7)

Empirical relation between mean, median and mode

Median Mean Median Mode