Lecture 3. Descriptive Statistics

1
Business Statistics
Black. Chakrapani. Castillo
© 2008 The McGraw-Hill Companies

2
Chapter 3
Descriptive Statistics
3
3.1 Measures of Central tendency
Measure of central tendency

 use to describe set of data.
 yield information about the center or middle part of
a group of numbers.
 average, middle, and the most frequently occurring
 Ungrouped data: mean, median, mode, percentiles
and quartiles.

4
3.1 Central tendency: Ungrouped
Arithmetic Mean / Mean

 average of a group of numbers
 Population mean:
 Sample mean:
Example: Suppose a company has five departments with

24, 13, 19, 26, and 11 workers each. Determine the
population mean.
5
Median
 middle value in an ordered array of numbers.
 Steps
1. Arrange the observations in an ordered data array.
2. For an odd number of terms, find the middle term of
the ordered array as the median.
3. For an even number of terms, find the average of the
middle of two terms as the median.
Example: Suppose a business researchers wants to
determine the median of the following data.
15 11 14 3 21 17 22 16 19 16 5 7 19 8 9 20 4
6
Mode
 most frequently occurring value in a set of data.
 organized data into an ordered array
 Bimodal – two modes are listed in case of a tie
 Multimodal – data sets with more than two modes
Example: Determine the mode of the given data.

15 11 14 3 21 17 22 16 19 16 5 7 19 8 9 20 4

7
Percentiles
 measures of central tendency that divide a group
of data into 100 parts.
 “Stair-step” values, 99 dividers
 used mainly in reporting test results.
Steps in determining location of a percentile:
1. Organize the numbers into an ascending-order array.
2. Calculate the percentile location (i) by:
where: i = percentile location
P = percentile of interest
n = number in the data set
8

3. Determine the location by either (a) or (b)
a. If i is a whole number, the Pth percentile is the average
of the value at the ith location and the value at the (i+1)th
location.
b. If i is not a whole number, the Pth percentile value is
located at the whole-number part of i + 1.
Example:
1. Suppose you want to determine the 80th percentile of
1,240 numbers.
2. Determine the 30th percentile of the eight numbers: 14,
12, 19, 23, 5, 13, 28, 17.

9
Quartiles
 measures of central tendency that divide a group
of data into four subgroups or parts.
 three quartiles are percentiles; 25th, 50th, 75th
 used percentiles to compute for accurate quartiles
Example:
Determine the quartiles of the given data 106 109
114 116 121 122 125 129.

10
3.2 Variability: Ungrouped
Measures of Variability
 describe the spread or the dispersion of the set of
data.
 necessary to complement the mean value in
describing the data.
Three distributions
with same mean but
different dispersions

11
Seven (7) measures of variability:

1. Range
2. Interquartile range
3. Mean absolute deviation
4. Variance
5. Standard deviation
6. z scores
7. Coefficient of variation

12
Range
 difference between the largest value of a data set
and the smallest value of the set.
Range = Highest – Lowest
Interquartile Range
 the range of values between the first and third
quartiles.
 the range of the middle 50% of the data and is
determined by computing the value of Q3 – Q1

13
Sample
Determine the Interquartile Country Exports
United States 377.8
range of the exports in Billion
United Kingdom 10.7
dollars. Japan 9.7
China 7.9
Mexico 4.6
Germany 4.2
France 3.2
Netherlands 3.2
South Korea 3.2
Belgium 2.3

14
Deviation from the mean (x - µ)

 subtracting the mean from each data value
 some are positive or negative
 Sum of deviations from the Arithmetic mean is
always zero. ∑(x - µ) = 0

15
Mean Absolute Deviation (MAD)

 average of the absolute values of the deviations
around the mean for a set of numbers.
MAD = ∑(x - µ) /N
Variance (σ2)
 average of the squared deviations about the
arithmetic mean for a set of numbers.
σ2 = ∑(x - µ)2 /N

16
Standard Deviation (σ)

 square root of the variance.
σ = √ [ ∑(x - µ)2 /N ]
Two (2) ways of applying standard deviation:

1. Empirical Rule
2. Chebyshev’s theorem

17
1. Empirical Rule
 guideline used to state the approximate
percentage of values that lie within a given number
of standard deviations from the mean of a set of
data if the data are normally distributed.
 Normally distributed (bell-shaped)
Distance from Values within
the Mean Distance
µ ± 1σ 68%
µ ± 2σ 95%
µ ± 3σ 99.7%

18
Sample:
A company produces a lightweight valve that is
specified to have a mass of 1,365g. Unfortunately,
because of imperfections in the manufacturing
process not all of the valves produced have a mass
of exactly 1,365g. In fact, the masses of the valves
produced are normally distributed with a mean
mass of 1,365g and a standard deviation of 294g.
a. Within what range of masses would
approximately 95% of the valve masses fall?
b. Approximately 16% of the masses would be more
than what value?
c. Approximately 0.15% of the masses would be
less than what value?
19
2. Chebyshev’s Theorem
 for not normally distributed or when the shape of
the distribution is unknown.
 States that “at least 1-1/k2 values will fall within ±k
standard deviations of the mean regardless of the
shape of distribution.”
 Within k standard deviation of the mean, µ ± kσ, lie
at least 1 - 1/k2
proportion of the
values.

20
Sample: In the computing industry , the average age

of the professional employees tends to be younger
than in many other business professions. Suppose
the average age of a professional employed by a
particular computer firm is 28 with a standard
deviation of 5 years. A histogram reveals that the
data are not normally distributed but rather are
amassed in the 20s with few workers at 40. apply
Chebyshev's theorem to determine within what
range of ages at least 85% of the workers’ ages
would fall?
21
Population vs. Sample

 Sample Variance:
 Sample Standard Deviation:

22
Example: Determine the sample variance and

standard deviation.

23
Computational Formulas: Population

 Variance:
 Standard Deviation:

24
Example: Determine the population variance and

standard deviation using computational
formulas.

25
Computational Formulas: Sample

 Variance:
 Standard Deviation:

26
z Scores
 A z score represents the number of standard
deviations a value (x) is above or below the mean of a
set of numbers when the data are normally distributed.
Using z scores allows translation of a value’s raw
distance from the mean into units of standard
deviations.

27
Example
 If a z score is negative, the raw value (x) is below the
mean. If the z score is positive, the raw value (x) is
above the mean.
 For example, for a data set that is normally distributed
with a mean of 50 and a standard deviation of 10,
suppose a statistician wants to determine the z score
for a value of 70. This value (x = 70) is 20 units above
the mean, so the z value is

28
Example
 This z score signifies that the raw score of 70 is two
standard deviations above the mean. How is this z
score interpreted? The empirical rule states that 95%
of all values are within two standard deviations of the
mean if the data are approximately normally
distributed.

29
Coefficient of Variation
 The coefficient of variation is a statistic that is the ratio
of the standard deviation to the mean expressed in
percentage and is denoted CV.
 Suppose five weeks of average prices for stock A are

57, 68, 64, 71, and 62. To compute a coefficient of
variation for these prices, first determine the mean and
standard deviation: = 64.40 and = 4.84. The coefficient
of variation is:
 The standard deviation is 7.5% of the mean.


Lecture 3. Descriptive Statistics

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lecture 3. Descriptive Statistics

Încărcat de

Drepturi de autor:

Formate disponibile

1

Black. Chakrapani. Castillo

© 2008 The McGraw-Hill Companies

3.1 Measures of Central tendency

Measure of central tendency

© 2008 The McGraw-Hill Companies

3.1 Central tendency: Ungrouped

Arithmetic Mean / Mean

Example: Suppose a company has five departments with

3.1 Central tendency: Ungrouped

3.1 Central tendency: Ungrouped

Example: Determine the mode of the given data.

© 2008 The McGraw-Hill Companies

3.1 Central tendency: Ungrouped

3.1 Central tendency: Ungrouped

© 2008 The McGraw-Hill Companies

3.1 Central tendency: Ungrouped

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

Seven (7) measures of variability:

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

Deviation from the mean (x - µ)

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

Mean Absolute Deviation (MAD)

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

Standard Deviation (σ)

Two (2) ways of applying standard deviation:

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

3.2 Variability: Ungrouped

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

Sample: In the computing industry , the average age

3.2 Variability: Ungrouped

Population vs. Sample

 Sample Standard Deviation:

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

Example: Determine the sample variance and

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

Computational Formulas: Population

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

Example: Determine the population variance and

© 2008 The McGraw-Hill Companies

3.2 Variability: Ungrouped

Computational Formulas: Sample

© 2008 The McGraw-Hill Companies

© 2008 The McGraw-Hill Companies

© 2008 The McGraw-Hill Companies

© 2008 The McGraw-Hill Companies

 Suppose five weeks of average prices for stock A are