Sunteți pe pagina 1din 50

Descriptive Statistics

Learning Objectives
Distinguish between measures of central
tendency, measures of variability, and
measures of shape
Understand the meanings of mean, median,
mode, quartile, percentile, and range
Compute mean, median, mode, percentile,
quartile, range, variance, standard deviation,
and mean absolute deviation


Learning Objectives --
Continued
Differentiate between sample and
population variance and standard
deviation
Understand the meaning of standard
deviation as it is applied by using the
empirical rule
Understand box and whisker plots,
skewness, and kurtosis

Measures of Central Tendency
Measures of central tendency yield
information about particular places or
locations in a group of numbers.
Common Measures of Location
Mode
Median
Mean
Percentiles
Quartiles

Mode
The most frequently occurring value in a
data set
Applicable to all levels of data
measurement (nominal, ordinal, interval,
and ratio)

Bimodal -- Data sets that have two modes
Multimodal -- Data sets that contain more
than two modes

Mode - Example
The mode is 44.
There are more 44s
than any other value.

35
37
37
39
40
40
41
41
43
43
43
43
44
44
44
44
44
45
45
46
46
46
46
48
Median
Middle value in an ordered array of
numbers.
Applicable for ordinal, interval, and ratio
data
Not applicable for nominal data
Unaffected by extremely large and
extremely small values.

Median: Computational
Procedure
First Procedure
Arrange observations in an ordered array.
If number of terms is odd, the median is the
middle term of the ordered array.
If number of terms is even, the median is the
average of the middle two terms.

Second Procedure
The medians position in an ordered array is given
by (n+1)/2.

Median: Example with
an Odd Number of Terms
Ordered Array includes:
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22

There are 17 terms in the ordered array.
Position of median = (n+1)/2 = (17+1)/2 = 9
The median is the 9th term, 15.
If the 22 is replaced by 100, the median
remains at 15.
If the 3 is replaced by -103, the median
remains at 15.

Mean
is the average of a group of numbers
Applicable for interval and ratio data, not
applicable for nominal or ordinal data
Affected by each value in the data set,
including extreme values
Computed by summing all values in the
data set and dividing the sum by the
number of values in the data set

Population Mean

= =
+ + + +
=
+ + + +
=
=

X
N N
X X X X
N 1 2 3
24 13 19 26 11
5
93
5
18 6
...
.
Sample Mean

X
X
n n
X X X X
n
= =
+ + + +
=
+ + + + +
=
=

1 2 3
57 86 42 38 90 66
6
379
6
63167
...
.
Weighted Arithmetic Mean

Subject Marks % (X) Weight (w) WX
English 60 1 60
Hindi 75 2 150
Maths 63 1 63
Physics 59 3 177
Chemistry 55 3 165
Quartiles
Measures of central tendency that divide a group
of data into four subgroups

Q
1
:

25% of the data set is below the first quartile
Q
2
:

50% of the data set is below the second
quartile
Q
3
:

75% of the data set is below the third quartile


Quartiles, continued
Q
1
is equal to the 25th percentile

Q
2
is located at

50th percentile and
equals the median

Q
3
is equal to the 75th percentile

Quartile values are not necessarily
members of the data set

Quartiles

25% 25% 25% 25%
Q
3
Q
2
Q
1
Quartiles: Example
Ordered array: 106, 109, 114, 116, 121, 122, 125,
129

Q
1
:


Q
2
:


Q
3
:

i Q = = =
+
=
25
100
8 2
109 114
2
1115 1 ( ) .
i Q = = =
+
=
50
100
8 4
116 121
2
1185 2 ( ) .
i Q = = =
+
=
75
100
8 6
122 125
2
1235 3 ( ) .
Grouped Data



Measures of Variability

Measures of variability describe the
spread or the dispersion of a set of data.
Common Measures of Variability
Range
Interquartile Range
Mean Absolute Deviation
Variance
Standard Deviation
Coefficient of Variation

Variability

Mean
Mean
No Variability in Cash Flow
Variability in Cash Flow
Range
The difference between the largest and the
smallest values in a set of data
Simple to compute
Ignores all data points
except two extremes
Example:
Range
= Largest - Smallest
= 48 - 35 = 13

35
37
37
39
40
40
41
41
43
43
43
43
44
44
44
44
44
45
45
46
46
46
46
48
35
48
Interquartile Range
Range of values between the first and third quartiles
Range of the middle half
Less influenced by extremes

Interquartile Range Q Q = 3 1
Deviation from the Mean


= = =

X
N
65
5
13
0 5 10 15 20
-8
-4
+3
+4
+5

Mean Absolute Deviation


5
9
16
17
18
-8
-4
+3
+4
+5
0
+8
+4
+3
+4
+5
24
X
X X
M A D
X
N
. . .
.
=

=
=


24
5
4 8
Population Variance

5
9
16
17
18
-8
-4
+3
+4
+5
0
64
16
9
16
25
130
X
X
( )
2
X
( )
2
2
130
5
26 0
o

=
=
=
X
N
.

Population Standard Deviation



( )
1 . 5
0 . 26
0 . 26
5
130
2
2
2
=
=
=
=
=
=


o

o
o
N
X
5
9
16
17
18
-8
-4
+3
+4
+5
0
64
16
9
16
25
130
X
X
( )
2
X

Chebyshevs Theorem
Applies to any distribution, regardless of shape
Places lower limits on the percentages of
observations within a given number of standard
deviations from the mean
Empirical Rule
Applies only to roughly mound-shaped and
symmetric distributions
Specifies approximate percentages of
observations within a given number of standard
deviations from the mean
Relation between the Mean and
Standard Deviation
1
1
2
1
1
4
3
4
75%
1
1
3
1
1
9
8
9
89%
1
1
4
1
1
16
15
16
94%
2
2
2
= = =
= = =
= = =
At least of the elements of any distribution
lie within k standard deviations of the mean
At
least
Lie
within
Standard
deviations
of the mean
2

3

4
Chebyshevs Theorem
|
|
.
|

\
|

2
1
1
k
For roughly mound-shaped and symmetric
distributions, approximately:


68.27%

1 standard deviation
of the mean


95.45%
Lie
within
2 standard deviations
of the mean


99.73%

3 standard deviations
of the mean


Empirical Rule
68-95-99.7 rule or
three-sigma rule

Sample Variance
Average of the squared deviations from the
arithmetic mean

2,398
1,844
1,539
1,311
7,092
625
71
-234
-462
0
390,625
5,041
54,756
213,444
663,866
X X X
( )
2
X X
( )
2
2
1
663 866
3
221 288 67
S
X X
n
=

=
=

,
, .
Sample Standard Deviation
Square root of the
sample variance

( )
2
2
2
1
663 866
3
221 288 67
221 288 67
470 41
S
X X
S
n
S
=

=
=
=
=
=

,
, .
, .
.
2,398
1,844
1,539
1,311
7,092
625
71
-234
-462
0
390,625
5,041
54,756
213,444
663,866
X X X
( )
2
X X
Coefficient of Variation
Ratio of the standard deviation to the mean,
expressed as a percentage.
Measurement of relative dispersion.
The CV can be used to compare two or more
sets of data measured in different units.

( )
C V . . =
o

100
Coefficient of Variation
Stock A:
Average price last year = $50
Standard deviation = $5


Stock B:
Average price last year = $100
Standard deviation = $5
10% 100%
$50
$5
100%
X
S
CV
A
= =
|
|
.
|

\
|
=
5% 100%
$100
$5
100%
X
S
CV
B
= =
|
|
.
|

\
|
=
Both stocks
have the
same
standard
deviation,
but stock B
is less
variable
relative to
its price
Chap 3-35

Locating Extreme Outliers:
Z-Score
To compute the Z-score of a data value, subtract
the mean and divide by the standard deviation.

The Z-score is the number of standard deviations a
data value is from the mean.

A data value is considered an extreme outlier if its
Z-score is less than -3.0 or greater than +3.0.

The larger the absolute value of the Z-score, the
farther the data value is from the mean.
Locating Extreme Outliers
Z-Score

Locating Extreme Outliers:


Z-Score
Suppose the mean math SAT score is 490,
with a standard deviation of 100.
Compute the z-score for a test score of 620.
3 . 1
100
130
100
490 620
= =

=
S
X X
Z
A score of 620 is 1.3 standard deviations
above the mean and would not be
considered an outlier.
Measures of Shape
Skewness
Absence of symmetry
Extreme values in one side of a distribution
Kurtosis
Peaked-ness of a distribution
Box and Whisker Plots
Graphic display of a distribution
Reveals skewness

Skewness

Negatively
Skewed
Positively
Skewed
Symmetric
(Not Skewed)
Skewness

Negatively
Skewed
Mode
Median
Mean
Symmetric
(Not Skewed)
Mean
Median
Mode
Positively
Skewed
Mode
Median
Mean
Coefficient of Skewness
Summary measure for skewness

If S < 0, the distribution is negatively skewed (skewed
to the left).
If S = 0, the distribution is symmetric (not skewed).
If S > 0, the distribution is positively skewed (skewed
to the right).

( )
S
Md
=
3
o
Coefficient of Skewness

( )
( )
1
1
1
1
1
1
1
23
26
12 3
3
3 23 26
12 3
073

o
=
=
=
=

=

=
M
S
M
d
d
.
.
.
( )
( )
2
2
2
2
2
2
2
26
26
12 3
3
3 26 26
12 3
0

o
=
=
=
=

=

=
M
S
M
d
d
.
.
( )
( )
3
3
3
3
3
3
3
29
26
12 3
3
3 29 26
12 3
073

o
=
=
=
=

=

= +
M
S
M
d
d
.
.
.
Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal in shape
Platykurtic: flat and spread out

Leptokurtic
Mesokurtic
Platykurtic
Exploratory Data Analysis
The Box-and-Whisker Plot
Min Q
1
Median Q
3
Max
The Box and central line are centered
between the endpoints if data are symmetric
around the median.





A Box-and-Whisker plot can be shown in
either vertical or horizontal format.
Statistics for Managers Using Microsoft Excel, 5e 2008 Pearson Prentice-Hall, Inc.

Chap 3-45

The Box-and-Whisker Plot
The Box-and-Whisker Plot is a graphical
display of the five number summary.
Minimum 1st Median 3rd Maximum
Quartile Quartile
25% 25% 25% 25%
Box and Whisker Plot
Five specific values are used:
Median, Q
2

First quartile, Q
1

Third quartile, Q
3

Minimum value in the data set
Maximum value in the data set

Box and Whisker Plot,
continued
Inner Fences
IQR = Q
3
- Q
1

Lower inner fence = Q
1
- 1.5 IQR
Upper inner fence = Q
3
+ 1.5 IQR

Outer Fences
Lower outer fence = Q
1
- 3.0 IQR
Upper outer fence = Q
3
+ 3.0 IQR


Box and Whisker Plot

Q
1
Q
3
Q
2
Minimum Maximum
Skewness: Box and Whisker Plots,
and Coefficient of Skewness

Negatively
Skewed
Positively
Skewed
Symmetric
(Not Skewed)
S < 0
S = 0
S > 0

S-ar putea să vă placă și