Documente Academic
Documente Profesional
Documente Cultură
=
30
1
2 2
) (
1 30
1
i
i
x x s
20
Position Value Position Value
1 2.3 16 10.2
2 3.6 17 10.4
3 4.8 18 11.2
4 5.5 19 11.4
5 6.1 20 11.7
6 6.2 21 11.8
7 6.8 22 12.1
8 7.2 23 12.3
9 8 24 13.5
10 8.1 25 14.5
11 8.3 26 15.3
12 8.5 27 15.9
13 8.9 28 16.6
14 9.1 29 18.7
15 9.6 30 19.5
Calculation of Quartiles from raw data
First quartile position:
75 . 7
100
25
31
100
) 1 ( = = +
p
n
Median position:
Halfway between 15
th
and 16
th
position: average of 9.6 and 10.2,
or
9.6 + 0.5*(10.2-9.6) = 9.9
5 . 15
100
50
31
100
) 1 ( = = +
p
n
Third quartile position?
21
Measures of Location and Dispersion from grouped data:
Duration of interstate calls example
Quartiles from Ogive: (location formula is not applicable on
grouped data)
Median = 10.2 , Q1 = 7.2, and Q3 = 13.3 minutes
IQR =Q3 Q1 = 13.3 7.2 = 6.1
For the following, use midpoint estimation
(see following slides):
Mean
Sample variance
Sample standard deviation
Modal class
22
Numerical Measures for Grouped continuous data
Notation for grouped frequency distribution
k = number of classes
m
j
= midpoint of j
th
class
f
j
= frequency of j
th
class
1
Population size:
k
j
j
N f
=
=
1
Sample size:
k
j
j
n f
=
=
1
Sample mean:
1
k
j j
j
x f m
n
=
~
1
Population mean:
1
k
j j
j
f m
N
u
=
~
23
Numerical Measures for grouped data
2
1
2
( )
Population Variance:
k
j j
j
m f
N
u
o
=
~
2
Population Standard Deviation: o= o
2
Sample Standard Deviation: s s =
2
1
2
( )
Sample Variance:
1
k
j j
j
m x f
s
n
=
24
Calculating approximate mean and median from
grouped Frequency Distribution for interstate calls
Class Freq. Lower
bound
Upper
bound
Class
midpoint
Freq x
midpoint
Freq x (midpoint
mean)
2
2 - 5 3 2 5 3.5 10.5 142.83
5 - 8 6 5 8 6.5 39 91.26
8 - 11 8 8 11 9.5 76 6.48
11 - 14 7 11 14 12.5 87.5 30.87
14 - 17 4 14 17 15.5 62 104.04
17 - 20 2 17 20 18.5 37 131.22
Total 30 312 506.7
j
f
j j
m f
2
) ( x m f
j j
j
m
Class midpoint = (lower limit + upper limit)/2
25
Summary measures based on grouped frequency
distribution
Approximate measures based on the grouped frequency table for
STD calls:
Mean ~ 312/30 = 10.4 minutes
Variance ~ 506.7/(30-1) = 17.47 minutes-squared
Std.dev ~ \17.47 = 4.18 minutes
The exact measures based on the raw data of STD calls:
Mean = 10.27 minutes, Variance = 18.36 minutes-squared,
and standard deviation = 4.28 minutes.
One advantage of grouped data is that we can identify the
modal class
Modal class is 8 11 minutes
28
Shapes of distributions
Distributions are often
symmetric bell-shaped, with a
single mode (unimodal).
Data distributions may have
more than one mode, and may
not be symmetric.
More than one mode bimodal
or multimodal
Asymmetry is referred to
skewness.
Skewness is often associated
with Outliers.
30
Detecting Outliers
Number of Television sets per household
0
5
10
15
20
0 1 2 3 4 5 6 7 8 9 10
Number of televisions
F
r
e
q
u
e
n
c
y
Example from previous Lecture
Outlier
31
Shapes of distributions
Best way to get info on your
datas distribution is to plot a
histogram!
1. Modality: is the distribution
unimodal, bimodal or
multimodal?
2. Symmetry: is the distribution
symmetric or skewed?
Measures of location can tell
us quite a bit about the shape
of the distribution, esp. its
skewness.
Symmetric unimodal
Symmetric bimodal
32
Shapes of distributions
Positively skewed
(skewed right)
Negatively skewed
(skewed left)
33
Measures of Location & Skewness
Symmetric distribution:
mean = median (= mode if
unimodal)
Skewed-Right:
mean > median (> mode if
unimodal)
mode
median
mean
Mean=median=mode
34
Measures of Location & Skewness
Skewed-Left:
mean < median
(< mode if
unimodal)
So the difference between the mean and the median (or the
mean and the mode) can tell us whether the distribution is
skewed; and if so, in which direction.
mode
median
mean
Pl. note this rule is true in general. There could be exceptions to this rule.
35
Case Studies in Descriptive Statistics
Presenting case studies: tell a story
Identify the Variable of Interest?
Choose appropriate tabular and graphical methods
Calculate numerical summary statistics
Summarize your findings in non-technical language: what do
your tables, graphs, and statistics tell you about the data and the
variable of interest?
Point out any interesting or surprising results.
Explain differences between stats that are supposedly
measuring the same thing eg: measures of central location.
Comparing distributions location, dispersion, shape etc
Comment on outliers if any
36
Tutorial: week 3
Tutorial:
Exercise Q13 Q15
Computing Lab:
Excel exercise 18-19.
When completed, work on assignment: 1
37
Next week
Basic Probability
Reading:
Unit Guide, Section 2.1.
Selvanathan, Chapter 4, Sections 4.1 4.5