Documente Academic
Documente Profesional
Documente Cultură
Activity 4+ 5 + 6
Descriptive Statistics
and Graphical Analysis
1
Learning Objectives
2
Contents
• 1. Measures of Central Tendency
• 1.1. Mean (Arithmetic Mean)
• 1.2. Mode
• 1.3. Median
• 1.4. Shape of a Distribution
• 2. Other Location Measures: Percentiles and Quartiles
• 2.1. Quartiles
• 2.2. Box-and-Whisker Plot
• 2.3. Distribution Shape and Box-and-Whisker Plot
3
Contents (continued)
• 3. Measures of Variation
• 3.1. Range
• 3.2. Interquartile Range
• 3.3. Variance
• 3.4. Standard Deviation
• 3.5. Coefficient of Variation
• 3.6. The Empirical Rule and Tchebysheff’s Theorem
• 3.7. Tchebysheff’s Theorem
• Using Microsoft Excel
• Exercices
4
Summary Measures
Describing Data Numerically
Range
Mean
Percentiles
Interquartile Range
Median
Quartiles
Mode Variance
5
1. Measures of Central Tendency
Center and Location
n
x i
XW
wx i i
x
w
i1
n i
N
x i W
wxi i
i1
N
w i
6
1.1. Mean (Arithmetic Mean)
Mean (arithmetic mean) of data values
Sample mean
n Sample Size
X X1 X 2 X n
i
X i 1
n n
Population mean
N
Population Size
X i
X1 X 2 X N
i 1
N N 7
1.1 Mean (continued)
(Weighted Mean)
• Used when values are grouped by frequency
or relative importance
Example: Sample of
26 Repair Projects
Weighted Mean Days
Days to Frequency to Complete:
Complete
5 4 XW
wx
i i
(4 5) (12 6) (8 7) (2 8)
6 12 w i 4 12 8 2
7 8 164
6.31 days
8 2 26
8
1.2. Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode or several modes
Mode = 9 No Mode
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
9
1.3. Median
• Robust measure of central tendency
• Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
• In an ordered array, the median is the “middle”
number
If n or N is odd, the median is the middle number
If n or N is even, the median is the average of the two
middle numbers
10
1.4. Shape of a Distribution
• Describes how data is distributed
• Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median < Mode Mean = Median = Mode Mode < Median < Mean
(Longer tail extends to left) (Longer tail extends to right)
11
2. Other Location Measures
Other Measures
of Location
Percentiles Quartiles
12
2.1. Quartiles
• Split Ordered Data into 4 Quarters
25% 25% 25% 25%
Q1 Q2 Q3
i n 1
• Position of ith Quartile Qi
4
13
2.1. Quartiles (continued)
1 9 1 12 13
Position of Q1 2.5 Q1 12.5
4 2
Q2 = Median = 16 Q3 = 17,5
14
2.2. Box-and-Whisker Plot
Graphical Display of Data Using 5-
Number Summary:
4 6 8 10 12
15
2.3. Distribution Shape and
Box-and-Whisker Plot
Refer to summary on Exhibit 3.4 (p.114), Fig. 3.13 (p.115)
Left-Skewed Symmetric Right-Skewed
Q1 Q2 Q3 Q1Q2Q3 Q1 Q2 Q3
16
3. Measures of Variation
Variation
7 8 9 10 11 1 7 8 9 10 11 1
2 2 18
3.2. Interquartile Range
• Interquartile range = 3rd quartile – 1st quartile
12 30 45 57 70
Interquartile range
19
= 57 – 30 = 27
3.3. Variance
• Important measure of variation
• Shows variation about the mean
n
Sample variance:
X X
2
i
S 2 i 1
n 1
Population variance: N
X
2
i
2 i 1
N 20
3.4. Standard Deviation
• Most important measure of variation
• Shows variation about the mean
X X
2
i
S i 1
n 1
Population standard deviation: N
X
2
i
i 1
N 21
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
22
3.5. Coefficient of Variation
• Measure of Relative Dispersion
• Always in %
• Shows Variation Relative to Mean
• Used to Compare 2 or More Groups
• Formula: Population Sample
σ s
CV 100% CV 100%
μ
x
23
3.6. The Empirical Rule and
Tchebysheff’s Theorem
• 3.6.1. The Empirical Rule
◘ μ 1σ contains about 68% of the values in the
population or the sample
68%
μ
μ 1σ
24
3.6.1. The Empirical Rule
(continued)
95% 99.7%
μ 2σ μ 3σ
25
3.7. Tchebysheff’s Theorem
• Regardless of how the data are distributed,
at least (1 - 1/k2) of the values will fall
within k standard deviations of the mean
• Examples:
(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% ……… k=3 (μ ± 3σ)
26
Using Microsoft Excel
• Use menu choice:
tools / data analysis / descriptive statistics
• Enter details in dialog box
27
Using Microsoft Excel (continued)
28
Using Microsoft Excel (continued)
Click OK
29
Using Microsoft Excel (continued)
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
30
Froblem for Activity 6
• The following data represent the waiting
time in minutes (definded as the time the
customer enters the line until he or she
reaches the teller window) of 15 customers
at two bank branches.
31
Froblem for Activity 6 (continued)
Bank 1 Bank 2
4.21 9.66
5.55 5.90
3.02 8.02
5.13 5.79
4.77 8.73
2.34 3.82
3.54 8.01
3.2 8.35
4.5 10.49
6.1 6.68
0.38 5.64
5.12 4.08
6.46 6.17
6.19 9.91
3.79 5.47
32
Froblem for Activity 6 (continued)
• For each of the waiting time at the two bank
branches :
1. Using Exel: List the five-number summary
(Xmin – First quartile – Median – Third quartile
– Xmax)
2. Using SPSS for window: Compute the range,
interquartile range, variance, standard deviation,
coefficient of variation.
33