Sunteți pe pagina 1din 33

Basic Business Statistics:

Concepts & Applications

Activity 4+ 5 + 6
Descriptive Statistics
and Graphical Analysis   

1
Learning Objectives

• Measures of Center and Location: Mean (Arit


hmetic Average,Weighted Mean, Geometric Mean)
, Mode, Median
• Other measures of Location: Percentiles, Quart
iles
• Measures of Variation: Range, Interquartile ran
ge, Variance and Standard deviation, Coefficient of
variation

2
Contents
• 1. Measures of Central Tendency
• 1.1. Mean (Arithmetic Mean)
• 1.2. Mode
• 1.3. Median
• 1.4. Shape of a Distribution
• 2. Other Location Measures: Percentiles and Quartiles
• 2.1. Quartiles
• 2.2. Box-and-Whisker Plot
• 2.3. Distribution Shape and Box-and-Whisker Plot

3
Contents (continued)
• 3. Measures of Variation
• 3.1. Range
• 3.2. Interquartile Range
• 3.3. Variance
• 3.4. Standard Deviation
• 3.5. Coefficient of Variation
• 3.6. The Empirical Rule and Tchebysheff’s Theorem
• 3.7. Tchebysheff’s Theorem
• Using Microsoft Excel
• Exercices
4
Summary Measures
Describing Data Numerically

Center and Location Other Measures of Variation


Location

Range
Mean
Percentiles
Interquartile Range
Median
Quartiles

Mode Variance

Weighted Mean Standard Deviation

5
1. Measures of Central Tendency
Center and Location

Mean Median Mode Weighted Mean


n

x i
XW 
wx i i
x
w
i1
n i
N

x i W 
 wxi i

 i1
N
w i

6
1.1. Mean (Arithmetic Mean)
Mean (arithmetic mean) of data values
Sample mean
n Sample Size
X X1  X 2    X n
i
X i 1

n n
Population mean
N
Population Size

X i
X1  X 2    X N
 i 1

N N 7
1.1 Mean (continued)
(Weighted Mean)
• Used when values are grouped by frequency
or relative importance

Example: Sample of
26 Repair Projects
Weighted Mean Days
Days to Frequency to Complete:
Complete
5 4 XW 
 wx
i i

(4  5)  (12  6)  (8  7)  (2  8)
6 12 w i 4  12  8  2
7 8 164
  6.31 days
8 2 26
8
1.2. Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode or several modes
Mode = 9 No Mode

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

9
1.3. Median
• Robust measure of central tendency
• Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5
• In an ordered array, the median is the “middle”
number
If n or N is odd, the median is the middle number
If n or N is even, the median is the average of the two
middle numbers
10
1.4. Shape of a Distribution
• Describes how data is distributed
• Symmetric or skewed
Left-Skewed Symmetric Right-Skewed

Mean < Median < Mode Mean = Median = Mode Mode < Median < Mean
(Longer tail extends to left) (Longer tail extends to right)
11
2. Other Location Measures
Other Measures
of Location

Percentiles Quartiles

The pth percentile in a data array:  1st quartile = 25th percentile


 p% are less than or equal to this
value
 2nd quartile = 50th percentile
 (100 – p)% are greater than or = median
equal to this value
(where 0 ≤ p ≤ 100)  3rd quartile = 75th percentile

12
2.1. Quartiles
• Split Ordered Data into 4 Quarters
25% 25% 25% 25%
 Q1   Q2   Q3 
i  n  1
• Position of ith Quartile  Qi  
4

13
2.1. Quartiles (continued)

Data in Ordered Array: 11 12 13 16 16 17 17 18 21

1 9  1  12  13 
Position of Q1   2.5 Q1   12.5
4 2

Q2 = Median = 16 Q3 = 17,5
14
2.2. Box-and-Whisker Plot
Graphical Display of Data Using 5-
Number Summary:

Xsmallest Q1 Median Q3 Xlargest

4 6 8 10 12
15
2.3. Distribution Shape and
Box-and-Whisker Plot
Refer to summary on Exhibit 3.4 (p.114), Fig. 3.13 (p.115)
Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1Q2Q3 Q1 Q2 Q3

16
3. Measures of Variation
Variation

Variance Standard Deviation Coefficient


of Variation
Range Population Population
Variance Standard
Sample Deviation
Variance Sample
Standard
Interquartile Range
Deviation 17
3.1. Range
• Measure of variation
• Difference between the largest and the
smallest observations:
Range  X Largest  X Smallest
• Ignore the way in which data are distributed
Range = 12 - 7 = 5 Range = 12 - 7 = 5

7 8 9 10 11 1 7 8 9 10 11 1
2 2 18
3.2. Interquartile Range
• Interquartile range = 3rd quartile – 1st quartile

Data in Ordered Array: 11 12 13 16 16 17 17 18 21

Interquartile Range  Q3  Q1  17.5  12.5  5


Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%

12 30 45 57 70

Interquartile range
19
= 57 – 30 = 27
3.3. Variance
• Important measure of variation
• Shows variation about the mean
n
Sample variance:
 X X
2
i
S  2 i 1

n 1
Population variance: N

 X 
2
i
2  i 1

N 20
3.4. Standard Deviation
• Most important measure of variation
• Shows variation about the mean

Sample standard deviation: n

 X X
2
i
S i 1

n 1
Population standard deviation: N

 X 
2
i
 i 1

N 21
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
22
3.5. Coefficient of Variation
• Measure of Relative Dispersion
• Always in %
• Shows Variation Relative to Mean
• Used to Compare 2 or More Groups
• Formula: Population Sample
σ s 
CV    100% CV    100%
μ 
x 
23
3.6. The Empirical Rule and
Tchebysheff’s Theorem
• 3.6.1. The Empirical Rule
◘ μ  1σ contains about 68% of the values in the
population or the sample

68%

μ
μ  1σ
24
3.6.1. The Empirical Rule
(continued)

95% 99.7%

μ  2σ μ  3σ

25
3.7. Tchebysheff’s Theorem
• Regardless of how the data are distributed,
at least (1 - 1/k2) of the values will fall
within k standard deviations of the mean
• Examples:
(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% ……… k=3 (μ ± 3σ)
26
Using Microsoft Excel
• Use menu choice:
tools / data analysis / descriptive statistics
• Enter details in dialog box

27
Using Microsoft Excel (continued)

 Use menu choice:


tools / data analysis /
descriptive statistics

28
Using Microsoft Excel (continued)

 Enter dialog box


details

 Check box for


summary statistics

 Click OK

29
Using Microsoft Excel (continued)
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:

$2,000,000
500,000
300,000
100,000
100,000

30
Froblem for Activity 6
• The following data represent the waiting
time in minutes (definded as the time the
customer enters the line until he or she
reaches the teller window) of 15 customers
at two bank branches.

31
Froblem for Activity 6 (continued)
Bank 1 Bank 2
4.21 9.66
5.55 5.90
3.02 8.02
5.13 5.79
4.77 8.73
2.34 3.82
3.54 8.01
3.2 8.35
4.5 10.49
6.1 6.68
0.38 5.64
5.12 4.08
6.46 6.17
6.19 9.91
3.79 5.47
32
Froblem for Activity 6 (continued)
• For each of the waiting time at the two bank
branches :
1. Using Exel: List the five-number summary
(Xmin – First quartile – Median – Third quartile
– Xmax)
2. Using SPSS for window: Compute the range,
interquartile range, variance, standard deviation,
coefficient of variation.

33

S-ar putea să vă placă și