Documente Academic
Documente Profesional
Documente Cultură
FUNDAMENTAL STATISTICS
4.1 Introduction
The concept of statistics is introduced in this topic. It covers the definition and
the application of statistics involving type of data (grouped and ungrouped
data), measures of central tendencies and measures of dispersion.
Objectives
At the end of the topic, you will be able to:
Differentiate between grouped data and ungrouped data
Construct grouped data from the given ungrouped data.
Calculate the measures of central tendencies (mean, mode and median)
for grouped and ungrouped data.
Calculate the measures of dispersion (range, variance, standard
deviation and coefficient of variation) for grouped and ungrouped data.
4.2 Type of Data : Grouped and Ungrouped
4.2.1 Ungrouped Data
Ungrouped data is a raw data which are not in the form of table. For
example: The number of visitors to the museum per day:
78, 45, 65, 67, 132, 78, 67, 79, 85, 98, 112,123, 142, 122
4.2.2 Grouped Data
Grouped data is data which are organized and summarized in the form of
table. A tabular arrangement of data by class intervals together with the
corresponding class frequencies is called grouped data, or frequency table.
Table 4.1 is a frequency table of weights in kg of 70 students at MN
University.
123
Table 4.1 Weights of 64 student at MN University
Weight (kg) Number of students
45.0 - 49.9
50.0 - 54.9
55.0 -59.9
60.0 - 64.9
65.0 - 69.9
4
10
15
20
15
General Rules for Forming Grouped Data;
1. Determine the number of class intervals by using Sturges Method,
The number of class intervals, k = 1 + 3.3 log
10
n where n is the number of
data or observation.
2. Determine the range where,
Range = the difference between the largest and smallest numbers
3. Estimate the size of each class intervals by using,
k
Range
size Class
Class Limits, Class Boundaries and Class Size
Class limits are the end values of a class interval. The value on the left is
called the lower limit and the value on the right is the upper limit.
Class boundary is a value between the upper limit of one class and the lower
limit of the next one. From Table 4.1,
Class Size is a difference between the upper class boundary (last class) and
lower class boundary (first class).
For the 3
rd
class interval : 55.0 -59.9,
Lower class limit = 55.0
Upper class limit = 59.9
Lower class boundary =
95 . 54
2
0 . 55 9 . 54
+
Upper class boundary =
95 . 59
2
0 . 60 9 . 59
+
Class size = 59.95 54.95 = 5
124
Example 4.1
The following data represent the amount of soft drink in a sample of 50 for 2-liter
bottles.
2.11 2.09 2.08 2.07 2.07 2.06 2.05 2.04 2.04 2.04
2.03 2.03 2.03 2.03 2.02 2.02 2.02 2.01 2.01 2.01
2.01 2.01 2.01 2.01 2.01 2.00 2.00 2.00 2.00 1.99
1.99 1.99 1.98 1.98 1.98 1.97 1.97 1.97 1.97 1.96
1.96 1.96 1.95 1.95 1.94 1.94 1.93 1.93 1.92 1.91
Construct the frequency distribution table.
Solution:
The number of class intervals, k = 1 + 3.3 log
10
n
= 1 + 3.3 log
10
50
= 6.6 7
03 . 0
7
91 . 1 11 . 2
k
Range
size Class
Then, the frequency distribution table is given as below,
Class
Boundaries
Class
Intervals
Frequency
1.905-1.935
1.935-1.965
1.965-1.995
1.995-2.025
2.025-2.055
2.055-2.085
2.085-2.115
1.91-1.93
1.94-1.96
1.97-1.99
2.00-2.02
2.03-2.05
2.06-2.08
2.09-2.11
4
7
10
15
8
4
2
Total 50
Practice 4.1
By referring to the grouped data in Example 1, determine:
125
(a) The lower limit of the sixth class.
(b) The upper limit of the fourth class.
(c) The lower class boundary of the third class.
(d) The size of the fourth class interval.
Solution
(a) The lower limit of the sixth class = 2.06.
(b) The upper limit of the fourth class = ____.
(c) The class boundaries of the third class = ____.
(d) The size of the fourth class interval = ____-____ = 0.03.
4.3 Measures of Central Tendencies
You can characterize any set of data by measuring its central tendencies. Most
sets of data show a distinct central tendency to group around a central value.
Since such typical values tend to lie centrally within a set of data arranged
according to magnitude are also called measures of central tendency. Several
types of central tendency can be defined, the most common being the
arithmetic mean, the median and the mode.
4.3.1 Mean (
x
)
Mean is referred to as arithmetic mean. It is the most commonly used in
measuring of central tendency. The mean serves as a balance point in a set of
data where all values play an equal role. However, it can be affected by the
extreme values.
Therefore the formulas for mean are:
(a) Ungrouped Data (b) Grouped Data
126
n
x ...... x x
n
x
x
n
n
i
i
+ + +
2 1
1
= total observations
sample size
Example 4.2
Calculate the mean for;
5, 9, 10, 12, 15, 18, 20
Solution:
Mean,
n
x
x
i
i
n i
1
=
7
20 18 15 12 10 9 5 + + + + + +
= 12.718
Example 4.3
A firm collected the information below from all 40 of its employees.
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
What is the mean amount paid per employee for travelling expenses?
Solution:
Class Intervals (x) f fx
x
k
i
i
k
i
i i
f
x f
1
1
f .... f f
`x f .... x f x f
k
+ + +
+ + +
2 1
1 1 2 2 1 1
127
2
5
8
10
12
5
12
15
6
2
10
60
120
60
24
Total 40 274
The mean amount paid per employee for travelling expenses
x
k
i
i
k
i
i i
f
x f
1
1
=
40
274
= RM 6.85
Practice 4.2
Find x if the mean for the following data is 10.
4, 8, 9, 9, x, 13, 17
Solution
Given mean,
n
x
x
i
i
n i
1
= 10
7
17 13 9 9 8 4
+ + + + + + x
70 ) 7 ( 10 __ + x
___ __ 70 x
Practice 4.3
Find the mean marks of statistics for technical students below.
Class Intervals
for Marks
Number of
Students
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
Solution
Class Intervals (x) x f fx
128
50-59
60-69
70-79
80-89
90-100
54.5
___
___
___
___
4
7
13
6
5
218
___
___
___
___
Total - ___ ___
The mean marks of statistics,
x
k
i
i
k
i
i i
f
x f
1
1
=
___
___
= ____
4.3.2 Median (
x
~
)
Median is a value located in the centre of a set of data that has been ordered
from lowest to highest value. Hence, fifty percent of the observations are
located below the median value and the other fifty percent are located
above it. The median is not affected by extreme values and can be used for
qualitative data.
The formulas for median are:
(a) Ungrouped Data / Grouped Data with Point Class Intervals
position
2
1
in the alue
~
+
n
v x
(in a data set of ascending order)
(b) Grouped Data
m
m
m
m
c
f
f
n
L x
-
2 ~
1
1
1
1
1
]
1
+
where n = sample size
L
m
= lower limit of the median class
f
m-1
= cumulative frequency before the median class
f
m
= frequency of the median class
c
m
= median class size
129
Example 4.4
Calculate the median for;
(a) 5, 9, 10, 12, 15, 18, 20
(b) 4, 3, 1, 6, 7, 5
Solution:
(a) Ascending order: 5, 9, 10, 12, 15, 18, 20
4
2
1 7
2
1
+ n
12 position 4 in the alue
~ th
v x
(b) Ascending order: 1, 3, 4, 5, 6, 7
5 . 3
2
1 6
2
1
+ n
5 . 4
2
5 4
position .5 3 in the alue
~ th
+
v x
Example 4.5
A firm collected the information below from all 40 of its employees.
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
What is the median amount paid per employee for travelling expenses?
130
Solution:
Class Intervals (x) f
f
f x
(n+1)/2
= 41/2
= 20.5
2
5
Median class 8
10
12
5
12
15
6
2
5
17
32
38
40
10
60
120
60
24
Total 40 274
The case above involves the grouped data with point class intervals. From the table,
locate the median position at the 20.5
th
position. The number falls in the x = 8.
. 8 position 0.5 2 in the alue
~
Hence,
th
v x
Practice 4.4
Find x if the median for the increasing following data is 10.
4, 8, 9, x, 13, 17
Solution
____
2
1
+ n
10
2
9
10 position ___ in the alue
~ th
+
x
v x
9 + x = ___
x = ___
Practice 4.5
Find the median marks of statistics for technical students below.
Class Intervals
for Marks
Number of
Students
131
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
Solution
Class Intervals (x) f f
50-59
60-69
Median class 70-79
80-89
90-100
4
7
13
6
5
4
11
__
__
__
Total 35
5 . 17
2
35
2
n
Locate the 17.5
th
position in the table. The class median located at class interval
70-79.
The median marks of statistics for technical students,
74.5
__
__
__ - 7.5 1
__
-
2 ~
1
1
]
1
+
1
1
1
1
]
1
+
m
m
m
m
c
f
f
n
L x
4.3.3 Mode (
x
)
Mode is the value that is repeated most often in the data set with two or
more repetitions. Like the median and unlike the mean, extreme values do
not affect the mode. Its also can be used for qualitative data.
The formulas for mode are:
(a) Ungrouped Data / Grouped Data with Point Class Intervals
set data in the often most repeated is that value the
~
x
132
(b) Grouped Data
Example 4.6
Calculate the mode for;
(a) 5, 9, 10, 12, 15, 18, 20
(b) 4, 3, 1, 3, 4, 5
(c) A1, A2, A2, A3, A4, A5
Solution:
(a) These data have no mode.
o
m
L x
~
+
m
c
1
]
1
+
2 1
1
where
o
m
L
= lower limit of the mode class
1
=
class mode the before class mode
class the of frequency - of frequency
2
=
class mode after the class mode
class the of frequency - of frequency
c
m
= mode class size
133
(b) The mode, x = 3 and 4.
(c) The mode, x = A2.
Example 4.7
A firm collected the information below from all 40 of its employees.
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
What is the mode amount paid per employee for travelling expenses?
Solution:
Class Intervals (x) f
2
5
Mode class 8
10
12
5
12
15
6
2
Total 40
The case above involves the grouped data with point class intervals. From the table,
the mode class falls in the x = 8. Hence, the mode, x = 8.
Practice 4.6
Find x if the mode for the following data is 10.
10 , 4, 8, 9, x, 13, 17, 9, 10
Solution
The highest frequent data is 3 times with mode = 10. Hence, x = __.
134
Practice 4.7
Find the mode marks of statistics for technical students below.
Class Intervals
for Marks
Number of
Students
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
Solution
Class Intervals (x) f
50-59
60-69
Mode class 70-79
80-89
90-100
4
7
13
6
5
Total 35
The highest frequency is 13 where the class mode located at class interval 70-79.
The mode marks of statistics for technical students,
o
m
L x
~
+
m
c
1
]
1
+
2 1
1
__
__
__) (__ ) 7 13 (
) 7 13 (
__
1
]
1
+
Relationship Between Mean, Median and Mode
(a) Types of curve: Normal Distribution
x x x
~
(b) Types of curve: Positively Skewed Distribution
135
x
x
~
x
(c) Types of curve: Negatively Skewed Distribution
x x
~
x
Note : The highest peak indicate the value of mode( x
).
4.4 Measures of Dispersion
The degree to which numerical data tend to spread about an average value is
called the dispersion, or variation of the data. Dispersion measures the spread or
variation of values in a data set. As the dispersions become higher, the
consistency of data will be become less. Several types of dispersions are
range, variance, standard deviation and coefficient of variation.
4.4.1 Range
The range is the difference between the largest and smallest numbers in the
set.
The formulas for range are:
(a) Ungrouped Data (b) Grouped Data
Range =
smallest est l
x x
arg
136
Example 4.8
Calculate the range for;
5, 9, 10, 12, 15, 18, 20
Solution:
Range = 20 5 = 15 .
Example 4.9
Calculate the range for;
Solution:
Range = 12 2 = 10 .
Practice 4.8
Find x if the range for the following ascending data is 15.
4, 8, 9, 9, 13, x
Solution
Given range = x - 4 = 15
___ __ 15 + x
Practice 4.9
Range = Upper boundary for the last class
lower boundary for the first class
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
137
Find the range of statistics marks for technical students below.
Solution
Range = Upper boundary for the last class lower boundary for the first class
= 100.5 - ___ = ___.
4.4.2 Variance
Two commonly used measures of variation that take into account how all
the data are distributed, are the variance and the standard deviation. These
parameters measure the average scatter around the mean. It indicate how
larger values fluctuate above it and how smaller values distribute below it.
In calculating the variance, the difference between the data and mean data
should be squared. Hence, the variance and standard deviation can never be
negative.
The formulas for variance are:
(a) Ungrouped Data
General Formula
where N = population size
Class Intervals
for Marks
Number of
Students
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
138
Population Variance,
N
x
N
i
i
2
1 2
) - (
Sample Variance,
1
) - (
2
1 2
n
x x
s
n
i
i
n = sample size
= population
mean
x = sample mean
Calculator/Shortcut Formula
Population Variance,
N
N
x
x
N
i
N
i
i
i
,
_
1
2
1 2
2
Sample Variance,
1
1
2
1 2
2
,
_
n
n
x
x
s
n
i
n
i
i
i
Note: The use of Calculator/Shortcut formula should be easier.
(b) Grouped Data
General Formula
Population Variance,
k
i
i
k
i
i i
f
x f
1
2
1 2
) - (
Sample Variance,
k
i
i
k
i
i i
f
x x f
s
1
2
1 2
) - (
Calculator/Shortcut Formula
Population Variance, Sample Variance,
139
,
_
k
i
i
k
i
k
i
i
k
i
i i
i i
f
f
x f
x f
1
1
1
2
1 2
2
1
1
1
1
2
1 2
2
,
_
k
i
i
n
i
k
i
i
n
i
i i
i i
f
f
x f
x f
s
Note : Population Standard Deviation,
2
Variance
Sample Standard Deviation,
2
Variance s s
Calculator
By using scientific calculator fx 570 MS or others;
1. Press <Mode> <Mode> <1> (for SD: Statistical Data).
2. Type (1
st
data) <M+> (2
nd
data) <M+> ... (last data) <M+>.
3. Press <Shift> <1> : 1 x 2
n
x
3
1 n
x
where 1 for mean
2 for population standard deviation
3 for sample standard deviation
Example 4.10
Calculate the population variance and sample variance for;
5, 9, 10, 12, 15, 18, 20
Solution:
For Population Variance
(i) General Formula
x (x -12.7143) (x -12.7143)
2
5
9
10
12
15
18
20
-7.7143
-3.7143
-2.7143
-0.7143
2.2857
5.2857
7.2857
59.5104
13.7960
7.3674
0.5102
5.2244
27.9386
53.0814
89 - 167.4286
Mean, 7143 . 12
7
89
n
x
140
Population Variance,
9184 . 23
7
4286 . 167
) - (
2
1 2
N
x
N
i
i
.
OR
(ii) Calculator/Shortcut Formula
x x
2
5
9
10
12
15
18
20
25
81
100
144
225
324
400
89 1299
Population Variance,
,
_
7
7
89
1299
2
1
2
1 2
2
N
N
x
x
N
i
N
i
i
i
23.9184
For Sample Variance
Sample Variance,
6
7
89
1299
1
2
1
2
1 2
2
,
_
n
n
x
x
s
n
i
n
i
i
i
= 27.9048
Example 4.11
Calculate the variance and standard deviation for travelling costs based on the
following data.
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
141
Solution:
Make the title of row in the table based on the formula Sample Variance,
1
1
1
1
2
1 2
2
,
_
k
i
i
n
i
k
i
i
n
i
i i
i i
f
f
x f
x f
s
Variance for travelling costs = sample variance
1
1
1
1
2
1 2
2
,
_
k
i
i
n
i
k
i
i
n
i
i i
i i
f
f
x f
x f
s
4641 . 7
39
40
274
2168
2
,
_
1
2
1 2
2
7
7
34
___
2
= ______
Practice 4.11
Find the variance of statistics marks for technical students below.
Solution
Class Intervals
for Marks
Number of
Students
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
143
x x
2
4
8
9
13
16
64
81
169
34 ____
Variance of statistics marks = sample variance
1
1
1
1
2
1 2
2
,
_
k
i
i
n
i
k
i
i
n
i
i i
i i
f
f
x f
x f
s
______
34
35
___
____
2