Sunteți pe pagina 1din 28

UNIT 4

FUNDAMENTAL STATISTICS
4.1 Introduction
The concept of statistics is introduced in this topic. It covers the definition and
the application of statistics involving type of data (grouped and ungrouped
data), measures of central tendencies and measures of dispersion.
Objectives
At the end of the topic, you will be able to:
Differentiate between grouped data and ungrouped data
Construct grouped data from the given ungrouped data.
Calculate the measures of central tendencies (mean, mode and median)
for grouped and ungrouped data.
Calculate the measures of dispersion (range, variance, standard
deviation and coefficient of variation) for grouped and ungrouped data.
4.2 Type of Data : Grouped and Ungrouped
4.2.1 Ungrouped Data
Ungrouped data is a raw data which are not in the form of table. For
example: The number of visitors to the museum per day:
78, 45, 65, 67, 132, 78, 67, 79, 85, 98, 112,123, 142, 122
4.2.2 Grouped Data
Grouped data is data which are organized and summarized in the form of
table. A tabular arrangement of data by class intervals together with the
corresponding class frequencies is called grouped data, or frequency table.
Table 4.1 is a frequency table of weights in kg of 70 students at MN
University.
123
Table 4.1 Weights of 64 student at MN University
Weight (kg) Number of students
45.0 - 49.9
50.0 - 54.9
55.0 -59.9
60.0 - 64.9
65.0 - 69.9
4
10
15
20
15
General Rules for Forming Grouped Data;
1. Determine the number of class intervals by using Sturges Method,
The number of class intervals, k = 1 + 3.3 log
10
n where n is the number of
data or observation.
2. Determine the range where,
Range = the difference between the largest and smallest numbers
3. Estimate the size of each class intervals by using,
k
Range
size Class
Class Limits, Class Boundaries and Class Size
Class limits are the end values of a class interval. The value on the left is
called the lower limit and the value on the right is the upper limit.
Class boundary is a value between the upper limit of one class and the lower
limit of the next one. From Table 4.1,
Class Size is a difference between the upper class boundary (last class) and
lower class boundary (first class).
For the 3
rd
class interval : 55.0 -59.9,
Lower class limit = 55.0
Upper class limit = 59.9
Lower class boundary =
95 . 54
2
0 . 55 9 . 54

+
Upper class boundary =
95 . 59
2
0 . 60 9 . 59

+
Class size = 59.95 54.95 = 5
124
Example 4.1
The following data represent the amount of soft drink in a sample of 50 for 2-liter
bottles.
2.11 2.09 2.08 2.07 2.07 2.06 2.05 2.04 2.04 2.04
2.03 2.03 2.03 2.03 2.02 2.02 2.02 2.01 2.01 2.01
2.01 2.01 2.01 2.01 2.01 2.00 2.00 2.00 2.00 1.99
1.99 1.99 1.98 1.98 1.98 1.97 1.97 1.97 1.97 1.96
1.96 1.96 1.95 1.95 1.94 1.94 1.93 1.93 1.92 1.91
Construct the frequency distribution table.
Solution:
The number of class intervals, k = 1 + 3.3 log
10
n
= 1 + 3.3 log
10
50
= 6.6 7
03 . 0
7
91 . 1 11 . 2


k
Range
size Class
Then, the frequency distribution table is given as below,
Class
Boundaries
Class
Intervals
Frequency
1.905-1.935
1.935-1.965
1.965-1.995
1.995-2.025
2.025-2.055
2.055-2.085
2.085-2.115
1.91-1.93
1.94-1.96
1.97-1.99
2.00-2.02
2.03-2.05
2.06-2.08
2.09-2.11
4
7
10
15
8
4
2
Total 50
Practice 4.1
By referring to the grouped data in Example 1, determine:
125
(a) The lower limit of the sixth class.
(b) The upper limit of the fourth class.
(c) The lower class boundary of the third class.
(d) The size of the fourth class interval.
Solution
(a) The lower limit of the sixth class = 2.06.
(b) The upper limit of the fourth class = ____.
(c) The class boundaries of the third class = ____.
(d) The size of the fourth class interval = ____-____ = 0.03.
4.3 Measures of Central Tendencies
You can characterize any set of data by measuring its central tendencies. Most
sets of data show a distinct central tendency to group around a central value.
Since such typical values tend to lie centrally within a set of data arranged
according to magnitude are also called measures of central tendency. Several
types of central tendency can be defined, the most common being the
arithmetic mean, the median and the mode.
4.3.1 Mean (
x
)
Mean is referred to as arithmetic mean. It is the most commonly used in
measuring of central tendency. The mean serves as a balance point in a set of
data where all values play an equal role. However, it can be affected by the
extreme values.
Therefore the formulas for mean are:
(a) Ungrouped Data (b) Grouped Data
126

n
x ...... x x
n
x
x
n
n
i
i
+ + +

2 1
1


= total observations
sample size
Example 4.2
Calculate the mean for;
5, 9, 10, 12, 15, 18, 20
Solution:
Mean,
n
x
x
i
i

n i
1

=
7
20 18 15 12 10 9 5 + + + + + +
= 12.718
Example 4.3
A firm collected the information below from all 40 of its employees.
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
What is the mean amount paid per employee for travelling expenses?
Solution:
Class Intervals (x) f fx
x

k
i
i
k
i
i i
f
x f
1
1



f .... f f
`x f .... x f x f

k
+ + +
+ + +

2 1
1 1 2 2 1 1
127
2
5
8
10
12
5
12
15
6
2
10
60
120
60
24
Total 40 274
The mean amount paid per employee for travelling expenses
x

k
i
i
k
i
i i
f
x f
1
1
=
40
274
= RM 6.85
Practice 4.2
Find x if the mean for the following data is 10.
4, 8, 9, 9, x, 13, 17
Solution
Given mean,
n
x
x
i
i

n i
1

= 10
7
17 13 9 9 8 4

+ + + + + + x

70 ) 7 ( 10 __ + x

___ __ 70 x
Practice 4.3
Find the mean marks of statistics for technical students below.
Class Intervals
for Marks
Number of
Students
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
Solution
Class Intervals (x) x f fx
128
50-59
60-69
70-79
80-89
90-100
54.5
___
___
___
___
4
7
13
6
5
218
___
___
___
___
Total - ___ ___
The mean marks of statistics,
x

k
i
i
k
i
i i
f
x f
1
1
=
___
___
= ____
4.3.2 Median (
x
~
)
Median is a value located in the centre of a set of data that has been ordered
from lowest to highest value. Hence, fifty percent of the observations are
located below the median value and the other fifty percent are located
above it. The median is not affected by extreme values and can be used for
qualitative data.
The formulas for median are:
(a) Ungrouped Data / Grouped Data with Point Class Intervals

position
2
1
in the alue
~
+

n
v x
(in a data set of ascending order)
(b) Grouped Data
m
m
m
m
c
f
f
n
L x
-
2 ~
1
1
1
1
1
]
1

+

where n = sample size
L
m
= lower limit of the median class
f
m-1
= cumulative frequency before the median class
f
m
= frequency of the median class
c
m
= median class size
129
Example 4.4
Calculate the median for;
(a) 5, 9, 10, 12, 15, 18, 20
(b) 4, 3, 1, 6, 7, 5
Solution:
(a) Ascending order: 5, 9, 10, 12, 15, 18, 20

4
2
1 7
2
1

+ n


12 position 4 in the alue
~ th
v x
(b) Ascending order: 1, 3, 4, 5, 6, 7

5 . 3
2
1 6
2
1

+ n


5 . 4
2
5 4
position .5 3 in the alue
~ th

+
v x
Example 4.5
A firm collected the information below from all 40 of its employees.
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
What is the median amount paid per employee for travelling expenses?
130
Solution:
Class Intervals (x) f
f
f x
(n+1)/2
= 41/2
= 20.5
2
5
Median class 8
10
12
5
12
15
6
2
5
17
32
38
40
10
60
120
60
24
Total 40 274
The case above involves the grouped data with point class intervals. From the table,
locate the median position at the 20.5
th
position. The number falls in the x = 8.

. 8 position 0.5 2 in the alue
~
Hence,
th
v x
Practice 4.4
Find x if the median for the increasing following data is 10.
4, 8, 9, x, 13, 17
Solution

____
2
1

+ n


10
2
9

10 position ___ in the alue
~ th

+

x
v x
9 + x = ___
x = ___
Practice 4.5
Find the median marks of statistics for technical students below.
Class Intervals
for Marks
Number of
Students
131
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
Solution
Class Intervals (x) f f
50-59
60-69
Median class 70-79
80-89
90-100
4
7
13
6
5
4
11
__
__
__
Total 35
5 . 17
2
35
2

n
Locate the 17.5
th
position in the table. The class median located at class interval
70-79.
The median marks of statistics for technical students,

74.5
__
__
__ - 7.5 1
__

-
2 ~
1

1
]
1

+
1
1
1
1
]
1

+

m
m
m
m
c
f
f
n
L x
4.3.3 Mode (
x
)
Mode is the value that is repeated most often in the data set with two or
more repetitions. Like the median and unlike the mean, extreme values do
not affect the mode. Its also can be used for qualitative data.
The formulas for mode are:
(a) Ungrouped Data / Grouped Data with Point Class Intervals

set data in the often most repeated is that value the
~
x
132
(b) Grouped Data
Example 4.6
Calculate the mode for;
(a) 5, 9, 10, 12, 15, 18, 20
(b) 4, 3, 1, 3, 4, 5
(c) A1, A2, A2, A3, A4, A5
Solution:
(a) These data have no mode.
o
m
L x
~
+
m
c

1
]
1

+
2 1
1
where
o
m
L
= lower limit of the mode class

1

=
class mode the before class mode
class the of frequency - of frequency

2

=
class mode after the class mode
class the of frequency - of frequency
c
m
= mode class size
133
(b) The mode, x = 3 and 4.
(c) The mode, x = A2.
Example 4.7
A firm collected the information below from all 40 of its employees.
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
What is the mode amount paid per employee for travelling expenses?
Solution:
Class Intervals (x) f
2
5
Mode class 8
10
12
5
12
15
6
2
Total 40
The case above involves the grouped data with point class intervals. From the table,
the mode class falls in the x = 8. Hence, the mode, x = 8.
Practice 4.6
Find x if the mode for the following data is 10.
10 , 4, 8, 9, x, 13, 17, 9, 10
Solution
The highest frequent data is 3 times with mode = 10. Hence, x = __.
134
Practice 4.7
Find the mode marks of statistics for technical students below.
Class Intervals
for Marks
Number of
Students
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
Solution
Class Intervals (x) f
50-59
60-69
Mode class 70-79
80-89
90-100
4
7
13
6
5
Total 35
The highest frequency is 13 where the class mode located at class interval 70-79.
The mode marks of statistics for technical students,

o
m
L x
~
+
m
c

1
]
1

+
2 1
1

__
__
__) (__ ) 7 13 (
) 7 13 (
__

1
]
1

+
Relationship Between Mean, Median and Mode
(a) Types of curve: Normal Distribution
x x x
~

(b) Types of curve: Positively Skewed Distribution
135
x
x
~
x
(c) Types of curve: Negatively Skewed Distribution
x x
~
x
Note : The highest peak indicate the value of mode( x
).
4.4 Measures of Dispersion
The degree to which numerical data tend to spread about an average value is
called the dispersion, or variation of the data. Dispersion measures the spread or
variation of values in a data set. As the dispersions become higher, the
consistency of data will be become less. Several types of dispersions are
range, variance, standard deviation and coefficient of variation.
4.4.1 Range
The range is the difference between the largest and smallest numbers in the
set.
The formulas for range are:
(a) Ungrouped Data (b) Grouped Data
Range =
smallest est l
x x
arg
136
Example 4.8
Calculate the range for;
5, 9, 10, 12, 15, 18, 20
Solution:
Range = 20 5 = 15 .
Example 4.9
Calculate the range for;
Solution:
Range = 12 2 = 10 .
Practice 4.8
Find x if the range for the following ascending data is 15.
4, 8, 9, 9, 13, x
Solution
Given range = x - 4 = 15

___ __ 15 + x
Practice 4.9
Range = Upper boundary for the last class
lower boundary for the first class
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
137
Find the range of statistics marks for technical students below.
Solution
Range = Upper boundary for the last class lower boundary for the first class
= 100.5 - ___ = ___.
4.4.2 Variance
Two commonly used measures of variation that take into account how all
the data are distributed, are the variance and the standard deviation. These
parameters measure the average scatter around the mean. It indicate how
larger values fluctuate above it and how smaller values distribute below it.
In calculating the variance, the difference between the data and mean data
should be squared. Hence, the variance and standard deviation can never be
negative.
The formulas for variance are:
(a) Ungrouped Data
General Formula
where N = population size
Class Intervals
for Marks
Number of
Students
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
138
Population Variance,
N
x
N
i
i
2
1 2
) - (


Sample Variance,
1
) - (

2
1 2

n
x x
s
n
i
i

n = sample size

= population
mean
x = sample mean
Calculator/Shortcut Formula
Population Variance,
N
N
x
x
N
i
N
i
i
i

,
_

1
2
1 2
2

Sample Variance,
1
1
2
1 2
2

,
_

n
n
x
x
s
n
i
n
i
i
i
Note: The use of Calculator/Shortcut formula should be easier.

(b) Grouped Data
General Formula
Population Variance,

k
i
i
k
i
i i
f
x f
1
2
1 2
) - (

Sample Variance,

k
i
i
k
i
i i
f
x x f
s
1
2
1 2
) - (

Calculator/Shortcut Formula
Population Variance, Sample Variance,
139

,
_

k
i
i
k
i
k
i
i
k
i
i i
i i
f
f
x f
x f
1
1
1
2
1 2
2

1
1
1
1
2
1 2
2

,
_

k
i
i
n
i
k
i
i
n
i
i i
i i
f
f
x f
x f
s
Note : Population Standard Deviation,
2
Variance
Sample Standard Deviation,
2
Variance s s
Calculator
By using scientific calculator fx 570 MS or others;
1. Press <Mode> <Mode> <1> (for SD: Statistical Data).
2. Type (1
st
data) <M+> (2
nd
data) <M+> ... (last data) <M+>.
3. Press <Shift> <1> : 1 x 2
n
x
3
1 n
x
where 1 for mean
2 for population standard deviation
3 for sample standard deviation
Example 4.10
Calculate the population variance and sample variance for;
5, 9, 10, 12, 15, 18, 20
Solution:
For Population Variance
(i) General Formula
x (x -12.7143) (x -12.7143)
2
5
9
10
12
15
18
20
-7.7143
-3.7143
-2.7143
-0.7143
2.2857
5.2857
7.2857
59.5104
13.7960
7.3674
0.5102
5.2244
27.9386
53.0814
89 - 167.4286
Mean, 7143 . 12
7
89


n
x

140
Population Variance,
9184 . 23
7
4286 . 167
) - (

2
1 2

N
x
N
i
i

.
OR
(ii) Calculator/Shortcut Formula
x x
2
5
9
10
12
15
18
20
25
81
100
144
225
324
400
89 1299
Population Variance,

,
_

7
7
89
1299

2
1
2
1 2
2
N
N
x
x
N
i
N
i
i
i

23.9184
For Sample Variance
Sample Variance,
6
7
89
1299
1
2
1
2
1 2
2

,
_

n
n
x
x
s
n
i
n
i
i
i
= 27.9048
Example 4.11
Calculate the variance and standard deviation for travelling costs based on the
following data.
Class Intervals for
Travelling Costs (RM)
Number of
Employees
2
5
8
10
12
5
12
15
6
2
141
Solution:
Make the title of row in the table based on the formula Sample Variance,

1
1
1
1
2
1 2
2

,
_

k
i
i
n
i
k
i
i
n
i
i i
i i
f
f
x f
x f
s
Variance for travelling costs = sample variance
1
1
1
1
2
1 2
2

,
_

k
i
i
n
i
k
i
i
n
i
i i
i i
f
f
x f
x f
s


4641 . 7
39
40
274
2168
2

Standard deviation for travelling costs, s = 4641 . 7 7321 . 2


x f fx x
2
f x
2
2
5
8
10
12
5
12
15
6
2
10
60
120
60
24
4
25
64
100
144
20
300
960
600
288
- 40 274 - 2168
142
Practice 4.10
Find the population variance for the following data.
4, 8, 9, 13
Solution
Population variance,

N
N
x
x
N
i
N
i
i
i

,
_

1
2
1 2
2


7
7
34
___
2

= ______
Practice 4.11
Find the variance of statistics marks for technical students below.
Solution
Class Intervals
for Marks
Number of
Students
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
143
x x
2
4
8
9
13
16
64
81
169
34 ____
Variance of statistics marks = sample variance
1
1
1
1
2
1 2
2

,
_

k
i
i
n
i
k
i
i
n
i
i i
i i
f
f
x f
x f
s


______
34
35
___
____
2

4.4.3 Coefficient of Variation


The coefficient of variation is a relative measure of variation that is always
expressed as a percentage rather than in terms of units of the particular data.
The coefficient of variation, denoted by CV, measures the scatter in the data
relative to the mean.
CV =
100 x
x
s
where s = sample standard deviation and x = sample mean
Example 4.12
The operations manager of a package delivery service samples 200 packages, and
finds that the mean weight is 26.0 pounds with a standard deviation of 3.9 pounds.
The mean volume is 8.8 cubic feet with a standard deviation of 2.2 cubic feet. How
can the operations manager compare the variation of the weight and the volume?
Solution:
Class Intervals f x fx x
2
f x
2
50-59
60-69
70-79
80-89
90-100
4
7
13
6
5
54.5
64.5
___
___
___
218
___
___
___
___
2970.25
_____
_____
_____
_____
11881
_____
_____
_____
_____
Total 35 - ___ - ___
144
For weight, the CV is,
CV
w
=
100 x
x
s
=
100 x
26
9 . 3
= 15%
For volume, the CV is,
CV
v
=
100 x
x
s
=
100 x
8 . 8
2 . 2
= 25%
Thus, relative to the mean, the variation of the package volume is much higher than
the variation of the package weight.
Practice 4.12
Below are data for the number of hours each week households watch television. The
information gathered by the interview session done to the industrial and town area.
Compare the consistencies of the number of hours-spent watching television
between the two locations and their implications.
Industrial Area Town Area
4 41 15 49 32 21 20 30
25 27 30 10 11 7 34 8
24 14 15 12
Solution
From the calculator;
Industrial area
25 . 15 , 12 . 25 s x
Town area
____ ____, s x
60.7% 100 x ___ 100 x
x
s
CV
industrial
___ 100 x ___ 100 x
x
s
CV
town
145
The smaller percentage comes from the ________ area. Hence, amount of time-
spent watching television in ________ area are the more consistent if compared to
the _______ area.
EXERCISE
1. Below are the total fares (to the nearest RM) collected on a Monday by a random
sample of 20 taxis belonging to a particular taxi company in a city.
95
92
93
115
147
185
126
127
143
157
101
93
123
133
83
51
132
129
125
135
Construct the frequency table with 50 as the lower class limit for the first class
and the class size of 20.
2. Based on the Exercise 1, find;
(a) Lower class limit for the second class.
(b) Upper class boundary for the fourth class.
(c) Class mode.
(d) Class median.
3. Find mean, mode and median from the data below;
5, 6, 8, 11, 14, 14, 17, 20
4. If the number 40 added to the data in Question No. 3, which measures best
describe central tendencies of those data. State your reason.
5. The curve below represents the type of data distribution.
(a) State the type of data distribution.
(b) If the value of median is higher than mode by 10 and the value of mode is
lower than mean by 14. What is the value of a and b?
146
50 a b
6. A random sample of 500 students were interviewed and asked to rate their
lecturers performance.
Perception Number of Students
Excellent
Good
Fair
Poor
Very poor
70
200
170
50
10
What is the value of mean, median and mode?
7. A random sample of 500 households was interviewed and data on number of cars
owned were recorded below. Find the mean, mode and median?
Number of cars Number of households
0
1
2
3
4
50
250
170
60
15
8. Calculate the mean, median and mode for the following amount received on Hari
Raya festival.
Amount received (RM) Number of children
147
80 90
90 100
100 110
110 120
120 130
130 140
140 150
2
5
10
15
9
7
2
9. Find the range and variance for the data set below.
110, 112, 98, 100, 115, 95, 100, 60
10. The data below state the age group and the number of summons for various
violations of traffic rules in 2005 and 2006 at Johor Bahru.
Age (years) Number of summons
2005 2006
20 30
30 40
40 50
50 60
60 70
120 150
300 330
250 230
70 90
25 20
Answer the following question.
(a) State the range of age group.
(b) Find the mean of age group.
(c) Calculate the standard deviation of age group.
(d) Which year shows more consistent in data distribution?
Answers to Exercise:
1.
Class Boundary Class Interval (RM) Number of taxis
49.5 69.5
69.5 89.5
89.5 109.5
109.5 129.5
129.5 149.5
149.5 169.5
169.5 189.5
50 69
70 89
90 109
110 129
130 149
150 169
170 189
1
1
5
6
5
1
1
148
2. (a) 70 (b) 129.5 (c) 110 - 129
(d)
10
2
20

. Since 10 is located at the interval 110 129, then class median is


110 - 129
3. Mean = 11.875 Median = 12.5 Mode = 14
4. Median since it cannot be affected by extreme values.
5. (a) Positively skewed distribution. (b) a = 60 and b = 64
6. Mean cannot be calculated.
Median = Mode = 200.
7. Mean = 1.5 Median = Mode = 2.
8. Mean = 115.6 Median = 114.5 Mode = 114.546
9. Range = 55
Variance = 17.261
10. (a)
50 Range Range
2006 2005

(b)
38.90244 Mean 39.5098 Mean
2006 2005


(c)
9.853146 deviation Standard 9.703806 deviation Standard
2006 2005


(d)
% 33 . 5 2 % 56 . 4 2
2006 2005
CV CV
Hence, data in 2005 show more consistent in data distribution.
Activity
The students in your class are instructed to state two things such as the number of
siblings (including themselves) and the number of pens or pencils used in class. At
the same time, you should record down the data. Then, you have to construct
frequency tables as given below.
149
Number of siblings Frequency Number of pens or
pencils
Frequency
1-2 1-2
3-4 3-4
5-6 5-6
7-8 7-8
More than 8 More than 8
Total Total
In the conclusion, you have to conclude;
(a) The highest and the lowest percent of the number of siblings and the number of
pens or pencils used in class.
(b) The mean, mode, median and standard deviation for the number of siblings and
the number of pens or pencils used in class.
(c) Which distribution item shows more consistent in data distribution?
150

S-ar putea să vă placă și