Sunteți pe pagina 1din 44

SQQS1013 Elementary Statistics

DESCRIPTIVE
STATISTICS

2.1 INTRODUCTION

Raw data

- Data recorded in the sequence in which there are


collected and before they are processed or ranked

Array data - Raw data that is arranged in ascending or


descending order.

Example 1
Here is a list of question asked in a large statistics class and the raw data given by one
of the students:
1.

What is your sex (m=male, f=female)?


Answer : m

2.

How many hours did you sleep last night?


Answer: 5 hours

3.

Randomly pick a letter S or Q.


Answer: S

4.

What is your height in inches?


Answer: 67 inches

5.

Whats the fastest youve ever driven a car (mph)?


Answer: 110 mph

Example 2

Quantitative raw data

Qualitative raw data

These data also called ungrouped data.

Chapter 2: Descriptive Statistics

SQQS1013 Elementary Statistics

2.2 ORGANIZING AND GRAPHING QUALITATIVE DATA


2.2.1 Frequency Distributions Table

A frequency distribution for qualitative data lists all categories and the
number of elements that belong to each of the categories.

It exhibits the frequencies are distributed over various categories

Also called as a frequency distribution table or simply a frequency


table.
e.g. : The number of students who belong to a certain category is called
the frequency of that category.

2.2.2 Relative Frequency and Percentage Distribution

A relative frequency distribution is a listing of all categories along with their


relative frequencies (given as proportions or percentages).

It is commonplace to give the frequency and relative frequency distribution


together.

Calculating relative frequency and percentage of a category

FORMUL
A

Relative Frequency of a category

Chapter 2: Descriptive Statistics

Frequency of that category


Sum of all frequencies
2

SQQS1013 Elementary Statistics

Percentage (%) = (Relative Frequency)* 100

Example 3
A sample of UUM staff-owned vehicles produced by Proton was identified and the
make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj =
Waja, St = Satria, P = Perdana, Sv = Savvy):
Construct a frequency distribution table for these data with their relative frequency
and percentage.

W
Is
Wj
Wj
St

W
W
Is
Sv
W

P
W
Wj
W
W

Is
Wj
Sv
Is
W

Is
Is
W
P
W

P
W
W
Sv
St

Is
W
W
Wj
St

W
Is
Wj
Wj
P

St
W
St
W
Wj

Wj
Wj
W
W
Sv

Solution:
Category

Frequency

Wira
Iswara
Perdana
Waja
Satria
Savvy

Relative
Frequency

Percentage (%)

19
8
4
10
5
4
Total

2.2.3 Graphical Presentation of Qualitative Data


a) Bar Graphs
Chapter 2: Descriptive Statistics

SQQS1013 Elementary Statistics

A graph made of bars whose heights represent the frequencies of


respective categories.

Such a graph is most helpful when you have many categories to


represent.

Notice that a gap is inserted between each of the bars.

It has

o
o
o
o

simple/ vertical bar chart


horizontal bar chart
component bar chart
multiple bar chart

Simple/ Vertical Bar Chart


To construct a vertical bar chart, mark the various categories on the horizontal
axis and mark the frequencies on the vertical axis

Horizontal Bar Chart


To construct a horizontal bar chart, mark the various categories on the vertical
axis and mark the frequencies on the horizontal axis.

Types of Vehicle

UUM Staff-owned Vehicles Produced


By Proton

Satria
Perdana
Wira
0

10

15

20

Frequency

Chapter 2: Descriptive Statistics

Component Bar Chart

SQQS1013 Elementary Statistics


To construct a component bar chart, all categories is in one bar and every
bar is divided into components.
The height of components should be tally with representative frequencies.

Example 4
Suppose we want to illustrate the information below, representing the number of
people participating in the activities offered by an outdoor pursuits centre during
Jun of three consecutive years.
2004
21
10
75
36

Climbing
Caving
Walking
Sailing
Total

2005
34
12
85
36
142

2006
36
21
100
40
167

191

Solution:

Number of participants

Activities Breakdown (Jun)


200
150

Sailing
Walking

100

Caving
Climbing

50
0
2004

2005

2006

Year

Multiple Bar Chart

To construct a multiple bar chart, each bars that representative any


categories are gathered in groups.

The height of the bar represented the frequencies of categories.

Useful for making comparisons (two or more values).

The bar graphs for relative frequency and percentage distributions can be

drawn simply by marking the relative frequencies or percentages, instead of the


class frequencies.

Chapter 2: Descriptive Statistics

SQQS1013 Elementary Statistics

Activities Breakdown (Jun)

Number of participants

120
100
Climbing

80

Caving

60

Walking

40

Sailing

20
0
2004

2005

2006

Year

b) Pie Chart

A circle divided into portions that represent the relative frequencies or


percentages of a population or a sample belonging to different
categories.

An alternative to the bar chart and useful for summarizing a single


categorical variable if there are not too many categories.

The chart makes it easy to compare relative sizes of each


class/category.

The whole pie represents the total sample or population. The pie is
divided into different portions that represent the different categories.

To construct a pie chart, we multiply 360 o by the relative frequency for


each category to obtain the degree measure or size of the angle for the
corresponding categories.
Example 5
Movie
Genres
Comedy
Action
Romance
Drama
Horror
Foreign
Science
Fiction
Total

Frequency

Relative Frequency

Angle Size

54
36
28
28
22
16
16

0.27
0.18
0.14
0.14
0.11
0.08
0.08

360*0.27=97.2o
360*0.18=64.8o
360*0.14=50.4o
360*0.14=50.4o
360*0.11=39.6o
360*0.08=28.8o
360*0.08=28.8o

200

1.00

360o

Chapter 2: Descriptive Statistics

SQQS1013 Elementary Statistics

Chapter 2: Descriptive Statistics

SQQS1013 Elementary Statistics

c) Line Graph/Time Series Graph

A graph represents data that occur over a specific period time of time.

Line graphs are more popular than all other graphs combined because
their visual characteristics reveal data trends clearly and these graphs
are easy to create.

When analyzing the graph, look for a trend or pattern that occurs over
the time period.

Example is the line ascending (indicating an increase over time) or


descending (indicating a decrease over time).

Another thing to look for is the slope, or steepness, of the line. A line
that is steep over a specific time period indicates a rapid increase or
decrease over that period.

Two data sets can be compared on the same graph (called a


compound time series graph) if two lines are used.

Data collected on the same element for the same variable at different
points in time or for different periods of time are called time series data.

A line graph is a visual comparison of how two variablesshown on the


x- and y-axesare related or vary with each other. It shows related
information by drawing a continuous line between all the points on a
grid.

Line graphs compare two variables: one is plotted along the x-axis
(horizontal) and the other along the y-axis (vertical).

The y-axis in a line graph usually indicates quantity (e.g., RM, numbers
of sales litres) or percentage, while the horizontal x-axis often measures
units of time. As a result, the line graph is often viewed as a time series
graph

Chapter 2: Descriptive Statistics

SQQS1013 Elementary Statistics

Example 6
A transit manager wishes to use the following data for a presentation showing
how Port Authority Transit ridership has changed over the years. Draw a time
series graph for the data and summarize the findings.

Ridership
(in millions)
88.0
85.0
75.7
76.6
75.4

Year
1990
1991
1992
1993
1994

Solution:

Ridership (in millions)

89
87
85
83
81
79
77
75
1990

1991

1992

1993

1994

Year

The graph shows a decline in ridership through 1992 and then leveling off for the years
1993 and 1994.

EXERCISE 1
Chapter 2: Descriptive Statistics

SQQS1013 Elementary Statistics

1. The following data show the method of payment by 16 customers in a supermarket


checkout line. ( C = cash, CK = check, CC = credit card, D = debit and O =
other ).
C
CK

a.
b.
c.

CK
CC

CK
D

C
CC

CC
C

D
CK

O
CK

C
CC

Construct a frequency distribution table.


Calculate the relative frequencies and percentages for all categories.
Draw a pie chart for the percentage distribution.

2. The frequency distribution table represents the sale of certain product in ZeeZee
Company. Each of the products was given the frequency of the sales in certain
period. Find the relative frequency and the percentage of each product. Then,
construct a pie chart using the obtained information.
Type of
Product
A
B
C
D
E

Frequency

Relative Frequency

Percentage

Angle Size

13
12
5
9
11

3. Draw a time series graph to represent the data for the number of worldwide airline
fatalities for the given years.
Year
No. of
fatalities

1990

1991

1992

1993

1994

1995

1996

440

510

990

801

732

557

1132

4. A questionnaire about how people get news resulted in the following information
from 25 respondents (N = newspaper, T = television, R = radio, M = magazine).
N
R
M
T
T

N
N
M
R
R

R
T
N
M
R

T
M
R
N
N

T
R
N
M
N

a. Construct a frequency distribution for the data.


b. Construct a bar graph for the data.
5. The given information shows the export and import trade in million RM for four
months of sales in certain year. Using the provided information, present this data
in component bar graph.
Month
September
October
November
December

6.

Export
28
30
32
24

Import
20
28
17
14

The following information represents the maximum rain fall in


millimeter (mm) in each state in Malaysia. You are supposed to help a
meteorologist in your place to make an analysis. Based on your knowledge,

Chapter 2: Descriptive Statistics

10

SQQS1013 Elementary Statistics


present this information using the most appropriate chart and give your
comment.
State

Quantity (mm)
435
512
163
721
664

Perlis
Kedah
Pulau Pinang
Perak
Selangor
Wilayah Persekutuan
Kuala Lumpur
Negeri Sembilan
Melaka
Johor
Pahang
Terengganu
Kelantan
Sarawak
Sabah

1003
390
223
876
1050
1255
986
878
456

2.3 ORGANIZING AND GRAPHING QUANTITATIVE DATA


2.3.1 Stem-and-Leaf Display

In stem and leaf display of quantitative data, each value is divided into two
portions a stem and a leaf. Then the leaves for each stem are shown
separately in a display.

Gives the information of data pattern.

Can detect which value frequently repeated.

Example 7
25
36
14

12
13
41

9
11
38

10
12
44

5
31
13

12
28
22

23
37
18

7
6
19

Solution:

2.3.2 Frequency Distributions

Chapter 2: Descriptive Statistics

11

SQQS1013 Elementary Statistics

A frequency distribution for quantitative data lists all the classes and the
number of values that belong to each class.

Data presented in form of frequency distribution are called grouped data.

The class boundary is given by the midpoint of the upper limit of one
class and the lower limit of the next class. Also called real class limit.

To find the midpoint of the upper limit of the first class and the lower limit
of the second class, we divide the sum of these two limits by 2.
e.g.:

400 401
400.5
2

class
boundary

Class Width (class size)

FORMUL
A

Class width = Upper boundary Lower boundary

e.g. :
Width of the first class = 600.5 400.5 = 200

Class Midpoint or Mark

FORMUL
A

Chapter 2: Descriptive Statistics

12

SQQS1013 Elementary Statistics

class midpoint or mark =

Lower limit + Upper limit


2

e.g:

Midpoint of the 1st class =

401 600
500.5
2

Constructing Frequency Distribution Tables


1. To decide the number of classes, we used Sturges formula, which is
FORMUL
A

where

c = 1 + 3.3 log n
c is the no. of classes
n is the no. of observations in the data set.

2. Class width,

FORMUL
A

Largest value - Smallest value


Number of classes
Range
i
c
i

This class width is rounded up to a convenient number.


3. Lower Limit of the First Class or the Starting Point
Use the smallest value in the data set.

Example 8

Chapter 2: Descriptive Statistics

13

SQQS1013 Elementary Statistics


The following data give the total home runs hit by all players of each of the 30 Major
League Baseball teams during 2004 season.

i)

Number of classes, c

ii)

Class width,

= 1 + 3.3 log 30
= 1 + 3.3(1.48)
= 5.89 6 class

242 135
6
17.8
18

iii) Starting Point = 135


Table 2.10 : Frequency Distribution for Data of Table 2.9

Total Home Runs


135 152
153 170
171 188
189 206
207 224
225 242

Tally
|||| ||||
||
||||
|||| |
|||
||||

f
10
2
5
6
3
4

f 30

2.3.3 Relative Frequency and Percentage Distributions


FORMUL
A

Chapter 2: Descriptive Statistics

14

SQQS1013 Elementary Statistics


Frequency of that class
Sum of all frequencies
f
=
f

Relative frequency of a class =

Percentage = (Relative frequency) 100

Example 9
(Refer example 8)
Table 2.11: Relative Frequency and Percentage Distributions

Total Home Runs


135 152
153 170
171 188
189 206
207 224
225 242

Class Boundaries
134.5 less than 152.5
152.5 less than 170.5
170.5 less than 188.5
188.5 less than 206.5
206.5 less than 224.5
224.5 less than 242.5
Total

Relative
Frequency
0.3333
0.0667
0.1667
0.2000
0.1000
0.1333
1.0

%
33.33
6.67
16.67
20.00
10.00
13.33
100%

2.3.4 Graphing Grouped Data


a) Histograms
A histogram is a graph in which the class boundaries are marked on the
horizontal axis and either the frequencies, relative frequencies, or percentages
are marked on the vertical axis. The frequencies, relative frequencies or
percentages are represented by the heights of the bars.
In histogram, the bars are drawn adjacent to each other and there is a space
between y axis and the first bar.

Example
(Refer example 8)

10

Frequency histogram for Table 2.9


12
10
8
6
4
2
0
134.5

b) Polygon

152.5 170.5 188.5 206.5 224.5 242.5

Total home runs

A graph formed by joining the midpoints of the tops of successive bars in a


histogram with straight lines is called a polygon.
Chapter 2: Descriptive Statistics

15

SQQS1013 Elementary Statistics

Example
11

Frequency polygon for Table 2.11

134.5

152.5 170.5 188.5 206.5 224.5 242.5

For a very large data set, as the number of classes is increased (and the width of
classes is decreased), the frequency polygon eventually becomes a smooth
curve called a frequency distribution curve or simply a frequency curve.

Frequency distribution curve

c) Shape of Histogram

Same as polygon.

For a very large data set, as the number of classes is increased (and the width
of classes is decreased), the frequency polygon eventually becomes a smooth
curve called a frequency distribution curve or simply a frequency curve.

The most common of shapes are:


(i) Symmetric
(ii) Right skewed

(iii) Left skewed


Chapter 2: Descriptive Statistics

16

SQQS1013 Elementary Statistics

Symmetric histograms

Right skewed and Left skewed

Describing data using graphs helps us insight into the main characteristics of the
data.
When interpreting a graph, we should be very cautious. We should observe
carefully whether the frequency axis has been truncated or whether any axis has
been unnecessarily shortened or stretched.

2.3.5 Cumulative Frequency Distributions


A cumulative frequency distribution gives the total number of values that
fall below the upper boundary of each class.
Example 12
Chapter 2: Descriptive Statistics

17

SQQS1013 Elementary Statistics

Using the frequency distribution of table 2.11,


Total Home
Runs
135 152
153 170
171 188
189 206
207 224
225 242

Cumulative
Frequency

Class Boundaries

134.5 less than 152.5


152.5 less than 170.5
170.5 less than 188.5
188.5 less than 206.5
206.5 less than 224.5
224.5 less than 242.5

10
2
5
6
3
4

Ogive
An ogive is a curve drawn for the cumulative frequency distribution by joining
with straight lines the dots marked above the upper boundaries of classes at
heights equal to the cumulative frequencies of respective classes.
Two type of ogive:
(i)
(ii)

ogive less than


ogive greater than

First, build a table of cumulative frequency.

Example
13

(Ogive Less Than)


Earnings
Number of
(RM)
students (f)

Cumulative Frequency

30 39
40 49
50 59
60 - 69
70 79
80 - 89

5
6
6
3
3
7

Total

Earnings (RM)

Cumulative
Frequency (F)

Less than 29.5


Less than 39.5
Less than 49.5
Less than 59.5
Less than 69.5
Less than 79.5
Less than 89.5

0
5
11
17
20
23
30

30

35

Graph Ogive Less Than


30
25
20
15
10
5
0
29.5Statistics
39.5
Chapter 2: Descriptive

49.5

59.5

69.5

79.5

89.5

Earnings

18

SQQS1013 Elementary Statistics

Example
14
(Ogive
More Than)
Earnings
(RM)

Number of
students (f)

30 39
40 49
50 59
60 - 69
70 79
80 - 89

5
6
6
3
3
7

Total

Earnings (RM)

Cumulative
Frequency (F)

More than 29.5


More than 39.5
More than 49.5
More than 59.5
More than 69.5
More than 79.5
More than 89.5

30
25
19
13
10
7
0

30

Graph Ogive More Than


35
30
25
20
15
10
5

Cumulative Frequency
0

29.5

39.5

49.5

59.5

69.5

79.5

89.5

Earnings

2.3.6 Box-Plot

Describe the analyze data graphically using 5 measurement: smallest


value, first quartile (K1), second quartile (median or K2), third quartile
(K3) and largest value.

Chapter 2: Descriptive Statistics

19

SQQS1013 Elementary Statistics

For symmetry data

Smallest
value

K1

Median

K3

Largest
value

For left skewed data

Smallest
value

K1

Median

K3

Largest
value

For right skewed data


Smallest K1
value

Median

Largest
value

K3

2.4 MEASURES OF CENTRAL TENDENCY


2.4.1 Ungrouped Data Measurement

Mean
FORMUL
A

Mean for population data:

Mean for sample data:

where:

x
N

x
n

x =

the sum of all values


N = the population size
n = the sample size,
= the population mean

= the sample mean

Example
15 following data give the prices (rounded to thousand RM) of five homes sold
The
recently in Sekayang.
158

189

265

127

191

Find the mean sale price for these homes.


Chapter 2: Descriptive Statistics

20

SQQS1013 Elementary Statistics

Solution:

Thus, these five homes were sold for an average price of RM186 thousand @
RM186 000.

The mean has the advantage that its calculation includes each value of
the data set.
Weighted Mean

Used when have different needs.

Weight mean :
FORMUL
A

xw

wx
w

where w is a weight.

Example 16
Consider the data of electricity components purchasing from a factory in the table
below:
Type

Number of component (w)

Chapter 2: Descriptive Statistics

Cost/unit (x)

21

SQQS1013 Elementary Statistics


1
2
3
4
5

1200
500
2500
1000
800

Total

6000

RM3.00
RM3.40
RM2.80
RM2.90
RM3.25

Solution:

xw

wx
w

1200(3) 500(3.4) 2500(2.8) 1000(2.9) 800(3.25)


1200 500 2500 1000 800
17800
=
6000
= 2.967
=

Mean cost of a unit of the component is RM2.97

Median

Median is the value of the middle term in a data set that has been
ranked in increasing order.

Procedure for finding the Median


Step 1: Rank the data set in increasing order.
Step 2: Determine the depth (position or location) of the median.
FORMUL
A

Depth of Median = n 2 1
Step 3: Determine the value of the Median.

Example
17

Find the median for the following data:


10
5
19

Solution:
(1)

Rank the data in increasing order

Chapter 2: Descriptive Statistics

22

SQQS1013 Elementary Statistics

(2)

Determine the depth of the Median


n 1
Depth of Median =
2
5 1
=
2
=3
(3) Determine the value of the median
Therefore the median is located in third position of the data set.

Hence, the Median for above data =

Example
18 the median for the following data:
Find
10
5
19
8
3

15

Solution:
(1) Rank the data in increasing order

(2) Determine the depth of the Median

n 1
2
6 1
=
2
= 3.5

Depth of Median =

(3) Determine the value of the Median


Therefore the median is located in the middle of 3rd position and 4th
position of the data set.

Median

8 10
9
2

Hence, the Median for the above data =

The median gives the center of a histogram, with half of the data values
to the left of (or, less than) the median and half to the right of (or, more
than) the median.

The advantage of using the median is that it is not influenced by outliers.

Mode

Chapter 2: Descriptive Statistics

23

SQQS1013 Elementary Statistics

Mode is the value that occurs with the highest frequency in a data set.

Example
19
1. What is the mode for given data?
77
69
74
81 71

68

74

73

2. What is the mode for given data?


77 69 68 74 81 71 68 74 73
Solution:
1. Mode =
2. Mode =

A major shortcoming of the mode is that a data set may have none or
may have more than one mode.

One advantage of the mode is that it can be calculated for both kinds of
data, quantitative and qualitative.

2.4.2 Grouped Data Measurement

Mean
FORMUL
A

Mean for population data:

fx
N

Mean for sample data:

x=
Where

fx
n

the midpoint and f is the frequency of a class.

Example 20
The following table gives the frequency distribution of the number of orders received
each day during the past 50 days at the office of a mail-order company. Calculate the
mean.
Number of order
10 12
13 15
16 18
19 21
Chapter 2: Descriptive Statistics

f
4
12
20
14
n = 50

24

SQQS1013 Elementary Statistics

Solution:
Because the data set includes only 50 days, it represents a sample. The value of
fx is calculated in the following table:

Number of order
10 12
13 15
16 18
19 21

f
4
12
20
14
n = 50

fx

The value of mean sample is:

Thus, this mail-order company received an average of 16.64 orders per day during
these 50 days.

Median
Step 1: Construct the cumulative frequency distribution.
Step 2: Decide the class that contain the median.
Class Median is the first class with the value of cumulative frequency is
at least n/2.

Step 3: Find the median by using the following formula:


FORMUL
A

Example 21

n
2 - F
Median = Lm +
i
f
m

Where:
n = the total frequency
F = the total frequency before class
median
i = the class width
= the lower boundary of the class
median
= the frequency of the class median

Based on the grouped data below, find the median:


Time to travel to work
1 10
11 20
21 30
31 40
41 50

Chapter 2: Descriptive Statistics

Frequency
8
14
12
9
7

25

SQQS1013 Elementary Statistics

Solution:
1st Step: Construct the cumulative frequency distribution
Time to travel to work

Frequency

1 10
11 20
21 30
31 40
41 50

8
14
12
9
7

Cumulative Frequency

Thus, 25 persons take less than 23 minutes to travel to work and another 25
persons take more than 23 minutes to travel to work.

Mode

Mode is the value that has the highest frequency in a data set.

For grouped data, class mode (or, modal class) is the class with the
highest frequency.

Formula of mode for grouped data:


FORMUL
A

Mode = L

mo

Chapter 2: Descriptive Statistics

1
i
1 + 2

26

SQQS1013 Elementary Statistics

Where:

Lmo

1
2
i

is the lower boundary of class mode


is the difference between the frequency of class mode and the
frequency of the class before the class mode
is the difference between the frequency of class mode and the
frequency of the class after the class mode
is the class width

Example
22

Based on the grouped data below, find the mode


Time to travel to work
1 10
11 20
21 30
31 40
41 50

Frequency
8
14
12
9
7

Solution:
Based on the table,

We can also obtain the mode by using the histogram;

Chapter 2: Descriptive Statistics

27

SQQS1013 Elementary Statistics

2.4.3 Relationship among Mean, Median & Mode


As discussed in previous topic, histogram or a frequency distribution curve
can assume either skewed shape or symmetrical shape.
Knowing the value of mean, median and mode can give us some idea
about the shape of frequency curve.
(1)

For a symmetrical histogram and frequency curve with one peak, the
value of the mean, median and mode are identical and they lie at the
center of the distribution.

Mean, median, and mode for a symmetric histogram and frequency distribution curve

(2)

For a histogram and a frequency curve skewed to the right, the value of
the mean is the largest that of the mode is the smallest and the value
of the median lies between these two.

Mean, median, and mode for a histogram and frequency distribution curve
skewed to the right
Chapter 2: Descriptive Statistics

28

SQQS1013 Elementary Statistics

(3)

For a histogram and a frequency curve skewed to the left, the value of
the mean is the smallest and that of the mode is the largest and the
value of the median lies between these two.

Mean, median, and mode for a histogram and frequency distribution curve
skewed to the left

2.5 DISPERSION MEASUREMENT

The measures of central tendency such as mean, median and mode do not
reveal the whole picture of the distribution of a data set.

Two data sets with the same mean may have a completely different spreads.

The variation among the values of observations for one data set may be
much larger or smaller than for the other data set.

2.5.1 Ungrouped Data Measurement


Range
FORMUL
A

RANGE = Largest value Smallest value

Chapter 2: Descriptive Statistics

29

SQQS1013 Elementary Statistics

Example 23

Find the range of production for this data set,

Solution:
Range = Largest value Smallest value
= 267 277 49 651
= 217 626

Disadvantages:
o

being influenced by outliers.


based on two values only. All other values in a data set are ignored.

Variance and Standard Deviation

Standard deviation is the most used measure of dispersion.

A Standard Deviation value tells how closely the values of a data set
clustered around the mean.

Lower value of standard deviation indicates that the data set value are
spread over relatively smaller range around the mean.

Larger value of data set indicates that the data set value are spread
over relatively larger around the mean (far from mean).

Standard deviation is obtained the positive root of the variance:


FORMUL
A
Variance for population:

Chapter 2: Descriptive Statistics

30

SQQS1013 Elementary Statistics

Variance for sample:

s2
FORMUL
A

n 1

Standard Deviation for population:


2

Standard Deviation for sample:


s

s2

Example 24
Let x denote the total production (in unit) of company
Company
A
B
C
D
E

Production
62
93
126
75
34

Find the variance and standard deviation,

Solution:
Company

Production (x)

A
B
C
D
E

62
93
126
75
34

x2

390

Chapter 2: Descriptive Statistics

31

SQQS1013 Elementary Statistics

The properties of variance and standard deviation:


o

The standard deviation is a measure of variation of all values from the


mean.

The value of the variance and the standard deviation are never
negative. Also, larger values of variance or standard deviation indicate
greater amounts of variation.

The value of s can increase dramatically with the inclusion of one or


more outliers.

The measurement units of variance are always the square of the


measurement units of the original data while the units of standard
deviation are the same as the units of the original data values.

2.5.2 Grouped Data Measurement

Range

FORMUL
A

Range = Upper bound of last class Lower bound of first class

Class
41 50
51 60
61 70
71 80
81 90
91 - 100
Total

Frequency
1
3
7
13
10
6
40

Upper bound of last class = 100.5


Chapter 2: Descriptive Statistics

32

SQQS1013 Elementary Statistics


Lower bound of first class = 40.5
Range = 100.5 40.5 = 60

Variance and Standard Deviation


FORMUL
A

Variance for population:

fx

fx

Variance for sample:

s2
FORMUL
A

fx

fx

n 1

Standard Deviation:
Population: 2
Sample:

s2

Example 25
Find the variance and standard deviation for the following data:
No. of order
10 12
13 15
16 18
19 21
Total

f
4
12
20
14
n = 50

Solution:
No. of order
10 12
13 15
16 18
19 21
Total

f
4
12
20
14
n = 50

Chapter 2: Descriptive Statistics

fx

fx2

33

SQQS1013 Elementary Statistics

Variance,

Standard Deviation,

Thus, the standard deviation of the number of orders received at the office of this mailorder company during the past 50 days is 2.75.

2.5.3 Relative Dispersion Measurement

To compare two or more distribution that has different unit based on their
dispersion OR

To compare two or more distribution that has same unit but big different in
their value of mean.

Also called modified coefficient or coefficient of variation, CV.


FORMUL
A

s
100% ( sample)
x

CV
100% ( population )
x
CV

Example
26
Chapter 2: Descriptive Statistics

34

SQQS1013 Elementary Statistics


Given mean and standard deviation of monthly salary for two groups of worker who
are working in ABC company- Group 1: 700 & 20 and Group 2 :1070 & 20. Find the
CV for every group and determine which group is more dispersed.

Solution:

20
100% 2.86%
700
20
CV2
100% 1.87%
1070
CV1

The monthly salary for group 1 worker is more dispersed compared to group 2.

2.6 MEASURE OF POSITION

Determines the position of a single value in relation to other values in a


sample or a population data set.

Quartiles
Quartiles are three summary measures that divide ranked data set into
four equal parts.

The 1st quartiles denoted as Q1


FORMUL
A

Depth of Q1 =

n 1
4

o The 2nd quartiles median of a data set or Q 2


o The 3rd quartiles denoted as Q3
FORMUL
A

Depth of Q3 =

3( n 1)
4

Example
27

Chapter 2: Descriptive Statistics

35

SQQS1013 Elementary Statistics

Table below lists the total revenue for the 11 top tourism company in Malaysia

109.7

79.9

21.2

76.4

80.2

82.1

79.4

89.3

98.0

103.5

103.5

109.7

86.8
Solution:
Step 1: Arrange the data in increasing order
76.4

79.4

79.9

80.2

82.1

86.8

89.3

98.0

121.2
Step 2: Determine the depth for Q1 and Q3

Depth of Q1 =

n 1 11 1
=
=3
4
4

Depth of Q3 =

3 11 1
3( n 1)
=
=9
4
4

Step 3: Determine the Q1 and Q3


76.4

79.4

79.9

80.2

82.1

86.8

89.3

98.0 103.5

109.7

121.2

Q1 = 79.9 ; Q3 = 103.5
Example
Table below list the total revenue for the 12 top tourism company in Malaysia
28

109.7

79.9

74.1

98.0

103.5

86.8

121.2

76.4

80.2

82.1

79.4

89.3

Solution:
Step 1: Arrange the data in increasing order
74.1 76.4

79.4

79.9

80.2

82.1

86.8

89.3

98.0 103.5

109.7

121.2
Step 2: Determine the depth for Q1 and Q3
Depth of Q1 =

n 1
12 1
=
= 3.25
4
4

Depth of Q3 =

3 12 1
3( n 1)
=
= 9.75
4
4

Step 3: Determine the Q1 and Q3


Chapter 2: Descriptive Statistics

36

SQQS1013 Elementary Statistics


74.1 76.4

79.4

79.9

80.2

82.1

86.8

89.3

98.0 103.5

109.7

121.2
Q1 = 79.4 + 0.25 (79.9 79.4) = 79.525
Q3 = 98.0 + 0.75 (103.5 98.0) = 102.125

Interquartile Range
The difference between the third quartile and the first quartile for a data
set.
FORMUL
A

IQR = Q3 Q1

Example 29
By referring to example 28, calculate the IQR.

Solution:
IQR = Q3 Q1 = 102.125 79.525 = 22.6

2.6.2 Grouped Data Measurement


Quartiles
From Median, we can get Q1 and Q3 equation as follows:
FORMUL
A

n
4 - F
Q1 LQ1 +
i
f
Q
1

3n
- F

Q3 LQ3 + 4
i
f
Q3

Example 30
Refer to example 22, find Q1 and Q3

Solution:
Chapter 2: Descriptive Statistics

37

SQQS1013 Elementary Statistics

1st Step: Construct the cumulative frequency distribution


Time to travel to work

Frequency

Cumulative Frequency

1 10
11 20
21 30
31 40
41 50

8
14
12
9
7

8
22
34
43
50

2nd Step: Determine the Q1 and Q3

Class Q1

n 50

12.5
4
4

Class Q1 is the 2nd class

Therefore,

n
4 - F
Q1 LQ1
i
fQ1

12.5 - 8
10.5
10
14
13.7143
Class Q 3

3n 3 50

37.5
4
4

Class Q3 is the 4th class


Therefore,

n
- F

Q3 LQ3 4 i
fQ3

37.5 - 34
30.5
10
9

34.3889
Interquartile Range
Chapter 2: Descriptive Statistics

38

SQQS1013 Elementary Statistics

FORMUL
A

IQR = Q3 Q1

Example 31
Refer to example 30, calculate the IQR.

Solution:
IQR = Q3 Q1 = 34.3889 13.7143 = 20.6746

2.7 MEASURE OF SKEWNESS

To determine the skewness of data (symmetry, left skewed, right skewed)

Also called Skewness Coefficient or Pearson Coefficient of Skewness

sk

mean mode
3(mean median)
or sk
s
s

If Sk +ve right skewed

If Sk -ve left skewed

If Sk = 0 symmetry

If Sk takes a value in between (-0.9999, -0.0001) or (0.0001,


0.9999) approximately symmetry.

Example
32
The duration of cancer patient warded in Hospital Seberang Jaya recorded in a
frequency distribution. From the record, the mean is 28 days, median is 25 days
and mode is 23 days. Given the standard deviation is 4.2 days.
Chapter 2: Descriptive Statistics

39

SQQS1013 Elementary Statistics

a. What is the type of distribution?


b. Find the skewness coefficient

Solution:
This distribution is right skewed because the mean is the largest value

Sk

Sk

Mean - Mode 28 23

11905
.
s
4.2
OR
3 Mean - Median
s

3 28 25
4.2

21429
.

So, from the Sk value this distribution is right skewed.

ADDITIONAL INFORMATION
Use of Standard Deviation
1. Chebyshevs Theorem
According to Chebyshevs Theorem, for any number k greater than 1, at least (1
1/k2) of the data values lie within k standard deviations of the mean.

1
k2
1
1
2 2
0.75 @ 75%
1

Thus; for example if k = 2, then

Therefore, according to Chebyshevs Theorem, at least 75% of the values of a


data set lie within two standard deviation of the mean

Chapter 2: Descriptive Statistics

40

SQQS1013 Elementary Statistics

2. Empirical Rule
For a bell-shaped distribution, approximately

1.68%of the observations lie within one standard deviation of the mean.
2.95% of the observations lie within two standard deviations of mean.
3.99.7% of the observations lie within three standard deviations of the mean.

Measure of Position
1.

Ungrouped Data - Quartile Deviation

QD is a mean for Interquartile Range

It used to compare the dissemination of two data set.

If the QD value is high, it means that the data is more


disseminated.

Quartile Deviation = Interquartile Range / 2


= (Q3 - Q1) / 2

2.

Ungrouped Data Percentile

Chapter 2: Descriptive Statistics

41

SQQS1013 Elementary Statistics

Pk = value of the (kn)th term in a ranked set 100


Where: k = the number of percentile
n = the sample size

Percentile rank of xi = Number of values than xi


X 100
Total number of values in the data set

Chapter 2: Descriptive Statistics

42

SQQS1013 Elementary Statistics

EXERCISE 2
1. A survey research company asks 100 people how many times they have been to
the dentist in the last five years. Their grouped responses appear below.
Number of Visits
04
59
10 14
15 19

Number of Responses
16
25
48
11

What are the mean and variance of the data?

2. A researcher asked 25 consumers: How much would you pay for a television
adapter that provides Internet access? Their grouped responses are as follows:
Amount ($)

Number of Responses

0 99
100 199
200 249
250 299
300 349
350 399
400 499
500 999

2
2
3
3
6
3
4
2

Calculate the mean, variance, and standard deviation.

3.

The following data give the pairs of shoes sold per day by a particular
shoe store in the last 20 days.
85
89

90
86

89
71

70
76

79
77

80
89

83
70

83
65

75
90

76
86

Calculate the
a. mean and interpret the value.
b.median and interpret the value.
c. mode and interpret the value.
d.standard deviation.

4.

The followings data shows the information of serving time (in minutes) for 40
customers in a post office:
2.0
4.5
2.5
2.9
4.2
2.9
3.5
3.2
2.9
4.0
3.0
3.8
2.5
2.3
2.1
3.1
3.6
4.3
4.7
2.6
4.1
4.6
2.8
5.1
2.7
2.6
4.4
3.5
2.7
3.9
2.9
2.9
2.5
3.7
3.3
a.Construct a frequency distribution table with 0.5 of class width.

Chapter 2: Descriptive Statistics

2.8
3.5
3.1
3.0
2.4

43

SQQS1013 Elementary Statistics


b.Construct a histogram.
c.Calculate the mode and median of the data.
d.Find the mean of serving time.
e.Determine the skewness of the data.
f. Find the first and third quartile value of the data.
g.Determine the value of interquartile range.

5.

In a survey for a class of final semester student, a group of data was obtained for
the number of text books owned.
Number of students
12
9
11
15
10
8

Number of text book owned


5
5
3
2
1
0

Find the average number of text book for the class. Use the weighted mean.

6.The following data represent the ages of 15 people buying lift tickets at a ski area.
15
30

25
53

26
28

17
40

38
20

16
35

60
31

21

Calculate the quartile and interquartile range.


7.A student scores 60 on a mathematics test that has a mean of 54 and a standard
deviation of 3, and she scores 80 on a history test with a mean of 75 and a
standard deviation of 2. On which test did she perform better?
8.The following table gives the distribution of the shares price for ABC Company which
was listed in BSKL in 2005.
Price (RM)
12 14
15 17
18 20
21 23
24 26
27 - 29

Frequency
5
14
25
7
6
3

Find the mean, median and mode for this data.

Chapter 2: Descriptive Statistics

44

S-ar putea să vă placă și