Sunteți pe pagina 1din 104

CIVIL ENGINEERING

STATISTICS
BFC 34303
Chapter 1 :
Review on Descriptive Statistics
INTRODUCTION
These are Mathematics marks for 30
students who are taking Test 1

12 , 23, 24, 45, 34, 48, 56, 63, 23, 44,


69, 78, 84, 95, 98, 67, 73, 69, 58, 70,
40, 88, 59, 47, 37, 15, 17, 36, 63, 38

How to interpret these marks?


WHAT IS STATISTICS ?
~ Statistics is the science that deals
with collecting, classifying, presenting,
describing, analyzing and interpreting
data to enable us to draw conclusions
and making reasonable decisions
~ Can be divided into 2 categories
(a) Descriptive statistics
(b) Inferential statistics
Descriptive statistics
~ The activities of collecting, classifying, presenting
and describing quantitative data
~ Methods for organizing (frequency table), representing
(graphs) and summarizing data (central tendency and
variability).
Inferential statistics
~ The part dealing with technique and method of
interpretation of the results obtained from the descriptive
statistics
WHAT IS POPULATION ?
~ Population is the entire (complete)
collection of data whose properties are
analyzed. It contains all the subjects of
interest.
~ Can be of any size, its items need not
be uniform but must share at least one
measurable feature.
WHAT IS SAMPLE?

~ A portion of population selected


for study

~ Sample is any set of entities, cases,


subjects, items or experimental
units chosen from the population.
WHAT IS RANDOM SAMPLE?
~ A random sample is a sample
selected in such a way that each
element of the population has the
same chance of being selected
WHAT IS PARAMETER ?
~ Parameter is a numerical measurement
describing some characteristics of a
population
~ Eg: The population mean , variance

WHAT IS STATISTIC?
~ Statistic is a numerical measurement
describing some characteristics of a
sample
~ Eg: The sample mean ,variance
WHAT IS VARIABLE ?
~ Any measured characteristic or
attribute that differs for different
elements

~ For example, if the weight of 30


subjects were measured, then
weight would be a variable.

~ Can be classified as quantitative or


qualitative
WHAT IS QUANTITATIVE
VARIABLE ?
~ The variable being studied is
numeric
~ measured on an ordinal, interval,
or ratio scale

~ eg: If the time it took them to


respond were measured, then the
variable would be quantitative.
WHAT IS QUALITATIVE
VARIABLE ?
~ The variable being studied is non-numeric
~ Called "categorical variables”

~ Measured on a nominal scale


~ eg: gender, educational level, eye
colour
If five-year old students were asked to
name their favourite colour, then the
variable would be qualitative.
WHAT IS DATA ?
~ A set of data is a collection of
observation, measurements or
information obtained

~ Can be classified as quantitative or


qualitative

~ Can be presented in various ways


WHAT IS QUANTITATIVE
DATA ?
~ Quantitative data refers to
observations which can be
measured numerically or counted
~ Can be divided into discrete data
and continuous data
~ eg: length, time,
temperature and mass
WHAT IS QUALITATIVE DATA ?

~ Qualitative data are not in


numerical form but instead
assigned as attributes
~ eg: race, marital status, age, gender
Discrete data
~ is a set of data that can only take exact
and countable values
~ For example:

a) The number of students in a class.


b) The number of cars sold on any day
at a car dealership.
c) The number of persons in a family.
d) The number of students in a class.
Continuous data
~ is a data can take any value over
certain interval and can be measured
to a certain degree of accuracy
(correct to certain decimal places)
~ For example:
a) The weight of students in a class.
b) The time taken to complete an
examination.
c) The amount of soda in a 150ml can.
d) The income of a family.
WHAT IS UNGROUPED DATA ?
~ (a) Raw data
(b) Not in the term of interval
(c) Frequency distribution that has
been arranged in order
~ Example:
(i) 3,5,6,2,5,2,4,6,5

(ii) Number of books 0 1 2 3


Frequency 3 7 4 2
WHAT IS GROUPED DATA ?
~ The data can be grouped into class
interval before the frequency
distribution is constructed
~ The table constructed is called
frequency distribution table
~ Example:

Height 150-155 155-160 160-165 165-170


(cm)
Frequency 2 8 6 5
WHAT IS FREQUENCY DISTRIBUTION?

• One method for simplifying and organizing data is to


construct a frequency distribution.

• A frequency distribution is an organized tabulation


showing exactly how many individuals are located in
each category on the scale of measurement.
Examples:
Determine whether the data obtained is discrete or continuous data.

(a) The number of books sold by a stationary shop.


(b) The time taken to travel from Kuala Terengganu to Batu Pahat
(c) The weight of FKAAS students
(d) The diameter of twenty spheres
REMARKS…

• All data are to be considered as sample


unless otherwise stated in the questions.
Example :
The number of male children in 20 families chosen at
random is as follows.
14 2 0 2 3 3 2 1 4 5 2 1 2 0 1 2 3 1 2

The above data is called a raw data and it can be


summarized as a frequency distribution as shown :
Number of male 0 1 2 3 4 5
children
Frequency 2 5 7 3 2 1

The data shown in this frequency distribution table is known


as ungrouped data.
CENTRAL TENDENCY
• In general terms, central tendency
(mean, median, and mode) is a statistical
measure that determines a single value
that accurately describes the center of the
distribution and represents the entire
distribution of scores.

• The goal of central tendency is to identify


the single value that is the best
representative for the entire set of data.
MEASURES OF LOCATION
( CENTRAL TENDENCY)
MEAN
Given a set data of x1,x2,x3,..xn.
The mean, is defined as
sum of all observations
x 
number of observations
x1  x 2  ...  x n

n
n For a set of data k
x i which can be fx i i
 i1 represented in a
 i 1
n frequency distribution k
table, the mean is
given by
f
i 1
i
Example :
Find the mean of the following data
14 2 0 2 3 3 2 1 4 5 2 1 2 0 1 2 3 1 2
Solution:
n

x i
1  4  2  ...  3  1  2
x i 1

n 20
41
  2.05
20
OR

x 0 1 2 3 4 5
f 2 5 7 3 2 1

fx i i
2(0)  5(1)  7(2)  3(3)  2(4)  1(5)
x i 1 
k 20
f
i 1
i  2.05
Example :
To obtain grade A, Saleha must achieve an average
of at least 75 marks in four tests. If her average
mark for the first three tests is 70, calculate the
lowest mark she must get in her fourth test in order
to obtain grade A.
Solution:
Let the four tests : w,x,y,z
Mean for w,x,y : 70
Mean for w,x,y,z :
3(70)  z
 75
4
210  z
 75
4 So, the lowest mark
210  z  300 she must get in her
fourth test in order to
z  90 obtain grade A is 90
MEDIAN
The median is the middle value of a set of data that is arranged in
order of magnitude.
th
Let x(k) be the k observation in a set of data which has been
arranged in ascending or descending order.
For example, consider the following set of numbers
9 2 7 10 5 16
After arrangement, it becomes
2 5 7 9 10 16
Thus, between x3  7 and x 4  9
 median is 8
Themedianof a set data x1 ,x 2 ,...,x n is denoted
by x(m) and x m may becalculated as:

 x n1  ,if n is odd


 
 2 


xm    
1
 x  x 
 2   2   2 1  ,if n is even
n n 
     
Example :
Find the median for the following sets of data
a) 21, 24, 17, 28, 36, 20, 32
b) 3.56, 2.7, 5.48, 8.61, 4.35, 6.22

Solution:
a) The data arranged in ascending order :
17 , 20 , 21 , 24 , 28 , 32 , 36
Since n = 7 , which is odd, thus the
median is x  x
m n 1  x  24
4
2
b) The data arranged in ascending order :
2.71 , 3.56 , 4.35 , 5.48 , 6.22 , 8.61
Since n = 6 , which is even, thus the
median is  
1
xm   x  6   x  6  
2   2   2 1 
 
1
 x3  x 4 
2 

1
  4.35  5.48 
2
 4.915
MODE

• The mode of a set of data is the value that


occurs most frequently.

• The mode may not be unique or they may be


no mode at all.
Example :
Find the mode for the following set of data

a) 2, 3, 3, 4, 5, 28, 5, 5

b) 2, 3, 5, 8, 10

c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.7, 0.7, 0.7, 0.5


QUARTILES
Quartiles divide a set of data which are arranged in
ascending order into 4 equal parts.
To find quartile ( Qk ):
Let k
r n
4
where : n  number of observations
k  quartile for Qk
(i) If r is an integer:
1 th
Qk   r observation  ( r  1) observation 
 th

2
(ii) If r is not an integer, then round up to the next
integer.
Q2 is also called median.
Interquartile Range = Q3  Q1
PERCENTILES
Percentiles divide a set of data which are arranged in
ascending order into 100 equal parts.
To find percentile ( Pk ):
k
Let r n
100
where : n  number of observations
k  percentile for Pk
(i) If r is an integer:
1 th
Pk   r observation  ( r  1)th observation 
2
(ii) If r is not an integer, then round up to the next
integer.

Notes:Q1 =P25 , Median  Q2 =P50 , Q3 =P75


Example :
Find the median, first quartile (Q1) ,third
quartile (Q3 ) and 40th percentile ( P40 ) for the
following sets of data
a) 21, 24, 17, 28, 36, 20, 32
b) 3.5, 2.7, 5.4, 8.6, 4.3, 6.2, 9.9, 7.6
Solution:
a) The data arranged in ascending order :
17 , 20 , 21 , 24 , 28 , 32 , 36
Median  Q2
k 2
r  n   7   3.5 ( not an integer )
4 4
 Median  Q2  4 observation  24
th
First quartile  Q1
k 1
r  n   7   1.75 ( not an integer )
4 4
 Q1  2 observation  20
th

Third quartile  Q3
k 3
r  n   7   5.25 ( not an integer )
4 4
 Q3  6 observation  32
th

40 percentile  P40
th

k 40
r n  7   2.8 (not an integer )
100 100
 P40  3 observation  21
rd
Example :

The following table shows the marks obtained


by 30 students in a Mathematics quiz, where
the maximum marks is 10.
Marks 2 3 4 5 6 7 8 9 10
No. of 2 4 3 6 4 5 4 1 1
students

Find the mean, mode, median, first and


third quartiles, interquartile range and
the 60th percentile.
Example :

Data 1: 6,7,8,6,9,6 mean = 7


Data 2: 5,7,2,6,13,9 mean = 7

• Most of the numbers in data 1 are around the mean value.


• Data 2 is more spread away from the mean.
• The difference in the spread can be determined by the measure of
dispersion
MEASURES OF DISPERSION

Variability
• The goal for variability is to obtain a measure
of how spread out the scores are in a
distribution.
• A measure of variability usually accompanies a
measure of central tendency as basic
descriptive statistics for a set of scores.
MEASURES OF DISPERSION

Three common measure of dispersion are:


• Range
• Variance
• Standard deviation
Range = Largest value – Smallest value

REMARK
• Range is not a good measure of dispersion because it is influenced by the
extreme values and the calculation does not cover all observations.

• Variance and standard deviation are most useful and widely used
measure of dispersion. Although they are influenced by the extreme
values, the calculations cover all the observations
REMARK

• Standard deviation measures how spreads out the values in a data set are.
• If the data points are all close to the mean, then the standard deviation is
close to zero.
• If many data points are far from the mean, then the standard deviation is
far from zero.
• If all the data values are equal, then the standard deviation is zero.
VARIANCE x
X

 fx i i

nf i

S 2

 (X  X) i
2

n 1 for i  1,2,...,n
Commonly in use formulae
STANDARD
 DEVIATION
2
x  nX
2 2
2
 nX fx
S 
2 i 2
S  i i

n 1 n 1
S  VARIANCE
  x  fx 
2


2

 xi2 
i
fx 2

i i  S2
 n i i
n

n 1 n 1
Example :
Calculate the variance and standard deviation for the
following sets of sample data. Hence, determine which data
is more disperse about the mean.

Set 1 : 16,10,9,2,5,2,7
Set 2 : 10,32,8,12,14,36,20,8,40,4,32,1
For Data 1:

Data 1 : 16,10,9,2,5,2,7
  n
 
2

x x2  n   xi  
   

i 1
2 4 X 2

i 1
i
n 
2 4  
 
5 25 S 
2

7 49 n 1
9 81
 51
2

10 100 519 
 7  24.571849
16 256 6
n n

 Xi  51
i1
 i  519
X
i1
2
S  24.571849  4.957
For Data 2:

Data 2 : 10,32,8,12,14,36,20,8,40,4,32,1
  n
 
2
n n
 n   xi    Xi  217  i  5929
2
X
   

i 1
X 2
 i1 i1
i 1
i
n 
 
 
S 
2

n 1

 217 
2

5929 
 12  182.265 Hence, data 2 is
11 more disperse
than data 1
S  182.265  13.5
STEM-AND-LEAF DIAGRAMS
Used to extract every data value in dataset.
The digit(s) in the greatest place value(s) of the data
values are the stems.
The digits in the next greatest place values are
the leaves.
To construct a stem-and-leaf diagram:
1. Place the stems in order vertically from smallest to
largest.
2. Place the leaves in order in each row from smallest
to largest.
3. Create a key for the stem-and-leaf diagram so that
people know how to interpret the diagram.
Example :
STEM-AND-LEAF DIAGRAMS
Shape of distribution
A perfectly symmetric curve is one in which both sides of
the distribution would exactly match the other if the figure
were folded over its central point.
An example is shown below:

A symmetric, bell-shaped distribution, a relatively common


occurrence is called a normal distribution.
STEM-AND-LEAF DIAGRAMS
A distribution is said to be skewed to the right, or
positively skewed, when most of the data are
concentrated on the left of the distribution. The right tail
clearly extends farther from the distribution's centre than
the left tail, as shown below:
STEM-AND-LEAF DIAGRAMS
A distribution is said to be skewed to the left, or
negatively skewed, if most of the data are concentrated
on the right of the distribution. The left tail clearly extends
farther from the distribution's centre than the right tail, as
shown below:
STEM-AND-LEAF DIAGRAMS
Example:
If the stem and leaf plot is turned on its side, it will look like
the following:

The distribution shows that most data are clustered at the right.
The left tail extends farther from the data centre than the right
tail. Therefore, the distribution is skewed to the left or
negatively skewed.
Example :
Marks of a recent Mathematics test are as given below:
73, 42, 67, 78, 99, 84, 91, 82, 86, 94
Based on the marks given:
(a) Construct a stem-and-leaf diagram.
(b) What is the highest and lowest mark?
(c) Interpret the distribution.
Solution:
(a) Mathematics Test Mark
Stem Leaf
4 2
5
6 7
7 3 8
8 2 4 6
9 1 4 9
Key:
9 9 means 99 marks
(b) Highest mark = 99, Lowest mark = 42
(c) Negatively skewed
Example :

Given the heights of 20 people are as follows:


154, 143, 148, 139, 143, 147, 153,
162, 136, 147, 144, 143, 139, 142,
143, 156, 151, 164, 157, 149.
Construct a stem-and-leaf diagram and state the shortest and
tallest height. Interpret the distribution.
Solution:
Stem Leaf
13 6 9 9
14 2 3 3 3 3 4 7 7 8 9
15 1 3 4 6 7
16 2 4
Key:
13 6 means 136 cm
Shortest height =136 cm
Tallest height =164cm
Positively skewed
Exercise:

The length of a straight line that were estimated by 22


students in mm are as given below:
10.5, 8.5, 8.6, 8.1, 7.3, 4.4, 6.6, 6.6, 7.9, 8.7, 8.3,
6.0, 8.7, 7.5, 7.9, 6.0, 9.1, 7.2, 8.4, 8.1, 8.6, 9.3
Construct a stem-and-leaf diagram based on the given
data. Interpret the distribution.
BOX-AND-WHISKER PLOTS
70
max

Q1 Q2 Q3 60
min max

50

0 10 20 30 40 50 60 70
40 Q3
Horizontal Box and Whisker
30
Q2

20

10
min
Vertical Box and Whisker
0
BOX-AND-WHISKER PLOTS
To construct a box-and-whisker plot:

STEP 1: Determine the five number summary.


STEP 2: Draw a horizontal axis on which the number
obtained in step 1 can be located. Above this
axis, mark all the five number summary with
vertical lines.
STEP 3: Connect the quartiles to each other to
make a box, and then connect the box
to the maximum and minimum lines.
STEP 4: Calculate the values of upper and lower
inner fence to determine whether the data
Upper inner fence = Q3 + 1.5 (Q3 – Q1)
Lower inner fence = Q1 - 1.5 (Q3 – Q1)
Lower inner fence Upper inner fence

min max
Q1 Q2 Q3

10 20 30 40 50 60 70 80 90 100

The data lies within the upper and lower inner fence, so the data has no outlier.

Lower inner fence Upper inner fence


Outlier

min max
Q1 Q2 Q3

10 20 30 40 50 60 70 80 90 100

The observation that lies outside fence is known as outlier.


SHAPE OF DATA DISTRIBUTION
(SYMMETRY AND SKEWNESS)

Symmetrical distribution-the ‘whiskers’ are


the same length and the median Q2 is in
the centre of the box.

Q1 Q2 Q3
min max
SHAPE OF DATA DISTRIBUTION
(SYMMETRY AND SKEWNESS)

Positively skewed distribution-the left


‘whiskers’ is shorter than the right ‘whiskers’
and the median is nearer to Q1.

Q1 Q2 Q3
min max
SHAPE OF DATA DISTRIBUTION
(SYMMETRY AND SKEWNESS)

Negatively skewed distribution-the left


‘whiskers’ is longer than the right
‘whiskers’ and the median is nearer to Q3.

Q1 Q2 Q3
min max
Example :
Data :
40, 32, 61, 52, 65, 68, 41, 61, 70, 66, 57, 55, 45,
51, 62, 69, 31, 50, 72, 66, 41, 54, 65, 79, 66
(a) Display the data in a stem and leaf diagram.
(b) Find the first, second and third quartiles, upper and lower inner
fence.
(a) Construct a box and whisker plot for the above data.
Solution :
(a) Stem Leaf
3 1 2
4 0 1 1 5
5 0 1 2 4 5 7
6 1 1 2 5 5 6 6 6 8 9
7 0 2 9
Key:
5 4 means 54
(b) Number of observation, n = 25, min = 31 , max = 79
1
r   25   6.25 , Q1 = the 7th observation
4
= 50
2
r  25   12.5 , Q2 = the 13th observation
4
= 61
3
r  25  18.75, Q3 = the 19th observation
4
= 66

Upper inner fence = Q3 + 1.5 (Q3 – Q1)


= 66 + 1.5(66 - 50)
= 90

Lower inner fence = Q1 - 1.5 (Q3 – Q1)


= 50 - 1.5(66 - 50)
= 26
(c)
Lower inner fence Upper inner fence
26 90
Q1 Q2 Q3

31 50 61 66 79

10 20 30 40 50 60 70 80 90 100

No outlier. The data is negatively skewed (skewed to the left).


Example :

Stem Leaf
5 1 9
6 2 3 3 4 4 4 4 4 5
6 8 8 8 9 9 9
7 0 2 2 3 6 7

Key:
5 9 means 59o F

From the given Stem and Leaf diagram, construct Box


and Whiskers plot. Determine the outliers of the data.
Number of observation, n = 23, min = 51 , max = 77

1
r   23  5.75 Q1 = the 6th observation
4
= 64o F

2
r  23  11.5 Q2 = the 12th observation
4 = 68o F
3
r  23  17.25 Q3 = the 18th observation
4
= 70o F
Upper inner fence = Q3 + 1.5 (Q3 – Q1)
= 70 + 1.5(70-64)
= 79o F

Lower inner fence = Q1 - 1.5 (Q3 – Q1)


= 64 - 1.5(70-64)
= 55o F
Lower inner fence Upper inner fence
55 79
Outlier
Q1 Q2 Q3

51 64 68 70 77

50 60 70 80
From the boxplot, we can see that the minimum value
51o F is outside the fence and this value is the outlier.
Therefore whiskers is drawn from 59o F to 77o F .
Lower inner fence Upper inner fence
55 79
Q1 Q2 Q3
Outlier

51 59 77
64 68 70

50 60 70 80
The data is negatively skewed (skewed to the left).
GROUPED
DATA
MEAN of a frequency distribution

The mean of a set of grouped data given in


the form of a frequency distribution is
defined as
k

f i xi
x  i 1
k

f
i 1
i

f
i 1
i  total no. of frequency

xi  class mark
Example :
Find the mean for the following data

Class Frequency, fi
0 ≤ x <10 2
10 ≤ x <20 17
20 ≤ x <30 26
30 ≤ x <40 10
40 ≤ x <50 5
Class Frequency

0 ≤ x <10 2
10 ≤ x <20 17
20 ≤ x <30 26
30 ≤ x <40 10
40 ≤ x <50 5
0  10
SOLUTION: x
2
Class Class mark, Frequency, fixi
xi fi
0 ≤ x <10 5 2 10
10 ≤ x <20 15 17 255
20 ≤ x <30 25 26 650
30 ≤ x <40 35 10 350
40 ≤ x <50 45 5 225
 fi = 60 f x
i i  1490
k

f xi 1490
x  24.83
i
x i 1
k

f
i 1
i 60
MODE of a frequency distribution

 d1 
mod e  Lm   c
 d1  d 2 
Lm = lower boundary of the class containing the
mode
d1 = the diff. between the frequency of the mode
class and the frequency of the class
immediately before it.
d2 = the diff. between the frequency of the mode
class and the frequency of the class
immediately after it
C = size of the mode class
Example :
Find the mode of frequency distribution given below:
Class Frequency
15 - 19 1
20 - 24 4
25 - 29 22
30 - 34 35
35 - 39 20
40 - 44 8
SOLUTION:

The mode class is 30 – 34 and the


corresponding frequency is 35.

Lm  29.5
 d1 
d1  35  22 mod e  Lm   c
d 2  35  20  d1  d 2 
c5
 13 
mode  29.5    5
 13  15 
= 31.8
Mode from histogram
Draw a line from the left upper
Draw
cornera of
line from
the the right
highest upper
vertical bar
frequency corner ofestimated
the highest vertical
to the is
Mode left upper corner
from of
thethe bar
to thevertical
next right upper
intersection bar corner
point of bothof the
lines
vertical bar before it
Histogram should be drawn on a
graph paper in order to obtain an
accurate answer

mode Class boundaries


Example :
For the data in example 2, find the mode
using the histogram
SOLUTION:
35

Frequency
30

25

20

15

10
5

14.5 19.5 24.5 29.5 34.5 39.5 44.5


Mode = 31.8
MEDIAN of a frequency distribution

NOTE :

Median of frequency distribution can't be


counted like the ungrouped data
because the data has been grouped in
the form of classes. So, we will get an
estimated value of median.
MEDIAN

n 
 2  FL 
m  Lm   c
 fm 
 
L m  lower boundary
n  total no. of frequency
FL  cumulative frequency of the class before median class
fm  frequency of median class
c  size of median class
Example :
Calculate the median for the following data
Class Frequency, f
0≤x<5 7
5 ≤ x <10 27
10 ≤ x <15 35
15 ≤ x < 20 54
20 ≤ x < 25 63
25 ≤ x < 30 43
30 ≤ x < 35 25
35 ≤ x < 40 17
40 ≤ x < 45 9
45 ≤ x < 50 4
SOLUTION:
Class Frequency, f Frequency, FL

0≤x<5 7 7
5 ≤ x <10 27 34
10 ≤ x <15 35 69
15 ≤ x < 20 54 123
20 ≤ x < 25 63 186
25 ≤ x < 30 43 229
30 ≤ x < 35 25 254
35 ≤ x < 40 17 271
40 ≤ x < 45 9 280
45 ≤ x < 50 4 284
f  284
The median class is 20 ≤ x < 25 with the
corresponding frequency as 63.
Hence, the median is n 
 2  FL 
m  Lm  
Lm  20  fm
c

 
 f  284 1 
FL  123  2 (284)  123 
m  20   5
 63 
fm  63  
c5  21.51
Quartile
Quartiles divide a set of data which are
arranged in ascending order into 4 equal
parts
Percentile
Percentiles divide a set of data which are
arranged in ascending order into 100 equal
parts
Decile
Deciles divide a set of data which are
arranged in ascending order into 10 equal
parts
For grouped data;
k  
  4  n  FL 
Qk  Lk      Ck, k  1, 2,3,..
 fk 
 
 

 k  
  100  n  FL 
Pk  Lk      Ck, k  1, 2,3,..,99
 fk 
 
 

 k  
  10  n  FL 
Dk  Lk      Ck, k  1, 2,3,..,9
 fk 
 
 
Lk = lower boundry of the class where Qk ,Pk ,Dk lies
n = total number of observations
FL = cumulative frequency before the class Qk ,Pk ,Dk
fk = frequency of the class where Qk ,Pk ,Dk lies
ck = class width where Qk ,Pk ,Dk lies
Example :
Height (cm) 3-5 6-8 9-11 12-14 15-17 18-20
Frequency 1 2 11 10 5 1

From the above data, calculate :


(a) first , third quartiles & interquartile range
th th
(b) the 10 , 90 percentiles
 c the 5
th
decile, D5
Solution:
Class Class Cumulative frequency
Limit Bound. Freq.
3-5 2.5-5.5 1 1
6-8 5.5-8.5 2 3
9-11 8.5-11.5 11 14
12-14 11.5-14.5 10 24
15-17 14.5-17.5 5 29
18-20 17.5-20.5 1 30
Q1 is in third class with boundries (8.5 - 11.5 )
Thus, Lk  8.5, f k  11, FL  3, c=3

(a) First and third quartile


Q1  P25

 7.5  3 
= 8.5 +    3  9.73
 11 
Q3 is in third class with boundries (11.5-14.5 )
Thus, Lk  11.5, f k  10, FL  14, c=3

Q3 = P75
 22.5-14 
=11.5 +    3
 10 
 14.05

Q3  Q1  14.05  9.73  4.32


 3 - 1
(b) P10 = 5.5 +   x 3  8.5
 2 
 27 - 24 
P90 = 14.5 +   x 3  16.3
 5 
 c D5  P50  Median
 15 - 14 
= 11.5 +   x3
 10 
 11.8
RANGE

Range = upper boundary of the last data


- lower boundary of the first class

INTERQUARTILE RANGE
• Defined as the difference between the
third quartile and the first quartile

Interquartile range = Q3 - Q1
  fx 
2

 fx 
2

Variance, S2 
f
 f -1

standard deviation, S  Variance


 S 2
Example :
Find the range, variance and standard deviation
Class Frequency Class 2
Intervals mark x fx fx
1-3 5 2 10 20
4-6 3 5 15 75
7-9 2 8 16 128
10-12 1 11 11 121
13-15 6 14 84 1176
16-18 4 17 68 1156

 f  21  fx  fx 2
= 204  2676
Solution:
Range = upper boundary of the last data
- lower boundary of the first class
= 18.5 – 0.5 = 18

  fx 
2

 fx  f
2

S 2

 S  34.71
2

 f 1
 204 
2 S = 34.71
2676 
21

20  5.892
Example :
Find the mean, variance and standard deviation.
Marks Number of students
0  x < 20 9
20  x < 40 29
40  x < 60 42
60  x < 80 26
80  x < 100 14

S-ar putea să vă placă și