Chapter 1: Descriptive Statistics: 1.1 Some Terms

UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II
Chapter 1: Descriptive Statistics
1.1 Some terms

Raw data
Raw data is data recorded in the sequence in which they are collected and before they are processed or
ranked
Table 1: The weights of 20 students in kg (Quantitative raw data)
61 68 65 67 68 71 69 63 74 64
66 65 62 67 60 73 69 70 70 71
Table 2: The grades of UCCM2623 of 20 students (Qualitative raw data)
A B C A C B B A B C
B A B B B A C D D B
Arrays
An arrangement of numerical raw data in ascending order or descending order of magnitude
60 61 62 63 64 65 65 66 67 67
68 68 69 69 70 70 71 71 73 74
Ungrouped data
Contains information on each member of a sample or population individually
Examples: Data presented in Table 1 and Table 2
Grouped data
Data presented in classes or intervals.
Example:
UCCM2623 Scores 10 12 13 15 16 18 19 21
Number of students 4 12 20 14
1.2 Organizing and Graphing Qualitative Data
1.2.1 Frequency distributions for qualitative data

A tabular arrangement that lists all categories and the number of elements that belong to each of the
categories.
Example 1.1. A sample was taken of 25 students who were planning to go to college. The courses he/she
intended to choose:
Engineering Infotech Engineering Business Business
Business Business Other Biotech Biotech
Biotech Biotech Infotech Biotech Biotech
Other Business Engineering Business Other
Engineering Biotech Biotech Other Infotech
Construct a frequency distribution table for these data.
Chapter 1 - 1
Solution.
Course Tally Frequency
Biotech 8
Business
Engineering 4
Infotech
Others 4
Total: 25
1.2.2 Relative frequency and percentage distributions

Tabular arrangement that lists the relative frequencies and percentages for all categories.
frequency of that category f

relative frequency of a category
sum of all frequencie s f
Percentage relative frequency 100%
Example 1.2. Determine the relative frequency and percentage distributions for the data in Example 1.1.
Solution.
Course Relative Percentage
Frequency
Biotech 32%
Business 0.24
Engineering 16%
Infotech 0.12
Others 16%
Total: 1 100%
1.2.3 Graphical presentation of qualitative data

Bar Graphs (bar chart)
A graph made of bars whose heights represent the frequencies of respective categories.
Example 1.3. Construct a bar chart for the data in Example 1.1.
Solution.
Frequency
8
6
4
2
Course
Biotech Business Engineering Infotech Others
Chapter 1 - 2
1.3 Organizing and graphing quantitative data
1.3.1 Frequency Distribution for quantitative data

Lists all the classes and the number of values that belong to each class.
Data presented in the form of a frequency distribution are called grouped data.
Note:
Generally, the grouping process destroys some of the original information
The classes are non-overlapping i.e. each value belongs to one and only one class
Class
An interval that includes all the values that falls within two numbers, the lower and upper limits
Class limits
Endpoints of each interval
Class Boundary
Class boundary is the dividing line between two classes. It is given by the midpoint of the upper limit of
one class and the lower limit of the next higher class
Class width / class size

Class width is the difference between the upper and lower class boundary
class width upper boundary lower boundary
Class mark / class midpoint

Class mark is the midpoint of the class interval
class mark (lower class limit upper class limit ) / 2
Constructing frequency distribution tables
1. Determine the number of classes, usually varies from 5 to 20, depending mainly on the number of
observations in the data set.
Find 2k where k is the smallest number such that 2k is greater than the number of observations
(n).
2. Determine the class interval or width ( i )

Must cover at least the distance from the smallest value (L) in the raw data up to the largest value
(H)
largest value( H ) smallest value( L)
approximate class width
number of classes
The class width is usually rounded to some convenient number.

The rounding of this number may slightly change the number of classes initially intended.
3. Determine the lower limit of the first class or the starting point.
Any convenient number that is equal to or less than the smallest value in the data set can be used
as the lower limit of the first class.
Chapter 1 - 3
Example 1.4. Sample of birth-weights (oz) from 50 consecutive deliveries is given below. Construct a
frequency distribution table.
86 111 118 121 92 124 108 104 132 125

120 91 89 122 115 138 118 99 95 115
123 128 134 115 84 138 140 105 124 144
104 133 132 106 98 125 146 108 132 98
121 104 98 115 107 127 122 135 126 89
Solution.
Birthweights (oz) Tally f

80-89 4
90-99
8
110-119
120-129 13
130-139
3
1.3.2 Relative frequency and percentage distributions
frequency of that class f

relative frequency of a class
sum of all frequencie s f
Percentage relative frequency 100%
Example 1.5. Calculate the relative frequencies and percentages distributions for the data in Example
1.4.
Solution.
Birthweights (oz) Class Boundaries Relative Frequency Percentage

80-89 79.5 - 89.5 8%
90-99 0.14 14%
100-109 99.5 - 109.5 0.16 16%
110-119 109.5 - 119.5 0.14 14%
120-129 119.5 - 129.5 0.26
130-139 129.5 - 139.5 0.16 16%
140-149 139.5 - 149.5 0.06 6%
Chapter 1 - 4
Grouped (quantitative) data can be displayed in a histogram or a polygon.
1.3.3 Histogram
Three types of histogram
1. Frequency histogram
2. Relative frequency histogram
3. Percentage histogram
A frequency histogram consists of a set of rectangle having

a) The bases on a horizontal axis with centres at the class marks and lengths equal to the class interval
sizes
b) The areas proportional to the class frequencies
If the class intervals all have equal size

the height of the rectangles are proportional to the class frequencies
otherwise
the height of the rectangles must be adjusted
Procedures to draw a histogram:

1. Mark the class boundary of each interval on the horizontal axis.
2. For each class, mark the frequencies (or relative frequencies or percentages) on the vertical
axis.
3. Draw a bar for each class so that its height represents the frequency of that class. (No gap
between each bar)
4. Label the histogram.
1.3.4 Polygon
Polygon is a line graph formed by joining the midpoints of the tops of successive bars in a histogram.
Next, we mark two more classes (with zero frequencies), one at each end, and mark the midpoints.
Three types of polygon:

1. Frequency polygon
2. Relative frequency polygon
3. Percentage polygon
Example 1.6. Reconsider the data in Example 1.4 and draw

i) the frequency histogram and frequency polygon
ii) the relative frequency histogram and relative frequency polygon
iii) the percentage histogram and percentage polygon
Chapter 1 - 5
The frequency histogram and frequency polygon

Frequency
15
10
79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5

Birth-weight (oz)
The relative frequency histogram and relative frequency polygon

Relative Frequency
0.30
0.25
0.20
0.15
0.10
0.05
79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5
Birth-weight (oz)
The percentage histogram and percentage polygon

Percentage Relative Frequency
30
25
20
15
10
5
79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5
Birth-weight (oz)
Example 1.7. The frequency distribution gives the weight of 35 objects, measured to the nearest kg.
Draw a histogram to illustrate the data.
Weight (kg) 68 9 11 12 17 18 20 21 29
Frequency 4 6 10 3 12
Solution.
standard class width
adjusted frequency frequency
class width
Chapter 1 - 6
Weight (kg) Class width Frequency Height of rectangle (adjusted frequency)

68 3 4 4
9 11 6 6
12 17 10
18 20 3 3 3
21 29 9 12
Adjusted Frequency
6
5
4
3
2
1
5.5 8.5 11.5 14.5 17.5 20.5 23.5 26.5 29.5

Weight (kg)
1.3.5 Cumulative frequency distribution

A table that presents the total number of values that fall below the upper boundary of each class.
It is constructed for quantitative data only.
cumulative frequency of a class
cumulative relative frequency
sum of all frequencie s in the data set
cumulative percentage cumulative relative frequency 100%
Example 1.8. Refer to data in Example 1.4, construct its cumulative frequency distribution, cumulative
relative frequency and cumulative percentage.
Birthweights (oz) Cumulative Cumulative relative Cumulative

frequency frequency percentage, %
<79.5 0 0 0%
4 0.08 8%
<99.5 0.22 22%
<109.5 19 38%
<119.5 26 0.52
<129.5 39 0.78 78%
<139.5 47 0.94 94%
<149.5 55 1 100%
1.3.6 Ogive / Cumulative frequency curve

A curve drawn for the cumulative frequency distribution by joining the dots marked above the upper
boundaries of classes at heights equal to the cumulative frequencies of respective classes.
Chapter 1 - 7
Note:
1. The ogive starts at the lower boundary of the first class and ends at the upper boundary of the last
class.
2. If relative cumulative frequency is used in place of cumulative frequency, the graph is called
relative cumulative frequency curve or percentage ogive.
Example 1.9. Draw an ogive for the data in Example 1.4. Estimate from the ogive,
a) the total number of deliveries that their birth-weights were less than 95oz.
b) the value of X , if 20 % of the deliveries were of birth-weights X oz or more.
Solution.
Ogive
55
50
Cumulative frequency
45
40
35
30
25
20
15
10
5
0
79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5
Birth-Weight (oz)
1.4 Measures of central tendency

Represent a data set by some numerical measures (typical values).
A single value that summarizes a set of data.
It locates the centre of the values.
Give the centre of a histogram or a frequency distribution curve.
3 measures will be considered here:

1. Median
2. Mode
3. Mean
1.4.1 Median
Median is the value of the middle term in a data set that has been ranked in increasing or decreasing order
n 1
Median is the value of the th term in a ranked data set; n total number of elements in the set .
2
Note:
1. If n is odd, then median is the value of the middle term in the ranked data.
2. If n is even, then median is the average value of the two middle terms.
Chapter 1 - 8
Example 1.10. Find the median of set A = { 10, 5, 19, 8, 3 } and set B = { 2, 7, 3, 6, 4, 5 }
Solution.
Note:
Median is not influenced by the extreme value. (Extreme values are values that are very small or very
large relative to the majority of the values in a data set.)
For grouped data in the form of frequency distribution of single-valued classes

Median can be found either from ungrouped frequency distribution or from the cumulative frequency
distribution.
Example 1.11. Find the median of the following frequency distribution.
No. of children 0 1 2 3 4 5
Frequency 3 5 12 9 4 2
Solution.
1.4.2 Mode
Mode is the value that occurs with the highest frequency in a data set.
Example 1.12. Find the mode of each of the following data set.
i) 74, 9, 5, 8, 3, 8, 8 iii) 2, 6, 6, 6, 3, 8, 8, 8, 3
ii) 2, 2, 6, 6, 8, 8, 9, 9 iv) B, C, D, A, A, C, C, C, B, A
Solution.
Note:
1. Mode is not influenced by the extreme value.
2. Mode may not exist, exist one mode(unimode), two modes(bimodal) or more than two
modes(multimodal).
3. Mode can be used for both quantitative and qualitative data
Chapter 1 - 9
Example 1.13. Find the mode of the following frequency distribution.
No. of children 0 1 2 3 4 5
Frequency 3 5 12 9 4 2
Solution.
1.4.3 Mean
The mean for population data x1 , x2 , ..., x N is denoted by and is defined as
x x ... x N 1 N
1 2 xi
N N i 1
The mean for sample data x1 , x2 , ..., xn is denoted by X and is defined as
x1 x2 ... xn 1 n
X xi
n n i 1
Example 1.14. Find the arithmetic mean for the data set { 158, 189, 265, 127, 191 }
Solution.
Note:
1. Mean not necessary takes one of the values in the original data
2. Mean is influenced by extreme value
For grouped data in the form of frequency distribution of single-valued classes
f1 x1 f 2 x2 ... f n xn 1 n f x
X f i xi i i
n n i 1 f i
Example 1.15. Find the mean of the following frequency distribution.
xi 2 5 6 8
fi 1 3 4 2
Solution.
xi 2 5 6 8
fi 1 3 4 2
f i xi 2 24 16
Chapter 1 - 10
For grouped data in the form of frequency distribution
Suppose data are grouped into k class intervals, and
f i = the frequency of class i N f i =population size

mi = the midpoint of class i n f i = sample size
f i mi
mean for population data:
N
f i mi
mean for sample data: X
n
Example 1.16. Find the mean of the following frequency distribution.
Weight (kg) 68 9 11 12 17 18 20 21 29
Solution.
Class interval 68 9 11 12 17 18 20 21 29
Class midpoint ( mi ) 10 14.5 19 25
Frequency ( f i ) 4 6 10 3 12
f i mi 60 145 57 300
1.5 Measures of dispersion

Sometimes, with the measures of central tendency only are not enough to reveal the whole picture of the
distribution of a data set. This is because the measure of central tendency does not describe how the data
is distributed
Data set Data Mean Median Mode

A 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11 6 6 6
B 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8 6 6 6

Set A Set B

1 2 3 4 5 6 7 8 9 10 11 4 5 6 7 8
Note: The mean, median and mode are the same for data set A and B but the distribution of the data are
different.
1.5.1 Measures of dispersion for ungrouped data

Range
The range for a data set {x1 , x2 , ..., xn } is defined to be the difference between the largest value and
smallest value.
Range largest value smallest value
Chapter 1 - 11
Example 1.17. Find the range for data set A and data set B above.
Variance
The variance is the average of the squared deviation of the data from the mean.
Consider a population of N measurements x1 , x2 , ..., x N

N
1
Population Mean =
N
x
i 1
i
1 N 1 N 2
Population Variance = 2
N i 1
( x i ) 2
( xi ) 2
N i 1
Consider a sample of n measurements x1 , x2 , ..., xn

1 n
Sample Mean = X xi
n i 1
1 n 2 1 n
2
1 n
Sample Variance = s 2
( xi X ) n 1
n 1 i 1
2
xi xi
n i 1
i 1
Standard Deviation
The standard deviation is the positive square root of the variance
Sample standard deviation = s s 2
Population standard deviation = 2
Note: 1. A small standard deviation means that the data are distributed closely to their mean.
2. A large standard deviation means that the data are widely scattered about their mean.
3. It is influenced by extreme values.
Example 1.18. Data shows the salary per day for all 6 employees of a small company.
29.50, 16.50, 35.40, 21.30, 49.70, 24.60
Calculate the variance and standard deviation for these data.
Solution.
Mean, =
xi xi ( xi ) 2 xi
2
29.50 0.00 0.00 870.25

16.50
35.40 5.90 34.81 1253.16
21.30 - 8.20 67.24 453.69
49.70 20.20 408.04 2470.09
24.60 - 4.90 24.01 605.16
Total
Chapter 1 - 12
Method 1:
N
1
Population variance = 2
N
(x )
i 1
i
2
Population standard deviation =
Method 2:
xi2
1 N 2
Population variance = 2 ( xi ) 2
N i 1
Population standard deviation =
Example 1.19. A sample consists of 5 data values: 72, 49, 79, 55 and 57. Calculate the variance and
standard deviation.
Solution.
n 5 , xi
xi2
1 n 2 1 n
2
Sample variance = s 2
xi xi =
n 1 i 1 n i 1
Sample standard deviation = s
1.5.2 Measures of dispersion for grouped data

Variance
f i mi2 f i mi
N 2
1
Population Variance =
N
2
i 1
f i ( mi ) 2
N

N

1 n
2
1 n 1 n
Sample Variance = s 2 i i
n 1 i 1
f ( m X ) 2
i i
n 1 i 1
f m 2
i i
n i 1
f m

Example 1.20. Find the variance from the following frequency distribution if it represent
a) population
b) sample
Height (m) 20 22 23 25 26 28 29 31 32 34
Chapter 1 - 13
Solution.
Height Midpoint, m Frequency, f fm f m2

20 22 3 63 1323
23 25 24 6
26 28 27 12 324 8748
29 31 30 9 270 8100
32 34 33 2 66 2178
Total:
f m 2 f m
2
i i i i
2
N N
1 n
2
1 n
s
2
f i mi f i mi
2
n 1 i 1 n i 1
1.6 Measures of position

Measures of position determine the position of a single value in relation to other values in a sample or a
population data set.
1.6.1 Quartiles
Quartiles are 3 summary measures that divide a ranked data set into 4 equal parts.
- second quartile (Q2) is the median of a data set.
- first quartile (Q1) is the value of the middle term among the observations that are less than
the median.
- third quartile (Q3) is the value of the middle term among the observations that are greater
than the median.
To Find The Quartiles of Ungrouped Data
Consider n items arranged in ascending order. Then,
1
The first quartile = Lower quartile = Q1 = (n 1)th value
4
1
The second quartile = Median = Q2 = (n 1)th value
2
3
The third quartile = Upper quartile = Q3 = (n 1)th value
4
When n is odd, the rule locate the exact position of the quartiles.
When n is even,
n 1 3
a) When n is even and is even, then round all decimal values of (n 1) or (n 1) values,
2 4 4
into .5 value , for example: 2.25 2.5
6.75 6.5
Chapter 1 - 14
n 1 3
b) When n is even and is odd, then round up the decimal value of the (n 1) or (n 1)
2 4 4
value which is greater than .5 value and round down the values which is smaller than .5 value, for
example:
3.75 4
2.25 2
To Find The Quartiles of Grouped Data (from Ogive)

n
The first quartile = Lower quartile = Q1 = th value
4
n
The second quartile = Median = Q2 = th value
2
3n
The third quartile = Upper quartile = Q3 = th value
4
1.6.2 Interquartile Range(IQR)

Interquartile Range, IQR Q3 Q1
Q3 Q1
The semi-interquartile range = The quartile deviation =
2
1.6.3 Percentiles
The (approximate) value of the kth percentile, denoted by Pk is
kn
Pk = value of the th term in a ranked data set
100
kn
where k denotes the number of the percentile and n represents the sample size. Note that round to
100
the nearest integer or .5 value, for example: 2.2 2.0
2.3 2.5
2.7 2.5
2.8 3.0
Example 1.21. The following are the scores of 12 students in a mathematics class.
75 80 68 53 99 58 76 73 85 88 91 79
a) Find the values of the three quartiles. Where does the score of 88 lie in relation to these quartiles?
b) Find the interquartile range.
c) Find the quartile deviation.
d) Find the value of the 62nd percentile.
Solution.
Chapter 1 - 15

Chapter 1: Descriptive Statistics: 1.1 Some Terms

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Chapter 1: Descriptive Statistics: 1.1 Some Terms

Încărcat de

Drepturi de autor:

Formate disponibile

UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Chapter 1: Descriptive Statistics

1.1 Some terms

Table 1: The weights of 20 students in kg (Quantitative raw data)

Table 2: The grades of UCCM2623 of 20 students (Qualitative raw data)

1.2 Organizing and Graphing Qualitative Data

1.2.1 Frequency distributions for qualitative data

1.2.2 Relative frequency and percentage distributions

frequency of that category f

Percentage relative frequency 100%

1.2.3 Graphical presentation of qualitative data

1.3 Organizing and graphing quantitative data

1.3.1 Frequency Distribution for quantitative data

Class width / class size

Class mark / class midpoint

Constructing frequency distribution tables

2. Determine the class interval or width ( i )

The class width is usually rounded to some convenient number.

86 111 118 121 92 124 108 104 132 125

Birthweights (oz) Tally f

1.3.2 Relative frequency and percentage distributions

frequency of that class f

Percentage relative frequency 100%

Birthweights (oz) Class Boundaries Relative Frequency Percentage

Grouped (quantitative) data can be displayed in a histogram or a polygon.

A frequency histogram consists of a set of rectangle having

If the class intervals all have equal size

Procedures to draw a histogram:

Three types of polygon:

Example 1.6. Reconsider the data in Example 1.4 and draw

The frequency histogram and frequency polygon

79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5

The relative frequency histogram and relative frequency polygon

The percentage histogram and percentage polygon

Weight (kg) Class width Frequency Height of rectangle (adjusted frequency)

5.5 8.5 11.5 14.5 17.5 20.5 23.5 26.5 29.5

1.3.5 Cumulative frequency distribution

Birthweights (oz) Cumulative Cumulative relative Cumulative

1.3.6 Ogive / Cumulative frequency curve

1.4 Measures of central tendency

3 measures will be considered here:

For grouped data in the form of frequency distribution of single-valued classes

Example 1.11. Find the median of the following frequency distribution.

Example 1.13. Find the mode of the following frequency distribution.

For grouped data in the form of frequency distribution of single-valued classes

Example 1.15. Find the mean of the following frequency distribution.

For grouped data in the form of frequency distribution

Suppose data are grouped into k class intervals, and

f i = the frequency of class i N f i =population size

Example 1.16. Find the mean of the following frequency distribution.

1.5 Measures of dispersion

Data set Data Mean Median Mode

1.5.1 Measures of dispersion for ungrouped data

Range largest value smallest value

Consider a population of N measurements x1 , x2 , ..., x N

Consider a sample of n measurements x1 , x2 , ..., xn

29.50 0.00 0.00 870.25

Population standard deviation =

Population standard deviation =

Sample standard deviation = s

1.5.2 Measures of dispersion for grouped data

Height Midpoint, m Frequency, f fm f m2

1.6 Measures of position