Sunteți pe pagina 1din 15

UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Chapter 1: Descriptive Statistics

1.1 Some terms


Raw data
Raw data is data recorded in the sequence in which they are collected and before they are processed or
ranked

Table 1: The weights of 20 students in kg (Quantitative raw data)

61 68 65 67 68 71 69 63 74 64
66 65 62 67 60 73 69 70 70 71

Table 2: The grades of UCCM2623 of 20 students (Qualitative raw data)

A B C A C B B A B C
B A B B B A C D D B

Arrays
An arrangement of numerical raw data in ascending order or descending order of magnitude

60 61 62 63 64 65 65 66 67 67
68 68 69 69 70 70 71 71 73 74

Ungrouped data
Contains information on each member of a sample or population individually
Examples: Data presented in Table 1 and Table 2

Grouped data
Data presented in classes or intervals.

Example:

UCCM2623 Scores 10 12 13 15 16 18 19 21
Number of students 4 12 20 14

1.2 Organizing and Graphing Qualitative Data

1.2.1 Frequency distributions for qualitative data


A tabular arrangement that lists all categories and the number of elements that belong to each of the
categories.

Example 1.1. A sample was taken of 25 students who were planning to go to college. The courses he/she
intended to choose:
Engineering Infotech Engineering Business Business
Business Business Other Biotech Biotech
Biotech Biotech Infotech Biotech Biotech
Other Business Engineering Business Other
Engineering Biotech Biotech Other Infotech
Construct a frequency distribution table for these data.

Chapter 1 - 1
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Solution.
Course Tally Frequency
Biotech 8
Business
Engineering 4
Infotech
Others 4
Total: 25

1.2.2 Relative frequency and percentage distributions


Tabular arrangement that lists the relative frequencies and percentages for all categories.

frequency of that category f


relative frequency of a category
sum of all frequencie s f

Percentage relative frequency 100%

Example 1.2. Determine the relative frequency and percentage distributions for the data in Example 1.1.

Solution.
Course Relative Percentage
Frequency
Biotech 32%
Business 0.24
Engineering 16%
Infotech 0.12
Others 16%
Total: 1 100%

1.2.3 Graphical presentation of qualitative data


Bar Graphs (bar chart)
A graph made of bars whose heights represent the frequencies of respective categories.

Example 1.3. Construct a bar chart for the data in Example 1.1.

Solution.
Frequency

8
6
4
2
Course
Biotech Business Engineering Infotech Others

Chapter 1 - 2
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

1.3 Organizing and graphing quantitative data

1.3.1 Frequency Distribution for quantitative data


Lists all the classes and the number of values that belong to each class.
Data presented in the form of a frequency distribution are called grouped data.

Note:
Generally, the grouping process destroys some of the original information
The classes are non-overlapping i.e. each value belongs to one and only one class

Class
An interval that includes all the values that falls within two numbers, the lower and upper limits

Class limits
Endpoints of each interval

Class Boundary
Class boundary is the dividing line between two classes. It is given by the midpoint of the upper limit of
one class and the lower limit of the next higher class

Class width / class size


Class width is the difference between the upper and lower class boundary
class width upper boundary lower boundary

Class mark / class midpoint


Class mark is the midpoint of the class interval
class mark (lower class limit upper class limit ) / 2

Constructing frequency distribution tables

1. Determine the number of classes, usually varies from 5 to 20, depending mainly on the number of
observations in the data set.
Find 2k where k is the smallest number such that 2k is greater than the number of observations
(n).

2. Determine the class interval or width ( i )


Must cover at least the distance from the smallest value (L) in the raw data up to the largest value
(H)
largest value( H ) smallest value( L)
approximate class width
number of classes

The class width is usually rounded to some convenient number.


The rounding of this number may slightly change the number of classes initially intended.

3. Determine the lower limit of the first class or the starting point.
Any convenient number that is equal to or less than the smallest value in the data set can be used
as the lower limit of the first class.

Chapter 1 - 3
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Example 1.4. Sample of birth-weights (oz) from 50 consecutive deliveries is given below. Construct a
frequency distribution table.

86 111 118 121 92 124 108 104 132 125


120 91 89 122 115 138 118 99 95 115
123 128 134 115 84 138 140 105 124 144
104 133 132 106 98 125 146 108 132 98
121 104 98 115 107 127 122 135 126 89

Solution.

Birthweights (oz) Tally f


80-89 4
90-99
8
110-119
120-129 13
130-139
3

1.3.2 Relative frequency and percentage distributions

frequency of that class f


relative frequency of a class
sum of all frequencie s f

Percentage relative frequency 100%

Example 1.5. Calculate the relative frequencies and percentages distributions for the data in Example
1.4.

Solution.

Birthweights (oz) Class Boundaries Relative Frequency Percentage


80-89 79.5 - 89.5 8%
90-99 0.14 14%
100-109 99.5 - 109.5 0.16 16%
110-119 109.5 - 119.5 0.14 14%
120-129 119.5 - 129.5 0.26
130-139 129.5 - 139.5 0.16 16%
140-149 139.5 - 149.5 0.06 6%

Chapter 1 - 4
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Grouped (quantitative) data can be displayed in a histogram or a polygon.

1.3.3 Histogram
Three types of histogram
1. Frequency histogram
2. Relative frequency histogram
3. Percentage histogram

A frequency histogram consists of a set of rectangle having


a) The bases on a horizontal axis with centres at the class marks and lengths equal to the class interval
sizes
b) The areas proportional to the class frequencies

If the class intervals all have equal size


the height of the rectangles are proportional to the class frequencies
otherwise
the height of the rectangles must be adjusted

Procedures to draw a histogram:


1. Mark the class boundary of each interval on the horizontal axis.
2. For each class, mark the frequencies (or relative frequencies or percentages) on the vertical
axis.
3. Draw a bar for each class so that its height represents the frequency of that class. (No gap
between each bar)
4. Label the histogram.

1.3.4 Polygon
Polygon is a line graph formed by joining the midpoints of the tops of successive bars in a histogram.
Next, we mark two more classes (with zero frequencies), one at each end, and mark the midpoints.

Three types of polygon:


1. Frequency polygon
2. Relative frequency polygon
3. Percentage polygon

Example 1.6. Reconsider the data in Example 1.4 and draw


i) the frequency histogram and frequency polygon
ii) the relative frequency histogram and relative frequency polygon
iii) the percentage histogram and percentage polygon

Chapter 1 - 5
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

The frequency histogram and frequency polygon


Frequency
15

10

79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5


Birth-weight (oz)

The relative frequency histogram and relative frequency polygon


Relative Frequency

0.30
0.25
0.20
0.15
0.10
0.05
79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5
Birth-weight (oz)

The percentage histogram and percentage polygon


Percentage Relative Frequency

30
25
20
15
10
5
79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5
Birth-weight (oz)

Example 1.7. The frequency distribution gives the weight of 35 objects, measured to the nearest kg.
Draw a histogram to illustrate the data.

Weight (kg) 68 9 11 12 17 18 20 21 29
Frequency 4 6 10 3 12

Solution.
standard class width
adjusted frequency frequency
class width

Chapter 1 - 6
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Weight (kg) Class width Frequency Height of rectangle (adjusted frequency)


68 3 4 4
9 11 6 6
12 17 10
18 20 3 3 3
21 29 9 12

Adjusted Frequency
6
5
4
3
2
1

5.5 8.5 11.5 14.5 17.5 20.5 23.5 26.5 29.5


Weight (kg)

1.3.5 Cumulative frequency distribution


A table that presents the total number of values that fall below the upper boundary of each class.
It is constructed for quantitative data only.
cumulative frequency of a class
cumulative relative frequency
sum of all frequencie s in the data set
cumulative percentage cumulative relative frequency 100%

Example 1.8. Refer to data in Example 1.4, construct its cumulative frequency distribution, cumulative
relative frequency and cumulative percentage.

Birthweights (oz) Cumulative Cumulative relative Cumulative


frequency frequency percentage, %
<79.5 0 0 0%
4 0.08 8%
<99.5 0.22 22%
<109.5 19 38%
<119.5 26 0.52
<129.5 39 0.78 78%
<139.5 47 0.94 94%
<149.5 55 1 100%

1.3.6 Ogive / Cumulative frequency curve


A curve drawn for the cumulative frequency distribution by joining the dots marked above the upper
boundaries of classes at heights equal to the cumulative frequencies of respective classes.

Chapter 1 - 7
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Note:
1. The ogive starts at the lower boundary of the first class and ends at the upper boundary of the last
class.
2. If relative cumulative frequency is used in place of cumulative frequency, the graph is called
relative cumulative frequency curve or percentage ogive.

Example 1.9. Draw an ogive for the data in Example 1.4. Estimate from the ogive,
a) the total number of deliveries that their birth-weights were less than 95oz.
b) the value of X , if 20 % of the deliveries were of birth-weights X oz or more.

Solution.

Ogive
55
50
Cumulative frequency

45
40
35
30
25
20
15
10
5
0
79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5
Birth-Weight (oz)

1.4 Measures of central tendency


Represent a data set by some numerical measures (typical values).
A single value that summarizes a set of data.
It locates the centre of the values.
Give the centre of a histogram or a frequency distribution curve.

3 measures will be considered here:


1. Median
2. Mode
3. Mean

1.4.1 Median
Median is the value of the middle term in a data set that has been ranked in increasing or decreasing order

n 1
Median is the value of the th term in a ranked data set; n total number of elements in the set .
2

Note:
1. If n is odd, then median is the value of the middle term in the ranked data.
2. If n is even, then median is the average value of the two middle terms.
Chapter 1 - 8
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Example 1.10. Find the median of set A = { 10, 5, 19, 8, 3 } and set B = { 2, 7, 3, 6, 4, 5 }

Solution.

Note:
Median is not influenced by the extreme value. (Extreme values are values that are very small or very
large relative to the majority of the values in a data set.)

For grouped data in the form of frequency distribution of single-valued classes


Median can be found either from ungrouped frequency distribution or from the cumulative frequency
distribution.

Example 1.11. Find the median of the following frequency distribution.

No. of children 0 1 2 3 4 5
Frequency 3 5 12 9 4 2

Solution.

1.4.2 Mode
Mode is the value that occurs with the highest frequency in a data set.

Example 1.12. Find the mode of each of the following data set.
i) 74, 9, 5, 8, 3, 8, 8 iii) 2, 6, 6, 6, 3, 8, 8, 8, 3
ii) 2, 2, 6, 6, 8, 8, 9, 9 iv) B, C, D, A, A, C, C, C, B, A

Solution.

Note:
1. Mode is not influenced by the extreme value.
2. Mode may not exist, exist one mode(unimode), two modes(bimodal) or more than two
modes(multimodal).
3. Mode can be used for both quantitative and qualitative data
Chapter 1 - 9
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Example 1.13. Find the mode of the following frequency distribution.

No. of children 0 1 2 3 4 5
Frequency 3 5 12 9 4 2

Solution.

1.4.3 Mean
The mean for population data x1 , x2 , ..., x N is denoted by and is defined as
x x ... x N 1 N
1 2 xi
N N i 1
The mean for sample data x1 , x2 , ..., xn is denoted by X and is defined as
x1 x2 ... xn 1 n
X xi
n n i 1

Example 1.14. Find the arithmetic mean for the data set { 158, 189, 265, 127, 191 }

Solution.

Note:
1. Mean not necessary takes one of the values in the original data
2. Mean is influenced by extreme value

For grouped data in the form of frequency distribution of single-valued classes

f1 x1 f 2 x2 ... f n xn 1 n f x
X f i xi i i
n n i 1 f i

Example 1.15. Find the mean of the following frequency distribution.

xi 2 5 6 8
fi 1 3 4 2

Solution.
xi 2 5 6 8
fi 1 3 4 2
f i xi 2 24 16

Chapter 1 - 10
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

For grouped data in the form of frequency distribution

Suppose data are grouped into k class intervals, and

f i = the frequency of class i N f i =population size


mi = the midpoint of class i n f i = sample size

f i mi
mean for population data:
N

f i mi
mean for sample data: X
n

Example 1.16. Find the mean of the following frequency distribution.

Weight (kg) 68 9 11 12 17 18 20 21 29
Frequency 4 6 10 3 12

Solution.
Class interval 68 9 11 12 17 18 20 21 29
Class midpoint ( mi ) 10 14.5 19 25
Frequency ( f i ) 4 6 10 3 12
f i mi 60 145 57 300

1.5 Measures of dispersion


Sometimes, with the measures of central tendency only are not enough to reveal the whole picture of the
distribution of a data set. This is because the measure of central tendency does not describe how the data
is distributed

Data set Data Mean Median Mode


A 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11 6 6 6
B 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8 6 6 6

Set A Set B


1 2 3 4 5 6 7 8 9 10 11 4 5 6 7 8
Note: The mean, median and mode are the same for data set A and B but the distribution of the data are
different.

1.5.1 Measures of dispersion for ungrouped data


Range
The range for a data set {x1 , x2 , ..., xn } is defined to be the difference between the largest value and
smallest value.

Range largest value smallest value

Chapter 1 - 11
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Example 1.17. Find the range for data set A and data set B above.

Variance
The variance is the average of the squared deviation of the data from the mean.

Consider a population of N measurements x1 , x2 , ..., x N


N
1
Population Mean =
N
x
i 1
i

1 N 1 N 2
Population Variance = 2
N i 1
( x i ) 2
( xi ) 2
N i 1

Consider a sample of n measurements x1 , x2 , ..., xn


1 n
Sample Mean = X xi
n i 1
1 n 2 1 n
2
1 n
Sample Variance = s 2
( xi X ) n 1
n 1 i 1
2
xi xi
n i 1
i 1

Standard Deviation
The standard deviation is the positive square root of the variance
Sample standard deviation = s s 2
Population standard deviation = 2

Note: 1. A small standard deviation means that the data are distributed closely to their mean.
2. A large standard deviation means that the data are widely scattered about their mean.
3. It is influenced by extreme values.

Example 1.18. Data shows the salary per day for all 6 employees of a small company.
29.50, 16.50, 35.40, 21.30, 49.70, 24.60
Calculate the variance and standard deviation for these data.

Solution.
Mean, =
xi xi ( xi ) 2 xi
2

29.50 0.00 0.00 870.25


16.50
35.40 5.90 34.81 1253.16
21.30 - 8.20 67.24 453.69
49.70 20.20 408.04 2470.09
24.60 - 4.90 24.01 605.16
Total

Chapter 1 - 12
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Method 1:
N
1
Population variance = 2
N
(x )
i 1
i
2

Population standard deviation =

Method 2:

xi2
1 N 2
Population variance = 2 ( xi ) 2
N i 1

Population standard deviation =

Example 1.19. A sample consists of 5 data values: 72, 49, 79, 55 and 57. Calculate the variance and
standard deviation.

Solution.
n 5 , xi
xi2

1 n 2 1 n
2

Sample variance = s 2
xi xi =
n 1 i 1 n i 1

Sample standard deviation = s

1.5.2 Measures of dispersion for grouped data


Variance
f i mi2 f i mi
N 2
1
Population Variance =
N
2
i 1
f i ( mi ) 2

N

N

1 n
2
1 n 1 n
Sample Variance = s 2 i i
n 1 i 1
f ( m X ) 2
i i
n 1 i 1
f m 2
i i
n i 1
f m

Example 1.20. Find the variance from the following frequency distribution if it represent
a) population
b) sample

Height (m) 20 22 23 25 26 28 29 31 32 34
Frequency 3 6 12 9 2

Chapter 1 - 13
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Solution.

Height Midpoint, m Frequency, f fm f m2


20 22 3 63 1323
23 25 24 6
26 28 27 12 324 8748
29 31 30 9 270 8100
32 34 33 2 66 2178
Total:

f m 2 f m
2

i i i i
2

N N

1 n
2
1 n
s
2
f i mi f i mi
2

n 1 i 1 n i 1

1.6 Measures of position


Measures of position determine the position of a single value in relation to other values in a sample or a
population data set.

1.6.1 Quartiles
Quartiles are 3 summary measures that divide a ranked data set into 4 equal parts.
- second quartile (Q2) is the median of a data set.
- first quartile (Q1) is the value of the middle term among the observations that are less than
the median.
- third quartile (Q3) is the value of the middle term among the observations that are greater
than the median.

To Find The Quartiles of Ungrouped Data

Consider n items arranged in ascending order. Then,

1
The first quartile = Lower quartile = Q1 = (n 1)th value
4
1
The second quartile = Median = Q2 = (n 1)th value
2
3
The third quartile = Upper quartile = Q3 = (n 1)th value
4

When n is odd, the rule locate the exact position of the quartiles.

When n is even,
n 1 3
a) When n is even and is even, then round all decimal values of (n 1) or (n 1) values,
2 4 4
into .5 value , for example: 2.25 2.5
6.75 6.5

Chapter 1 - 14
UECM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

n 1 3
b) When n is even and is odd, then round up the decimal value of the (n 1) or (n 1)
2 4 4
value which is greater than .5 value and round down the values which is smaller than .5 value, for
example:
3.75 4
2.25 2

To Find The Quartiles of Grouped Data (from Ogive)


n
The first quartile = Lower quartile = Q1 = th value
4
n
The second quartile = Median = Q2 = th value
2
3n
The third quartile = Upper quartile = Q3 = th value
4

1.6.2 Interquartile Range(IQR)


Interquartile Range, IQR Q3 Q1
Q3 Q1
The semi-interquartile range = The quartile deviation =
2

1.6.3 Percentiles
The (approximate) value of the kth percentile, denoted by Pk is

kn
Pk = value of the th term in a ranked data set
100
kn
where k denotes the number of the percentile and n represents the sample size. Note that round to
100
the nearest integer or .5 value, for example: 2.2 2.0
2.3 2.5
2.7 2.5
2.8 3.0

Example 1.21. The following are the scores of 12 students in a mathematics class.
75 80 68 53 99 58 76 73 85 88 91 79
a) Find the values of the three quartiles. Where does the score of 88 lie in relation to these quartiles?
b) Find the interquartile range.
c) Find the quartile deviation.
d) Find the value of the 62nd percentile.

Solution.

Chapter 1 - 15

S-ar putea să vă placă și