Sunteți pe pagina 1din 49

School of Engineering Engineering Mathematics 4

(MTH60403/ENG 2123)

1
 Distinguish between discrete and continuous data
 Construct frequency and relative frequency tables for
grouped and ungrouped discrete data
 Determine class boundaries, class intervals and central
values for discrete and continuous data
 Construct a histogram and a frequency polygon
 Determine the mean, median and mode of grouped and
ungrouped data
 Determine the range, variance and standard deviation of
discrete data
 Measure dispersion of data using the normal and
standard normal curves.

2
Topics to be covered
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
3
Introduction
 Statistics as a discipline is the development and
application of methods to collect, analyze and interpret
data.

Statistical techniques are used in a wide range of types of


scientific and social research. Areas that use modern
statistical methods including medical, economics, finance,
marketing research, manufacturing and etc.

Some fields of inquiry use applied statistics so extensively


that they have specialized terminology. Some examples
include: Data mining, Energy statistics, Engineering
statistics, Reliability engineering, Social statistics etc.
4
Introduction
Statistics is concerned with the collection, ordering and analysis of
data. Data consist of sets of recorded observations or values. Any
quantity that can have a number of values is a variable. A variable
maybe one of two kinds:

(a)Discrete – a variable that can be counted, or for which there is a


fixed set of values. Examples: number of people in a room, shoe
size of children, number of components in a machine.

(b)Continuous – a variable that can be measured on a continuous


scale, the result depending on the precision of the measuring
instrument, or the accuracy of the observer. Examples: weight of
people, output voltage of an analogue system, loads on a beam,
temperature of a coolant, the capacity of a container.
Definition of “continuous data” – data which can take values between two
end points – weights of people can be5 60.28 kg, 70.3 kg….
Introduction

A statistical exercise normally consists of four stages :

1. Collection of data ( measure and record)


2. Arrangement/ordering and presentation of the data
3. Analysis of the collected data
4. Interpretation of the results and conclusions formulated.

6
Arrangement of data

A set of data:
28 31 29 27 30 29 29 26 30 28

28 29 27 26 32 28 32 31 25 30

27 30 29 30 28 29 31 27 28 28

Can be arranged in ascending order:

25 26 26 27 27 27 27 28 28 28
28 28 28 28 29 29 29 29 29 29
30 30 30 30 30 31 31 31 32 32

7
Arrangement of data

Number of
Once the data is in ascending order:
Value times
25 26 26 27 27 27 27 28 28 28
28 28 28 28 29 29 29 29 29 29 25 1
30 30 30 30 30 31 31 31 32 32 26 2
27 4
It can be entered into a table.
28 7
The number of occasions on which any 29 6
particular value occurs is called the 30 5
frequency, denoted by f. 31 3
32 2
8
Arrangement of data

When dealing with large numbers of readings, instead


of writing all the values in ascending order, it is more
convenient to compile a tally diagram, recording the
range of values of the variable and adding a stroke for
each occurrence of that reading:

9
Arrangement of data
Grouped Data
If the range of values of the variable is large, it is often
helpful to consider these values arranged in regular
groups or classes.

10
Arrangement of data
Grouping with Continuous Data
The lengths (in mm) of 40 spindles were measured as below :
20.90 20.57 20.86 20.74 20.82 20.63 20.53 20.89 20.75 20.65
20.71 21.03 20.72 20.41 20.94 20.75 20.79 20.65 21.08 20.89
20.50 20.88 20.97 20.78 20.61 20.92 21.07 21.16 20.80 20.77
20.82 20.72 20.60 20.90 20.86 20.68 20.75 20.88 20.56 20.94

Lowest value = 20.41 } form classes from 20.40 to 21.20 at 0.10 intervals.
Highest value = 21.16

11
Arrangement of data
Grouping with Continuous Data
With continuous data the groups boundaries are given
to the same number of significant figures or decimal
places as the data:

The lengths
(in mm) of 40
spindles were
measured and
arranged in this
table.

12
Arrangement of data
Relative Frequency
If the frequency of any one group is divided by the
sum of the frequencies the ratio is called the relative
frequency of that group. Relative frequencies can be
expressed as percentages:

1
100  2.5
40

9
100  22.5
40

13
Arrangement of data
Rounding off Data
 If the value 21.7 is expressed to two significant
figures, the result is rounded up to 22. similarly, 21.4
is rounded down to 21.
 To maintain consistency of group boundaries, middle
values will always be rounded up. So that 21.5 is
rounded up to 22 and 42.5 is rounded up to 43.
 Therefore, when a result is quoted to two significant
figures as 37 on a continuous scale this includes all
possible values between:

36.50000… and 37.49999…

14
Arrangement of data
Class Boundaries
A class or group boundary lies midway between the data
values. For example, for data in the class or group labelled:
7.1 – 7.3
(a)The class values 7.1 and 7.3 are the lower and upper limits
of the class and their difference gives the class width.
(b) The class boundaries are 0.05 below the lower class limit
and 0.05 above the upper class limit.
(c) The class interval is the difference between the upper and
lower class boundaries.
(d) The central value (or mid-value) of the class interval is one
half of the difference between the upper and lower class
boundaries.
15
Arrangement of data
Class Boundaries
These terms can be summarized in the following diagram, using
the class 7.1 – 7.3 (inclusive) as example

(a) (a) (a)

(b) (d) (b)

(c)

16
Histograms
Frequency histogram
A histogram is a graphical
representation of a frequency
distribution in which vertical
rectangular blocks are drawn so that:

(a)the centre of the base indicates


the central value of the class and

(b) the area of the rectangle


represents the class frequency.

17
Histograms
Frequency histogram
For example, the measurement of the lengths of 50
brass rods gave the following frequency distribution:

18
Histograms
Frequency histogram
This gives rise to the histogram:

A relative frequency histogram is identical in shape to


the frequency histogram but differs in that the vertical
axis measures relative frequency ( percentage).
19
Measure of central tendency
Most of the whole range of values is clustered within the
middle classes and knowledge of the center region of the
histogram is important. We can put a numerical value on
this by determining a measure of central tendency.

There are three common measures of central tendency, the


1) Mean,
2) Mode,
3) Median of a set of observations.

So these all are measures of central tendency - a single value


that attempts to quantify the "average" value around which the
values in a data set tend to cluster.

20
Measure of central tendency
Mean
The arithmetic mean: x of a set of n observations is
their average:

mean =
sum of observations
that is x 
 x
number of observations n

When calculating from a frequency distribution, this


becomes:

x
 xf  xf

n f
21
Measure of central tendency
Mean

Find the average of the data shown below


xf
25
52
108
196
174
150
93
64
30 862

x
 xf  xf
 
862
 28.73
n f 30

22
Measure of central tendency
Coding for calculating the mean
A deal of tedious work can be avoided by coding with
a false mean. It involves converting the x-values into
simpler values for the calculation and then converting
back again for the final result.

(a) Choose a convenient value of x near the middle of


the range (the false mean),
(b) Subtract it from every other value of x,
(c) Divide by a suitable data interval to give the coded
value of xc.
(d) Proceed to find the mean of the coded values: xc
23
Measure of central tendency
Coding for calculating the mean
Find the average of the data shown below
using coding procedure Data interval

(b)

(a) (c)
False mean

(d) xc 
 x f
c

2.0
 0.0333 to 4 dp
f 60
24
Measure of central tendency
Decoding for calculating the mean
Decoding requires the coding process to be reversed.

This means multiplying by the appropriate data interval


and then adding the false mean:

xc 
 x f
c

2.0
 0.0333 to 4 dp where xc 
x  30.8
f 60 0.2

Therefore:
x  (0.0333)  0.2  30.8  30.79 to 2 dp

25
Measure of central tendency
Coding with a grouped frequency distribution
This procedure is similar where the false mean is the
centre value of a convenient class.

xc 
 xc f 
11
 0.22
50 11

f 50
26
Measure of central tendency
Decoding with a grouped frequency distribution
Decoding again requires the coding process to be reversed.

This means multiplying by the appropriate data interval and


then adding the false mean:

xc 
 x f
c

11 x  2.30
 0.22 where xc  m
f 50 0.03
Therefore:
x m  (0.22)  0.03  2.30  2.3067 to4 dp
giving:
x  2.307 to 3 dp
27
Measure of central tendency
Mode of a set of data
The mode of a set of data is that value of the variable
that occurs most often.

The mode of:


2, 2, 6, 7, 7, 7, 10, 13

is clearly 7. The mode may not be unique, for instance


the modes of:

23, 25, 25, 25, 27, 27, 28, 28, 28

are 25 and 28.


28
Measure of central tendency
Mode of a grouped frequency distribution
The modal class of grouped data is the class with the
greatest population.

For example, the modal class of:

Is the third class.


We can also by plotting the histogram of the data find the
mode. (please refer to the textbook Stroud, page 1155-1156)

29
Measure of central tendency
Median of a set of data
The median is the value of the middle datum when the
data is arranged in ascending or descending order.
If there is an even number of values the median is the
average of the two middle data.

The data 4, 7, 8, 9, 12, 15, 26 has a median of 9.


The data 5, 6, 10, 12, 14, 17, 23, 30 has a
median of 13.
14  12
Why? Because  13
2

30
Measure of central tendency
Median with grouped frequency distribution
In the case of grouped data the median divides the population
of the largest block of the histogram into two parts A and B:
In this frequency distribution A + B = 20
6  12  15  A  B  13  9  5
B
A 20

Note : A+B=20; B =20-A


so that A = 7: 15
13
12
7
The width of A   class interval 6
9
20 5
 0.35  0.3
 0.105
Therefore, Median = 30.85 + 0.105
30.85
= 30.96 to 2 dp 31.15

31
Mean, Mode, Median
How we know which is the correct measure of location
to use in a given situation?

Mode: This is used when data is qualitative, or quantitative


with either a single mode or bimodal. It is not very
informative if each value occurs only once.
Median: This is used for quantitative data. It is usually used
when there are extreme values.
Mean: This is used for quantitative data and uses all the
pieces of data. It therefore gives a true measure
of the data. However, it is affected by extreme values.

32
Mean, Mode, Median
How we know which is the correct measure of location to use in a given situation?

A child at a junior school records the maximum temperature,


in °C, for seven days at his school. The results are given below
15.7 16.1 16.2 47.6 17.4 18.6 16.7
a) Find the mean and median of these data?
b) Why we did not ask about the mode?
The child’s teacher realizes that the figure 47.6 should be
17.6.
c) Write down what effect this will have on the median and
mean.
Mean 21.2; median 16.7
Median 16.7; mean 16.9
33
Mean, Mode, Median
How we know which is the correct measure of location to use in a given situation?

A company consists of seven workers paid at $10 per hour


and their supervisor who is paid at $50 per hour.
a) Find the mode, median and mean of all eight workers?
Write down, with reason, which of the mean, mode and
median you should use in the following situations:
b) When asked the typical hourly rate of pay for the company.
c) When trying to persuade a prospective employee to work
for the company.
Mode 10; median 10; mean 15
Mode or median

The mean as it is a higher value and more


likely to34persuade the prospective employee
Dispersion
The mean, mode and median give important information
about the central tendency of data but they do not tell
anything about the spread or dispersion about the centre.

For example,
the set 26, 27, 28 ,29 30 has a mean of 28.
and the set 5, 19, 20, 36, 60 also has a mean of 28.

but one is clearly more tightly arranged about the mean


than the other.
We therefore need a measure to indicate the spread of
the values about the mean.

35
Dispersion
Range
The simplest measure of dispersion is the range – the
difference between the highest and the lowest values.

In the previous two case,

the range of set 1 is 30 - 26 = 4,

while that of set 2 is 60 – 5 = 55.

The disadvantage of the range, however, is that it deals


only with the extreme values; it does not take into account
the behaviour of the intermediate values.

36
Dispersion
Standard Deviation
The standard deviation is the most widely
used measure of dispersion. The variance
of a set of data is the average of the square
of the difference in value of a datum from the http://www.scienceofrelationships.com/
mean: home/tag/love-letter

( x1  x ) 2  ( x2  x ) 2   ( xn  x ) 2
variance 
n
This has the disadvantage of being n

measured in the square of the units 


 ix  x  2

of the data. The standard deviation is  i 1

the square root of the variance: n

37
Dispersion
Standard Deviation (Alternative formula)
Since: n n

 ( xi  x ) 2
 i
( x 2
 2 xi x  x 2
)
 i 1
 i 1
n n
n n n n

 x  2 x  xi   x
2
i
2
i
x 2
 2 nx 2
 nx 2

 i 1 i 1 i 1
 i 1
n n
n

i
x 2

 i 1
 x2
n

That is:
  x x 2 2

38
A question to try

Find the mean and standard deviation for the following:


a. 1,2,3,4,5 and 6
b. 1001,1002, 1003, 1004, 1005 and 1006
c. 0.1, 0.2, 0.3, 0.4, 0.5 and 0.6

a 3.5, 1.71; b 1003.5, 1.71; c 0.35, 0.17.

39
Frequency Polygons and Frequency Curves

If the centre points of the tops If the frequency polygon is


of the rectangular blocks of a smoothed out, or if we plot the
frequency histogram are joined frequency against the central
by straight lines, the resulting value of each class and draw
figure is called a frequency a smooth curve, the result is a
polygon. frequency curve.

A represents
the total
frequency of
the variable.
A

40
Normal Distribution Curve
When very large numbers of observations are made and
the range is divided into a very large number of ‘narrow’
classes, the resulting frequency curve, in many cases,
approximates closely to a standard curve known as the
normal distribution curve, which has a characteristic
bell-shaped formation.
The normal distribution AR= AL
curve is symmetrical
about its centre line AL AR
which coincides with the
mean of the observations.

41
Normal Distribution Curve
Values within 1 standard deviation of the mean
There are two points on the normal distribution curve where the concavity
switches, one from concave to convex and the other from convex to concave.
The horizontal distance of each of these two points from the mean line is one
standard deviation.

Of the area beneath the


normal distribution curve:

68%

lies within one standard


deviation from the mean.

42
Normal Distribution Curve
Values within 1 standard deviation of the mean

(68%)

On a manufacturing run to produce 1000 bolts of


nominal length 32.5 mm, sampling gave a mean of
32.58 mm and a standard deviation of 0.06 mm.
From this observation, x  32.58 mm and  = 0.06 mm.
We conclude that 68% of
x    32.58  0.06  32.52  the bolts, i.e. 680, are
 likely to have lengths
x    32.58  0.06  32.64 between
32.52 mm and 32.64 mm
43
Normal Distribution Curve
Values within 2 standard deviations of the mean
Of the area beneath the
normal distribution curve: 95%

lies within two standard deviations


from the mean.

From example,x  32.58 mm and  = 0.06 mm.


We conclude that 95% of
x  2  32.58  0.12  32.46  the bolts, i.e. 950, are likely
 to have lengths between
x  2  32.58  0.12  32.70
32.46 mm and 32.70 mm

44
Normal Distribution Curve
Values within 3 standard deviations of the mean
Of the area beneath the
normal distribution curve: 99.7%

lies within three standard deviations


from the mean.

From example, x  32.58mm and  = 0.06 mm.


We conclude that 99.7% of
x  3  32.58  0.18  32.40  the bolts, i.e. 997, are likely
 to have lengths between
x  3  32.58  0.18  32.76
32.40 mm and 32.76 mm

45
Normal Distribution Curve

We can enter the same information in a slightly different


manner, dividing the figure into columns of 1σ width on
each side of the mean.

46
Standardized Normal Curve

The standardized normal curve is the same shape as the


normal curve but the axis of symmetry is the vertical axis;
the horizontal axis carries a scale of z-values where:
xx
z

and the area beneath the
curve is 1. Its equation is:
z2
1 
 ( z)  e 2
2
47
A question to try
A computer operator transfers an hourly wage list from a paper
copy to her computer. The data transferred is given below
$5.50 $6.10 $7.80 $6.10 $9.20 $91.00 $11.3
a) Find the mode, median and range of these data?
b) Find the mean and the standard deviation of these data?
The office manager looks at the figures and decides that
something must be wrong.
c) Write down with a reason the mistake that probably been
made.
d) Recalculate with the corrected data the mean, range and the
standard deviation and compare both results.
Mode 6.1;median 7.8; range 85.5; mean 19.57; std dev 29.22

Mean 7.87; range 5.8; Std dev 1.96 48


Thank you for your attention

49

This lecture note is taken from Dr. Abdul Kareem

S-ar putea să vă placă și