Sunteți pe pagina 1din 22

INTRODUCTION

By the 18th century, the term "statistics" designated the systematic


collection of demographic and economic data by states. For at least two millennia, these data were
mainly tabulations of human and material resources that might be taxed or put to military use. In the
early 19th century, collection intensified, and the meaning of "statistics" broadened to include the
discipline concerned with the collection, summary, and analysis of data. Today, data is collected and
statistics are computed and widely distributed in government, business, most of the sciences and
sports, and even for many pastimes. Electronic computers have expedited more elaborate statistical
computation even as they have facilitated the collection and aggregation of data. A single data
analyst may have available a set of data-files with millions of records, each with dozens or hundreds
of separate measurements. These were collected over time from computer activity (for example, a
stock exchange) or from computerized sensors, point-of-sale registers, and so on. Computers then
produce simple, accurate summaries, and allow more tedious analyses, such as those that require
inverting a large matrix or perform hundreds of steps of iteration, that would never be attempted by
hand. Faster computing has allowed statisticians to develop "computer-intensive" methods which
may look at all permutations, or use randomization to look at 10,000 permutations of a problem, to
estimate answers that are not easy to quantify by theory alone.
The term "mathematical statistics" designates the mathematical theories of probability and statistical
inference, which are used in statistical practice. The relation between statistics and probability theory
developed rather late, however. In the 19th century, statistics increasingly used probability theory,
whose initial results were found in the 17th and 18th centuries, particularly in the analysis of games
of chance (gambling). By 1800, astronomy used probability models and statistical theories,
particularly the method of least squares. Early probability theory and statistics was systematized in
the 19th century and statistical reasoning and probability models were used by social scientists to
advance the new sciences of experimental psychology and sociology, and by physical scientists
in thermodynamics and statistical mechanics. The development of statistical reasoning was closely
associated with the development of inductive logic and the scientific method, which are concerns
that move statisticians away from the narrower area of mathematical statistics. Much of the
theoretical work was readily available by the time computers were available to exploit them. By the
1970s, Johnson and Kotz produced a four-volume Compendium on Statistical Distributions (1st ed.,
1969-1972), which is still an invaluable resource.
Applied statistics can be regarded as not a field of mathematics but an autonomous mathematical
science, like computer science and operations research. Unlike mathematics, statistics had its
origins in public administration. Applications arose early in demography and economics; large areas
of micro- and macro-economics today are "statistics" with an emphasis on time-series analyses.
With its emphasis on learning from data and making best predictions, statistics also has been
shaped by areas of academic research including psychological testing, medicine and epidemiology.
The ideas of statistical testing have considerable overlap with decision science. With its concerns
with searching and effectively presenting data, statistics has overlap with information
science and computer science.
DEFINITION OF STATISTICS
Statistics os the study of the collection, organization, analysis, interpretation and
presentation of data. It deals with all aspects of data, including the planning of data
collection in terms of the design of surveys and experiments.
The term statistics is ultimately derived from the New Latin statisticum collegium ("council of state")
and the Italian word statista ("statesman" or "politician"). The German Statistik, first introduced
by Gottfried Achenwall (1749), originally designated the analysis of data about the state, signifying
the "science of state" (then called political arithmetic in English). It acquired the meaning of the
collection and classification of data generally in the early 19th century. It was introduced into English
in 1791 by Sir John Sinclair when he published the first of 21 volumes titled Statistical Account of
Scotland.[1]

BRIEF HISTORY OF STATISTICS


Basic forms of statistics have been used since the beginning of civilization. Early empires often
collated censuses of the population or recorded the trade in various commodities. The Han
Dynasty and the Roman Empire were some of the first states to extensively gather data on the size
of the empire's population, geographical area and wealth.
The use of statistical methods dates back to at least the 5th century BCE. The
historian Thucydides in his History of the Peloponnesian War[2] describes how the Athenians
calculated the height of the wall of Platea.
Forms of probability and statistics were developed by Al-Khalil (717–786 CE), an Arab
mathematician studying cryptology. He wrote the Book of Cryptographic Messages which contains
the first use of permutations and combinations to list all possible Arabic words with and without
vowels.[3]
The earliest writing on statistics was found in a 9th-century Arabic book entitled Manuscript on
Deciphering Cryptographic Messages, written by Al-Kindi (801–873). In his book, Al-Kindi gave a
detailed description of how to use statistics and frequency analysis to decipher encrypted messages.
This text arguably gave rise to the birth of both statistics and cryptanalysis.[4][5] Al-Kindi also made the
earliest known use of statistical inference, while he and other Arab cryptologists developed the early
statistical methods for decodingencrypted messages. An important contribution of Ibn Adlan (1187–
1268) was on sample size for use of frequency analysis.[3]
The Nuova Cronica, a 14th-century history of Florence by the Florentine banker and official Giovanni
Villani, includes much statistical information on population, ordinances, commerce and trade,
education, and religious facilities and has been described as the first introduction of statistics as a
positive element in history,[6] though neither the term nor the concept of statistics as a specific field
yet existed. But this was proven to be incorrect after the rediscovery of Al-Kindi's book on frequency
analysis.[4][5]
The idea of the median originated in Edward Wright's book on navigation (Certaine Errors in
Navigation) in 1599 in a section concerning the determination of location with a compass.
Sir William Petty, a 17th-century economist who used early statistical methods to analyse demographic data.

The birth of statistics is often dated to 1662, when John Graunt, along with William Petty, developed
early human statistical and census methods that provided a framework for modern demography. He
produced the first life table, giving probabilities of survival to each age. His book Natural and Political
Observations Made upon the Bills of Mortality used analysis of the mortality rolls to make the first
statistically based estimation of the population of London.

DEVELOPMENT OF THE MODERN STATISTICS – STATISTICS TODAY


Although the origins of statistical theory lie in the 18th-century advances in probability, the modern
field of statistics only emerged in the late-19th and early-20th century in three stages. The first wave,
at the turn of the century, was led by the work of Francis Galton and Karl Pearson, who transformed
statistics into a rigorous mathematical discipline used for analysis, not just in science, but in industry
and politics as well. The second wave of the 1910s and 20s was initiated by William Sealy Gosset,
and reached its culmination in the insights of Ronald Fisher. This involved the development of
better design of experiments models, hypothesis testing and techniques for use with small data
samples. The final wave, which mainly saw the refinement and expansion of earlier developments,
emerged from the collaborative work between Egon Pearson and Jerzy Neyman in the
1930s.[23] Today, statistical methods are applied in all fields that involve decision making, for making
accurate inferences from a collated body of data and for making decisions in the face of uncertainty
based on statistical methodology.

OBJECTIVES
Upon completion of the Additional Mathematics’ project work, I am able to gain valuable
experiences based on the objectives :
Apply and adapt a variety of problem-solving strategies to solve routine and non-routine
problems
Experiences classroom environments which are challenging, interesting and meaningful
and hence, improve thinking skills.
Experiences classroom environments where knowledge and skills are applied in
meaningful ways in solving real-life problems.
Experiences classroom environments where expressing one’s mathematical thinking,
reasoning and communication are highly encouraged and expected.
Experiences classroom environments that simulate and enhances effective learning.
Acquire effective mathematical communication through oral and writing to use the
language of mathematics to express mathematical ideas correctly and precisely.
Enhance exquisite mathematical knowledge and skills through problem-solving in ways
that increase the interest and confidence.
Prepare ourselves for the demand of our future understanding and our workplace.
Realize the importance and the beauty of mathematics.
We are expected to submit the project work within 2 weeks from the first day the task is
being administered to us. Failure to do so will result in us to not receive the certificate.

TASK SPECIFICATIONS
All Form 5 students who took Additional Mathematics subjects are required to do this
project. The main objective of the Additional Mathematics Work 2019 was to analyze the
weight and height to determine the average BMI in my class. I had to analyze my
classmates' BMI. In the First Part of the questions, I had to list the importance of data
analysis in daily life. I also had to specify the 3 types of measure of central tendency and at
least 2 types of measure of dispersion, as well as examples of their uses in our daily life.
For Part 2 I had to calculate the mean, median, mode, variance and standard deviation for
the ungrouped data of the BMI. Then, I had to construct a frequency distribution table that
contains at least 5 class intervals of equal size. By using the mathematical method, I had
able to draw histogram and ogive. From the table, graph and formulas, I would find the
mean, mode, median, variance, and standard deviation also interquartile range from the
grouped data.
In a nutshell, by using mathematics, I was able to draw an ogive to calculate BMI in my class

APPLICATIONS OF STATISTICS IN DAILY LIFE


1. Statistics is the collection of data and its representation or interpretation. Statistics use
three means of comparison through the data mean, median & mode.
2. What is Mean? The mean is used as one of the comparing properties of statistics. It is
defined as the average of all the clarifications.
It helps teachers to see the average marks of the students.
It is used in factories, for the authorities to recognize whether the benefits of the workers is
continued or not.
It is also used to contrast the salaries of the workers.
To calculate the average speed of anything.
It is also used by the government to find the income or expenses of any person.
Using this the family could balance their expenses with their average income.
3. What is Median & what are its daily applications? The median is defined as the middle
value of any observation. Its applications in daily life are as follow:
It is used to measure the distribution of earnings.
Used to find the players height e.g. football players.
To find the middle age from the class students.
Used to find the poverty line.
4. What is mode & what is its importance in our daily life? Mode contains the highest
frequency in any data. Its applications are as follows:-
It is used for the influx of public transport.
The no. of games succeeded by any team of players.
The frequency of the need for infants
Used to find the number of the mode is also seen in the calculation of the wages, in the
patients going to the hospitals, the mode of travel, etc.
Part One
Question 1
The importance of data analysis in daily life:
Data analysis is the most important in businesses. No business can survive without
analyzing available data. Visualize the following situations:
A pharma company is performing trials on number of patients to test its new drug to fight
cancer. The number of patients under the trial is well over 500.
A company wants to launch new variant of its existing line of fruit juice. It wants to carry
out the survey analysis and arrive at some meaningful conclusion. Sales director of a
company knows that there is something wrong with one of its successful products however
hasn't yet carried out any market research data analysis. How and what does he conclude?
These situations are indicative enough to conclude that data analysis is the lifeline of any
business. Whether one wants to arrive at some marketing decisions or fine- tune new
product launch strategy, data analysis is the key to all the problems.
What is the importance of data analysis - instead, one should say what is not important
about data analysis. Merely analyzing data isn't sufficient from the point of view of making
a decision. How does one interpret from the analyzed data is more important. Thus, data
analysis is not a decision making system, but decision supporting system.
Data analysis can offer the following benefits:
Structuring the findings from survey research or other means of data collection.
Break a macro picture into a micro one.
Acquiring meaningful insights from the dataset.
Basing critical decisions from the findings.
Ruling out human bias through proper statistical treatment.
Question 2
MEASURE OF CENTRAL TENDENCY
Mode, median and mean are the three measures of central tendency which indicates the
central values around which the data seems to cluster. However, the values of these
measures may differ greatly. Thus, it is vital to choose one that reflects the central value of
data. When an extreme value exists in the data, the mean does not reflect the central value
of the data. The median and the mode are not affected by the existence of the extreme
value. Nevertheless, the mode is confusing when the data has more than one mode.
Mean
The mean of a set of ungrouped data can be obtained by adding all the values of the data
∑𝓧
and dividing the sum by the total number of values of data, x = 𝓝 where

N is the total number of values of data.


If the set of ungrouped data is given in a frequency distribution table, then the mean is
∑𝐟𝓧
calculated as follows, x = ∑𝒇 where f is the frequency.

When the values of a set of data are grouped into classes in a frequency table, the value that
∑𝐟𝓧
is used to represent all the values of the data in a class is the midpoint of the class, , x = ∑𝒇
where x is the class midpoint and f is the class frequency.
Mode
The mode of a set of ungrouped data can be determined by identifying the value which
occurs most frequently. When the data is given in a frequency distribution table, then the
mode is the value with the highest frequency.
Example: 15, 11, 12,14, 11,16, 17, 11,19, 11
The mode is 11 since 11 occurs most frequently.
In a set of a grouped data, the mode can be found in a histogram. A histogram is a graphic
representation of a frequency distribution table. It is constructed by marking the
boundaries of each class along the horizontal axis and the frequencies along the vertical
axis. The class with the tallest bar is the modal class and based on the diagram below, P is
the mode.
(pic)
Median
The median of a set of ungrouped data is the value in the middle position of the set when
the values of the data are arranged in ascending order.
Example:
11, 13, 14, 16, 17, 18, 19, 21, 23
The median is 17 since it is in the middle.
If the total number of values of the data is even, there will be two values in the
middle. Thus, the median is the mean of the two middle values.
Example :
21, 22, 24, 26, 27, 29, 31, 32
26+27
Median = 2

= 26.2
In a set of grouped data the median can be estimated from an ogive or calculated from a
cumulative frequency table by using the following formula:
𝑁
2
−ℱ
𝑚 = ℒ+( )C
𝑓𝑚

L = lower boundary of the median class


N = total frequency
E = cumulative frequency up to the lower boundary of the median class
𝑓𝑚 = frequency of the median class
C = size of the class interval
(pic)
The ogive is constructed by plotting the cumulative frequency against the corresponding
upper boundary of each class of a set of data :
MEASURE OF DISPERSION
Range
The simplest measure of dispersion of a set of data is the range. The range of a set of
ungrouped data is the difference between the largest value and the smallest | value in the
data, range = largest value - smallest value.
The range of a set of grouped data is the difference between the midpoint of the highest
class and the midpoint of the lowest class, range = midpoint of the highest class - the
midpoint of the lowest class.
Variance
∑𝒙𝟐 ∑𝒇𝒙𝟐
𝝈𝟐 = − 𝒙𝟐 for a set of ungrouped data and 𝝈𝟐 = − 𝒙𝟐 for a set of grouped data,
𝑵 ∑𝒇
where x is the class midpoint.
Standard deviation

The square root of the variance is called the standard deviation, 𝝈 = √𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 which
has the same unit as each value of the data.
Question 3 Examples of the uses of each type of measure of central tendency in daily life:
Mode
The mode appears the most often out of a given set of numbers. A data set can have more
than one mode or no mode at all if all of the numbers appear with equal frequency. The
concept of a mode can be easily connected to many tangible, real-life situations. A bakery
that sells twice as many red velvet cupcakes as chocolate brownies will need to produce
more cupcakes to satisfy their customers. In this bakery, the number of red velvet sold is
the mode.
Mean and Average
The terms mean and average are used interchangeably in mathematics. When the entirety
of a data set is added and divided by the total number of data points, the resulting number
is referred to as the mean or the average. Averages are used quite frequently in everyday
statistics. Ask the student to find the average age of classmates in the room with or without
the age of the instructor included and in their own family.
Median
Median is another simple measurement used commonly in basic statistical analysis. When a
data set is organized by the size of numbers, the median is the middle value. If there is an
even number of data points, the median is the average of the two middle values. Medians
can be used when the data set features extremely large or small values that could
significantly affect the average. Median is likely to be the best measurement of the age
when a given group features one person aged 80, and everyone else between 18 and 20.
Question 3
Based on the answer above, the most appropriate measure of central tendency that reflects
the performance of my class is the mean. The mean is able to show the average mark
obtained by the class reflecting the average performance of all the students in the class.
Question 3
Merits of standard deviation
It is based on all the items of the distribution.
It is a mean-able to algebraic treatment since actual + or  signs deviations are taken into
consideration.
It is least affected by fluctuations of sampling.
It facilitates the calculation of combined standard deviation and coefficient of variation,
which is used to compare the variability of two or more distributions.
It facilitates the other statistical calculations like skewness and correlation.
It provides a unit of measurement for the normal distribution.
Limitation of quartile deviation
It is not suited to algebraic treatment.
It is very much affected by sampling fluctuations.
The method of dispersion is not based on the items of the series.
It ignores the 50% of the distribution.
Question 4
a) Grouped data is more accurate. Ungrouped data are data that are not organized, or if
arranged could only be from highest to lowest. Grouped data are data that are
organized and arranged into different classes or categories.
b) Data in statistics can be classified into grouped data and ungrouped data. A row of data
such as 1, 2, 6, 4,6, 3, 7, is called an ungrouped data. Ungrouped data is any list of
numbers that you had gathered. Besides, this data can also be summarized neatly in a
frequency distribution table as shown below:
c)
Number 1 2 3 4
Frequency 3 2 1 2

Ungrouped data is usually used when there are lesser numbers to count or small numbers
with only one possible answer. Example: The ages of 200 people going to a park on a
Saturday afternoon. The ages are: 27, 8, 10, 49, ..
On the contrary, grouped data is data that has been organized into groups known as
classes. Each of these classes is of a certain width and this is referred to as the class interval
or class size. Example:

Age (years) Frequency


0-9 5

10-19 6

20-29 7

30-39 3

40-49 4

50-59 5

Grouped data is the opposite of ungrouped data which is used when you have — a big
amount of numbers or large numbers of possible outcomes. Example: The ages of 200
people going to a park on a Saturday afternoon. The ages have been grouped into the
classes 0-9, 10-19, 20-29,...
HEIGHT AND WEIGHT OF 5 STEM 1’s STUDENTS
a)
Students Height (m) Weight (kg)
1 1.73 43
2 1.77 54
3 1.62 50
4 1.58 45
5 1.65 48
6 1.78 50
7 1.60 50
8 1.63 53
9 1.68 44
10 1.75 85
11 1.69 75
12 1.58 35
13 1.76 57
14 1.74 73
15 1.68 50
16 1.55 44
17 1.62 38
18 1.62 46
19 1.6 49
20 1.69 53
21 1.6 58
22 1.67 43
23 1.69 56
24 1.71 68
25 1.71 56
26 1.75 50
27 1.68 52
28 1.65 54
29 1.55 50
30 1.7 52
31 1.68 70
32 1.63 45
33 1.65 50
34 1.88 75
35 1.72 54
36 1.59 41
37 1.6 47
38 1.52 45
39 1.58 52
40 1.74 62
41 1.75 50
42 1.71 49
43 1.58 45
44 1.67 50
45 1.7 75
46 1.72 54
47 1.88 75
48 1.72 70
49 1.76 60
50 1.86 104

CLASS INTERVAL AND FREQUENCY


Based on th sample data collected from (a)
b)

Weight Frequency
35-44 6
45-54 28
55-64 6
65-74 4
75-84 4
85-94 1
95-104 1
3 statistical graphs:
30

35-44
25
45-54
20
55-64
Student

15 65-74

75-84
10
85-94

5 95-104

0
Weight (kg)

Histogram

30
frequency
25

20

15

10 frequency

0
35-44 45-54 55-64 65-74 75-84 85-94 95-104
Frequency Polygon

Frequency
30

25

20
Student

15
Frequency

10

0
35-44 45-54 55-64 65-74 75-84 85-94 95-104

Weight (kg)

Bar Chart

Standard deviation
Method 1 : Using calculator
n Σχ Σχ² Standard
deviation
50 55.3 27.65 161722 13.28

Method 2 : Using formula 1


Weight (kg) Frequency,ƒ Midpoint , χ ƒχ ƒχ²
35-44 6 39.5 237 936105
45-54 28 49.5 1386 68607
55-64 6 59.5 357 21241.5
65-74 4 69.5 278 19321
75-84 4 79.5 318 25281
85-94 1 89.5 89.5 8010.25
95-104 1 99.5 99.5 9900.25
Σƒ = 50 Σƒχ = 2765 Σƒχ² = 161722

Mean, = 55.3kg
σ ² = Σƒχ² - (Σƒχ)²
Σƒ
= 161722/50-(2765/50)²
= 176.35kg

σ = 176.51/2
=13.28kg

Method 3 : Using formula 2

Weight (kg) Frequency, ƒ Midpoint, χ ( χ - )² ƒ( χ - )²


35-44 6 39.5 249.64 1497.84
45-54 28 49.5 33.64 941.92
55-64 6 59.5 17.64 105.84
65-74 4 69.5 201.64 806.56
75-84 4 79.5 585.64 2342.56
85-94 1 89.5 1169.64 1169.64
95-104 1 99.5 1953.64 1953.64
Σƒ = 50 Σ( χ - )² Σƒ( χ - )²
= 4211.48 = 8818

= 55.3kg

σ² = Σƒ( χ - )²
Σƒ
= 8818
50
= 176.36kg

σ = 176.361/2
= 13.28kg

D) Conclusion:
The value of the standard deviation σ = 13.28kg indicates that there is a wide
dispersion from the value of mean obtained
FURTHER EXPLORATION
Student Height (m) Weight (kg) BMI
1 1.73 43 14.37
2 1.77 54 17.24
3 1.62 50 19.05
4 1.58 45 18.03
5 1.65 48 17.63
6 1.78 50 15.78
7 1.60 50 19.53
8 1.63 53 19.95
9 1.68 44 15.59
10 1.75 85 27.76
11 1.69 75 26.26
12 1.58 35 14.02
13 1.76 57 18.4
14 1.74 73 24.11
15 1.68 50 17.72
16 1.55 44 18.31
17 1.62 38 14.48
18 1.62 46 19.53
19 1.60 49 19.14
20 1.69 53 18.56
21 1.60 48 18.75
22 1.67 43 15.75
23 1.69 56 19.6
24 1.71 68 23.26
25 1.71 56 19.15
26 1.75 50 16.31
27 1.68 52 18.42
28 1.65 54 19.83
29 1.55 50 20.81
30 1.70 52 18
31 1.68 70 24.8
32 1.63 45 16.93
33 1.64 50 18.37
34 1.88 75 21.22
35 1.72 54 17.84
36 1.59 41 16.22
37 1.60 47 18.36
38 1.52 45 19.48
39 1.58 52 20.83
40 1.74 62 20.48
41 1.75 50 16.33
42 1.71 49 16.76
43 1.58 45 18.03
44 1.67 50 17.93
45 1.70 75 25.95
46 1.72 54 18.25
47 1.88 75 21.22
48 1.72 70 23.66
49 1.76 60 19.37
50 1.88 104 30.06

Weight, Frequency, Midpoint, Midpoint x Cumulative Upper


(kg) ƒ χ Frequency, Frequency Boundary
ƒχ
35-44 6 39.5 237 6 44.5
45-54 28 49.5 1386 34 54.5
55-64 6 59.5 357 40 64.5
65-74 4 69.5 278 44 74.5
75-84 4 79.5 318 48 84.5
85-94 1 89.5 89.5 49 94.5
95-104 1 99.5 99.5 50 104.5

Means, = sum of (frequency x class mark)


Total frequency
= Σƒχ
Σƒ

= 237+1386+357+278+318+89.5+99.5
6+28+6+4+4+1+1
= 2765
50
= 55.3kg

Median Class = 45-54

Therefore , median, m = L + N/2-F x C



= 44.5 + 50/2-6 x 10
28
= 44.55 + 6.79
= 51.29kg

30

25

20
35-44
15 45-54
55-64
10

0
weight(kg)

From the histogram, mode = 49.5kg


Comments on values obtained:
Mean - mean is the best measure of central tendency because all the
value in a set of data are taken into consideration while
calculating the mean. Mean is also suitable representing data
which are evenly distributed.
Median - median is used when there are extreme values because median
eliminates the effects of extreme values in the set of data.
Mode - mode is used to represent a set of data containing a large number
of values which take only some specific values and many
repeated values.

Therefore,I will choose mean because it is the best measure of central


tendency.

BMI Frequency Cumulative frequency Upper boundary


6-10 0 0 10.5
11-15 3 3 15.5
16-20 34 37 20.5
21-25 9 46 25.5
26-30 3 49 30.5
31-35 1 50 35.5
BMI
60

50

40

30
BMI
20

10

0
0 5 10 15 20 25 30 35 40

-10

From the ogive above, it shows that


 About 52 % of the student are underweight
 About 40% of the student are normal weight
 About 2% of the student are overweight
 Only 2% of the student are obese
 The mean of the BMI = Σƒχ
Σƒ
= 965.43
50
= 19.3

 Most of the student are not obese but underweight or normal weight.

S-ar putea să vă placă și