Mathematics Research Paper - Binomial Distribution With The Galton Board

Mathematical Exploration
The Galton Board
IB SL Mathematics
Nidhi Rameshan
Contents
Introduction ................................................................................................................ 3
Rationale .................................................................................................................... 3
Aim and Methodology................................................................................................. 4
Aim ......................................................................................................................... 4
Part I: Modelling the curve for SAT scores based on Galton Board ........................ 5
Part II: Analysis by comparison of actual and modelled SAT scores ...................... 9
Conclusion ............................................................................................................... 11
Works cited .............................................................................................................. 13

Introduction
The bean machine, also known as the Galton Board or quincunx, is a 7.5" by
4.5" desktop machine, invented by Sir Francis Galton to demonstrate probability. It
consists of an upright board with evenly spaced nails or pegs, driven into its upper
half, where the nails are arranged in staggered order (a quincunx), and the lower half
divided into a number of evenly-spaced rectangular slots. The upper edge has a funnel
with balls, each of diameter less than the distance between the pegs. The funnel is
located precisely above the central peg of the second row, so that each ball would fall
vertically on the uppermost point of the nail's surface. As the board is rotated on its
axis, the balls would be released from the funnel at the top of the board and have a
50% chance of going either way every time they would hit a pin, and eventually fall
into their respective slots. Each slot in the board has an associative probability and
looking at all the slots simultaneously gives a distribution. The probability that a ball
would end up in an outer slot is very small. Thus, we see a tendency for the balls to
go towards the centre and create what’s called a normal distribution, or bell-shaped
curve. As the beads accumulate in the slots, they approximate a bell curve.
The Galton board shows the emergence of an orderly bell curve from the chaos
of numerous falling balls bouncing off pegs. This shows the distribution of balls into
the slots as they fall either left or right of the pegs.
Rationale
The Galton Board, or bean machine as I like to call it, played the role of a toy
during my childhood. It was situated on the table in my father's office, and I used to
play with it whenever I could. I was fascinated by the way the tiny metal balls would
hit the pegs and change their course, to finally fall into one of the slots, and form a
curve. I used to wonder as to why the pegs were placed in the shape of a triangle, and
if that affected the way the balls moved and fell into the slots. My father tried to explain
the rules of probability to me, but I was too young to understand, and one day, the
bean machine broke. When we started learning probability in the lower classes, I
vaguely remembered the term, but could not associate it with anything I could
remember. Also, being a high school student, the SATs are a mandatory exam, and
after understanding the principle of the Galton Board, I understood that the distribution
of scores allowed for a good investigation of the board and how it can be used to
analyse real life situations.
Aim and Methodology
Aim
The aim of the exploration is to use the principle of the Galton Board to analyse
and predict real life situations having random probability, which follow normal
distribution. In this case, the SAT scores for October 2017 will be analysed, due to
the large number of test takers showing random probability, to estimate the predicted
scores, and compare them with the actual scores obtained by the test takers.
The exploration will be divided into two parts, part I and part II. Part I is the
modelling of the curve, which consists of using the concept of the Galton Board in
tandem with regression to make a statistical model of the SAT, with the beads in the
Galton Board representing the test takers and the pegs representing the questions in
the test. Part II is the analysis, where the cumulative distribution function and related
error function will be used to analyse the differences between the predicted model and
its real-life equivalence, using calculus to show the difference between the actual
scores and the predicted scores, therefore showing whether the test was harder,
easier or on par with the expectations.
Part I: Modelling the curve for SAT scores based on Galton Board
The Galton Board works on the principle of the central limit theorem, in order to
use a binomial distribution to approximate a normal distribution. According to the
central limit theorem (CLT), the sum of a large number of random variables is
approximately normal, and the mean of all samples in the data set will be
approximately equal to the mean of the entire data set. The Galton Board is similar to
Pascal's Triangle, and can be used to model this situation. Similar to the flipping of a
coin, where the more flips done the higher chances of getting a 50% chance of each
outcome, in the Galton board, the higher the number of balls and pegs on the board,
the higher chances of the formation of a bell curve. Every time a ball hits a peg, it has
an equal (50%) chance of bouncing to the left or right of the peg, with probability p
(and q= 1-p). The nails are symmetrically placed in the form of a quincunx, so the balls
bounce with equal probability i.e., p=q=1/2. When each ball reaches the bottom row,
it hits the nth peg from the left, exactly when it has taken n right turns. This occurs with
probability
and this gives rise to a binomial distribution.
If the number of balls is sufficient large and p=q=1/2, then according to the weak
law of large numbers, the distribution will approximate that of a normal distribution,
where
𝑙𝑖𝑚 𝑃(𝑥) = 0.5

𝑥→∞
A normal distribution has a property where the mean, median and mode are all the
same. The standard normal distribution equation is
1 1𝑥 2
−
×𝑒 2
√2𝜋
and for a random variable X, the probability that X will take on a value which is less
than, equal to or greater than x is called cumulative distribution function,
𝑥 𝑡2
1
𝑓(𝑥) = ×∫ 𝑒2 𝑑𝑡
√2𝜋 ∞
While modelling the SAT scores, the test takers will be considered to be the
balls in the board, and each peg will represent a question in the test. While using a
simulation of the Galton Board, the number of columns will be taken as 12 columns,
and the number of beads will be set at 100000. The number of columns is set at 12,
because the range for the SAT scores is between 400 and 1600, and thus, the
difference between each column represents 100 points, and the simulation was done
with 100000 beads because the mean number of test takers per year is 1.7 million. As
the total number of questions in the SAT is 154, this allows me to assume that each
peg represents 14 questions, and each and each bead represents 17 test takers. Also,
while modelling the graph, it is assumed that each student had a 50% chance of
getting each question right, considering the difficulty of each question. In order to find
the points on the curve to be used in a regression, I chose the midpoints of each
column to be the average score.

Figure 1 Simulation for Normal Distribution Curve
Thus, the twelve points that were used to model the curve are given below
𝑥 𝑦
450 00.043
550 00.530
650 02.614
750 08.008
850 16.366
950 22.527
1050 22.412
1150 16.142
1250 08.033
1350 02.727
1450 00.540
1550 00.049
In order to create a function to model the graph, polynomial regression was used by
inputting the values in an algorithm. In the equation below, y refers to the percentage
of test takers, x refers to the scores gained, β is the coefficient of the equation and ε
represents the error.
The inputted values used in the polynomial regression equation calculated the given
function.
𝑦 = (−6.870986513 × 10−5 × 𝑥 2 ) + (1.374296953 × 10−1 × 𝑥) + (−52.19932124)
The model formed from the function is accurate between 750 and 1450 due to the
related error function.

Figure 2 Model of SAT scores
Part II: Analysis by comparison of actual and modelled SAT scores
Due to the cumulative distribution frequency, at 1000,
1
𝐹(𝑥 = 1000) =
2
Considering the area under the graph as the number of people attempting the test,
1450
∫ 𝑓(𝑥)𝑑𝑥 = 9520𝑝𝑒𝑜𝑝𝑙𝑒
750
𝑎
9520
∫ 𝑓(𝑥)𝑑𝑥 = = 4760
750 2
𝐹(𝑎) − 𝐹(750) = 4760
𝑎 ≈ 1050
𝑀𝑒𝑎𝑛𝑠𝑐𝑜𝑟𝑒 ≈ 1050, which is similar to the mean score in the model i.e.1000. This
proves that the modelled function is mostly accurate when compared to the actual
values, also accounting for error.
Figure 3 Actual SAT scores 2017
Figure 4 Actual SAT Scores 2018

In order to compare the values of the actual scores and the modelled scores, the ratio
of the value at the 50th percentile and the entire value of the area under the curve
𝑎 𝑐
must be compared for both situations. Let be the ratio for the actual scores, and
𝑏 𝑑
be the ratio for the modelled scores.
𝐹(𝑥) = ∫ 𝑓(𝑥)𝑑𝑥
𝐹(𝑥 = 1000)
𝐴𝑐𝑡𝑢𝑎𝑙𝑆𝑐𝑜𝑟𝑒𝑠 (2017) = = 0.567456
𝐹(𝑥 = 1600)
𝑓(𝑥 = 1000)
𝐴𝑐𝑡𝑢𝑎𝑙 𝑆𝑐𝑜𝑟𝑒𝑠 (2018) = = 0.5553
𝑓(𝑥 = 1600)
𝐹(𝑥 = 1000)
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑𝑆𝑐𝑜𝑟𝑒𝑠 (2017 𝑎𝑛𝑑 2018) = = 0.413585
𝐹(𝑥 = 1600)
𝑎 𝑐 𝑒
> 𝑑 >𝑓
𝑏
0.56 > 0.55 > 0.41
Conclusion
Through the exploration, it was seen that the SAT scores form a normal
distribution curve, and the curve modelled based on the values in the simulation was
similar to the curve formed by the actual scores gained in October 2017, with slight
error, which was expected. While comparing the values of both the model and the
𝑎 𝑐
actual curve, it was seen through calculus, that > 𝑑, meaning that the actual scores
𝑏
were higher than the scores estimated based on the simulation and model created.
Since the simulation represented the average scores each year and the actual grades
were higher than the estimated score, it can be said that based on previous SATs, the
October 2017 SAT was easier than expected. If the estimated scores were higher than
the actual scores, then the test would have been harder than other tests and the level
of difficulty would be higher than expected.
The difficulties faced while carrying out this exploration were extensive, such
as the excessive error in the model between 400 and 750 marks, and 1450 and 1600
points, which could thus lead to vast inaccuracy while comparing the model with the
actual values. Also, the values to be inputted in the polynomial regression function
were extremely small, and thus could not be calculated directly and had to be
calculated with the assistance of an algorithm. The values were established based on
several assumptions, such as assuming each student had a 50% chance of getting
the question correct. However, the data set of the actual October 2017 SAT scores
was not hard to obtain, giving me a accurate comparable data set, thus strengthening
my comparison and ensuring accuracy for the most part.
The findings of my exploration show that any real life situation with a random
probability with a large enough sample size can be modelled and predicted or
estimated based on the Galton Board and it’s principle, and will form a natural
distribution curve. This can be applied to the prediction of scores for any other exams
or tests, and also the prediction of height, strength and other abilities in humans and
animals.
Works cited
1. “Central Limit Theorem.” Exponential Distribution | Definition | Memoryless
Random Variable,
www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php.
2. “Galton Board.” From Wolfram MathWorld,
mathworld.wolfram.com/GaltonBoard.html
3. “Galton Board.” Pascaline – Interactive Simulations – EduMedia, www.edumedia-
sciences.com/en/media/905-galton-board
4. “Galton Board.” PhysLab, www.physlab.org/class-demo/galton-board/.
5. Kozlov, V. V. and Mitrofanova, M. Yu. "Galton Board." Regular Chaotic
Dynamics 8, 431-439, 2002.
6. Learner.org. (2018). Mathematics Illuminated | Unit 7 | 7.5 The Galton Board
Revisited. [online] Available at:
https://www.learner.org/courses/mathilluminated/units/7/textbook/05.php [Accessed
27 May 2018].
7. Weisstein, Eric W. "Distribution Function." From MathWorld--A Wolfram Web
Resource. http://mathworld.wolfram.com/DistributionFunction.html

Mathematics Research Paper - Binomial Distribution With The Galton Board

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Mathematics Research Paper - Binomial Distribution With The Galton Board

Încărcat de

Drepturi de autor:

Formate disponibile

Mathematical Exploration

The Galton Board

Aim and Methodology................................................................................................. 4

Works cited .............................................................................................................. 13

4.5" desktop machine, invented by Sir Francis Galton to demonstrate probability. It

the slots as they fall either left or right of the pegs.

analyse real life situations.

Aim and Methodology

easier or on par with the expectations.

use a binomial distribution to approximate a normal distribution. According to the

and this gives rise to a binomial distribution.

𝑙𝑖𝑚 𝑃(𝑥) = 0.5

same. The standard normal distribution equation is

than, equal to or greater than x is called cumulative distribution function,

column to be the average score.

represents the error.

𝑦 = (−6.870986513 × 10−5 × 𝑥 2 ) + (1.374296953 × 10−1 × 𝑥) + (−52.19932124)

related error function.

Part II: Analysis by comparison of actual and modelled SAT scores

Due to the cumulative distribution frequency, at 1000,

𝐹(𝑎) − 𝐹(750) = 4760

values, also accounting for error.

Figure 3 Actual SAT scores 2017

Figure 4 Actual SAT Scores 2018

be the ratio for the modelled scores.

0.56 > 0.55 > 0.41

of difficulty would be higher than expected.

my comparison and ensuring accuracy for the most part.

1. “Central Limit Theorem.” Exponential Distribution | Definition | Memoryless

2. “Galton Board.” From Wolfram MathWorld,

3. “Galton Board.” Pascaline – Interactive Simulations – EduMedia, www.edumedia-

4. “Galton Board.” PhysLab, www.physlab.org/class-demo/galton-board/.

5. Kozlov, V. V. and Mitrofanova, M. Yu. "Galton Board." Regular Chaotic

Dynamics 8, 431-439, 2002.

6. Learner.org. (2018). Mathematics Illuminated | Unit 7 | 7.5 The Galton Board

Revisited. [online] Available at:

7. Weisstein, Eric W. "Distribution Function." From MathWorld--A Wolfram Web

S-ar putea să vă placă și