0 evaluări0% au considerat acest document util (0 voturi)

65 vizualizări9 paginiA

Jun 05, 2014

© © All Rights Reserved

DOCX, PDF, TXT sau citiți online pe Scribd

A

© All Rights Reserved

0 evaluări0% au considerat acest document util (0 voturi)

65 vizualizări9 paginiA

© All Rights Reserved

Sunteți pe pagina 1din 9

Alex Nguyen

Mr. Reppenhagen

May 28

th

, 2014

IB Mathematics

What is the Central Limit Theorem and what are its applications in statistics?

Table of Contents

1. Introduction 2

2. Definition of Terms 2

3. Explanation 4

4. History of Central Limit Theorem 4

5. Demonstration of Proof of Central Limit Theorem 5

6. Applications 6

a. Sampling 7

b. Polls 7

7. Weakness 8

8. Conclusion 8

9. Works Cited

2

Introduction

The Central Limit Theorem has many versions, referring to a convergence of means of a probability

distribution towards one single distribution: the normal distribution. The Classical Central Limit Theorem

states that the arithmetic mean of a large number of independent and identically distributed random

variables, each with well-defined mean and variance, will be normally distributed. In simpler terms, if we

obtain a large sample, each individual in the sample being independent from one another and random,

and calculate the means of these random variables, then the central limit theorem states that this

distribution of means will be approximately bell-shaped, or distributed in a normal curve. The Classical

Central Limit Theorem allows one to speculate on the probability distribution of the outcome (the mean)

of a process (some random variables comprising an event) without knowing much about the nature of

the events themselves other than the fact that they are identical and independent. Today, the central

limit theorem is made abstract and its hypotheses weakened to allow for some cases in which the

variables can be dependent on one another, which widens the scope and applicability of the theorem.

The Central Limit Theorem is a surprising result in statistics and probability and is used constantly, which

elevates the importance of the normal distribution in statistics and probability.

Definition of Key Terms

First, let us define our terms. By a random variable we mean something that changes due to chance. A

random variable can be domain of a probability density function. Each value of a random variable,

whether discrete or continuous, is assigned a single probability. For example, the values of a die roll

are values of a random variable. A mathematical function that assigns all possible values of the random

variable an associated probability is called a probability distribution. All probability distribution has its

area under the curve as one unit, because the chances of something in the sample space happening is

certain. The probability distribution for one single die roll is a horizontal straight line.

The mean or expected value of a random variable is the value of the variable we would expect if we

repeat the random variable an infinite number of times and take the average of all the value. In a way,

the mean or expected value is a weighted average of all possible values. The standard deviation is the

square root of the root mean square or quadratic mean of the distances between all the values and the

expected value.

3

The normal distribution is a very commonly occurring continuous probability distribution in which the

mean, median, and mode are the exactly the same. A normal distribution with mean and standard

deviation is given by the equation:

The normal distribution is also called a Gaussian distribution. The value of a normal distribution is

practically zero several standard deviations away from the mean. For example, 99.7% of values are

present within 3 standard deviations of the mean. Therefore, extreme events are predicted to have very

little chances of occurring, to due the exponential decay demonstrated on both sides.

The picture above is a normal distribution with the probability of values lying between each standard

deviation from the mean.

Then, we need to define what being independent and identically distributed means. The phrase is

used to describe a collection of random variables. Random variables are independent from each other if

one occurring does not alter the probability of the other. Random variables are identically distributed

when they have the same probability distribution as the others. For example, the events of rolling the

die repeatedly is independent and identically distributed because they have the same probability

4

distribution (horizontal straight line) and are independent from one another (one die roll does not affect

another).

By the mean in this essay we mean the arithmetic mean, and by variance we mean the square of

standard deviation.

Explanation

The Central Limit Theorem essentially describes the characteristics of a population of means creating

from the means of an infinite number of random population samples size N, all drawn from a parent

population. The Central Limit Theorem specifically predicts that regardless of the distribution of the

parent population, as long as the samples are random and independent from one another, that

1) The mean of a population of means is always equal to the mean of the parent population from

which the samples are taken

2) The standard deviation of a population of means is always equal to the standard deviation of the

parent population divided by the square root of the sample size.

3) The distribution will increasingly approach a normal distribution as the size N of sample

increases.

We know that different parts of the distribution converge differently to a normal distribution. The parts

close to the mean converges quickly to the normal distribution but the tails converge more slowly to the

normal. Therefore, we say that the central limit theorem gives an asymptotic distribution. It requires a

large number of observations to stretch the convergence to the tails.

History of the Central Limit Theorem

Many natural and social scientists in the 19

th

century has noticed a pattern in the means of these

independent random variables. When the outcome (the means) is affected by a lot of random variables

(high sample size) and when each variable only has a slight effect on the outcome as a whole, the mean

is distributed in a certain way, regardless of the actual probability distribution of the random variables.

Mathematically, however, it is an important and seemingly daunting problem, which requires a

mathematician to draw conclusions about the outcome (the mean) from a set of random variables when

little is known about the distribution of the various variables. It has been described as "one of the most

5

remarkable results in all of mathematics" and "a dominating personality in the world of probability and

statistics" (Adams, 1974, p. 2). It is also one of the earliest results of probability theory.

The central limit theorem was named by the mathematician Georg Polya, from a paper in 1810 by the

French mathematician Laplace, in which Polya recognized a number of theorem that eventually leads to

the appearance of the normal distribution. Polya, drawing from Laplaces foundations, named the

theorems central limit theorems which is used widely today.

Proof of Central Limit Theorem

While the proof of the central limit theorem is too advanced for the scope of this exploration, we can

explore the heuristic behind the central limit theorem.

The normal distribution satisfies a specific identity about itself

A random variable with a normal distribution of mean

and another

random variable distributed normally with mean

distributed normally with mean

In essence, normal distributions when added together yield normal distributions up to a degree of

scaling. The equation

defines a normal distribution.

These properties help us understand how we expect normalized means to converge to a normal

distribution. Suppose that the population of means converge to a hypothetical distribution D. We have

6

where

is simply the normalized sum. To find the mean we simply divide the sum by .

We would expect that

so D must be normal.

Applications of Central Limit Theorem

Hypothesis testing draws strongly from the central limit theorem. The central limit theorem helps

scientists who want to draw claims about a population that they are studying. For example, in areas of

knowledge where there are variable behaviors such as in psychology, experts need to formulate

hypotheses about a population and need to know the margin of error for their claims. They use

statistical experiments to obtain sample data from the population. Information from the data, such as

standard deviation, sample size, or the mean can be used to test for the accuracy of a specific

hypothesis with regards to a population. Hypothesis testing that assumes data is just from a normal

distribution seems unrealistic, because real world data shows outliers, skewing towards one side,

multiple peaks and asymmetry. For examples, if we sample the worlds wealth, we have outliers such as

Bill Gates or Warren Buffett, which are not taken into by the normal distribution because they are so far

from the mean as to be virtually impossible by the normal distribution. We also have a skewed

distribution towards poverty (there are more moderately poor people than there are moderately rich

people). Therefore, it is unrealistic to treat all data as if they are normal. However, if we take the mean

of such data, assuming that in all samples of data are formed of similar composition and assuming the

sample size is large enough, we can put the data into a normal distribution.

7

Sampling

A sampling distribution of a statistic is the probability distribution of a given statistic. For example, if we

have one sample we might want to know the mean of that one sample. However, when we repeat the

experiment we take the mean of each sample as representative of that sample. This mean is called the

sample mean. The sample distribution of the mean is a probability distribution of these means, because

the means might differ from one sample to another. For example, realistically we can take samples in

one geographical area. Perhaps one mean would be altered because one area is a rich neighborhood

and yields a high average income per household. However this is not representative of the actual

population. We need to take many samples. The sample distribution of the mean is the distribution of

these various means.

This is important for statisticians because when they make claims such as 90% of individuals earn an

average of 30,000 dollars to to 70,000 dollars 19 out of 20 times, they want to specifically know if

current statistical surveys will report the same findings. Sampling distributions of the means are used to

generate confidence intervals for survey reports and for significance testing (testing the statistic to see if

they actually describe the population). Therefore it is important to know how variable our estimates are.

The central limit theorem, by generalizing these sample distributions of the mean into normal

distributions, help us figure out specifically what the variability of these statistics are.

Polls

An important effect of the central limit theorem affects how we read polls. For example, during election

time we usually see polls that are taken to estimate the percentage of a population which supports a

certain candidate for presidency. Since it is not possible to survey the entire population the pollsters

have to survey only a certain proportion of the population. Suppose the pollsters survey a sample

population of size n for their preferences. The preferences of the people in the sample can be

represented as a sequence of random variables which are independent and presumably identically

distributed. The pollsters sample the mean in the polls. The mean should be distributed normally. As the

number of people surveyed increases, the mean of the sample distribution of means should be close to

the population mean, which is the number reported in the polls.

8

Weakness

A weakness of the central limit theorem is the premise of independent and identically distributed

variables. However, the theorem still holds if some of its assumptions (independent and identically

distributed variables with finite mean and standard deviation) are violated. If the variables are weakly

dependent on each other, the sample distribution of the mean converges less quickly to the normal

distribution, which means that our estimates are less accurate than it is had the random variables been

independent.

The main thing to understand is that elementary mathematics is an elegant and simple starting point.

Assumptions such as independent and identically distributed random variables are not usually found in

the real world. In the same vein, assumptions in physics such as engines completely transforming heat

into work do not exist in real life. In the real world, nonstationary processes whose probability

distributions, mean, and variance shift as a function of time are commonly found. Therefore, the central

limit theorem does not strictly apply to these situations.

Conclusion

The central limit theorem is an important theorem in statistics and probability theory. Some

mathematicians have called it one of the fundamental theorems of statistics. The theorem tells you how

a population of means behave when the sample size approaches infinity. This greatly helps with surveys

and help us compute how accurately a survey reflects the population. However, due to the various

assumptions that the central limit theorem makes (independent and identically distributed), we can only

use the theorem as a starting point. Therefore, the central limit theorem is important for students to

know how to use, but we need to know its limitations.

9

Works Cited

Adams, W.J. (1974). The life and times for the Central Limit Theorem.

New York: Kaedmon.

Blacher, Ren. "Central Limit Theorem by Moments." Statistics & Probability Letters 77.17

(2007): 1647-651. Dartmouth University. Web. 28 May 2014.

"The Central Limit Theorem." Intuitor. Web. 28 May 2014.

Clauset, Aaron. "Adapted Probability Distributions." Web. 28 May 2014.

"Distribution, Normal." Encyclopedia.com. HighBeam Research, 01 Jan. 2008. Web. 28 May

2014.

H., Krieger. "Proof of Central Limit Theorem." Harvey Mudd College, 2005. Web. 28 May 2014.

Lane, David M. "Sampling Distribution (1 of 3)." Sampling Distribution (1 of 3). Web. 28 May

2014.

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.