Sunteți pe pagina 1din 4

10/15/2016

NormalApproximationtoBinomialDukeUniversity|Coursera

Back to Week 4

09:50

Prev

Lessons

14:04

Normal Approximation to Binomial

Next

Have a question? Discuss this lecture in the week forums.

Interactive Transcript
Search Transcript

English

0:00
In this video, we will discuss shapes of binomialdistributions, and take a look at how they changeas we tweak
some of its paramaters, such asthe number of trials or the probability of success.We will also talk about the
fact that when the number of trials increases, theshape of the binomial actually starts lookingcloser and
closer to a full normal distribution.And for such situations we're going to use methods we've learned
tocalculate normal probabilities to approximate binomial probabilities.Say we have a binomial random
variable with probability of success 0.25.This is what the distribution looks like when n is equal to 10.Let's
pause for a moment and carefully examine what we're seeing here.Each bar represents a potential
outcome.With ten trials, the number of successes couldrange anywhere from 0 to 10 and thereforewe have
11 bars here.Heights of the bars represent the likelihood of these outcomes.For example, the probability of
https://www.coursera.org/learn/probabilityintro/lecture/2sosk/normalapproximationtobinomial

1/4

10/15/2016

NormalApproximationtoBinomialDukeUniversity|Coursera

zero successes can be calculated as 0.75.The probability of failure raised to the 10thpower, since zero
successes basically means ten failures.This value comes out to be approximately 0.056, which is the height of
this bar.With n equals 10 and p equals 0.25, the expected number of successes is 2.5.And hence the
distribution is centered around this value.So, the binomial distribution, with p equals0.25 and n equals 10 is
right skewed.Let's increase the sample size a bit keeping p constant at 0.25.With n equals20 we see a change
in the center of thedistribution, which is expected since n times p is now dierent.But we also see a change in
the shape.The distribution, while still right-skewed, is looking much less skewed.Increasing the sample size
further to 50, the distribution looks even moresymmetric, and much smoother, and increasingthe sample size
even further to 100,the distribution looks no dierent than the normal distribution.So let's take a look at why
this might be ofinterest, within the context of data from a study on Facebook usage.
2:20
A recent study found that Facebook users get more than they give.For example, 40 percent of Facebook users
in oursample made a friend request, but 63 percent received atleast one request.Users in the sample
pressed the like button next to friends' content anaverage of 14 times, but had their content liked an average
of 20 times.Users sent nine personal message on average but received 12.12% of users tagged their friend in
aphoto, but 35% were themselves tagged in a photo.
2:55
So what explains this phenomenon?The answer is power users.Those who contribute much more content
than the typical user.I'm sure you all have a few friends like that, whoare so much more active than everyone
else on your friend list.Some of the other ndings from the study arethat 25% of Facebook users are
considered power users.So these are the ones that give more than they get.And that the averageFacebook
user has 245 friends.We're looking for the probability that an average Facebook user with245 friends have 70
or more friends who are power users.
3:36
So what do we have here?25% are considered power users, which means that probability ofsuccess is 0.25.
And the average Facebook user has 245friends, meaning that n is equal to 245.The probability we're
interested in is 70 or more power user friends,which translates to number of successes equal to or greater
than 70.
4:03
We have n equals 245 trials, a xed number.Each trial outcome can be classied as a success or a failure,
power user ornot power user.The probability of success is the same for each trial, 25%.And we're going to
assume that the trials are independent.They might not be in reality, since if you're the type of person to have
somefriends who are power users, the others mightbe more likely to be power users as well.But again, we're
going to assume independence for the sake of this example.This is what the binomial distributionwith n is
equal to 245, and p is equal to 0.25 looks like.And we're interested in the probability of 70 or moresuccesses,
meaning that 70 or more power-user friends among 245.What does mean?That's 70, or 71, or 72 all the way
up to 245.
5:00
So what we're interested in is the sum of probabilitiesof each one of these outcomes 70 through 245.We can
calculate each one of these probabilities using the binomial formulaand add them up, but that really does not
sound like fun.This is where the resemblance between the binomialdistribution and the normal distribution
comes in very handy.The blue-shaded area of interest can just aswell be calculated as the area under the
smoothnormal curve that closely resembles the more jagged binomial distribution.Because calculating a
shaded area under the normalcurve is a much simpler task than calculating individualbinomial probabilities
for all of these outcomes andadding them up, we might want to use that method.To calculate a normal
https://www.coursera.org/learn/probabilityintro/lecture/2sosk/normalapproximationtobinomial

2/4

10/15/2016

NormalApproximationtoBinomialDukeUniversity|Coursera

probability, we need a littlemore information on the parameters of the normal distribution.These can be
estimated by the mean and the standard deviation of the originalbinomial distribution. The mean is n times p,
so that's 245times 0.25, 61.25, and the standard deviationis the square root of 245 times 0.25 times
0.75Which comes out to be 6.78. So among 245 friends,we expect 61.25 power users, give or take 6.78.Given
an observation, the mean, and the standard deviation, wecan calculate the area under the curve via a z
score.So the z score is going to be the observation 70 minus 61.25,the mean, divided by 6.78, the standard
deviation, which comesout to be 1.29.We can then nd the probability of a z score being greater than 1.29,
sincewe shaded the area underneath the curve beyond the observation of interest.So we want to take a look
on our table to 1.29 as a z score, and in theintersection of the row and the column of interest, we can see
0.9015.The probability of obtaininga z score greater than 1.29 is going to be one minus that probability from
the table.Why are we doing this one minus bit?Well, because the table always gives us the percentile or the
area under thecurve below the observed value and we want to nd the complement of that.Which comes out
to be 0.0985. So there is a 9.85%chance that an average Facebook user, with 245 friends,has at least 70
friends who are considered power users.
7:47
We can also directly calculate this probability usingR and the D binom function we've seen before.The rst
argument in the function is the number ofsuccesses, and we're interested in everything between 70 and
245.The second argument is the total sample size, 245, and
8:06
the third is a probability of success for each trial.So what this function here is doing is actually two things.First,
calculating the probabilities for each outcome 70,71, 72, all the way up to 245,and then we wrap that around
with the sum function, so we're adding all of that up.And the probability comes out to be 0.113, or
11.3%.Versus the 0.0985 we found before.Why are these values ever so slightly dierent?On one hand, it
makes sense.We called the approach the normal approximation to the binomial afterall, so it's just an
approximation and not an exact result.On the other hand, if we needthe exact probability, the dierence may
be frustrating.Let's take a closer look at thebinomial distribution and the normal approximation to it.
9:01
We can see that the red normal curve isslightly dierent than the barsrepresenting the exact binomial
probabilities.It falls a little bit short.Also, under the continuous normal distribution, the probabilityof exactly
70 successes is undened. So the shadedarea above 70 doesn't exactly include theprobability of 70 successes.
A common x to thisproblem is a 0.5 adjustment to the observation of interest.So we calculate the z score
using 69.5 as opposed to 70, which yieldsan adjusted z score of 1.22.Everything else about the method stays
the same.And the result we get, and you can conrm this using a table or acomputation, is now much closer
to the exactresult from the binomial distribution, 0.1112 versus 0.113.One other method for calculating
binomial probabilities is using an applet.So let'sgo to this website where the applet can be found andlet's
take a look to see how we can calculate this probability.
10:13
We're working with a binomial distribution sothat's the distribution that we're going to pick.Our number of
trials or number of prints here is 245.So we're going to slide n across to 245,and our probability of success is
0.25, so we'regoing to slide the p to 0.25.We're looking for the area above 70, so let's take our cuto value to
70.And remember that we're looking for the upper tail.And we're looking for greater than or equal to.So we
want to pick our bound to be that as well, andonce again we can see that same probability, 11.3% chance of
having70 or more power user friends among a sample of 245 friends.
11:04
https://www.coursera.org/learn/probabilityintro/lecture/2sosk/normalapproximationtobinomial

3/4

10/15/2016

NormalApproximationtoBinomialDukeUniversity|Coursera

In the example we just presented, weplotted the binomial distribution using computation, andvisually
conrmed that it looked unimodal andsymmetric, roughly similar to a normal distribution.But what if we
couldn't plot the binomial distribution?What are some guidelines that we can use to determine whether the
sample size orthe number of trials is large enough, such that we can be condent in estimatingthe binomial
distribution using the normal?In other words, how can we tell if the shape of the binomialdistribution is going
to be unimodal andsymmetric, and closely follow the normal distribution?
11:42
The rule of thumb is the success-failure condition.Which says that a binomial distribution with at least 10
expectedsuccesses and 10 expected failures closely follows a normal distribution.So that's n times p needs to
be greater than or equal to ten,and, n times 1 minus p needs to be greater than or equal to 10.And in cases
where it does we canapproximate the binomial distribution with the normal, wherethe parameters of the
normal distribution are calculatedas the mean and standard deviation of the binomial.We also talked about
the 0.5 adjustment to make the probabilities calculatedusing the normal approximation much closer tothe
exact probabilities from the binomial distribution.But I encourage you to not focus on those details awhole
lot, but instead try to focus on the bigger picture.Remember that the binomial distribution with
sucientsample size starts to look nearly normal.This is important and we're emphasizing this herebecause
when we later on get to doing inferencefor categorical variables with two outcomes, so those arekind of like
Bernoulli outcomes that follow a binomial distribution.We're going to make use of the fact thatthe
distributions start to look sl, nearly normal, andwe're going to apply methods that are based onthe normal
distribution to do inference for these variables.Let's do a quick practice problem.What is the minimum n, or
the sample size, required fora binomial distribution with probability of successequaling 0.25, to closely follow
a normal distribution?We know that n times p needs to be greater than or equal to ten, andn times one minus
p needs to be greater than or equal to ten as well.So for both of these equations we want to solve for n and
then we'regoing to take the maximum of those since that's going to be the minimum requiredsample
size.Well, for n times 0.25 to be greater than or equalto ten, n needs to be greater than or equal to forty.For n
times 0.75 to be greater than or equal toten, n needs to be greater than or equal to 13.33.So the answer is, we
need at least forty observations for a binomial distributionwith p equals 0.25, to closely follow a normal
distribution.

Downloads
Lecture Video mp4

Subtitles (English) WebVTT

Transcript (English) txt

Lecture Slide pdf

https://www.coursera.org/learn/probabilityintro/lecture/2sosk/normalapproximationtobinomial

4/4

S-ar putea să vă placă și