Sunteți pe pagina 1din 29

Statistics 2

Binomial Distribution

A spinner is divided into four equal sized sections marked 1, 2, 3, 4. If the spinner is spun 6 times, how
likely is it to land on 1 on four occasions?

One possible sequence would be 11111′1′.

The number of possible sequences is or .

Each sequence has probability 0.254 × 0.752.

So the required probability is

A binomial distribution arises when the following conditions are met:

• an experiment is repeated a fixed number (n) of times


(i.e., there is a fixed number of trials);
• the outcomes from the trials are independent of one another;
• each trial has two possible outcomes (referred to as success and failure);
• the probability of a success (p) is constant.

If the above conditions are satisfied and X is the random variable for the number of successes, then X
has a binomial distribution. We write:

Where n = number of trials and p = probability of success.

Interpretation of certain phrases is critical, especially when dealing with Discrete distributions. (Binomial
& Poisson).

Phrase Means To use tables


Greater than 5 X>5 1 – P(X 5)
At least 7 X 7 1 – P(X 6)
Fewer than 10 X < 10 P(X 9)
No more than 3 X 3 P(X 3)
At most 8 X 8 P(X 8)
Exactly 4 X=4 P(X 4) – P(X 3)

1
Example :

a) P(X = 3)

Using tables

b) P(X > 1)

Example:

The probability that a baby is born a boy is 0.51. A mid-wife delivers 10 babies. Find:

a) The probability that exactly 4 are male;

b) The probability that at least 8 are male.

( ( (

2
It can be shown that if X ~ B(n, p), then

Poisson Distribution

A random variable X which counts the number of times an event occurs in a given unit of space or time
will have a Poisson distribution if:

• The events occur independently of each other and at random;


• The events occur at a constant rate;
• The events occur singly (one at a time).

The notation used to indicate that a random variable X has a Poisson distribution is

The distribution is fully specified by a single parameter .

If X ~ Po( ) then

For

Example:

Suppose X ~ Po( ). Find .

3
Example:

On average a call centre receives 1.75 phone calls per minute.

a) Assuming a Poisson distribution, find the probability that the number of phone calls received
in a randomly chosen minute is :
(i) Exactly 4;
(ii) No more than 2.

Let X = number of phone calls received in 1 minute.

Then X ~ Po(1.75).

b) Find the probability that 6 phone calls are received in a 4 minute period.

Let Y = number of phone calls received in 4 minutes.

The number of calls in 4 minutes will be on average

So Y ~ Po(7).

4
Approximating a Binomial by a Poisson

X ~ B(n, p), then X can be reasonably be approximated by a Poisson distribution with mean np if :

• n is large
• p is small

Two frequently used rules of thumb are :

• n > 50 and np < 5


• n > 50 and p < 0.1

Example:

A drug manufacturer has found 2% of the patients taking a particular drug will experience a particular
side effect.

A hospital consultant prescribes the drug to 150 of her patients.

Using a suitable approximation calculate the probability that:

a) None of her patients suffer from the side effects.


b) No more than 5 suffer from the side effects.

Let X represent the number of patients experiencing side effects.

The exact distribution of X is X ~ B(150, 0.02).

Since n is large and p is small, X ≈ Po(150 x 0.02)

So, X ≈ Po(3).

(tables)

(tables)

5
Continuous random variables

A probability density function (p.d.f.) is a curve that models the shape of the distribution corresponding
to a continuous random variable.

If is the p.d.f corresponding to a continuous random variable X and if is defined


then the following properties must hold

1. The total area under a p.d.f. is 1.

2. The graph of the p.d.f never dips below the x-axis.

for

3. Probabilities correspond to the area under the curve.

6
Mode

Suppose that a random variable X is defined by the probability density function for .

The mode of X is the value of that produces the largest value for in the interval .

A sketch of the probability density function can be very helpful when determining the mode.

Example:
Differentiation could
A random variable X has p.d.f. , where
be used to find the
mode here.

Find the mode.

The mode can be found using differentiation:

7
To find the turning point we solve .

or

if the point is maximum.

So the mode is

Cumulative distribution functions

The c.d.f. is found by integrating the p.d.f.

Example:

A random variable X has a p.d.f , where

Find the c.d.f and find P(X < 1).

8
 0 x<0
1 4 1
=
F( x)  24 x + 6 x 0 ≤ x ≤ 2
 1 x>2

Median and Quartiles

The median of a random variable X is defined to be the value such that

where F is the cumulative distribution of X.

Likewise the lower quartile is the solution to the equation

and the upper quartile is the solution to

Example :

A random variable X is defined by the cumulative distribution function:

 0 x<2
=
1 2
(
F( x)  24 x + x − 6

) 2≤ x≤5
 1 x>5

a) Calculate and sketch the probability density function.


b) Find the median value.
c) Work out

The p.d.f. is found by differentiating the c.d.f.

Sketch of

9
Median

Therefore

or

must be since it lies in the interval [2,5]

Expectation

If X is a continuous random variable defined by the probability density function over the domain
, then the mean or expectation of X is given by

10
Note : If the p.d.f is symmetrical, then the expected value of X will be the value corresponding to the line
of symmetry.

Example :

A random variable X is defined by the probability density function

Calculate the E[X] and E[1/X]

Variance

If X is a continuous random variable defined by the probability density function over the domain
then the variance of X is given by

where

Example :

A continuous random variable Y has a probability density function where

Calculate the value of Var[Y].

11
Sketch of

The p.d.f. is symmetrical. Therefore .

Examination-style question :

The mass, Xkg, of luggage taken on board an aircraft by a passenger can be modeled by the probability
density function

a) Sketch the probability density function and find the value of k.


b) Verify that the median weight of luggage is about 20.586 kg.
c) Find the mean and variance of X.

12
To find k we use

To verify that the median is about 20.586, we need to check that

Therefore Var[X] = 428.5714 - 20² = 28.6 )

13
Continuous Uniform Distribution

A random variable X is said to have a continuous uniform distribution (or rectangular distribution) over
the interval [a,b] if its probability density function has the form :

The graph of the p.d.f. is as follows:

If X has a continuous uniform distribution over the interval [a,b], then

Example :

A random variable Y has a continuous uniform distribution in the interval [2,8]. Find .

14
Examination-style question:

A random variable X is given by the probability density function , where

Find:

a) E[X] and Var[x]


b)

X has a uniform distribution over the interval (5,15).

The p.d.f. for X is shown on the diagram below.

The probability we require is shaded.

So,

15
If X has a uniform distribution over the interval (a,b) then the cumulative distribution function of X is :

 0 x<a

F (= x )  x −a
x) P( X ≤= a ≤ x ≤b
 b−a
 1 x >b

Approximating a binomial using a normal

Calculating probabilities using the binomial distribution can be cumbersome if the number of trials (n) is
large.

Consider this example:

10% of people in the United Kingdom are left handed.

A school has 1200 students. Find the probability that more than 140 of them are left handed.

Let the number of left-handed people in the school be X.

Then X ~ B[1200, 0.1].

The required probability is

P(X > 140) = P(X = 141) + P(X = 142) + … + P(X = 1200)

As no tables exist for this distribution, calculating this probability by hand would be a mammoth task.

A further problem arises if you attempt to work one of these probabilities, for example P(X = 141):

P(X = 141) = ¹²⁰⁰C₁₄₁ X 0.1¹⁴¹ X 0.9¹⁰⁵⁹


Calculators cannot calculate
the value of this coefficient
– it is too large!

16
One way forward is to approximate the binomial distribution using a normal distribution.

If X ~ B(n,p) where n is large and p is small, then X can be reasonably approximated using a normal
distribution :

where

There is a widely used rule of thumb that can be applied to tell you when the approximation will be
reasonable:

A binomial distribution can by approximated


reasonably well by a normal distribution
provided that np > 5 and nq > 5

Continuity Correction

A continuity correction must be applies when approximating a discrete distribution (such as binomial) to
a continuous distribution (such as normal distribution).

Exact distribution: B(n,p) Approximate distribution: N[np, npq]

Introductory example:

10% of people in the United Kingdom are left handed. A school has 1200 students. Find the probability
that more than 140 of them are left handed.

Let the number of left-handed people in the school be X.

Then X ~ B[1200, 0.1].

Since np = 120 > 5 and nq = 1080 > 5 we can approximate the distribution using a normal distribution:

So P(X > 140) → P(X ≥ 140.5) (Using Continuity Correction)

Standardize = = 1.973

Therefore P(X ≥ 140.5) = P(Z ≥ 1.973)

17
= 1- P(Z ≤ 1.973) = 1- 0.9758

= 0.0242

Examination-style question:

A sweet manufacturer makes sweets in 5 colours. 25% of the sweets it produces are red.

The company sells its sweets in tubes and in bags. There are 10 sweets in a tube and 28 sweets in a bag.
It can be assumed that the sweets are of random colours.

a) Find the probability that there are more than 4 red sweets in a tube.
b) Using a suitable approximation, find the probability that a bag of sweets contains between 5 and
12 red sweets (inclusive).

Let the number of red sweets in a tube be X.

Then the exact distribution for X is X ~ B[10, 0.25].

P (X > 4) = 1 – P(X ≤ 4)

= 1 – 0.9219

= 0.0781

Let the number of red sweets in a bag be Y.

Then the exact distribution for Y is Y ~ B[28, 0.25].

The distribution can be approximated by a normal since np = 7 and nq = 21 (both greater than 5) :

Y ≈ N[7, 5.25]

P (5 ≤ Y ≤ 12) → P(4.5 ≤ Y ≤ 12.5) (Using Continuity Correction)

Standardize : = -1.091

P(-1.091 ≤ Z ≤ 2.400)

= P(Z ≤ 2.400) - P(Z ≤ -1.091)

= P(Z ≤ 2.400) – (1- P(Z ≤ 1.091)

= 0.9918 – (1-0.8623) = 0.8541

18
Approximating the Poisson using a normal

If and is large, then X is approximately normally distributed:

Recall that the mean and variance of a Poisson distribution are equal.

There is a widely used rule of thumb that can be applied to tell you when the approximation will be
reasonable:

A Poisson can be approximated


reasonably well by a normal
distribution provided .

Note: A continuity correction is required because we approximating a discrete distribution using a


continuous one.

Examination-style question:

An electrical retailer has estimated that he sells a mean number of 5 digital radios each week.

a) Assuming that the number of digital radios sold on any week can be modelled by a Poisson
distribution find the probability that the retailer sells fewer than 2 digital radios on a randomly
chosen week.
b) Use a suitable approximation to decide how many digital radios he should have in order for him
to be at least 90% certain of being able to meet the demand for radios over the next 5 weeks.

Let X represent the number of digital radios sold in a week.

So .

P( X < 2) = P( X ≤ 1)

= 0.0404

Let Y represent the number of digital radios sold in a period of 5 weeks.

19
P( Y ≤ y ) = 0.9

P( Y ≤ y + 0.5) (Using Continuity Correction)

The 10% point of


a normal is 1.282.

So,

So the retailer would need to keep 31 digital radios in stock.

20
Populations and samples

Population – is the set of all individuals


or objects that we wish to study.

Census – is an investigation in which information


is obtained from every member of the population.

Sample – is a selection of individual members


or items from a population.

Sampling frame – is a list of all member


of the population.

Sampling unit – is an individual member of


a population.

Statistic is a quantity calculated solely from


the observations in a sample.

Examples:

A head teacher is interested in finding out how long her sixth form students spend in part-time
employment per week.

Population – is the set of all sixth form students in her school.

Sampling frame - would be the registers of sixth form tutor groups.

Carrying out a census of the entire population is usually not feasible or sensible.

Advantages of taking a census are:

• Every single member of the population is used


• Unbiased
• Gives an accurate answer

21
Disadvantages of taking a census are:

• Money
• Time
• Resources

Instead of surveying the whole population, information can instead be obtained from a sample.

The sampling process should be undertaken carefully to ensure that the sample is representative of the
entire population.

Bias can occur if one section of the population is over/under represented.

Random sample – if every member in the sample size


has the same probability of being chosen.

A simple random sample of size n consists of the observation X₁,X₂,…, Xn from a population where Xi

• are Independent random variables.


• have the same distribution as the population.

Example :

A large bag of coins contains 1p, 2p and 5p coins in the ratio 2:1:3.

a) Find the mean, μ, and the variance, σ2, for the population of coins.
b) A random sample of 3 coins is taken from this population. List all the possible outcomes.

Let X be the value of the coin chosen.

Distribution of the population:

1 2 5

22
The possible outcomes and the mean:

(1,1,1) → 1

(1,1,2) (1,2,1) (2,1,1) → 4/3

(2,2,1) (2,1,2) (1,2,2) → 5/3

(2,2,2) → 2

(1,1,5) (1,5,1) (5,1,1) → 7/3

(5,5,1) (5,1,5) (1,5,5) → 11/3

(5,5,5) → 5

(2,2,5) (2,5,2) (5,2,2) → 3

(5,5,2) (5,2,5) (2,5,5) → 4

(1,2,5) (1,5,2) (2,1,5) (2,5,1) (5,1,2) (5,2,1) → 8/3

Working out

e.g.

(1,1,2) = 4/3 Times by 3: Since 3 different combinations.

The sampling distribution is :

1 4/3 5/3 2 7/3 8/3 3 11/5 4 5


1/27 1/18 1/36 1/216 1/6 1/6 1/24 1/4 1/8 1/8

23
Hypothesis Testing

Null Hypothesis (H0) – is the hypothesis we assume to be correct unless proved otherwise.

Alternative Hypothesis (H1) – tells us whether the assumption is wrong or not.

Steps required to answer Hypothesis Test questions in an examination are:

Step 1: Write out H0 and H1 in mathematical terms.

Step 2: State the significance level – if none is mentioned in the question, it is usual
to choose 5%.

Step 3: State the distribution, assuming the null hypothesis to be true.

Step 4: Calculate the probability (under H0) of obtaining results as extreme as those
collected.

Step 5: Compare the probability with the significance level and make conclusions –
can H0 be rejected or not? Interpret your results in context.

Hypothesis Testing for the Binomial Distribution

Lower One Tail Test

Example:

Is a normal six sided die fair when 1 six is thrown in 24 throws?

Test at the 5% level of significance.

Let X be the random variable the number of 6’s thrown in 24 throws.

Therefore X ~ B[24, ]

24
H0 =

H1 <

Reject H0 if: P(X≤1) ≤ 0.05

P(X≤1) = 0.0729

0.0729 > 0.05

Accept H0 : evidence to suggest that the die is fair.

Upper One Tail Test

Example:

In Luigi's restaurant, on average 1 in 10 people order a bottle of Chardonnay. Out of a sample of 50, 11
chose Chardonnay. Has the drink become more popular?

Test at the 1% level of significance.

Let X be the random variable the number of people ordering a bottle of Chardonnay in a sample of 50.

X ~ B[50, 0.1]

H0 = 0.1

H1 > 0.1

Reject H0 if: P(X≥11) ≤ 0.01

P(X≥11) = 1 - P(X≤10)

= 1 – 0.9906 (Using tables)

= 0.0094 < 0.01

Reject H0 : since evidence to suggest the number of people ordering Chardonnay has increased at the
1% level of significance.

25
Critical Values Method

Example 1:

A manufacturer claims that 2 out of 5 people prefer Soapy Suds washing powder over any other brand.
For a sample of 25 people, only 4 people are found to prefer Soapy Suds. Is the manufacturer’s claim
justified?

Test at the 5% level of significance.

Let X be the random variable the number of people who prefer soapy suds.

X ~ B[25, 0.4]

H0 = 0.4

H1 < 0.4

Reject H0 if: P(X≤xc) ≤ 0.05

From tables: xc = 5

Since x=4 < critical value.

Reject H0 : since evidence to suggest that the manufacturer’s claim is false and it is less than 2 in 5 at the
5% level of significance.

Example 2:

A particular drug has a 1 in 4 chance of curing a certain disease. A new drug is developed to cure the
disease. How many people would need to be cured in a sample of 20 if the new drug was to be deemed
more successful at curing the disease than the old drug to obtain a significant result at the 5% level?

Let X be the random variable the number of people who are cured by the new drug.

X ~ B[20, 0.25]

H0 = 0.25

H1 > 0.25

Reject H0 if: P(X≥xc) ≤ 0.05 ; xc = critical value

1 - P(X≤xc - 1) ≤ 0.05

26
P(X≤xc - 1) ≥ 0.95

xc – 1 ≥ 9

xc ≥ 10

So 10 or more people are required to be cured to obtain significant evidence that the new drug is better
at curing the disease.

Two Tail Test

Example:

A person suggests that the proportion, p of red cars on a road is 0.3. In a random sample of 15 cars it is
desired to test the null hypothesis p = 0.3 against p ≠ 0.3 at a nominal significance level of 10%.

Determine the appropriate acceptance region and the corresponding actual significance level.

Let X be the random variable the number of red cars in a sample of 15.

X ~ B[15, 0.3]

H0 = 0.3

H1 ≠ 0.3

5% level of significance for each tail.

Reject H0 if: P(X≤xl) ≤ 0.05; xl = lower critical value

From tables: xl = 1

Reject H0 if: P(X≥xu) ≤ 0.05; xu = upper critical value

1 - P(X≤xu - 1) ≤ 0.05

P(X≤xu - 1) ≥ 0.95

From tables: xu – 1 = 7

Therefore xu = 8

H0 rejection region: x ≤ 1 or x ≥ 8.

Actual significance level: P(x ≤ 1) + P(x ≥ 8)

0.0353 + 0.05 = 0.0853 = 8.53%

27
Hypothesis Testing for the Poisson Distribution

Lower One Tail Test

Example:

The number of car accidents along a certain stretch of road occurred at an average rate of 5 per week.
After the introduction of speed cameras the number of accidents in one week is 2. Assuming that the
number of accidents can be modeled as a Poisson distribution, test at the 5% nominal significance level
if the has been in a reduction in the number of accidents.

Let X be the random variable the number of accidents in a week.

X ~ Po[5]

H0 = 5

H1 < 5

Reject H0 if: P(X≤xl) ≤ 0.05

From tables: xl = 1

Since x = 2 > Lower Critical Value.

Accept H0: since there is insufficient evidence to the claim that the number of accidents has reduced at
the nominal 5% significance level.

Upper One Tail Test

Example:

A shop sells a particular make of radio at a rate of 4 per week on average. The shop places an advert in
the local paper in the hope of raising sales. In the week that the advert was placed the number of sales
was 10. Is there significant evidence that the sales have increased? Test at the 5% nominal level of
significance.

Let X be the random variable the number of radios sold per week.

X ~ Po[4]

H0 = 4

H1 > 4

Reject H0 if: P(X≥xu) ≤ 0.05

1 - P(X≤xu - 1) ≤ 0.05

28
P(X≤xu - 1) ≥ 0.95

From tables:

xu – 1 = 8

xu = 9

x = 10 > Upper Critical Value.

Reject H0: Since evidence to suggest that the number of radios sold has increased at the 5% level of
significance.

Two Tail Test

Example:

A machine produces glass sheets. The number or bubbles seen per square metre in the glass sheet
follows a Poisson distribution with mean 3. Find the lower and upper critical values for a nominal 10%
significance level test for the mean not equal to 3 and the actual significance level of the test.

Let X be the random variable the number of bubbles per m2.

X ~ Po[3]

H0 = 3

H1 ≠ 3

5% level of significance for each tail.

Reject H0 if: P(X≤xl) ≤ 0.05; xl = lower critical value

From tables: xl = 0

Reject H0 if: P(X≥xu) ≤ 0.05; xu = upper critical value

1 - P(X≤xu - 1) ≤ 0.05

P(X≤xu - 1) ≥ 0.95

From tables: xu – 1 = 6

xu = 7

Actual significance level: P(X≤0) + P(X≥7)

= 0.0498 + ( 1 – 0.9665) = 0.0833 = 8.33%

29

S-ar putea să vă placă și