12 Confint

PHP 2510
Central limit theorem, confidence intervals
PHP 2510 October 20, 2008
Distribution of the sample mean

Case 1: Population distribution is normal
For an individual in the population, Xi N (, 2 ) for
i = 1, 2, ..., n. Then, for a sample of size n, the sample mean
also has a normal distribution
X N (, 2 /n)
Case 2: Population distribution is not normal, e.g. Poisson,
Binomial, Then, for large samples, the sample mean also has a
normal distribution with mean equal to E(X) and variance
equal to var(X)/n
This is known as the central limit theorem

Central Limit Theorem

Characterizes distribution of X in large samples
Suppose a sample X1 , . . . , Xn comes from a distribution with
mean E(X) and variance var(X). This can be almost any
distribution (binomial, poisson, etc.)
When n is large, the sample mean X is normally distributed. Its

mean is equal to the population mean, and its variance is
var(X)/n. We can write
var(X)
X N E(X),
n
Example 1: Throw a fair coin.

Use sample mean to estimate the probability of having a head.
Let X be the outcome of throw a fair coin once.
X
I Throw a fair coin n times. Let X1 , X2 , ..., Xn be the outcomes.
Pn
II Compute sample mean X = i=1 Xi /n.

is normally distributed for a large n. To illustrate
III CLT says X
versus its
this, lets repeat Steps I and II 1000 times. Plot X
relative frequency.
0.2
0.4
0.6
0.8
0.0666666666666667
0.6
sample mean
n=40
n=100
0.00
0.04
relative freqency
0.08
0.04
0.00
0.8
0.08
sample mean
0.12
relative freqency
0.10
0.00
0.10
0.20
relative freqency
0.30
0.20
n=15
0.00
relative freqency
n=5
0.275 0.45
0.6
sample mean
0.75
0.35
0.5
sample mean
Example 2: Throw a fair die.

Use sample mean to estimate the probability of having a six. Let
X be the outcome of throw a fair die once.
X
I Throw a fair die n times. Let X1 , X2 , ..., Xn be the outcomes.
Pn
II Compute sample mean X = i=1 Xi /n.

is normally distributed for a large n. To illustrate
III CLT says X
versus its
this, lets repeat Steps I and II 1000 times. Plot X
relative frequency.
n=15
0.20
0.00
0.2
0.4
0.6
0.8
0.2
0.4
0.6
sample mean
n=40
n=100
0.00
0.04
relative freqency
0.10
0.05
0.00
0.08
sample mean
0.15
relative freqency
0.10
relative freqency
0.3
0.2
0.1
0.0
relative freqency
0.4
n=5
0.15
0.3
sample mean
0.07 0.23
sample mean
Confidence intervals
Confidence intervals can be used to convey uncertainty about the
estimate of any parameter.
A confidence interval is comprised of two random variables (lower
& upper bound) and covers the true mean with some pre-specified
probability
The confidence interval boundaries themselves are random
variables.
What question does a CI answer?

Example 1: Incidence of pre-eclampsia.
A random sample of 1249 women is selected and followed through
pregnancy. 250 get pre-eclampsia.
Estimate the incidence by sample mean:
250
1249
= 20%
We would like to find an interval that contains, with 95%

probability, the true incidence of pre-eclampsia.
Example 2: Hospitalization rate of HIV-infected women during a

6-month period.
A sample of 787 women are followed for 6 months, resulting in the
following data.
We are interested to construct an interval that contains the true

rate of hospitalization with 90% probability.
10
numhosp |
Freq.
Percent
Cum.
------------+----------------------------------0 |
508
64.55
64.55
1 |
176
22.36
86.91
2 |
61
7.75
94.66
3 |
20
2.54
97.20
4 |
13
1.65
98.86
5 |
5
0.64
99.49
6 |
1
0.13
99.62
7 |
3
0.38
100.00
------------+----------------------------------Total |
787
100.00
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------numhosp |
787
.5870394
1.036723
0
7
11
Constructing a confidence interval

Central limit theorem: Says the sample mean is normally
distributed in large samples
X N (E(X), var(X)/n)
Writing it this way is a little tedious, so we use and to

generically denote E(X) and var(X); i.e.
X N (, 2 /n)
Implies that the sample mean can be rescaled to a standard

normal
X
N (0, 1)
/ n
12
Applying the CLT to form confidence intervals

To form a 90% confidence interval, we want an interval that
contains the true mean with probability 0.90.
Logic: For a large sample size, the sample mean is a normally
distributed random variable.
Find an interval that contains a standard normal random
variable with some pre-specified probability.
Center it using the sample mean, and scale it using the
standard error.
13
Step 1. Determine which two values contain 90% of the area

under the standard normal curve
Ans: 1.65 and 1.65
Step 2. Then with 90% probability, the standardized mean will

fall between 1.65 and 1.65
X
< 1.65
1.65 <
/ n
In other words,
X
< 1.65 = 0.90
Pr 1.65 <
/ n
14
Step 3.
X
< 1.65 = 0.90
Pr 1.65 <
/ n

Pr X 1.65(/ n) < < X + 1.65(/ n) = 0.90.
In words: start with X, then add and subtract 1.65 standard

errors.
X 1.65 (/ n)
In large samples, can replace with sample SD S
15
Properties of the confidence interval

Covers the true mean with pre-specified probability
Increase this probability by increasing number of standard errors

to add and subtract
For 95% coverage, add and subtract 1.96 std. errors.
X 1.96 (/ n)
Width of an interval determined by

Population variance 2
Sample size n
Nominal coverage probability
16
17
18
1.4
1.8
2.2
95% CI
2.6
1.4
1.8
2.2
2.6
90% CI
19
Example 1. Incidence of pre-eclampsia

Sample 1249 women, 250 get pre-eclampsia. Find an interval that
contains the true incidence with 95% probability.
Step 1. Let X be the pre-eclampsia status.
X Bernoulii(p)
where E(X) = p and 2 = var(X) = p(1 p).
Sample mean: X = 250/1249 = 0.20.
We estimate p by
pb = X,
and 2 by
b2 = pb(1 pb) = (0.2)(0.8) = 0.16.
20
Step 2. Find number of std. errors needed for 95% coverage

1.96
Step 3. Add and subtract 1.96 std. errors from sample mean
Lower limit = 0.20 (1.96)(0.011) = 0.18
Upper limit = 0.20 + (1.96)(0.011) = 0.22
Confidence interval: (0.18, 0.22)

How to make this a 90% interval?
21
Example 2: Hospitalization data

Find a 90% confidence interval for mean number of hospitalizations
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------numhosp |
787
.5870394
1.036723
0
7
Step 1: Use summary statistics to obtain key values
Sample mean = 0.59
Sample SD = 1.04
Std error of sample mean = 1.04/ 787 = 0.03

Step 2: Coverage probability is 90%. Add and subtract 1.65 SEs
Step 3: Compute interval
0.59 (1.65)(0.03) (0.54, 0.64)
22

12 Confint

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

12 Confint

Încărcat de

Drepturi de autor:

Formate disponibile

PHP 2510

Central limit theorem, confidence intervals

PHP 2510 October 20, 2008

Distribution of the sample mean

This is known as the central limit theorem

Central Limit Theorem

When n is large, the sample mean X is normally distributed. Its

PHP 2510 October 20, 2008

Example 1: Throw a fair coin.

II Compute sample mean X = i=1 Xi /n.

PHP 2510 October 20, 2008

PHP 2510 October 20, 2008

Example 2: Throw a fair die.

II Compute sample mean X = i=1 Xi /n.

PHP 2510 October 20, 2008

PHP 2510 October 20, 2008

PHP 2510 October 20, 2008

What question does a CI answer?

Estimate the incidence by sample mean:

We would like to find an interval that contains, with 95%

Example 2: Hospitalization rate of HIV-infected women during a

We are interested to construct an interval that contains the true

PHP 2510 October 20, 2008

Constructing a confidence interval

Writing it this way is a little tedious, so we use and to

Implies that the sample mean can be rescaled to a standard

Applying the CLT to form confidence intervals

PHP 2510 October 20, 2008

Step 1. Determine which two values contain 90% of the area

Step 2. Then with 90% probability, the standardized mean will

PHP 2510 October 20, 2008

In words: start with X, then add and subtract 1.65 standard

PHP 2510 October 20, 2008

Properties of the confidence interval

Increase this probability by increasing number of standard errors

Width of an interval determined by

PHP 2510 October 20, 2008

PHP 2510 October 20, 2008

PHP 2510 October 20, 2008

Example 1. Incidence of pre-eclampsia

b2 = pb(1 pb) = (0.2)(0.8) = 0.16.

PHP 2510 October 20, 2008

Step 2. Find number of std. errors needed for 95% coverage

Confidence interval: (0.18, 0.22)

PHP 2510 October 20, 2008

Example 2: Hospitalization data

Std error of sample mean = 1.04/ 787 = 0.03

S-ar putea să vă placă și