Sunteți pe pagina 1din 79

Central limit theorem

If a random variable Y is the sum of n independent


random variables that satisfy certain general
conditions, then for sufficiently large n , Y is
approximately normally distributed.
If X1, X2,X3------Xn is a sequence of n independent
variables with E(Xi) = i and V(Xi) = and Y =
X1+X2+X3+.Xn, then under general conditions

Zn =
1
i
2
o

=
=

n
i
i
n
i
i Y
1
1
2
) (
o

Central limit theorem


If X1, X2,X3------Xn is a sequence of n independent
variables with E(Xi) = and V(Xi) = and Y =
X1+X2+X3+.Xn, then

Zn =
2
2
o
n
n Y
o
) (
RESEARCH
METHODOLOGY
Sampling Distribution, Chi ,T and F
Distributions
4
Sampling Distribution
Statistic:- Any function of the observations in a
random sample that does not depends on unknown
parameters.( Used to drawing conclusions)
The sampling distribution of a statistic is the
probability function that describes the probabilistic
behaviour of the statistic in repeated sampling from
the same universe or on the same process variable
assignment model.
5
Sampling Distribution
If the sample mean of X1, X2,X3------Xn is a linear
combination of n independent variables. then E( ) =
and V( )=

Zn=
Standard error of a statistic is the standard deviation
of its sampling distribution.
If the standard error involves unknown parameters
whose value can be estimated, substitution of these
estimates into the standard error results in the
estimated standard error
6
X
X
n
o
2
X
n
X
/
) (
o

Sampling Distribution
Standard error of a statistic is the standard deviation
of its sampling distribution.
If the standard error involves unknown parameters
whose value can be estimated, substitution of these
estimates into the standard error results in the
estimated standard error
Standard error of is
If is unknown the sample standard deviation is

7
X
n
o
o
n
s
Examples
8
Data on the tension bond strength of a modified Portland cement mortar are
16.85,16.40,17.21,16.35,16.52,17.04,16.96,17.15,16.59, 16.57calculate the standard
error if the SD is 0.25 kgf/cm
2 ,
if SD not known.
Examples
9




t - DISTRIBUTION
t-Distribution Probability Density Function

A random variable T is said to have the t-
distribution with parameter , called degrees of
freedom, if its probability density function is given
by:
- < t <

where is a positive integer

| |
( )
( ) 2 / 1
2
1
2 /
2 / ) 1 (
) (
+
|
|
.
|

\
|
+
I
+ I
=
v
v
tv v
v t
t h
v
v
t-Distribution Table of Probabilities
Remark: The distribution of T is usually called the Student-t
or the t-distribution. It is customary to let t
p
represent the t
value above which we find an area equal to p.






Values of T, t
p,
for which P(T > t
p,
) = p

0
t
p

t
p
t-distribution - Probability Density Function for
various values of


-3 -2 -1 0 1 2 3
= v
5 = v
2 = v
v
Table of t-Distribution

t-Distribution - Example

If T~t
10
,
find:
(a) P(0.542 < T < 2.359)
(b) P(T < -1.812)
(c) t for which P(T>t) = 0.05 .

Example Solution
(a) P(0.542 < T < 2.359)
= 0.3-0.02 =0.28

(b) P(T < -1.812)=F(-1.812)
=P(T > 1.812)=0.05

(c) t for which P(T>t) = 1-F(t ) =0.05 .
t = 1.812


0
t
0.542 2.359
0
t
-1.812 1.812
0
0.05
t
t
CHI-SQUARED DISTRIBUTION

Chi-Squared Distribution Probability
Density Function

A random variable X is said to have the Chi-Squared
distribution with parameter , called degrees of
freedom, if the probability density function of X is
for x > 0

, elsewhere

where is a positive integer.

( )
2
1
2
2 /
2 / 2
1
x
e x

I
v
v
v
0
) ( = x f
Chi-Squared Distribution - Remarks

The Chi-Squared distribution plays a vital role in
statistical inference. It has considerable application
in both methodology and theory. It is an important
component of statistical hypothesis testing and
estimation.
The Chi-Squared distribution is a special case of the
Gamma distribution, i.e., when o = /2 and | = 2.

Chi-Squared Distribution Mean and Standard
Deviation

Mean or Expected Value



Standard Deviation

v =
v o 2 =
Chi-Squared Distribution Table of Probabilities

It is customary to let _
2
p
represent the value above which we
find an area of p. This is illustrated by the shaded region
below.




For tabulated values of the Chi-Squared distribution see the
Chi-Squared table, which gives values of _
2
p
for various values
of p and . The areas, p, are the column headings; the degrees
of freedom, , are given in the left column, and the table
entries are the _
2
values.


x
f(x)
p
2
,v
_
p
) ( 1
2
,v
_
p
F
2
_
0
Chi-Squared Table
Chi-Squared Table Continued
Chi-Squared Distribution Example

2
15
_
X

Example Solution
(a) P(7.261 < X < 24.996)
= 0.95-0.05
=0.9

(b)P(X<6.262)= 0.025


(c) For which P(X < ) =0.02

= 5.985
x
f(x)
2
_
0
7.261 24.996
x
f(x)
2
_
0
6.262
x
f(x)
2
_
0
P

F-DISTRIBUTION

F-Distribution Probability Density Function

A random variable X is said to have the F-distribution with
parameters
1
and
2
, called degrees of freedom, if the
probability density function is given by:

, 0 < x <


0 , elsewhere
Note : The probability density function of the F-distribution
depends not only on the two parameters
1
and
2
but also
on the order in which we state them.

| |
( ) ( )
2
) (
2
1
1
2
2 1
2
2 1 2 1
2 1
1 1
) 1 (
2 / 2 /
) / ( 2 / ) (
v v
v v
v
v
v v
v v v v
+

+
I I
+ I
x
x
= ) ( x h
F-Distribution - Application

Remark: The F-distribution is used in two-sample
situations to draw inferences about the population
variances. It is applied to many other types of
problems in which the sample variances are
involved.
In fact, the F-distribution is called the variance ratio
distribution.

F-Distribution Probability Density Function
Shapes






probability density functions for various values of
1
and
2

6 and 24 d.f.
6 and 10 d.f.
x
0
f(x)
F-Distribution (p=0.01) Table

F-Distribution (p=0.05) Table

F-Distribution Table of Probabilities

The f
p
is the f value above which we find an area equal to p,
illustrated by the shaded area below.




For tabulated values of the F-distribution see the F table,
which gives values of x
p
for various values of
1
and
2
. The
degrees of freedom,
1
and
2
are the column and row
headings; and the table entries are the x values.


x
f(x)
p
p
x
0
F-Distribution - Properties


Let x
o
(
1
,
2
) denote x
o
with
1
and
2
degrees of
freedom, then

( )
( )
1 2
2 1 1
,
1
,
v v
v v
o
o
x
x =

F-Distribution Example


If Y ~ F
6,11,

find:

(a) P(Y < 3.09)

(b) y for which P(Y > y ) = 0.01

Example Solution


(a) P(Y < 3.09) = F(3.09)

= 1- P(Y > 3.09) = 1 - 0.05
=0.95
(b) P(Y > y ) = 0.01
y =5.07

11 , 6
2 1
= = v v

y
f(y)
p
0
3.09

y
f(y)
0.01
y
Learning Objectives
1. Estimate a population parameter (means) based
on a large sample selected from the population
2. Use the sampling distribution of a statistic to
form a confidence interval for the population
parameter
3. Show how to select the proper sample size for
estimating a population parameter
Statistical Interval for a Single Sample
Outlines:
Confidence interval on the mean of a normal
distribution, variance known.
Confidence interval on the mean of a normal
distribution, variance unknown.
Confidence interval on the variance and standard
deviation of a normal distribution.



Statistical Methods
Statistical
Methods
Estimation
Hypothesis
Testing
Inferential
Statistics
Descriptive
Statistics
Statistical Methods
Statistical
Methods
Estimation
Hypothesis
Testing
Inferential
Statistics
Descriptive
Statistics
Point Estimator
A point estimator of a population parameter is a rule
or formula that tells us how to use the sample data to
calculate a single number that can be used as an
estimate of the target parameter.
Point Estimation
1. Provides a single value
Based on observations from one sample
2. Gives no information about how close the value is
to the unknown population parameter
3. Example: Sample mean x = 3 is the point
estimate of the unknown population mean
Interval Estimator
An interval estimator (or confidence interval) is a
formula that tells us how to use the sample data to
calculate an interval that estimates the target
parameter.
Interval Estimation
1. Provides a range of values
Based on observations from one sample
2. Gives information about closeness to unknown
population parameter
Stated in terms of probability
Knowing exact closeness requires knowing unknown
population parameter
3. Example: Unknown population mean lies between 50
and 70 with 95% confidence
2011 Pearson Education,
Inc
Key Elements of
Interval Estimation
Sample statistic
(point estimate)
Confidence interval
Confidence limit
(lower)
Confidence limit
(upper)
A confidence interval provides a range of plausible
values for the population parameter.
Confidence interval
Confidence interval: Bounds represent an interval of plausible
values for a parameter.
Suppose that we estimate the mean viscosity of a chemical
product to be , we do not know exactly that the
mean likely to be between 900 and 1100? or 990 and 1010?
Because we use a sample from the population to compute the
interval, we have high confident that it does contain the
unknown population parameter.
1000 = = ~ x
Confidence interval
Practical example
A machine fills cups with margarine, and is supposed to be adjusted so that
the mean content of the cups is close to 250 grams of margarine. Of course
it is not possible to fill every cup with exactly 250 grams of margarine.
Hence the weight of the filling can be considered to be a random variable X.
The distribution of X is assumed here to be a normal distribution with
unknown expectation and known standard deviation = 2.5 grams.
To check if the machine is adequately calibrated, a sample of n = 25 cups of
margarine is chosen at random.
The sample shows actual weights , with mean:
if the population mean actually around 250g. The value of
If , population mean shouldnt close to 250g.

25 3 2 1
,..., , , x x x x
2 . 250
1
25
1
= =

= i
i
x
n
x
1 . 251 , 4 . 250 = x
? , 6 . 280 = = x
Confidence interval (Case I)
Confidence interval on the mean of a normal distribution,
variance known.
Suppose that X
1
, X
2
, ...,X
n
is a random sample from a normal
distribution with unknown and known
2
.
We known that


A Confidence interval estimate for is


n
X
Z
/ o

=
) / , ( ~ n N X o
U L s s
o = s s 1 } { U L P
Prob. of selecting samples provide the range of that contains the true value of
Confidence interval (Case I)
In order to find lower and upper confidence limits:
o
o

o
o
o

o o
o o
= + s s
= s

s
1 } {
1 }
/
{
2 / 2 /
2 / 2 /
n
z X
n
z X P
z
n
X
z P
Confidence interval (Case I)
Ex. Ten measurements of impact energy on specimens of steel are: 64.1, 64.7,
64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, and 64.3. Assume that impact energy
is normally distributed with = 1 J. We want to find a 95% CI for
That is, based on the sample data, a range of highly plausible values for mean impact
energy for steel is 63.84J-65.08J.
Confidence interval (Case I)
Choice of Sample Size
Confidence interval (Case I)
Ex. Consider the previous example, we want to determine how many
specimens must be tested to ensure that the 95% CI on of steel has a
length at most 1.0J.
CI length <= 1.0J, E= 0.5J



n = 16


37 . 15
5 . 0
1 ) 96 . 1 (
2 2
2 /
=
(

= |
.
|

\
|
=
E
z
n
o
o
Confidence interval (Case I)
One-Sided Confidence Bounds




Ex. From previous Ex, find a lower one sided 95% CI for mean impact energy.

o
s
s
s
94 . 63
10
1
64 . 1 46 . 64
5 . 0
n
z x
Confidence interval (Case I)
Large Sample Confidence Interval for
has any distribution, n>=30, variance unknown
We can approximate CI for by replacing by S.

i
X
Confidence interval (Case I)
Ex
Confidence interval (Case I)

Confidence interval (Case II)
Confidence interval on the mean of a normal distribution,
variance unknown.
Suppose that X
1
, X
2
, ...,X
n
is a random sample from a normal distribution
with unknown and unknown
2
.
n<30


Confidence interval (Case II)

Confidence interval (Case II)
Ex




Confidence interval (Case III)
Confidence interval on the variance and standard deviation of a
normal distribution.


Confidence interval (Case III)
Two-Sided CI






One-Sided CI
Confidence interval (Case III)
Ex

Two random samples are drawn from the two
populations of interest.
Because we compare two population means, we
use the statistic .
Confidence Intervals for the Difference
between Two Population Means
1
-
2
:
Independent Samples
2 1
x x
Population 1 Population 2

Parameters:
1
and o
1
2
Parameters:
2
and o
2
2

(values are unknown) (values are unknown)

Sample size: n
1
Sample size: n
2


Statistics: x
1
and s
1
2
Statistics: x
2
and s
2
2


Estimate
1

2
with x
1
x
2
Confidence Interval for
1

2

*
*
Confidence interval
2 2
1 2
( )
1 2
1 2
where is the value from the z-table
that corresponds to the confidence level
x x z
n n
z
o o
+
Note: when the values of o
1
2
and o
2
2
are unknown, the
sample variances s
1
2
and s
2
2
computed from the data can be
used.

Do people who eat high-fiber cereal for
breakfast consume, on average, fewer calories
for lunch than people who do not eat high-fiber
cereal for breakfast?
A sample of 150 people was randomly drawn.
Each person was identified as a consumer or a
non-consumer of high-fiber cereal.
For each person the number of calories
consumed at lunch was recorded.
Example: confidence interval for
1

2
Consmers Non-cmrs
568 705
498 819
589 706
681 509
540 613
646 582
636 601
739 608
539 787
596 573
607 428
529 754
637 741
617 628
633 537
555 748
. .
. .
. .
. .
Solution:
The parameter to be tested is
the difference between two means.
The claim to be tested is:
The mean caloric intake of consumers (
1
)
is less than that of non-consumers (
2
).


Use s
1
2
= 4,103 for o
1
2
and s
2
2
= 10,670
for o
2
2


Example: confidence interval for
1

2
1 1 2 2
43, 604.02; 107, 633.239 n x n x = = = =
| |
*
2 2
1 2
( )
1 2
1 2
4103 10670
(604.02 633.239) 1.96
43 107
29.21 27.38 56.59, 1.83
x x z
n n
o o
+
= +
= =
The confidence interval estimator for the
difference between two means is
Example: confidence interval for
1

2
Interpretation
The 95% CI is (-56.59, -1.83).
We are 95% confident that the interval
(-56.59, -1.83) contains the true but unknown
difference
1

2

Since the interval is entirely negative (that is, does
not contain 0), there is evidence from the data that

1
is less than
2
. We estimate that non-
consumers of high-fiber breakfast consume on
average between 1.83 and 56.59 more calories for
lunch.
Homework
1. A confidence interval estimate is desired for the gain in a circuit on a semiconductor device. Assume that gain
is normally distributed with sd.=20.
a) Find a 95% CI for when n =10 and
b) Find a 95% CI for when n =25and
c) Find a 99% CI for when n =10 and
d) Find a 99% CI for when n =25 and
e) How does the length of CIs computed above change with the changes in sample size and confidence level?
2. The sugar content of the syrup in canned peaches is normally distributed. A random sampling of n=10 cans
yields a sample standard deviation of s=4.8 mg. Calculate a 95% two-sided confidence interval for o
3. The 2004 presidential election exit polls from the critical state of Ohio provided the following results. There
were 2020 respondents in the exit polls and 768 were college graduates. Of the college graduates, 412
votes for George Bush.
a) Calculate a 95% confidence interval for the proportion of college graduates in Ohio that voted for George
Bush?
b) Calculate a 95% lower confidence bound for the proportion of college graduates in Ohio that voted for
George Bush?

1000 = x
1000 = x
1000 = x
1000 = x
Table of t-Distribution

Chi-Squared Table
Chi-Squared Table Continued
F-Distribution (p=0.01) Table

F-Distribution (p=0.05) Table

S-ar putea să vă placă și