Sunteți pe pagina 1din 28

MATH& 146

Lesson 27
Section 4.1
The t Distribution

1
Inference for Numerical Data

Earlier in this course, we introduced inference for


proportions using the normal model, and recently, we
encountered the chi-square distribution, which is
useful for working with categorical data with many
levels.
Our focus will now move to numerical data, where we
will encounter two more distributions: the t distribution
(which looks a lot like the normal distribution) and the
F distribution (which looks a lot like the 2 distribution).

2
Inference for Numerical Data

Our general approach will be:


1) Determine which point estimate or test statistic is
useful.
2) Identify an appropriate distribution for the point
estimate or test statistic.
3) Apply the hypothesis and confidence interval
techniques as before, using the distribution from
step 2.

3
Inference for Numerical Data

The sampling distribution associated with a sample


mean is nearly normal, if certain conditions are
satisfied. However, this becomes more complex
when the sample size is small, where small here
typically means a sample size smaller than 30
observations.
For this reason, we'll use a new distribution called
the t distribution that will often work for both small
and large samples of numerical data.

4
Standard Error
For the case of a single mean, the standard error
of the sample mean can be calculated as

SE
n

where is the population standard deviation and n


is the sample size.

5
Standard Error
The problem with this formula is that is usually
unknown to us. We solve this problem by
estimating the standard error using the sample
standard deviation, s:
s
SE
n

6
The SE formula
Looking at this formula, there are two characteristics
we should be thinking about.
1) A larger s corresponds to a larger SE. If the data
are more volatile, then we'll be less certain of the
location of the true mean, so the standard error
should be bigger. On the other hand, if the
observations all fall very close together, then s will
be small, and the sample mean should be a more
precise estimate of the true mean.

s
SE
n 7
The SE formula
2) Also, the larger the sample size n, the smaller the
standard error. This makes intuitive sense in that,
generally speaking, larger samples tend to be more
accurate than smaller samples. We should expect
estimates to be more precise when we have more
data, so the standard error SE should get smaller
when n gets bigger.

s
SE
n 8
Example 1
We've taken a random sample of 100 runners from the
2012 Cherry Blossom Run in Washington, DC, which
was a race with 16,924 participants.
The sample data for the 100 runners are summarized
below.
ID time age gender state
1 88.31 59 M MD
2 100.67 32 M VA
3 109.52 33 F VA

100 89.49 26 M DC
9
Example 1 continued
A histogram and summary statistics of the run time of
participants are available below. Assuming the
conditions for the normal distribution have been met,
create a 95% confidence interval for the average time
it takes runners in the Cherry Blossom Run to
complete the race.
time
sample mean 95.61
sample st. dev. 15.78
sample size 100

10
Example 2
Use the data to calculate a 90% confidence interval for
the average age of participants in the 2012 Cherry
Blossom run. Assume the conditions for applying the
normal distribution have been met.

Age
sample mean 35.05
sample st. dev. 8.97
sample size 100

11
t Distributions
The normal distribution can be used for larger
samples. Otherwise, we need the t distribution. The
t distribution shares many characteristics with the
standard normal, N( = 0, = 1), distribution. Both are
symmetric, unimodal, a mean of 0, and might be
described as "bell-shaped."

t distribution standard normal distribution


(solid line) (dotted line)

12
t Distributions
However, the t-distribution has thicker tails. This
means that in a t-distribution, it is more likely that
we will see extreme values (values far from 0) than
in a standard normal distribution.

t distribution standard normal distribution


(solid line) (dotted line)

13
t Distributions
The t distribution, always centered at zero, has a
single parameter: degrees of freedom.
The degrees of freedom (df) describe the precise
form of the bell-shaped t distribution.

14
t Distributions
The larger the degrees of freedom, the more
closely the distribution approximates the normal
model.

15
t Distributions
In fact, when the degrees of freedom is about 30 or
more, the t distribution is nearly indistinguishable from
the standard normal distribution. Below are t
distributions with 1, 10, and 40 degrees of freedom. In
each case, N(0,1) is drawn for comparison.

16
Degrees of Freedom (df)
Informally, we can define degrees of freedom as a
way of keeping score.
A data set contains a number of observations, say,
n. They constitute n individual pieces of
information. These pieces of information can be
used either to estimate parameters or variability.

17
Degrees of Freedom (df)
In general, each item being estimated costs one
degree of freedom. The remaining degrees of
freedom are used to estimate variability. All we
have to do is count properly.
When studying means, there are n observations.
There's one parameter (the mean) that needs to
be estimated. That leaves n 1 degrees of
freedom for estimating variability.

For inference on single means, use df n 1


18
Example 3
Find the degrees of freedom if the sample size is
a) 8
b) 17
c) 5,023

19
Using tcdf on the TI-83/84
tcdf computes the Student-t distribution probability
between lowerbound and upperbound for the specified
df (degrees of freedom).

That is, if T ~ t(df), then tcdf(a,b,df) = P(a < T < b).

a b 20
Example 4
Graph the area represented by the following
statement and calculate.

tcdf(1, 2, 3)

21
Example 5
A t distribution with 20 degrees of freedom is
shown below. Estimate the proportion falling
above 1.65.

22
Example 6
A t distribution with 2 degrees of freedom is shown
below. Estimate the proportion of the distribution
falling more than 3 units from the mean (above or
below).

23
Conditions
Before applying the t distribution for inference
about a single mean, we check two conditions.
1) Independence of observations.
2) Observations come from a nearly normal
distribution.

24
Independence Condition
We can verify independence just as we did before.
We collect a simple random sample from less
than 10% of the population, or if the data are
from an experiment or random process, we
carefully check to the best of our abilities that the
observations were independent.

25
Nearly Normal Condition
This second condition is difficult to verify with small
data sets. We often
1) Take a look at a plot of the data for obvious
departures from the normal model, usually in the
form of prominent outliers.
2) Compute z-scores for the minimum and maximum
values, and check to see if either is beyond 3 or
+3.

26
Nearly Normal Condition
As the sample size increases, we can increasingly
relax the nearly normal condition. For instance,
moderate skew is acceptable when the sample
size is about 30 or more.
strong skew is acceptable when the sample size
is about 60 or more.

27
Example 7
New York is known as "the city that never sleeps." A
random sample of 25 New Yorkers were asked how
much sleep they get per night. Statistical summaries
of these data are shown below. Check the conditions
to determine if a t distribution is appropriate to use to
answer whether New Yorkers sleep more or less than
8 hours a night on average.

n x s min max
25 7.76 0.77 6.17 9.78
28

S-ar putea să vă placă și