Lecture Notes 7a - Hypothesis Testing For A Population Mean

Lecture notes 7a hypothesis testing for a population
mean
Throughout these notes, it will help to reference the

Hypothesis testing quick reference guide handout.
If you dont have this handout, you can download it
from the course webpage.
Lecture notes 6 highlights:
Hypothesis testing outline

Example description
Setting up the null and alternative hypotheses
Two tailed vs. one tailed tests
The level of significance and critical value
The t-distribution
The test statistic and p-value
The statistical decision
The main purpose of this class is to familiarize you
with the ways in which researchers use statistical
techniques to answer scientific questions.
A very common statistical technique for answering

scientific questions is called hypothesis testing.
Hypothesis testing is an inferential procedure in

which we test to see if we have sufficient evidence to
reject a null hypothesis (H0) in favor of an alternative
hypothesis (Ha).
These two hypotheses are meant to reflect the

research hypothesis being tested.
We choose between H0 and Ha by computing a test
statistic from a set of data, which quantifies the
strength of our evidence against H0.
This statistic will follow a known sampling

distribution, which in our examples will be the t-
distribution. The t-distribution is very similar to the
standard normal z-distribution, just more spread out
for statistics based on small sample sizes.
Since we know the sampling distribution that our test

statistic follows, we can calculate the probability that
it would be a certain size if H0 were true. If we get a
statistic that would be unlikely to occur if H0 were
true, then we will reject H0 in favor of Ha.
The probability of obtaining a test statistic at least
as large as that which we obtained, if the null
hypothesis were true, is called the p-value. Small
p-values are considered to be evidence again H0.
If our p-value is small enough, we reject H0,

otherwise we fail to reject (FTR) H0.
We are able to compute p-values because we know

the sampling distributions of our test statistics.
There is a field of statistics called non-parametric

statistics which is not based on known
distributions. We will not be studying these
techniques in our class.
The hypothesis testing procedure can be performed in
4 steps. Note that how these steps are defined is
subjective; other instructors and textbooks will define
them differently, but the outcome will be the same.
1. Set up the null & alternative hypotheses
2. State the significance level and the corresponding

critical value
3. Compute the test statistic & p-value
4. Make the statistical decision and interpret your

results.
We will demonstrate this procedure using a worked
example.
In general, hypothesis tests can be used to draw

inference on a wide variety of parameters. For now,
we will just be drawing inference upon the population
mean, which is denoted .
Later, we will test to see if two groups have means

that differ from one another, and so we will be
drawing inference on a difference in means, 1 - 2
Each step of the procedure will be considered at

length in the context of this worked example.
Hypothesis Test Example
The example in these notes is the same as the

example in the previous set of notes. The difference is
that in the previous notes we constructed a
confidence interval, whereas in these notes we will
perform a hypothesis test. We will then note how
these two inferential techniques are related to one
another.
Example setup
In many animal species, individuals communicate
through UV signals visible to one another but invisible
to humans. A scientist is interested in the role of UV
colors in the Lissotriton vulgaris newt, and conducted
a study that measured the difference in length of time
a female of the species spent near males with and
without the UV presence. A positive measurement
indicates that the female spent longer time with the
UV present, and a negative means less time under the
same conditions.
The average measurement from 23 trials is 50.7, with

a standard deviation of 87.3.
Example setup
Step 1: set up the null and alternative hypotheses
In hypothesis testing, we always make the null

hypothesis (H0) the proposal we would like to
reject.
This proposal can be stated as a comparison

between some unknown parameter (such as a true
population mean or proportion) and some
hypothesized value.
If we reject H0, we do so in favor of the alternative

hypothesis, Ha. Thus if you are trying to find
evidence in favor of a proposal, that proposal goes
in Ha and its opposite goes in H0.
Two tailed or one tailed test?
Your quick reference sheet contains three general
scenarios for a null and alternative hypothesis pair:
that of a two tailed test, left tailed test, and
right tailed test.
When setting up the null and alternative

hypotheses, we must determine which of these
three scenarios makes the most sense for the
question at hand.
Note that in every case, the notion of equality goes

in the null hypothesis and the notion of non-
equality goes in the alternative hypothesis.
Two tailed tests
If the null hypothesis that we are trying to reject is
that a population parameter is equal to some value,
and the alternative hypothesis is that this population
parameter is not equal to this value, then the test is a
two tailed test.
In a two tailed test, we are willing to reject the null

hypothesis if we find evidence that the unknown
parameter is either less than or greater than the
hypothesized value, and we do not specify ahead of
time which it should be.
If the question is whether or not the true unknown

parameter is different from or not equal to some
specified value, then the test should be two-tailed.
One tailed tests
Sometimes we are only interested in testing to see if
the true unknown parameter is greater than some
specified value, or less than some specified value, but
not in testing for both possibilities at once.
In this case, the test will be either left tailed

(if our alternative hypothesis is that the parameter is
less than the hypothesized value) or right
tailed (greater than the hypothesized value).
Some people say that we shouldnt make assumptions

about whether the true value is less than or greater
than the hypothesized value, and that all tests should
be two tailed. I am sympathetic to this argument, but
will consider both one and two tailed tests.
Example (step 1)
We would like to see if the true mean difference in
newt relationship time differs from what we would
expect to see if the UV presence made no difference.

When the parameter is a mean, we denote it . Also,
since we are testing to see if the mean differs from
some value, this will be a two tailed test.
We can now write the null and alternative

hypotheses:

H0 :
Ha :
Example (step 1)
Note that if we were only interested in testing to see
if the population mean were greater than 0 (as
opposed to different from 0), then this would have
been a right tailed test, and we would have set it up
like this:

H0 :
Ha :
Step 2: state the significance level and the
corresponding critical value
The significance level () is the probability we are

willing to accept of rejecting a true null hypothesis.
If this is not given to you, use = 0.05, which is the
most common value used in hypothesis testing.
The critical value is the value of the test statistics

sampling distribution under the null hypothesis
(in this case, the t-distribution) such that is the
area remaining in the tail of this sampling
distribution.
Step 2: state the significance level and the
corresponding critical value
This concept is easier to understand visually:

Step 2
The sampling distribution of the test statistic under H0
shows the distribution of values of the test statistic
that we would expect to obtain if H0 were true. This
distribution tells us which values of the test statistic
are likely to occur when H0 is true, and of course
which values are unlikely to occur when H0 is true.
The center of the distribution contains the values that

are likely to occur. The tails of the distribution are
where the values which are unlikely to occur lie.
For two tailed tests, we will reject H0 if we obtain a test

statistic in either the left or right tail of the
t-distribution. For one tailed tests, we will reject H0
only if we obtain a test statistic in the relevant tail.
Step 2
The t-distribution
Your t-table gives you critical values based on the
t distribution for all of the most common levels of
significance.
When performing hypothesis tests for means, our

test statistic follows a t distribution, and so our
critical value will be in terms of t.
The t-distribution is more spread out when the

sample size is smaller. The intuition behind this is
that, while the z-distribution is based on a
population standard deviation (), the
t-distribution is based on an estimated sample
standard deviation (s).
The t-distribution
So, our test statistic will be based not only on the
mean of a random sample, but also on the standard
deviation of a random sample.
This sample standard deviation introduces and extra

amount of natural variability in the possible values
that our test statistic can take on.
When the sampling distribution of a statistic has

more variability, it is more spread out. The
distribution of the sample standard deviation will be
more spread out for smaller sample sizes, and so the
distribution of the test statistic (which has the
sample standard deviation in its denominator) will
be more spread out as well.
The t-distribution
As a side note, the t-distribution is sometimes called
students t-distribution.
The statistician who first discovered this distribution

(William Gosset) worked for Guinness brewing at the
time, and Guinness had a policy of not allowing its
employees to publish work they had done for the
company.
Gosset knew that his discovery would have broad

scientific applicability, as it would allow researchers
to draw statistical conclusions using relatively small
sample sizes. He felt his work should be published,
and so he published under the pseudonym
student.
Example (Step 2)
The level of significance is given as =0.01.
We just noted that the smaller the size of the sample

used to compute a t-distributed test statistic, the
more spread out the t-distribution will be.
Specifically, the shape of a t-distribution depends on

its degrees of freedom (df). Degrees of freedom
roughly refers to how many observations used in the
calculation of a statistic can be treated as random.
This is a fairly complex notion which will not explore
in any depth.
When conducting a hypothesis test for a mean, we

use d =f 1
n
Example (Step 2)
So, we have a two-tailed t-test at =0.01 and df =
11. According to the t-table, the critical value is
3.11.
We can sketch the distribution of the test statistic

under H0 and label the critical values:
Example (Step 2)
Lets also sketch what the critical value would have
been if this had been a right tailed test:
Step 3: compute the test statistic and p-value
The test statistic tells us how much evidence we have

against H0. The bigger the test statistic, the stronger
the evidence. The test statistic we will use for now is:
x 0 Where x is the sample

t = mean and 0 is the
s hypothesized value of the
population mean.
n
The p-value is the area in the tail(s) of the

distribution next to the test statistic. You need
your calculator to find this.
Step 3
Note that this statistic follows the general form:
Here, x is the point estimate for the unknown

population mean, 0 is the hypothesized value of the
unknown population mean, and is the standard
error of the point estimate.
Conceptually, this statistic quantifies how far away

our point estimate is from the hypothesized value, in
terms of the standard amount by which we expect our
point estimate to differ from the population parameter
it is estimating (e.g. the natural variability of the
estimate).
Example (step 3)
So, we can plug in the relevant values and compute
our t-test statistic. Note that, for a two-tailed test,
whatever we do on one side of the distribution we
will also do on the other. So, we will consider both
positive and negative values of the test statistic
when computing the p-value.
Example (step 3)
We can then label this on the distribution of the test
statistic under H0 and shade the area corresponding to
the p-value:
Example (step 3)
Finally, we can compute the p-value using the tcdf()
function on the calculator. The works exactly like
the normalcdf() function, but you must also include
df:

Area = tcdf(left endpoint, right endpoint, df):
Also, for a two-tailed test we double the area in one

tail, since whatever is done on one side of the
distribution must also be done on the other:
Example (step 3)
Lets now show what this calculation and picture
would have looked like if this had been a right tailed
test:
Step 4: Make the statistical decision and interpret your

results
In this last step, we compare our p-value to our

level of significance in order to decide whether or
not to reject H0.
If p-value < , reject H0
If p-value > , fail to reject (FTR) H0

Step 4
Once we have made the statistical decision, we
should state what our results mean in plain
English. In other words, relate the statistical
results back to the original question of interest.
If we reject H0, it is common to say that the

parameter differs significantly or is significantly
different from the hypothesized value.
If we FTR H0, it is common to say that the

parameter does not differ significantly from the
hypothesized value.
In general, results are significant if they are unlikely

to come about as the result of chance alone.
Example (step 4)
In this example, we have:

= and p-value =
And so the statistical decision is:
An English interpretation of this decision is:

Conclusion
The example we went through demonstrates a test for a
single population mean. In practice, there are hundreds
(maybe thousands) of different kinds of hypothesis tests.
We will only look at a handful of them in this class.
What all hypothesis tests have in common is that they

identify null and alternative hypotheses, and they utilize
observed data to answer the question how likely would it
be to obtain results like this if the null hypothesis was
true? All hypothesis tests will result in a p-value.
In published research, you wont see the details of a

hypothesis test outlined. But you will always see a
p-value. An understanding of the p-value is the most
important thing you can take away from this section.
The Relationship Between a CI and a
Hypothesis Test
A confidence interval can be thought of as an
inverted two-tailed hypothesis test.
What is meant by this is that a CI contains all the

possible null values that would result in failing to
reject a null hypothesis.
This should make intuitive sense: if we consider a

value for an unknown parameter plausible, then we
would not want to reject it. However, if a value is
outside the range of what we consider plausible,
then we would want to reject it.
The Relationship Between a CI and a
Hypothesis Test
Note that this relationship holds for two-tailed
hypothesis tests. You can see this on your t-
table; the 95% confidence level column
corresponds to a two-tailed test at = 0.05;
the 99% confidence level column corresponds
to a two-tailed test at = 0.01, and so on.
For a one-tailed hypothesis test, it is possible

to reject a null value that falls inside a
corresponding confidence interval.
The Relationship Between a CI and a Hypothesis
Test
Recall that, in the previous set of notes, we failed to

reject the null hypothesis that the true mean UV
newt relationship differential was equal to 0.
Do these results agree with the CIs we just

constructed?
Further Inference
So far we have looked at inferential
techniques (hypothesis tests and
confidence intervals) for a single mean.
In the next set of notes, we will look at
inferential techniques for determining
if there is a difference between two
means.

Lecture Notes 7a - Hypothesis Testing For A Population Mean

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lecture Notes 7a - Hypothesis Testing For A Population Mean

Încărcat de

Drepturi de autor:

Formate disponibile

Lecture notes 7a hypothesis testing for a population

Throughout these notes, it will help to reference the

Hypothesis testing outline

A very common statistical technique for answering

Hypothesis testing is an inferential procedure in

These two hypotheses are meant to reflect the

This statistic will follow a known sampling

Since we know the sampling distribution that our test

If our p-value is small enough, we reject H0,

We are able to compute p-values because we know

There is a field of statistics called non-parametric

1. Set up the null & alternative hypotheses

2. State the significance level and the corresponding

3. Compute the test statistic & p-value

4. Make the statistical decision and interpret your

In general, hypothesis tests can be used to draw

Later, we will test to see if two groups have means

Each step of the procedure will be considered at

The example in these notes is the same as the

The average measurement from 23 trials is 50.7, with

Step 1: set up the null and alternative hypotheses

In hypothesis testing, we always make the null

This proposal can be stated as a comparison

If we reject H0, we do so in favor of the alternative

When setting up the null and alternative

Note that in every case, the notion of equality goes

In a two tailed test, we are willing to reject the null

If the question is whether or not the true unknown

In this case, the test will be either left tailed

Some people say that we shouldnt make assumptions

We can now write the null and alternative

The significance level () is the probability we are

The critical value is the value of the test statistics

This concept is easier to understand visually:

The center of the distribution contains the values that

For two tailed tests, we will reject H0 if we obtain a test

When performing hypothesis tests for means, our

The t-distribution is more spread out when the

This sample standard deviation introduces and extra

When the sampling distribution of a statistic has

The statistician who first discovered this distribution

Gosset knew that his discovery would have broad

We just noted that the smaller the size of the sample

Specifically, the shape of a t-distribution depends on

When conducting a hypothesis test for a mean, we

We can sketch the distribution of the test statistic

The test statistic tells us how much evidence we have

x 0 Where x is the sample

The p-value is the area in the tail(s) of the

Here, x is the point estimate for the unknown

Conceptually, this statistic quantifies how far away

Also, for a two-tailed test we double the area in one

Step 4: Make the statistical decision and interpret your

In this last step, we compare our p-value to our

If p-value < , reject H0

If p-value > , fail to reject (FTR) H0

If we reject H0, it is common to say that the

If we FTR H0, it is common to say that the

In general, results are significant if they are unlikely

And so the statistical decision is:

An English interpretation of this decision is:

What all hypothesis tests have in common is that they