Sunteți pe pagina 1din 40

Lecture notes 7a hypothesis testing for a population

mean

Throughout these notes, it will help to reference the


Hypothesis testing quick reference guide handout.
If you dont have this handout, you can download it
from the course webpage.
Lecture notes 6 highlights:

Hypothesis testing outline


Example description
Setting up the null and alternative hypotheses
Two tailed vs. one tailed tests
The level of significance and critical value
The t-distribution
The test statistic and p-value
The statistical decision
Hypothesis testing outline
The main purpose of this class is to familiarize you
with the ways in which researchers use statistical
techniques to answer scientific questions.

A very common statistical technique for answering


scientific questions is called hypothesis testing.

Hypothesis testing is an inferential procedure in


which we test to see if we have sufficient evidence to
reject a null hypothesis (H0) in favor of an alternative
hypothesis (Ha).

These two hypotheses are meant to reflect the


research hypothesis being tested.
Hypothesis testing outline
We choose between H0 and Ha by computing a test
statistic from a set of data, which quantifies the
strength of our evidence against H0.

This statistic will follow a known sampling


distribution, which in our examples will be the t-
distribution. The t-distribution is very similar to the
standard normal z-distribution, just more spread out
for statistics based on small sample sizes.

Since we know the sampling distribution that our test


statistic follows, we can calculate the probability that
it would be a certain size if H0 were true. If we get a
statistic that would be unlikely to occur if H0 were
true, then we will reject H0 in favor of Ha.
Hypothesis testing outline
The probability of obtaining a test statistic at least
as large as that which we obtained, if the null
hypothesis were true, is called the p-value. Small
p-values are considered to be evidence again H0.

If our p-value is small enough, we reject H0,


otherwise we fail to reject (FTR) H0.

We are able to compute p-values because we know


the sampling distributions of our test statistics.

There is a field of statistics called non-parametric


statistics which is not based on known
distributions. We will not be studying these
techniques in our class.
Hypothesis testing outline
The hypothesis testing procedure can be performed in
4 steps. Note that how these steps are defined is
subjective; other instructors and textbooks will define
them differently, but the outcome will be the same.

1. Set up the null & alternative hypotheses

2. State the significance level and the corresponding


critical value

3. Compute the test statistic & p-value

4. Make the statistical decision and interpret your


results.
Hypothesis testing outline
We will demonstrate this procedure using a worked
example.

In general, hypothesis tests can be used to draw


inference on a wide variety of parameters. For now,
we will just be drawing inference upon the population
mean, which is denoted .

Later, we will test to see if two groups have means


that differ from one another, and so we will be
drawing inference on a difference in means, 1 - 2

Each step of the procedure will be considered at


length in the context of this worked example.
Hypothesis Test Example

The example in these notes is the same as the


example in the previous set of notes. The difference is
that in the previous notes we constructed a
confidence interval, whereas in these notes we will
perform a hypothesis test. We will then note how
these two inferential techniques are related to one
another.
Example setup
In many animal species, individuals communicate
through UV signals visible to one another but invisible
to humans. A scientist is interested in the role of UV
colors in the Lissotriton vulgaris newt, and conducted
a study that measured the difference in length of time
a female of the species spent near males with and
without the UV presence. A positive measurement
indicates that the female spent longer time with the
UV present, and a negative means less time under the
same conditions.

The average measurement from 23 trials is 50.7, with


a standard deviation of 87.3.
Example setup

Step 1: set up the null and alternative hypotheses

In hypothesis testing, we always make the null


hypothesis (H0) the proposal we would like to
reject.

This proposal can be stated as a comparison


between some unknown parameter (such as a true
population mean or proportion) and some
hypothesized value.

If we reject H0, we do so in favor of the alternative


hypothesis, Ha. Thus if you are trying to find
evidence in favor of a proposal, that proposal goes
in Ha and its opposite goes in H0.
Two tailed or one tailed test?
Your quick reference sheet contains three general
scenarios for a null and alternative hypothesis pair:
that of a two tailed test, left tailed test, and
right tailed test.

When setting up the null and alternative


hypotheses, we must determine which of these
three scenarios makes the most sense for the
question at hand.

Note that in every case, the notion of equality goes


in the null hypothesis and the notion of non-
equality goes in the alternative hypothesis.
Two tailed tests
If the null hypothesis that we are trying to reject is
that a population parameter is equal to some value,
and the alternative hypothesis is that this population
parameter is not equal to this value, then the test is a
two tailed test.

In a two tailed test, we are willing to reject the null


hypothesis if we find evidence that the unknown
parameter is either less than or greater than the
hypothesized value, and we do not specify ahead of
time which it should be.

If the question is whether or not the true unknown


parameter is different from or not equal to some
specified value, then the test should be two-tailed.
One tailed tests
Sometimes we are only interested in testing to see if
the true unknown parameter is greater than some
specified value, or less than some specified value, but
not in testing for both possibilities at once.

In this case, the test will be either left tailed


(if our alternative hypothesis is that the parameter is
less than the hypothesized value) or right
tailed (greater than the hypothesized value).

Some people say that we shouldnt make assumptions


about whether the true value is less than or greater
than the hypothesized value, and that all tests should
be two tailed. I am sympathetic to this argument, but
will consider both one and two tailed tests.
Example (step 1)
We would like to see if the true mean difference in
newt relationship time differs from what we would
expect to see if the UV presence made no difference.

When the parameter is a mean, we denote it . Also,
since we are testing to see if the mean differs from
some value, this will be a two tailed test.

We can now write the null and alternative


hypotheses:

H0 :

Ha :
Example (step 1)
Note that if we were only interested in testing to see
if the population mean were greater than 0 (as
opposed to different from 0), then this would have
been a right tailed test, and we would have set it up
like this:


H0 :

Ha :
Step 2: state the significance level and the
corresponding critical value

The significance level () is the probability we are


willing to accept of rejecting a true null hypothesis.
If this is not given to you, use = 0.05, which is the
most common value used in hypothesis testing.

The critical value is the value of the test statistics


sampling distribution under the null hypothesis
(in this case, the t-distribution) such that is the
area remaining in the tail of this sampling
distribution.
Step 2: state the significance level and the
corresponding critical value

This concept is easier to understand visually:


Step 2
The sampling distribution of the test statistic under H0
shows the distribution of values of the test statistic
that we would expect to obtain if H0 were true. This
distribution tells us which values of the test statistic
are likely to occur when H0 is true, and of course
which values are unlikely to occur when H0 is true.

The center of the distribution contains the values that


are likely to occur. The tails of the distribution are
where the values which are unlikely to occur lie.

For two tailed tests, we will reject H0 if we obtain a test


statistic in either the left or right tail of the
t-distribution. For one tailed tests, we will reject H0
only if we obtain a test statistic in the relevant tail.
Step 2

The t-distribution
Your t-table gives you critical values based on the
t distribution for all of the most common levels of
significance.

When performing hypothesis tests for means, our


test statistic follows a t distribution, and so our
critical value will be in terms of t.

The t-distribution is more spread out when the


sample size is smaller. The intuition behind this is
that, while the z-distribution is based on a
population standard deviation (), the
t-distribution is based on an estimated sample
standard deviation (s).
The t-distribution
So, our test statistic will be based not only on the
mean of a random sample, but also on the standard
deviation of a random sample.

This sample standard deviation introduces and extra


amount of natural variability in the possible values
that our test statistic can take on.

When the sampling distribution of a statistic has


more variability, it is more spread out. The
distribution of the sample standard deviation will be
more spread out for smaller sample sizes, and so the
distribution of the test statistic (which has the
sample standard deviation in its denominator) will
be more spread out as well.
The t-distribution
As a side note, the t-distribution is sometimes called
students t-distribution.

The statistician who first discovered this distribution


(William Gosset) worked for Guinness brewing at the
time, and Guinness had a policy of not allowing its
employees to publish work they had done for the
company.

Gosset knew that his discovery would have broad


scientific applicability, as it would allow researchers
to draw statistical conclusions using relatively small
sample sizes. He felt his work should be published,
and so he published under the pseudonym
student.
Example (Step 2)
The level of significance is given as =0.01.

We just noted that the smaller the size of the sample


used to compute a t-distributed test statistic, the
more spread out the t-distribution will be.

Specifically, the shape of a t-distribution depends on


its degrees of freedom (df). Degrees of freedom
roughly refers to how many observations used in the
calculation of a statistic can be treated as random.
This is a fairly complex notion which will not explore
in any depth.

When conducting a hypothesis test for a mean, we


use d =f 1
n
Example (Step 2)
So, we have a two-tailed t-test at =0.01 and df =
11. According to the t-table, the critical value is
3.11.

We can sketch the distribution of the test statistic


under H0 and label the critical values:
Example (Step 2)
Lets also sketch what the critical value would have
been if this had been a right tailed test:
Step 3: compute the test statistic and p-value

The test statistic tells us how much evidence we have


against H0. The bigger the test statistic, the stronger
the evidence. The test statistic we will use for now is:

x 0 Where x is the sample


t = mean and 0 is the
s hypothesized value of the
population mean.
n

The p-value is the area in the tail(s) of the


distribution next to the test statistic. You need
your calculator to find this.
Step 3
Note that this statistic follows the general form:

Here, x is the point estimate for the unknown


population mean, 0 is the hypothesized value of the
unknown population mean, and is the standard
error of the point estimate.

Conceptually, this statistic quantifies how far away


our point estimate is from the hypothesized value, in
terms of the standard amount by which we expect our
point estimate to differ from the population parameter
it is estimating (e.g. the natural variability of the
estimate).
Example (step 3)
So, we can plug in the relevant values and compute
our t-test statistic. Note that, for a two-tailed test,
whatever we do on one side of the distribution we
will also do on the other. So, we will consider both
positive and negative values of the test statistic
when computing the p-value.
Example (step 3)
We can then label this on the distribution of the test
statistic under H0 and shade the area corresponding to
the p-value:
Example (step 3)
Finally, we can compute the p-value using the tcdf()
function on the calculator. The works exactly like
the normalcdf() function, but you must also include
df:

Area = tcdf(left endpoint, right endpoint, df):

Also, for a two-tailed test we double the area in one


tail, since whatever is done on one side of the
distribution must also be done on the other:
Example (step 3)
Lets now show what this calculation and picture
would have looked like if this had been a right tailed
test:

Step 4: Make the statistical decision and interpret your


results

In this last step, we compare our p-value to our


level of significance in order to decide whether or
not to reject H0.

If p-value < , reject H0

If p-value > , fail to reject (FTR) H0


Step 4
Once we have made the statistical decision, we
should state what our results mean in plain
English. In other words, relate the statistical
results back to the original question of interest.

If we reject H0, it is common to say that the


parameter differs significantly or is significantly
different from the hypothesized value.

If we FTR H0, it is common to say that the


parameter does not differ significantly from the
hypothesized value.

In general, results are significant if they are unlikely


to come about as the result of chance alone.
Example (step 4)
In this example, we have:

= and p-value =

And so the statistical decision is:

An English interpretation of this decision is:


Conclusion
The example we went through demonstrates a test for a
single population mean. In practice, there are hundreds
(maybe thousands) of different kinds of hypothesis tests.
We will only look at a handful of them in this class.

What all hypothesis tests have in common is that they


identify null and alternative hypotheses, and they utilize
observed data to answer the question how likely would it
be to obtain results like this if the null hypothesis was
true? All hypothesis tests will result in a p-value.

In published research, you wont see the details of a


hypothesis test outlined. But you will always see a
p-value. An understanding of the p-value is the most
important thing you can take away from this section.
The Relationship Between a CI and a
Hypothesis Test
A confidence interval can be thought of as an
inverted two-tailed hypothesis test.

What is meant by this is that a CI contains all the


possible null values that would result in failing to
reject a null hypothesis.

This should make intuitive sense: if we consider a


value for an unknown parameter plausible, then we
would not want to reject it. However, if a value is
outside the range of what we consider plausible,
then we would want to reject it.
The Relationship Between a CI and a
Hypothesis Test
Note that this relationship holds for two-tailed
hypothesis tests. You can see this on your t-
table; the 95% confidence level column
corresponds to a two-tailed test at = 0.05;
the 99% confidence level column corresponds
to a two-tailed test at = 0.01, and so on.

For a one-tailed hypothesis test, it is possible


to reject a null value that falls inside a
corresponding confidence interval.
The Relationship Between a CI and a Hypothesis
Test

Recall that, in the previous set of notes, we failed to


reject the null hypothesis that the true mean UV
newt relationship differential was equal to 0.

Do these results agree with the CIs we just


constructed?
Further Inference
So far we have looked at inferential
techniques (hypothesis tests and
confidence intervals) for a single mean.
In the next set of notes, we will look at
inferential techniques for determining
if there is a difference between two
means.

S-ar putea să vă placă și