Sunteți pe pagina 1din 61

Statistical Significance

Overview
When we generate a new statistic from a
sample, all we have is an estimate of the
population.
Our audience wants to know: how
confident can we be that this statistic
accurately reflects the population?
Fortunately, statistics can also be used to
measure the likelihood that our sample
statistics resemble the population.

The Logic of Inferential


Statistics
We dont know for certain that our sample
statistic is a true reflection of the populationby
chance, we could have chosen an unusual
sample.
The central limit theorem allows us to estimate
the likelihood that our findings are due to
chance, given:
How different the value is from the value given
in our null hypothesis
Our sample size
The variation of scores in the distribution

Overview of Terminology
Descriptive Statistic: a statistic calculated to
describe the sample or population, such as a
mean, proportion, or correlation coefficient.
Inferential Statistic: a statistic calculated to
assess how accurately a sample statistic
matches the true population parameter.
Can we infer that the population parameter is
the same as the sample statistic?

Parameter: a statistic, but exclusively used


when talking about the population.

Three Distributions for Inferential


Statistics
Three Types of Distributions:
1. Population Distribution: the distribution of all
values in the entire population.

In most instances, this is inaccessible; we seek to make


inferences about the population based on our sample.

2. Sample Distribution: the distribution of all values


in the sample weve drawn.

This is the only distribution we (typically) can actually


measure.

3. Sampling Distribution: the distribution of many


means drawn from many samples.

Hypothetical. The basis for the Central Limit Theorem.

Sampling Distribution
Hypothetical distribution of all sample
means in the population.
How is it created?
Take a sample from the population. Calculate
the mean.
Repeat many, many times.
Plot all of those means in a frequency
distribution graph, where:
The x-axis indicates, from left to right, the lowest to
the highest mean values
The y-axis indicates, from bottom to top, the lowest
to the highest frequencies of each mean.

The Sampling Distribution

When we take many sample means and plot them on a graph, the sample
means that are most different from the population mean are least likely;
the sample means that are most similar to the population mean are most
likely; and the distribution about the population mean is normal.

Sampling Distribution
The sampling distribution is normally distributed,
even if our sample mean is not.
The center of the sampling distribution is the
population mean.
Because it is normally distributed, we can apply
the empirical rule to determine the proportion of
values above or below any one mean.
We can therefore estimate the probability that
the population mean is different from the sample
mean based on the proportions under the curve.

Empirical Rule

Standard Error
The standard error is the standard deviation
of the sampling distribution.
Because the sampling distribution is based
on sample means, not individual scores, it
has less variance than the standard
deviation.
Because inferential statistics are based on
the sampling distribution, the standard error
is used, instead of standard deviation.
Formula:

Central Limit Theorem


Central Limit Theorem: The sampling
distribution of any variable will be
approximately normal, given a
sufficiently large sample size.
This does not mean that the population
is normally distributed.
It does mean that we can use the
normal curve to estimate the probability
of obtaining a particular sample mean.

Symbols for Three Different


Distributions
Measure

Sample

Population

Sampling
Distribution

Mean
Standard
Deviation
The sample distribution is the distribution of values for your
sample youre measuring; the population distribution for the
population youre making inferences about; and the sample
distribution for the hypothetical distribution of means drawn
from the population in repeated sampling.
We use the Greek letter mu to refer to the population
mean; and the Greek letter sigma to refer to the
population standard deviation.
Going forward, descriptive statistics will often have a sample
symbol and a corresponding Greek letter symbolizing the
population value.

Hypothesis Test
A hypothesis is a statement about some
characteristic of a variable or a collection of
variables.
A significance test evaluates the hypothesis
by comparing the values predicted by the
hypothesis to the values we find in our data:
Data that fall far from what the hypothesis
predicts suggest that the hypothesis is false;
Data that are close to what the hypothesis
predicts suggest that the hypothesis is true.

Hypothesis Test
Hypothesis Testing is a procedure for
measuring a hypothesis against data.
The Central Limit Theorem tells us that in any
random sample, our sample statistic is most
likely to be close to the population parameter;
and it is less likely that it will be very different
from the population parameter.
Using the normal distribution, we can estimate
the probability that the population parameter
suggested by a hypothesis is true given our
sample statistic.

General Overview of the Hypothesis


Test
1. Null Hypothesis. (H0)
The hypothesis we are trying to disprove.
The opposite of the alternate hypothesis.

2. Alternate Hypothesis. (HA)


Typically, the hypothesis that some difference, or
association, measured in our sample, is also true
in the population.
It may be easier to state this first; then simply
state its opposite as the null hypothesis.
In hypothesis testing, we do not actually prove
anything; we disprove something, and in doing so,
give support to our prediction.

General Overview of the Hypothesis


Test
3. Descriptive Statistics.
Our sample statistic: the mean, Pearson
correlation, V, Gamma, Regression coefficients, etc.

4. Test Statistic.
A statistic that estimates, given the standard error
and the assumption of normality, how different out
sample finding is from what wed expect to find if
the population is true.
For means and Pearson correlations: t. For ANOVA
and regression: F. For nonparametric association:
Chi-squared.

General Overview of the Hypothesis


Test
5. Conclusion: We either reject H0 or
HA.
a. We decide on the alpha-level.

Alpha-level: the maximum probability that


H0 is true that we will allow if we are to
reject H0.
In this class, assume that the alpha level
will be .05 unless otherwise directed.

General Overview of the Hypothesis


Test
5. Conclusion: We either reject H0 or HA.
SPSS provides the p-value, usually
called Sig. or something similar,
which is the probability H0 is true.

We compare the p-value to the pre-selected


alpha-level.
If p is less than alpha, we reject H0.
If p is greater than alpha, we fail to reject
H0.

Ex: Hypothesis Test for a Single


Mean
.

Here is a small survey asking


people how much they
spend on groceries in a
month.
Your boss suggests they
spend $400.00 per month.
Perform a hypothesis test
that the true population
mean is not $400.00 per
month.

Ask Yourself, before


beginning:
1. What statistic is being asked for
here?
2. Is this a test of a single mean, or a
comparison of means?
3. Is this a one-tailed or two-tailed
test?

Ask Yourself, before


beginning:
1.What statistic is being asked for here?
Were testing an assertion about a mean of
$400. So we are looking for a mean.
We call the mean suggested in the question
the null hypothesis mean, symbolized as .

2. Is this a test of a single mean, or a


comparison of means?
Only one mean is given, and the test is
whether the true mean is *not* that
number. So it is a test of a single mean.

Ask Yourself, before


beginning:
3.Is this a one-tailed or two-tailed test?

A two-tailed test asks only whether the true


population mean is different from .
Look for wording like not, not equal to, or
different from to identify a two-tailed test.
A one-tailed test specifically asks whether the
true population mean is greater than or less
than .
Look for wording like greater than, more
than, or increased to identify a one-tailed
test.

Ex: Hypothesis Test for a Single


Mean Using SPSS

Here is the data


in SPSS Data
View.

Ex: Hypothesis Test for a Single


Mean Using SPSS
1. State Null Hypothesis.
H0:

2. State Alternate Hypothesis.


HA:

. The hypotheses youre testing are


the same.

Ex: Hypothesis Test for a Single


# 3 Find
# 4 FindMean Using SPSS
Descriptive
Statistics

Test
Statistic

With SPSS, we can


do #3 and #4 at
once: go to
Analyze
Compare Means
One-Sample Ttest.

Ex: Hypothesis Test for a Single


Mean Using SPSS

The
variable
ExpPerMonth starts
off in the left-hand
box; Ive selected it
by moving it to the
right. Under test
value, I enter 400;
this is . Then click
Options.
The Confidence Interval
Percentage is just 100%alpha. The alpha level for a
two-tailed test should be .05
(or 5%); so we should set this
at 95%. Click continue and
OK.

Ex: Hypothesis Test for a Single


Mean Using SPSS

Sample Statistics gives us the


descriptive statistics, and they are
roughly the same as before, albeit
more precise.

Test gives us t; df; and the p-value (here called


Sig. (2-tailed). This is the exact probability that H 0
is true.
If the p-value is less than .05, we reject H0. It is, so
our conclusion is to reject H0.

Ex: Hypothesis Test for a Single


Mean
#5
Conclusion:
We reject H0.
We do not know for certain that HA
is true; but we are 95% confident
that H0 is false.
We are 95% confident that the true
mean is not $400.00.
We can say that the difference
between and is statistically

What is statistical
significance?
1. An assessment of the population based
on sample data.
A population parameter is always
statistically significant.

2. It is relative.

A single number on its own is never


statistically significant.
We find either that the difference between
two numbers is statistically significant; or
that the association between two variables
is statistically significant.

What is statistical
significance?
3. It is determined by:

Dispersion: the greater the spread


of scores, the less likely we are to
find statistical significance.
Sample size: the larger the sample,
the more likely we are to find
statistical significance.
Our arbitrary determination of
appropriate alpha level.

What is statistical
significance?
We could hypothetically choose a
higher or lower alpha level, but there
are consequences:
If we increase alpha (say, to .10), we
are more likely to find statistical
significance, but we run the risk of a
type 1 error.
If we decrease alpha (say, to .01), we
are less likely to find statistical
significance, but we run the risk of a
type 2 error.

What is statistical
significance?

Type 1 Error: Rejecting the null hypothesis, when the


null hypothesis is true.

Our finding of a statistically significant difference, or


association, is not true in the population.
From our example: Our hypothesis tests tells us that the
true mean is not $400; but it turns out that it is $400.

Type 2 Error: Accepting the null hypothesis when it is


false.

There is a statistically significant difference, or


association, in the population, but our hypothesis test said
not.
From our example: If our hypothesis tests told us that the
true mean was $400; but it turns out that it was not.
Since we usually cant measure the population, we rarely
know if weve made a Type I or Type II error.

Type I and Type II Errors


Our Conclusion
Reject H0 that

Type I Error

Correct Decision

Accept H0 that

Correct Decision

Type II Error

The probability of making a Type I Error is called the p-value. We


do not calculate the p-value by hand (but SPSS does). The
maximum probability of making a Type I Error that we are willing
to allow is the alpha level.
The probability of making a Type II Error is called Power. The
maximum probability of making a Type II Error that we are willing
to allow is called the beta level. It is 1-alpha.

Ex: Hypothesis Test for a Comparison


of Means (Independent Samples)
.

Expenditure Expenditure
Per Month, Per Month,
Rec February
March
Gender
1

513

575

467

450

298

373

494

525

367

389

621

664

404

390

533

610

379

449

10

513

517

11

440

481

12

475

455

Your boss suggests that


spending in March is equal
among men and women.
Perform a hypothesis test
that the true population
mean for monthly
Note: here you will be comparing values
expenditure
groceries
of
Expenditure Per on
Month,
March for is
people
with gender
M men.
with values of
greater
among
Expenditure Per Month, March for
people with gender F. So, youll take
the mean for records 1, 2, 6, 8, 10, 11,
and 12; and compare with the mean for
records 3, 4, 5, 7, and 9.

Ask Yourself, before


beginning:
1. Is this a test of a single mean, or a comparison of
means?
Comparison of means.

2. Is this a one-tailed or two-tailed test?


One tailed test: the question is specifically concerned
with whether male expenditure is greater than female.

3. Is this a paired samples comparison of means, or


an independent samples comparison of means?
Independent samples: we are comparing means drawn
from one variable (Expenditure in March) according to
different values of another (Gender).

Ex: Hypothesis Test for a Comparison of


Means (Independent Samples) Using SPSS

1. State Null Hypothesis.


H0:

2. State Alternate Hypothesis.


HA:
The hypotheses youre testing are the same.

Ex: Hypothesis Test for a Comparison of


Means (Independent Samples) Using SPSS

Again, we can do #3
(find descriptive
statistics) and #4 (find
test statistic) at once:
go to Analyze
Compare Means
Independent Samples Ttest.

Ex: Hypothesis Test for a Comparison of


Means (Independent Samples) Using SPSS
We calculate means for the
test variable, and compare
them on different values of
the grouping variable. So,
the test variable is
MarExp, and the
grouping variable is
gender.
Click Define Groups, and
another box appears.
Group 1 should be the value
expected to be larger.
Group 2 should be the value
expected to be smaller.
Here, I indicate that Men
are Group 1 and Women are
Group 2. Click Continue,
then OK.

Ex: Hypothesis Test for a Comparison of


Means (Independent Samples) Using SPSS

Group Statistics summarizes


basic descriptive statistics for each
group.

Under Independent Samples t-test, look under Equal


Variances not assumed. The hypothesis test results are
provided. Note that the mean difference is positive; if it
were negative (and we had assigned March to Group 1, as
directed), we must reject H0 no matter what, because it
would indicate a decrease (when HA specified an

Ex: Hypothesis Test for a Comparison


of Means (Independent Samples)
#5
Conclusion:
We reject H0.
We do not know for certain that HA
is true; but we are 95% confident
that H0 is false.
We can say that the difference
between and is statistically
significant.

Ex: Hypothesis Test for a Comparison


of Means
(Paired
Samples)
Here
are results
of a survey
.

Expenditure Expenditure
Per Month, Per Month,
Rec February
March
Gender
1

513

575

467

450

298

373

494

525

367

389

621

664

404

390

533

610

379

449

10

513

517

11

440

481

12

475

455

on monthly spending on
groceries you ran first in
February, and again in
March.
Your boss suggests that
spending has not increased
from February to March.
Perform a hypothesis test
that the true population
mean for monthly
expenditure on groceries has
increased.

Ask Yourself, before


beginning:
1. Is this a test of a single mean, or a
comparison of means?
2. Is this a one-tailed or two-tailed
test?
3. Is this a paired samples comparison
of means, or an independent
samples comparison of means?

Ask Yourself, before


beginning:
1. Is this a test of a single mean, or a
comparison of means?
Two means are given, and the test is whether
the true mean has increased. So it is a test of a
comparison of means.

2. Is this a one-tailed or two-tailed test?


The question asks about an *increase*.
I.e., we are testing not just whether the means
for these two months are different, but whether
one mean is specifically greater than the other.
This is a one-tailed test.

Ask Yourself, before


beginning:
3. Is this a paired samples comparison of
means, or an independent samples
comparison of means?
Independent samples: comparison of means
for two different values of another variable.
Paired samples: comparison of means for two
variables, using all cases that have valid data.
Here: were comparing one variable (Exp Per
Month in Feb) to another (Exp Per Month in
March), so this is a Paired Samples analysis.

Ex: Hypothesis Test for a Comparison


of Means (Paired Samples) Using SPSS
1. State Null Hypothesis.
H0:

2. State Alternate Hypothesis.


HA:
The hypotheses youre testing are the same.

Ex: Hypothesis Test for a Comparison


of Means (Paired Samples) Using SPSS

Once again, we can do #3 (find


Descriptive Statistics) and #4 (find
test statistic) at once: go to
Analyze Compare Means
Paired Samples T Test.

Ex: Hypothesis Test for a Comparison


of Means (Paired Samples) Using SPSS
Select the variables
FebExp and MarExp.
Click OK.

Ex: Hypothesis Test for a Comparison


of Means (Paired Samples) Using SPSS

Sample Statistics gives us the


descriptive statistics, and they are
roughly the same as before, albeit
more precise.

Test gives us t; df; and the p-value (here called


Sig. (2-tailed). This is the exact probability that H 0
is
true.
If the
p-value is less than .05, we reject H . It is, so
0

our conclusion is to reject H0.

Ex: Hypothesis Test for a Comparison


of Means (Paired Samples) Using SPSS
#5
Conclusion:
SPSS only provides a p-value for a two-tailed
test (the p-value, Sig. two-tailed, is .013).
Divide this p-value by 2 to get the appropriate
p-value for a one-tailed test. (.0065)
It is less than .05, so we reject H0.
We do not know for certain that HA is true; but
we are 95% confident that H0 is false.
We can say that the difference between and
is statistically significant.

ANOVA
ANOVA allows you to compare the means of
several groups.
ANOVA provides a test against the H0 that more
than two means are equal.
Commonly used for experiments, where we
might compare:
A new treatment vs. a placebo treatment vs. no
treatment, or
A new treatment vs. a old treatment vs. no
treatment
treatment here refers to a subset of a sample, or
a group sharing a common characteristic.

H0 and HA
We wish to determine that there are
any statistically significant
differences between treatment
means (1, 2, 3, etc.)
H0: 1 = 2 = 3 etc.
HA: At least two of 1, 2, etc. differ.
We seek to show that at least one
pair of means are significantly
different from the others.

Variability
Between-Treatment Variability:
variability among the sample means.
Within-Treatment Variability:
variability among cases in each
sample.
ANOVA seeks to determine that
between-treatment variability is
greater than within-treatment
variability.

Ex: ANOVA with SPSS


ANOVA can be performed much more
quickly using SPSS. Given the same
problem
Hypotheses are the same:
H 0 : 1 = 2 = 3
Where 1 refers to the mean for the new
program; 2 refers to the mean for the old
program; and 3 refers to the mean for no
program.

HA: At least two of 1, 2, and 3 differ.

ANOVA with SPSS

To get descriptive
and test statistics,
go to Analyze
Compare Means
One-Way ANOVA

ANOVA with SPSS


1
)

2
)

3
)

4
)

Dependent List
will be the variable
you find means for
(gpa_cum); Factor
is the grouping
variable. To get
Tukey HSD, go to
Post Hoc and
select Tukey and
Games-Howell.
Also go to Options
and select
Homogeneity of
Variance Test and

ANOVA with SPSS


1)

2)

3)

The Levene Statistic tests for the assumption


of homogeneity of variances. If we find that
Sig. is greater than .05, we can use ANOVA.
Otherwise, we must run another test called
the Welch Statistic. Here, Sig. is greater than
.05, so we can assume homogeneity of
variances and use ANOVA and Tukey HSD.
In the ANOVA box, Between
Groups Sum of Squares is
SST; Within Groups Sum of
Squares is SSE. If Sig. (the
p-value) is less than .05, at
least one pair of groups is
statistically significantly
different.
The Welch Statistic would be
important if Sig. in the Test of
Homogeneity of Variances box
was less than .05. If it were,
interpret Sig. here as you would
Sig. in the ANOVA box.

ANOVA with SPSS

The Multiple Comparisons box tells you exactly which


differences in means are important. If homogeneity of
variances can be assumed, use Tukey HSD. A Sig. of
less than .05 is a statistically significant difference. If
homogeneity of variances cannot be assumed, use
Games-Howell; and interpret the same way.

ANOVA with SPSS:


Conclusions
What conclusions can we draw about the effects of
the programs on change in student GPA?
1. The Levene Statistic tells us that Equal Variances can
be assumed.
2. H0 is rejected; the ANOVA p-value is less than .05, so at
least one set of differences is statistically significant.
3. Tukeys HSD tells us that the only significant difference
is between the new program (1) and no program (3):
a. The new tutoring program (1) and the old program (2) are
not significantly different.
b. The old tutoring program (2) and no program (3) are not
significantly different.

ANOVA
What does it mean to reject H0 for
ANOVA?
At least one mean is statistically
significantly different from one other.
But F-test does not tell you which.
Post-Hoc Test: Tukey HSD. Identifies
meaningful differences among the
means.

What do Post-Hoc Tests Tell


Us?
Our conclusion was to reject H0 that the
three means were equal in the population.
Our three post-hoc test results:
Q12 = 1.30 (new program and old program)
Q13 = 4.12 (new program and no program)
Q23 = 2.82 (old program and no program)

The greatest difference was between the


new program and no intervention. We
would assert that the new program likely
works best.

Review: Hypothesis Testing with


Means
Situation

Appropriate
Hypothesis Test

Example

Single Mean
compared with a
hypothesized mean

Single means test

Comparing mean
expenditure for
February with some
prior guess

Means of one variable Comparison of


Comparing mean
according to different means, independent expenditure for men
values of another
samples
and women
Two means that
appear in your data
as two distinct
variables

Comparison of
means, paired
samples

Comparing mean
expenditure for
February with mean
for March

Means for more than


two different values
of another variable

Analysis of Variance
(ANOVA)

Comparing mean
expenditure for
freshmen,
sophomores, juniors,
and seniors

S-ar putea să vă placă și