14-Statistical Significance

Statistical Significance
Overview
When we generate a new statistic from a
sample, all we have is an estimate of the
population.
Our audience wants to know: how
confident can we be that this statistic
accurately reflects the population?
Fortunately, statistics can also be used to
measure the likelihood that our sample
statistics resemble the population.
The Logic of Inferential

Statistics
We dont know for certain that our sample
statistic is a true reflection of the populationby
chance, we could have chosen an unusual
sample.
The central limit theorem allows us to estimate
the likelihood that our findings are due to
chance, given:
How different the value is from the value given
in our null hypothesis
Our sample size
The variation of scores in the distribution
Overview of Terminology
Descriptive Statistic: a statistic calculated to
describe the sample or population, such as a
mean, proportion, or correlation coefficient.
Inferential Statistic: a statistic calculated to
assess how accurately a sample statistic
matches the true population parameter.
Can we infer that the population parameter is
the same as the sample statistic?
Parameter: a statistic, but exclusively used

when talking about the population.
Three Distributions for Inferential

Statistics
Three Types of Distributions:
1. Population Distribution: the distribution of all
values in the entire population.
In most instances, this is inaccessible; we seek to make

inferences about the population based on our sample.
2. Sample Distribution: the distribution of all values

in the sample weve drawn.
This is the only distribution we (typically) can actually

measure.
3. Sampling Distribution: the distribution of many

means drawn from many samples.
Hypothetical. The basis for the Central Limit Theorem.
Sampling Distribution
Hypothetical distribution of all sample
means in the population.
How is it created?
Take a sample from the population. Calculate
the mean.
Repeat many, many times.
Plot all of those means in a frequency
distribution graph, where:
The x-axis indicates, from left to right, the lowest to
the highest mean values
The y-axis indicates, from bottom to top, the lowest
to the highest frequencies of each mean.
The Sampling Distribution
When we take many sample means and plot them on a graph, the sample
means that are most different from the population mean are least likely;
the sample means that are most similar to the population mean are most
likely; and the distribution about the population mean is normal.
Sampling Distribution
The sampling distribution is normally distributed,
even if our sample mean is not.
The center of the sampling distribution is the
population mean.
Because it is normally distributed, we can apply
the empirical rule to determine the proportion of
values above or below any one mean.
We can therefore estimate the probability that
the population mean is different from the sample
mean based on the proportions under the curve.
Empirical Rule
Standard Error
The standard error is the standard deviation
of the sampling distribution.
Because the sampling distribution is based
on sample means, not individual scores, it
has less variance than the standard
deviation.
Because inferential statistics are based on
the sampling distribution, the standard error
is used, instead of standard deviation.
Formula:
Central Limit Theorem

Central Limit Theorem: The sampling
distribution of any variable will be
approximately normal, given a
sufficiently large sample size.
This does not mean that the population
is normally distributed.
It does mean that we can use the
normal curve to estimate the probability
of obtaining a particular sample mean.
Symbols for Three Different

Distributions
Measure
Sample
Population
Sampling
Distribution
Mean
Standard
Deviation
The sample distribution is the distribution of values for your
sample youre measuring; the population distribution for the
population youre making inferences about; and the sample
distribution for the hypothetical distribution of means drawn
from the population in repeated sampling.
We use the Greek letter mu to refer to the population
mean; and the Greek letter sigma to refer to the
population standard deviation.
Going forward, descriptive statistics will often have a sample
symbol and a corresponding Greek letter symbolizing the
population value.
Hypothesis Test
A hypothesis is a statement about some
characteristic of a variable or a collection of
variables.
A significance test evaluates the hypothesis
by comparing the values predicted by the
hypothesis to the values we find in our data:
Data that fall far from what the hypothesis
predicts suggest that the hypothesis is false;
Data that are close to what the hypothesis
predicts suggest that the hypothesis is true.
Hypothesis Test
Hypothesis Testing is a procedure for
measuring a hypothesis against data.
The Central Limit Theorem tells us that in any
random sample, our sample statistic is most
likely to be close to the population parameter;
and it is less likely that it will be very different
from the population parameter.
Using the normal distribution, we can estimate
the probability that the population parameter
suggested by a hypothesis is true given our
sample statistic.
General Overview of the Hypothesis

Test
1. Null Hypothesis. (H0)
The hypothesis we are trying to disprove.
The opposite of the alternate hypothesis.
2. Alternate Hypothesis. (HA)

Typically, the hypothesis that some difference, or
association, measured in our sample, is also true
in the population.
It may be easier to state this first; then simply
state its opposite as the null hypothesis.
In hypothesis testing, we do not actually prove
anything; we disprove something, and in doing so,
give support to our prediction.

Test
3. Descriptive Statistics.
Our sample statistic: the mean, Pearson
correlation, V, Gamma, Regression coefficients, etc.
4. Test Statistic.
A statistic that estimates, given the standard error
and the assumption of normality, how different out
sample finding is from what wed expect to find if
the population is true.
For means and Pearson correlations: t. For ANOVA
and regression: F. For nonparametric association:
Chi-squared.

Test
5. Conclusion: We either reject H0 or
HA.
a. We decide on the alpha-level.
Alpha-level: the maximum probability that

H0 is true that we will allow if we are to
reject H0.
In this class, assume that the alpha level
will be .05 unless otherwise directed.

Test
5. Conclusion: We either reject H0 or HA.
SPSS provides the p-value, usually
called Sig. or something similar,
which is the probability H0 is true.
We compare the p-value to the pre-selected

alpha-level.
If p is less than alpha, we reject H0.
If p is greater than alpha, we fail to reject
H0.
Ex: Hypothesis Test for a Single

Mean
.
Here is a small survey asking

people how much they
spend on groceries in a
month.
Your boss suggests they
spend $400.00 per month.
Perform a hypothesis test
that the true population
mean is not $400.00 per
month.
Ask Yourself, before

beginning:
1. What statistic is being asked for
here?
2. Is this a test of a single mean, or a
comparison of means?
3. Is this a one-tailed or two-tailed
test?

beginning:
1.What statistic is being asked for here?
Were testing an assertion about a mean of
$400. So we are looking for a mean.
We call the mean suggested in the question
the null hypothesis mean, symbolized as .

Only one mean is given, and the test is
whether the true mean is *not* that
number. So it is a test of a single mean.

beginning:
3.Is this a one-tailed or two-tailed test?
A two-tailed test asks only whether the true

population mean is different from .
Look for wording like not, not equal to, or
different from to identify a two-tailed test.
A one-tailed test specifically asks whether the
true population mean is greater than or less
than .
Look for wording like greater than, more
than, or increased to identify a one-tailed
test.

Mean Using SPSS
Here is the data

in SPSS Data
View.

Mean Using SPSS
1. State Null Hypothesis.
H0:
2. State Alternate Hypothesis.

HA:
. The hypotheses youre testing are

the same.

# 3 Find
# 4 FindMean Using SPSS
Descriptive
Statistics
Test
Statistic
With SPSS, we can

do #3 and #4 at
once: go to
Analyze
Compare Means
One-Sample Ttest.

Mean Using SPSS
The
variable
ExpPerMonth starts
off in the left-hand
box; Ive selected it
by moving it to the
right. Under test
value, I enter 400;
this is . Then click
Options.
The Confidence Interval
Percentage is just 100%alpha. The alpha level for a
two-tailed test should be .05
(or 5%); so we should set this
at 95%. Click continue and
OK.

Mean Using SPSS
Sample Statistics gives us the

descriptive statistics, and they are
roughly the same as before, albeit
more precise.
Test gives us t; df; and the p-value (here called

Sig. (2-tailed). This is the exact probability that H 0
is true.
If the p-value is less than .05, we reject H0. It is, so
our conclusion is to reject H0.

Mean
#5
Conclusion:
We reject H0.
We do not know for certain that HA
is true; but we are 95% confident
that H0 is false.
We are 95% confident that the true
mean is not $400.00.
We can say that the difference
between and is statistically
What is statistical
significance?
1. An assessment of the population based
on sample data.
A population parameter is always
statistically significant.
2. It is relative.
A single number on its own is never

statistically significant.
We find either that the difference between
two numbers is statistically significant; or
that the association between two variables
is statistically significant.
What is statistical
significance?
3. It is determined by:
Dispersion: the greater the spread

of scores, the less likely we are to
find statistical significance.
Sample size: the larger the sample,
the more likely we are to find
statistical significance.
Our arbitrary determination of
appropriate alpha level.
What is statistical
significance?
We could hypothetically choose a
higher or lower alpha level, but there
are consequences:
If we increase alpha (say, to .10), we
are more likely to find statistical
significance, but we run the risk of a
type 1 error.
If we decrease alpha (say, to .01), we
are less likely to find statistical
significance, but we run the risk of a
type 2 error.
What is statistical
significance?
Type 1 Error: Rejecting the null hypothesis, when the

null hypothesis is true.
Our finding of a statistically significant difference, or

association, is not true in the population.
From our example: Our hypothesis tests tells us that the
true mean is not $400; but it turns out that it is $400.
Type 2 Error: Accepting the null hypothesis when it is

false.
There is a statistically significant difference, or

association, in the population, but our hypothesis test said
not.
From our example: If our hypothesis tests told us that the
true mean was $400; but it turns out that it was not.
Since we usually cant measure the population, we rarely
know if weve made a Type I or Type II error.
Type I and Type II Errors

Our Conclusion
Reject H0 that
Type I Error
Correct Decision
Accept H0 that
Correct Decision
Type II Error
The probability of making a Type I Error is called the p-value. We

do not calculate the p-value by hand (but SPSS does). The
maximum probability of making a Type I Error that we are willing
to allow is the alpha level.
The probability of making a Type II Error is called Power. The
maximum probability of making a Type II Error that we are willing
to allow is called the beta level. It is 1-alpha.
Ex: Hypothesis Test for a Comparison

of Means (Independent Samples)
.
Expenditure Expenditure
Per Month, Per Month,
Rec February
March
Gender
1
513
575
467
450
298
373
494
525
367
389
621
664
404
390
533
610
379
449
10
513
517
11
440
481
12
475
455
Your boss suggests that

spending in March is equal
among men and women.
mean for monthly
Note: here you will be comparing values
expenditure
groceries
of
Expenditure Per on
Month,
March for is
people
with gender
M men.
with values of
greater
among
Expenditure Per Month, March for
people with gender F. So, youll take
the mean for records 1, 2, 6, 8, 10, 11,
and 12; and compare with the mean for
records 3, 4, 5, 7, and 9.

beginning:
1. Is this a test of a single mean, or a comparison of
means?
Comparison of means.
2. Is this a one-tailed or two-tailed test?

One tailed test: the question is specifically concerned
with whether male expenditure is greater than female.
3. Is this a paired samples comparison of means, or

an independent samples comparison of means?
Independent samples: we are comparing means drawn
from one variable (Expenditure in March) according to
different values of another (Gender).
Ex: Hypothesis Test for a Comparison of

Means (Independent Samples) Using SPSS

H0:

HA:
The hypotheses youre testing are the same.

Again, we can do #3
(find descriptive
statistics) and #4 (find
test statistic) at once:
go to Analyze
Compare Means
Independent Samples Ttest.

We calculate means for the
test variable, and compare
them on different values of
the grouping variable. So,
the test variable is
MarExp, and the
grouping variable is
gender.
Click Define Groups, and
another box appears.
Group 1 should be the value
expected to be larger.
Group 2 should be the value
expected to be smaller.
Here, I indicate that Men
are Group 1 and Women are
Group 2. Click Continue,
then OK.

Group Statistics summarizes

basic descriptive statistics for each
group.
Under Independent Samples t-test, look under Equal

Variances not assumed. The hypothesis test results are
provided. Note that the mean difference is positive; if it
were negative (and we had assigned March to Group 1, as
directed), we must reject H0 no matter what, because it
would indicate a decrease (when HA specified an

of Means (Independent Samples)
#5
Conclusion:
We reject H0.
We do not know for certain that HA
is true; but we are 95% confident
that H0 is false.
We can say that the difference
between and is statistically
significant.

of Means
(Paired
Samples)
Here
are results
of a survey
.
Expenditure Expenditure
Per Month, Per Month,
Rec February
March
Gender
1
513
575
467
450
298
373
494
525
367
389
621
664
404
390
533
610
379
449
10
513
517
11
440
481
12
475
455
on monthly spending on
groceries you ran first in
February, and again in
March.
Your boss suggests that
spending has not increased
from February to March.
mean for monthly
expenditure on groceries has
increased.

beginning:
2. Is this a one-tailed or two-tailed
test?
3. Is this a paired samples comparison
of means, or an independent
samples comparison of means?

beginning:
Two means are given, and the test is whether
the true mean has increased. So it is a test of a
comparison of means.
2. Is this a one-tailed or two-tailed test?

The question asks about an *increase*.
I.e., we are testing not just whether the means
for these two months are different, but whether
one mean is specifically greater than the other.
This is a one-tailed test.

beginning:
3. Is this a paired samples comparison of
means, or an independent samples
Independent samples: comparison of means
for two different values of another variable.
Paired samples: comparison of means for two
variables, using all cases that have valid data.
Here: were comparing one variable (Exp Per
Month in Feb) to another (Exp Per Month in
March), so this is a Paired Samples analysis.

of Means (Paired Samples) Using SPSS
H0:

HA:
The hypotheses youre testing are the same.

Once again, we can do #3 (find

Descriptive Statistics) and #4 (find
test statistic) at once: go to
Analyze Compare Means
Paired Samples T Test.

Select the variables
FebExp and MarExp.
Click OK.

Sample Statistics gives us the

descriptive statistics, and they are
roughly the same as before, albeit
more precise.
Test gives us t; df; and the p-value (here called

Sig. (2-tailed). This is the exact probability that H 0
is
true.
If the
p-value is less than .05, we reject H . It is, so
0
our conclusion is to reject H0.

#5
Conclusion:
SPSS only provides a p-value for a two-tailed
test (the p-value, Sig. two-tailed, is .013).
Divide this p-value by 2 to get the appropriate
p-value for a one-tailed test. (.0065)
It is less than .05, so we reject H0.
We do not know for certain that HA is true; but
we are 95% confident that H0 is false.
We can say that the difference between and
is statistically significant.
ANOVA
ANOVA allows you to compare the means of
several groups.
ANOVA provides a test against the H0 that more
than two means are equal.
Commonly used for experiments, where we
might compare:
A new treatment vs. a placebo treatment vs. no
treatment, or
A new treatment vs. a old treatment vs. no
treatment
treatment here refers to a subset of a sample, or
a group sharing a common characteristic.
H0 and HA
We wish to determine that there are
any statistically significant
differences between treatment
means (1, 2, 3, etc.)
H0: 1 = 2 = 3 etc.
HA: At least two of 1, 2, etc. differ.
We seek to show that at least one
pair of means are significantly
different from the others.
Variability
Between-Treatment Variability:
variability among the sample means.
Within-Treatment Variability:
variability among cases in each
sample.
ANOVA seeks to determine that
between-treatment variability is
greater than within-treatment
variability.
Ex: ANOVA with SPSS

ANOVA can be performed much more
quickly using SPSS. Given the same
problem
Hypotheses are the same:
H 0 : 1 = 2 = 3
Where 1 refers to the mean for the new
program; 2 refers to the mean for the old
program; and 3 refers to the mean for no
program.
HA: At least two of 1, 2, and 3 differ.
ANOVA with SPSS
To get descriptive
and test statistics,
go to Analyze
Compare Means
One-Way ANOVA
ANOVA with SPSS

1
)
2
)
3
)
4
)
Dependent List
will be the variable
you find means for
(gpa_cum); Factor
is the grouping
variable. To get
Tukey HSD, go to
Post Hoc and
select Tukey and
Games-Howell.
Also go to Options
and select
Homogeneity of
Variance Test and
ANOVA with SPSS

1)
2)
3)
The Levene Statistic tests for the assumption

of homogeneity of variances. If we find that
Sig. is greater than .05, we can use ANOVA.
Otherwise, we must run another test called
the Welch Statistic. Here, Sig. is greater than
.05, so we can assume homogeneity of
variances and use ANOVA and Tukey HSD.
In the ANOVA box, Between
Groups Sum of Squares is
SST; Within Groups Sum of
Squares is SSE. If Sig. (the
p-value) is less than .05, at
least one pair of groups is
statistically significantly
different.
The Welch Statistic would be
important if Sig. in the Test of
Homogeneity of Variances box
was less than .05. If it were,
interpret Sig. here as you would
Sig. in the ANOVA box.
ANOVA with SPSS
The Multiple Comparisons box tells you exactly which

differences in means are important. If homogeneity of
variances can be assumed, use Tukey HSD. A Sig. of
less than .05 is a statistically significant difference. If
homogeneity of variances cannot be assumed, use
Games-Howell; and interpret the same way.
ANOVA with SPSS:

Conclusions
What conclusions can we draw about the effects of
the programs on change in student GPA?
1. The Levene Statistic tells us that Equal Variances can
be assumed.
2. H0 is rejected; the ANOVA p-value is less than .05, so at
least one set of differences is statistically significant.
3. Tukeys HSD tells us that the only significant difference
is between the new program (1) and no program (3):
a. The new tutoring program (1) and the old program (2) are
not significantly different.
b. The old tutoring program (2) and no program (3) are not
significantly different.
ANOVA
What does it mean to reject H0 for
ANOVA?
At least one mean is statistically
significantly different from one other.
But F-test does not tell you which.
Post-Hoc Test: Tukey HSD. Identifies
meaningful differences among the
means.
What do Post-Hoc Tests Tell

Us?
Our conclusion was to reject H0 that the
three means were equal in the population.
Our three post-hoc test results:
Q12 = 1.30 (new program and old program)
Q13 = 4.12 (new program and no program)
Q23 = 2.82 (old program and no program)
The greatest difference was between the

new program and no intervention. We
would assert that the new program likely
works best.
Review: Hypothesis Testing with

Means
Situation
Appropriate
Hypothesis Test
Example
Single Mean
compared with a
hypothesized mean
Single means test
Comparing mean
expenditure for
February with some
prior guess
Means of one variable Comparison of

Comparing mean
according to different means, independent expenditure for men
values of another
samples
and women
Two means that
appear in your data
as two distinct
variables
Comparison of
means, paired
samples
Comparing mean
expenditure for
February with mean
for March
Means for more than

two different values
of another variable
Analysis of Variance
(ANOVA)
Comparing mean
expenditure for
freshmen,
sophomores, juniors,
and seniors

14-Statistical Significance

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

14-Statistical Significance

Încărcat de

Drepturi de autor:

Formate disponibile

Statistical Significance

The Logic of Inferential

Parameter: a statistic, but exclusively used

Three Distributions for Inferential

In most instances, this is inaccessible; we seek to make

2. Sample Distribution: the distribution of all values

This is the only distribution we (typically) can actually

3. Sampling Distribution: the distribution of many

Hypothetical. The basis for the Central Limit Theorem.

The Sampling Distribution

Central Limit Theorem

Symbols for Three Different

General Overview of the Hypothesis

2. Alternate Hypothesis. (HA)

General Overview of the Hypothesis

General Overview of the Hypothesis

Alpha-level: the maximum probability that

General Overview of the Hypothesis

We compare the p-value to the pre-selected

Ex: Hypothesis Test for a Single

Here is a small survey asking

Ask Yourself, before

Ask Yourself, before

2. Is this a test of a single mean, or a

Ask Yourself, before

A two-tailed test asks only whether the true

Ex: Hypothesis Test for a Single

Here is the data

Ex: Hypothesis Test for a Single

2. State Alternate Hypothesis.

. The hypotheses youre testing are

Ex: Hypothesis Test for a Single

With SPSS, we can

Ex: Hypothesis Test for a Single

Ex: Hypothesis Test for a Single

Sample Statistics gives us the

Test gives us t; df; and the p-value (here called

Ex: Hypothesis Test for a Single

A single number on its own is never

Dispersion: the greater the spread

Type 1 Error: Rejecting the null hypothesis, when the

Our finding of a statistically significant difference, or

Type 2 Error: Accepting the null hypothesis when it is

There is a statistically significant difference, or

Type I and Type II Errors

The probability of making a Type I Error is called the p-value. We

Ex: Hypothesis Test for a Comparison

Your boss suggests that

Ask Yourself, before

2. Is this a one-tailed or two-tailed test?

3. Is this a paired samples comparison of means, or

Ex: Hypothesis Test for a Comparison of

1. State Null Hypothesis.

2. State Alternate Hypothesis.

Ex: Hypothesis Test for a Comparison of

Ex: Hypothesis Test for a Comparison of

Ex: Hypothesis Test for a Comparison of

Group Statistics summarizes

Under Independent Samples t-test, look under Equal

Ex: Hypothesis Test for a Comparison

Ex: Hypothesis Test for a Comparison

Ask Yourself, before

Ask Yourself, before