Documente Academic
Documente Profesional
Documente Cultură
Probability sample
Non probability sample
Statistical inference
Sampling error
Probability sample
Goal: A representative sample = miniature of
the population
You can use simple random sampling,
systematic sampling, stratified sampling,
clustered sampling or combination of these
methods to get a probability sample
Probability sample You can draw
conclusions about the whole population
inference.ppt - Aki Taanila
Simple Random
Sample
Population
Systematic
Stratified
Population
18-29
Sample
Proportional
allocation
30-49
65+
50-64
Even allocation
Compare groups
Sample
Cluster
Divide population into the clusters
(schools, districts,)
Choose randomly some of the
clusters
Statistical inference
Statistical inference: Drawing conclusions
about the whole population on the basis of a
sample
Precondition for statistical inference: A
sample is randomly selected from the
population (=probability sample)
Sampling Error
Sample 1
mean 40,5
Population
Sample 2
mean 40,3
mean 40,8
10
Sample 3
mean 41,4
Sampling distributions
Mean
Normal distribution
T-distribution
Proportion
Normal distribution
11
9.12
Distribution of a statistic
Statistics follow distributions too
But the distribution of a statistic is a theoretical construct.
Statisticians ask a thought experiment: how much would the
value of the statistic fluctuate if one could repeat a particular
study over and over again with different samples of the same
size?
By answering this question, statisticians are able to pinpoint
exactly how much uncertainty is associated with a given
statistic.
Sampling distribution
Most of the statistical inference methods are
based on sampling distributions
You can apply statistical inference without
knowing sampling distributions
Still, it is useful to know, at least the basic idea
of sampling distribution
14
= sample mean
Parameters
m = population mean
s2 = sample variance
s2 = population variance
s = sample standard
deviation
s = population standard
deviation
9.15
2.
s is a nonnegative number. If all the numbers in a sample are equal, the value of the
standard deviation will be zero. This is the smallest possible value for the standard
deviation.
3.
When comparing 2 samples of data, the sample that is more variable will have a
larger standard deviation.
9.16
9.17
1/6
1/6
1/6
1/6
1/6
1/6
9.18
1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6
Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5
19
4,1
2.5
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5
Sample
25
26
27
28
29
30
31
32
33
34
35
36
Mean
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6
3
3.5
4
4.5
5
5.5
3.5
4
4.5
5
5.5
6
9.19
P( )
5/36
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
4/36
P(
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
is shown below:
3/36
2/36
1/36
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
9.21
Compare
Compare the distribution of X
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Generalize
We can generalize the mean and variance of
the sampling of two dice:
to n-dice:
9.23
n 10
n 25
m x 3.5
m x 3.5
m x 3.5
s2x
s .5833 ( )
5 6
s2x
2
s x .2917 ( )
10
s2x
s .1167 ( )
25
2
x
2
x
9.25
Parameter Estimation
Parameter and its estimate
Error margin
30
Parameter estimation
Objective is to estimate the unknown
population parameter using the value
calculated from the sample
The parameter may be for example mean
or proportion
31
Statistic
Parameter
Mean:
estimates
_m___
Standard
deviation:
estimates
_s___
Proportion:
estimates
____
from sample
from entire
population
Population
Mean, m, is
unknown
Sample
I am 95%
confident that m
is between 40 &
60
Parameter
Error margin
A value calculated from the sample is the best
guess when estimating corresponding
population value
Estimate is still uncertain due to sampling
error
Error margin is a measure of uncertainty
Using error margin you can state confidence
interval: estimate + error margin
inference.ppt - Aki Taanila
35
36
s
n
s
n
m x t critical
37
s
n
Confidence level
Confidence level can be selected to be different from 95%
If population standard deviation s is known then critical value
can be calculated from normal distribution
Ex. In Excel =-NORMSINV(0,005) gives the critical value for
99% confidence level (0,005 is half of 0,01)
If population standard deviation s is unknown then critical
value can be calculated from t-distribution
Ex. In Excel =TINV(0,01;79) gives critical value when sample
size is 80 and confidence level is 99%
38
Inference
Two ways to make inference
Estimation of parameters
* Point Estimation (X or p)
* Intervals Estimation
Hypothesis Testing
Hypothesis testing
Null hypothesis
Alternative hypothesis
2-tailed or 1 tailed
P-value
40
Hypothesis 1
Hypothesis is a belief
concerning a parameter
Parameter may be
population mean,
proportion, correlation
coefficient,...
41
Hypothesis 2
Null hypothesis is prevalent opinion, previous
knowledge, basic assumption, prevailing
theory,...
Alternative hypothesis is rival opinion
Null hypothesis is assumed to be true as long
as we find evidence against it
If a sample gives strong enough evidence
against null hypothesis then alternative
hypothesis comes into force.
inference.ppt - Aki Taanila
42
Hypothesis examples
H0: Mean height of males equals 174.
H1: Mean height is bigger than 174.
H0: Half of the population is in favour of nuclear power plant.
H1: More than half of the population is in favour of nuclear power plant.
H0: The amount of overtime work is equal for males and females.
H1: The amount of overtime work is not equal for males and females.
43
2-tailed Test
Use 2-tailed if there is no
reason for 1-tailed.
In 2-tailed test deviations
(from the null hypothesis)
to the both directions are
interesting.
Alternative hypothesis
takes the form different
than.
inference.ppt - Aki Taanila
44
1-tailed Test
In 1-tailed test we know
beforehand that only
deviations to one
direction are possible
or interesting.
Alternative hypothesis
takes the form less
than or greater than.
inference.ppt - Aki Taanila
45
J
J
J J
J
J
Random sample
Mean
age = 45
46
Reject null
hypothesis! Sample
mean is only 45!
47
Significance Level
When we reject the null hypothesis there is a
risk of drawing a wrong conclusion
Risk of drawing a wrong conclusion (called pvalue or observed significance level) can be
calculated
Researcher decides the maximum risk (called
significance level) he is ready to take
Usual significance level is 5%
inference.ppt - Aki Taanila
48
P-value
We start from the basic assumption: The null
hypothesis is true
P-value is the probability of getting a value
equal to or more extreme than the sample
result, given that the null hypothesis is true
Decision rule: If p-value is less than 5% then
reject the null hypothesis; if p-value is 5% or
more then the null hypothesis remains valid
In any case, you must give the p-value as a
justification for your decision.
inference.ppt - Aki Taanila
49
50
Testing mean
Null hypothesis: Mean equals x0
Alternative hypothesis (2-tailed): Mean is
different from x0
Alternative hypothesis (1-tailed): Mean is less
than x0
Alternative hypothesis (1-tailed): Mean is
bigger than x0
51
xm
s
n
52
xm
s
n
53
mx m
s
sx
n
Confidence Interval
s
confidence interval observed mean Z/2 * ( )
n
Confidence Interval
s
confidence interval observed mean Tn 1,/2 * ( )
n
Distribution of a correlation
coefficient
Normally distributed!
Mean = 0.15 (true correlation)
Standard error = 0.10
1 r
n
Proportions/difference in proportions
Regression coefficients
T-distribution for small samples
Recall: 68-95-99.7 rule for normal distributions! These is a 95% chance that the
sample mean will fall within two standard errors of the true mean= 62 +/- 2*3.3 =
55.4 nmol/L to 68.6 nmol/L
Mean - 2 Std error=55.4
Mean
To be precise, 95% of
observations fall
between Z=-1.96 and Z=
+1.96 (so the 2 is a
rounded number)
Only 1 confidence
interval missed the true
mean.
Confidence Intervals
The value of the statistic in my sample (eg.,
mean, odds ratio, etc.)
Z value
1.28
1.645
1.96
2.33
2.58
3.08
3.27