Documente Academic
Documente Profesional
Documente Cultură
Continued
Bayesian inference for the normal distribution
Suppose that we observe a single observation
x
from a
normal distribution with unknown mean
and known
variance
2
. Suppose that our prior distribution for
is
2
0 0
( , ) N
.
The posterior distribution of
is
( | ) ( )
( | ) ( | ) ( )
( | ) ( )
f x
h x f x
f x d
Now
2 2
0
2 2
0
0
2 2
0
2 2
0
2 2
2 0 0
2 2 2 2 2 2
0 0 0
1 1 1 1
( | ) ( ) exp ( ) exp ( )
2 2
2 2
1 1
exp ( ) ( )
2 2
1 1 1
exp 2
2
f x x
x
x x
1
1
1
1
]
]
1
1
]
_ _
+ + + +
'
, ,
1
;
1
]
Let
, , and a b c
be the coefficients in the quadratic
polynomial in
1 1
_ _
1 1 ' ;
, ,
1 1
] ]
The second term does not depend on
+
and variance
2
1
2 2
0
1
1 1
+
.
Comments about role of prior in the posterior distribution:
The posterior mean is a weighted average of the prior mean
and the data, with weights proportional to the respective
precisions of the prior and the data, where the precision is
equal to 1/variance. If we assume that the experiment (the
observation of X ) is much more informative than the prior
distribution in the sense that
2 2
0
<<
, then
2 2
1
1
x
Thus, the posterior distribution of
is known
and as before, we use the prior distribution
2
0 0
( , ) N
. As
before, the posterior distribution is proportional to
1 1
( | , , ) ( , , | ) ( )
n n
h x x f x x K K
.
From the independence of the
i
X
s,
2
1
/ 2 2
1
1 1
( , , | ) exp ( )
(2 ) 2
n
n i
n n
i
f x x x
1
1
]
K
.
Using the identity,
2 2 2
1 1
( ) ( ) ( )
n n
i i
i i
x x x n x
+
we obtain
2 2
1
/ 2 2 2
1
1 1 1
( , , | ) exp ( ) exp ( )
(2 ) 2 2 /
n
n i
n n
i
f x x x x x
n
1 1
1 1
] ]
K
Only the last term depends on
, so
2
1
2
1
( | , , ) exp ( ) ( )
2 /
n
h x x x
n
1
1
]
K
This posterior distribution can be evaluated in the same
way as the single observation case with
x
replacing
x
and
2
/ n replacing
2
. Thus, the posterior distribution is
normal with mean
0
2 2
0
1
2 2
0
1
nx
n
+
and variance
2
1
2 2
0
1
1 n
+
For large values of
n
,
1
x
and
2
2
1
n
. Therefore, the
information in the sample largely determines the posterior
distribution for large samples.
Bayesian inference:
Point estimation: When faced with a decision, the Bayesian
wants to minimize the expected loss (i.e., maximize the
expected utility) of a decision rule under the prior
distribution
( )
for . In other words the Bayesian
chooses the decision rule d that minimizes the Bayes risk:
( )
( ) [ ( , )] B d E R d
,
i.e., the Bayesian chooses to use the Bayes rule for the
Bayesians prior distribution
( )
As we showed in Notes 26, for point estimation with
squared error loss, the Bayes rule is to use the posterior
mean as the estimate.
Thus, for the above normal distribution setup, the
Bayesians estimate of
is
0
2 2
0
2 2
0
1 1
x
+
+
Interval estimation: A Bayesian version of a confidence
interval is called a credibility interval . A
100(1 )%
credibility interval is an interval of the form
0 1
( , )
, where
1
0
( | ) 1 h x d
.
For example, for
2
1
( , , ) iid ( , )
n
X X N K
where
2
is
known and the prior distribution is
2
0 0
( , ) N
, the posterior
distribution for
is
0
2 2
0
2 2 2 2
0 0
1
,
1 1
nx
N
n n
_
+
+ +
,
and a 95%
credibility interval for
is
0
2 2
0
2 2 2 2
0 0
1
1.96
1 1
nx
n n
+
t
+ +
The frequentist confidence interval is not a probability
statement about . The Bayesian credibility interval is a
statement about . For the Bayesian, once the data
x
has
been observed, the interval is fixed and is random.
Hypothesis testing: Consider testing
0 0
: H
vs.
0
:
a
H
. For the prior distribution, we need to put prior
probabilities on
0
and
a
H H
and then put a prior on under
a
H
. If we use the following 0-1 loss function
0 if
( , Hypothesis i chosen)=
1 if
i
i
H
L
H
'
,
the posterior risk is minimized by choosing the hypothesis
that is more probable under the posterior distribution.
Thus, the Bayes rule is to choose the hypothesis that is
more probable under the posterior distribution. Bayesian
hypothesis testing is a complex topic. The difficulty is that,
unlike in estimation problems, the prior is influential even
in large samples and so must be chosen carefully.
Chapter 15 from Rice, Mathematical Statistics and Data
Analysis provides another example of Bayesian inference
of Bayesian analysis for the binomial distribution.
Review of course
I. Three basic types of statistical inferences (Chapter 5):
1. Point estimation -- best estimate of parameter
2. Confidence intervals how much uncertainty is there
in our estimate of the parameter.
3. Hypothesis testing choose between two hypotheses
about the parameter.
II. Monte Carlo method for studying properties of
inference procedures and bootstrap method for constructing
confidence intervals based on Monte Carlo simulations
(Chapters 5.8-5.9)
III. Maximum likelihood method of making statistical
inferences and its properties (Chapter 6)
IV. Optimal point estimators: Cramer-Rao Lower Bound
(Chapter 6), sufficiency (Chapter 7.1-7.3), Rao-Blackwell
Theorem (Chapter 7.3).
V. Optimal hypothesis testing (Chapters 8.1-8.3).
VI. Decision theory (Chapter 7.1, my notes)
VII. Bayesian statistics (Chapter 11.1-11.2, my notes).