Notes 28

Statistics 512 Notes 28: Bayesian Statistics
Continued

Bayesian inference for the normal distribution
Suppose that we observe a single observation
x
from a
normal distribution with unknown mean
and known
variance
2
. Suppose that our prior distribution for
is
2
0 0
( , ) N
.
The posterior distribution of
is
( | ) ( )
( | ) ( | ) ( )
( | ) ( )
f x
h x f x
f x d

Now
2 2
0
2 2
0
0
2 2
0
2 2
0
2 2
2 0 0
2 2 2 2 2 2
0 0 0
1 1 1 1
( | ) ( ) exp ( ) exp ( )
2 2
2 2
1 1
exp ( ) ( )
2 2
1 1 1
exp 2
2
f x x
x
x x

1

1

1
1
]
]
1

1
]

_ _
+ + + +
'
, ,
1
;
1
]
Let
, , and a b c
be the coefficients in the quadratic
polynomial in
that is the last expression. The last

expression may then be written as
2
2
exp
2
a b c
a a

1
_
+
1
,
]
To simplify this further, we use the technique of
completing the square and rewrite the expression as
2 2
2
exp exp
2 2
a b a c b
a a a

1 1
_ _

1 1 ' ;

, ,
1 1

] ]

The second term does not depend on
and we thus have

that
2
( | ) exp
2
a b
h x
a

1
_

1

,
1
]
This is the density of a normal random variable with mean
b
a
and variance
1
a
.
Thus, the posterior distribution of
is normal with mean

0
2 2
0
1
2 2
0
1 1
x

+
and variance
2
1
2 2
0
1
1 1
+
.
Comments about role of prior in the posterior distribution:
The posterior mean is a weighted average of the prior mean
and the data, with weights proportional to the respective
precisions of the prior and the data, where the precision is
equal to 1/variance. If we assume that the experiment (the
observation of X ) is much more informative than the prior
distribution in the sense that
2 2
0
<<
, then
2 2
1

1
x
Thus, the posterior distribution of
is nearly normal with

mean
x
and variance
2
. This result illustrates that if the
prior distribution is quite flat relative to the likelihood, then
1. the prior distribution has little influence on the
posterior
2. the posterior distribution is approximately
proportional to the likelihood function.
On a heuristic level, the first point says that if one does not
have strong prior opinions, ones posterior opinion is
mainly determined by the data one observes. Such a prior
distribution is often called a vague or noninformative prior.
Inference for sample of more than one observation
We now consider the posterior distribution when an iid
sample
2
1
( , , ) ~ ( , )
n
X X N K
is taken where
2
is known
and as before, we use the prior distribution
2
0 0
( , ) N
. As
before, the posterior distribution is proportional to
1 1
( | , , ) ( , , | ) ( )
n n
h x x f x x K K
.
From the independence of the
i
X
s,
2
1
/ 2 2
1
1 1
( , , | ) exp ( )
(2 ) 2
n
n i
n n
i
f x x x

1

1
]
K
.
Using the identity,
2 2 2
1 1
( ) ( ) ( )
n n
i i
i i
x x x n x

+

we obtain
2 2
1
/ 2 2 2
1
1 1 1
( , , | ) exp ( ) exp ( )
(2 ) 2 2 /
n
n i
n n
i
f x x x x x
n

1 1

1 1
] ]
K
Only the last term depends on
, so
2
1
2
1
( | , , ) exp ( ) ( )
2 /
n
h x x x
n

1

1
]
K
This posterior distribution can be evaluated in the same
way as the single observation case with
x
replacing
x
and
2
/ n replacing
2
. Thus, the posterior distribution is
normal with mean
0
2 2
0
1
2 2
0
1
nx
n
+
and variance
2
1
2 2
0
1
1 n
+
For large values of
n
,
1
x
and
2
2
1
n
. Therefore, the
information in the sample largely determines the posterior
distribution for large samples.
Bayesian inference:
Point estimation: When faced with a decision, the Bayesian
wants to minimize the expected loss (i.e., maximize the
expected utility) of a decision rule under the prior
distribution
( )
for . In other words the Bayesian
chooses the decision rule d that minimizes the Bayes risk:
( )
( ) [ ( , )] B d E R d

,
i.e., the Bayesian chooses to use the Bayes rule for the
Bayesians prior distribution
( )
As we showed in Notes 26, for point estimation with
squared error loss, the Bayes rule is to use the posterior
mean as the estimate.
Thus, for the above normal distribution setup, the
Bayesians estimate of
is
0
2 2
0
2 2
0
1 1
x

+
+
Interval estimation: A Bayesian version of a confidence
interval is called a credibility interval . A
100(1 )%
credibility interval is an interval of the form
0 1
( , )
, where
1
0
( | ) 1 h x d
.
For example, for
2
1
( , , ) iid ( , )
n
X X N K
where
2
is
known and the prior distribution is
2
0 0
( , ) N
, the posterior
distribution for
is
0
2 2
0
2 2 2 2
0 0
1
,
1 1
nx
N
n n

_
+

+ +

,
and a 95%
credibility interval for
is
0
2 2
0
2 2 2 2
0 0
1
1.96
1 1
nx
n n

+
t
+ +
The frequentist confidence interval is not a probability
statement about . The Bayesian credibility interval is a
statement about . For the Bayesian, once the data
x
has
been observed, the interval is fixed and is random.
Hypothesis testing: Consider testing
0 0
: H
vs.
0
:
a
H
. For the prior distribution, we need to put prior
probabilities on
0
and
a
H H
and then put a prior on under
a
H
. If we use the following 0-1 loss function
0 if
( , Hypothesis i chosen)=
1 if
i
i
H
L
H
'
,
the posterior risk is minimized by choosing the hypothesis
that is more probable under the posterior distribution.
Thus, the Bayes rule is to choose the hypothesis that is
more probable under the posterior distribution. Bayesian
hypothesis testing is a complex topic. The difficulty is that,
unlike in estimation problems, the prior is influential even
in large samples and so must be chosen carefully.
Chapter 15 from Rice, Mathematical Statistics and Data
Analysis provides another example of Bayesian inference
of Bayesian analysis for the binomial distribution.
Review of course
I. Three basic types of statistical inferences (Chapter 5):
1. Point estimation -- best estimate of parameter
2. Confidence intervals how much uncertainty is there
in our estimate of the parameter.
3. Hypothesis testing choose between two hypotheses
about the parameter.
II. Monte Carlo method for studying properties of
inference procedures and bootstrap method for constructing
confidence intervals based on Monte Carlo simulations
(Chapters 5.8-5.9)
III. Maximum likelihood method of making statistical
inferences and its properties (Chapter 6)
IV. Optimal point estimators: Cramer-Rao Lower Bound
(Chapter 6), sufficiency (Chapter 7.1-7.3), Rao-Blackwell
Theorem (Chapter 7.3).
V. Optimal hypothesis testing (Chapters 8.1-8.3).
VI. Decision theory (Chapter 7.1, my notes)
VII. Bayesian statistics (Chapter 11.1-11.2, my notes).

Notes 28

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Notes 28

Încărcat de

Drepturi de autor:

Formate disponibile

Statistics 512 Notes 28: Bayesian Statistics

that is the last expression. The last

and we thus have

is normal with mean

is nearly normal with

S-ar putea să vă placă și