Notes 27

Statistics 512 Notes 27: Bayesian Statistics
Views of Probability
The probability that this coin will land heads up is
1
2
.
Frequentist (sometimes called objectivist) viewpoint: This
statement means that if the experiment were repeated
many, many times, the long-run average proportion of
heads would tend to
1
2
.
Bayesian (sometimes called subjectivist or personal)
viewpoint: This statement means that the person making
the statement has a prior opinion about the coin toss such
that he or she would as soon guess heads or tails if the
rewards are equal.
In the frequentist viewpoint, the probability of an event A,
( ) P A
, represents the long run frequency of event A in
repeated experiments.
In the Bayesian viewpoint, the probability of an event A,
( ) P A
, has the following meaning: For a game in which if
Aoccurs the Bayesian will be paid $1,
( ) P A
is the amount
of money the Bayesian would be willing to pay to buy into
the game. Thus, if the Bayesian is willing to pay 50 cents
to buy in,
( ) P A
=.5. Note that this concept of probability is
personal:
( ) P A
may vary from person to person depending
on their opinions.
In the Bayesian viewpoint, we can make probability
statements about lots of things, not just data which are
subject to random variation. For example, I might say that
the probability that Franklin D. Roosevelt had a cup of
coffee on February 21, 1935 is .68. This does not refer to
any limiting frequency. It reflects my strength of belief
that the proposition is true.
Rules for Manipulating Subjective Probabilities
All the usual rules for manipulating probabilities apply to
subjective probabilities. For example,
Theorem 11.1: If
1
C
and
2
C
are mutually exclusive, then
1 2 1 2
( ) ( ) ( ) P C C P C P C +
.
Proof: Suppose a person thinks a fair price for
1
C
is
1 1
( ) p P C
and that for
2
C
is
2 2
( ) p P C
. However, that
person believes that the fair price for
1 2
C C
is
3
p
which
differs from
1 2
p p +
. Say
3 1 2
p p p < +
and let the
difference be
1 2 3
( ) d p p p +
. A gambler offers this
person the price
3
4
d
p +
for
1 2
C C
. The person takes the
offer because it is better than
3
p
. The gambler sells
1
C
at a
discount price of
1
4
d
p
and sells
2
C
at a discount price of
2
4
d
p
to the person. Being a rational person with those
given prices of
1 2 3
, , and p p p
, all three of these deals seem
very satisfactory. At this point, the person has received
3
4
d
p +
and paid
1 2
2
d
p p +
. Thus before any bets are
paid off, the person has
3 1 2 3 1 2
3
( )
4 2 4 4
d d d d
p p p p p p + + +
.
That is, the person is down
4
d
before any bets are settled.
We now show that no matter what event happens, the
person will pay and receive the same amount in settling the
bets:
Suppose
1
C
happens: the gambler has
1 2
C C
and the
person has
1
C
so they exchange $1s and the person is
still down
4
d
. The same thing occurs if
2
C
happens.
Suppose neither
1
C
nor
2
C
happens, then the gambler
and the person receive zero, and the person is still
down
4
d
.
1
C
and
2
C
cannot occur together since they are
mutually exclusive.
Thus, we see that it is bad for the person to assign
3 1 2 1 2 1 2
( ) ( ) ( ) p P C C p p P C P C < + +
Because the gambler can put the person in a position to lose
1 2 3
( ) / 4 p p p +
no matter what happens. This is
sometimes referred to as a Dutch book.
The argument when
3 1 2
p p p > +
is similar and
can also lead to a Dutch book. Thus
3
p
must equal
1 2
p p +
to avoid a Dutch book; that is,
1 2 1 2
( ) ( ) ( ) P C C P C P C +
.
The Bayesian can consider subjective conditional

probabilities, such as
1 2
( | ) P C C
, which is the fair price of
1
C
only if
2
C
is true. If
2
C
is not true, the bet is off. Of
course,
1 2
( | ) P C C
could differ from
1
( ) P C
. To illustrate,
say
2
C
is the event that it will rain today and
1
C
is the
event that a certain person who will be outside on that day
will catch a cold. Most of us would probably assign the
fair prices so that
1 1 2
( ) ( | ) P C P C C <
.
Consequently, a person has a better chance of getting a cold
on a rainy day.
Frequentist vs. Bayesian statistics
The frequentist point of view towards statistics is based on
the following postulates:
F1: Probability refers to limiting relative frequencies.
Probabilities are objective properties of the real world.
F2: Parameters are fixed, unknown constants.
Because they are not fluctuating, no useful probability
statements can be made about parameters.
F3: Statistical procedures should be designed to have
well-defined long run frequency properties. For
example, a 95 percent confidence interval should trap
the true value of the parameter with limiting frequency
at least 95 percent.
The Bayesian approach to statistics is based on the
following postulates:
B1: Probability describes a persons degree of belief,
not limiting frequency.
B2: We can make probability statements about
parameters that are reflect our degree of belief about
the parameters, even though the parameters are fixed
constants.
B3: We can make inferences about a parameter by
producing a probability distribution for . Inferences,
such as point estimates and interval estimates, may
then be extracted from this distribution.
Bayesian inference
Bayesian inference about a parameter is usually carried
out in the following way:
1. We choose a probability density
( )
-- called the prior
distribution that expresses our beliefs about a parameter
before we see any data.
2. We choose a probability model
( | ) f x
that reflects our
beliefs about
x
given .
3. After observing data
1
, ,
n
X X K
, we update our beliefs
and calculate the posterior distribution
1
( | , , )
n
h X X K
.
As in our discussion of the Bayesian approach in decision
theory, the posterior distribution is calculated using Bayes
rule:
,
( , )
( | ) ( )
( | )
( )
( | ) ( )
X
X
f x
f x
h x
f x
f x d
Note that
( | ) ( | ) ( ) h x f x
as varies so that the
posterior distribution is proportional to the likelihood times
the prior.
Based on the posterior distribution, we can get a point
estimate, an interval estimate and carry out hypothesis tests
as we shall discuss below.
Bayesian inference for the normal distribution

Suppose that we observe a single observation
x
from a
normal distribution with unknown mean
and known
variance
2
. Suppose that our prior distribution for
is
2
0 0
( , ) N
.
The posterior distribution of
is
( | ) ( )
( | ) ( | ) ( )
( | ) ( )
f x
h x f x
f x d

Now
2 2
0
2 2
0
0
2 2
0
2 2
0
2 2
2 0 0
2 2 2 2 2 2
0 0 0
1 1 1 1
( | ) ( ) exp ( ) exp ( )
2 2
2 2
1 1
exp ( ) ( )
2 2
1 1 1
exp 2
2
f x x
x
x x

1

1

1
1
]
]
1

1
]

_ _
+ + + +
'
, ,
1
;
1
]
Let
, , and a b c
be the coefficients in the quadratic
polynomial in
that is the last expression. The last

expression may then be written as
2
2
exp
2
a b c
a a

1
_
+
1
,
]
To simplify this further, we use the technique of
completing the square and rewrite the expression as
2 2
2
exp exp
2 2
a b a c b
a a a

1 1
_ _

1 1 ' ;

, ,
1 1

] ]

The second term does not depend on
and we thus have

that
2
( | ) exp
2
a b
h x
a

1
_

1

,
1
]
This is the density of a normal random variable with mean
b
a
and variance
1
a
.
Thus, the posterior distribution of
is normal with mean

0
2 2
0
1
2 2
0
1 1
x

+
and variance
2
1
2 2
0
1
1 1
+
.
Comments about role of prior in the posterior distribution:
The posterior mean is a weighted average of the prior mean
and the data, with weights proportional to the respective
precisions of the prior and the data, where the precision is
equal to 1/variance. If we assume that the experiment (the
observation of X ) is much more informative than the prior
distribution in the sense that
2 2
0
<<
, then
2 2
1

1
x
Thus, the posterior distribution of
is nearly normal with

mean
x
and variance
2
. This result illustrates that if the
prior distribution is quite flat relative to the likelihood, then
1. the prior distribution has little influence on the
posterior
2. the posterior distribution is approximately
proportional to the likelihood function.
On a heuristic level, the first point says that if one does not
have strong prior opinions, ones posterior opinion is
mainly determined by the data one observes. Such a prior
distribution is often called a vague or noninformative prior.
Bayesian decisionmaking:
When faced with a decision, the Bayesian wants to
minimize the expected loss (i.e., maximize the expected
utility) of a decision rule under the prior distribution
( )
for . In other words the Bayesian chooses the decision
rule d that minimizes the Bayes risk:
( )
( ) [ ( , )] B d E R d

,
i.e., the Bayesian chooses to use the Bayes rule for the
Bayesians prior distribution
( )
As we showed in last class, for point estimation with
squared error loss, the Bayes rule is to use the posterior
mean as the estimate.
Thus, for the above normal distribution setup, the
Bayesians estimate of
is
0
2 2
0
2 2
0
1 1
x

+
+

Notes 27

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Notes 27

Încărcat de

Drepturi de autor:

Formate disponibile

Statistics 512 Notes 27: Bayesian Statistics

The Bayesian can consider subjective conditional

that is the last expression. The last

and we thus have

is normal with mean

is nearly normal with

S-ar putea să vă placă și