Documente Academic
Documente Profesional
Documente Cultură
Bayesian inference
Use R
Tyrannies and Frontiers
Three tyrannies:
(1) Popperism, (2) Fisher, (3) Gauss
Three frontiers:
(1) Bayes and MCMC,
(2) Generalized linear multilevel models,
(3) Formal modal comparison
Bayesian inference and MCMC
Computationally difficult
Used to be controversial
Laplace (17491827)
Multilevel models
Free
Estimate posterior
Simulate predictions
Ten tosses of the globe:
W L W W L W L L W W
D = { }
Hypotheses and Models
Each value of p
W
a different
model, M
p
W
p
W
= 0.23
p
W
= 0.97
A measure of confidence
Probability p
W
is best model in set, conditional
on the evidence, D.
Not probability p
W
is true All models are
false.
D
L
L
S
,
L
S
T
l
M
/
T
l
N
/
N
D
C
N
l
l
D
L
N
C
L
H
e
r
e
'
s
t
l
e
q
u
i
c
k
e
s
t
w
a
y
t
o
J
e
r
i
v
e
a
f
o
r
m
u
l
a
f
o
r
l
r
(
.
Q
8
|
%
)
.
Y
o
u
r
e
-
a
l
l
y
s
l
o
u
l
J
p
a
y
a
t
t
e
n
t
i
o
n
t
o
t
l
i
s
J
e
r
i
v
a
t
i
o
n
,
b
e
c
a
u
s
e
w
o
r
k
i
n
,
t
l
r
o
u
,
l
i
t
y
o
u
r
s
e
l
f
w
i
l
l
l
e
l
p
y
o
u
r
e
m
e
m
b
e
r
t
l
e
f
o
r
m
u
l
a
f
o
r
l
r
(
.
Q
8
|
%
)
.
/
n
J
i
f
y
o
u
e
v
e
r
f
o
r
,
e
t
t
l
e
e
x
a
c
t
f
o
r
m
u
l
a
,
y
o
u
c
a
n
j
u
s
t
f
i
n
J
i
t
a
,
a
i
n
,
b
e
c
a
u
s
e
t
l
e
J
e
r
i
v
a
t
i
o
n
r
e
a
l
l
y
i
s
t
l
a
t
s
i
m
p
l
e
.
l
n
o
w
i
n
,
l
o
w
s
i
m
p
l
e
i
t
i
s
t
o
p
r
o
J
u
c
e
t
l
i
s
f
o
r
m
u
l
a
w
i
l
l
a
l
s
o
s
e
r
v
e
t
o
i
m
p
r
e
s
s
u
p
o
n
y
o
u
j
u
s
t
l
o
w
b
a
s
i
c
i
t
i
s
t
o
p
r
o
b
a
b
i
l
i
t
y
t
l
e
o
r
y
.
l
t
'
s
n
o
t
a
f
a
n
c
y
r
e
s
u
l
t
a
t
a
l
l
,
J
e
s
p
i
t
e
i
t
s
i
m
p
o
r
t
a
n
c
e
.
T
l
e
p
r
o
b
a
b
i
l
i
t
y
l
r
(
.
Q
8
|
%
)
i
s
a
D
P
O
E
J
U
J
P
O
B
M
p
r
o
b
a
b
i
l
i
t
y
.
l
t
s
a
y
s
t
l
a
t
t
l
e
p
r
o
b
a
b
i
l
i
t
y
o
f
.
Q
8
i
s
c
o
n
J
i
t
i
o
n
e
J
o
n
%
.
T
l
e
J
e
f
i
n
i
t
i
o
n
o
f
c
o
n
J
i
t
i
o
n
a
l
p
r
o
b
a
b
i
l
i
t
y
i
s
e
m
b
e
J
J
e
J
i
n
a
c
o
m
m
o
n
p
r
o
b
a
b
i
l
i
t
y
r
u
l
e
:
l
r
(
.
Q
8
,
%
)
=
l
r
(
.
Q
8
|
%
)
l
r
(
%
)
.
(
2
.
1
)
l
e
a
J
l
r
(
.
Q
8
,
%
)
a
s
U
I
F
Q
S
P
C
B
C
J
M
J
U
Z
P
G
C
P
U
I
.
Q
8
B
O
E
%
.
T
l
e
a
b
o
v
e
i
s
t
r
u
e
,
n
o
m
a
t
t
e
r
w
l
a
t
.
Q
8
a
n
J
%
r
e
f
e
r
e
n
c
e
.
S
o
i
t
'
s
j
u
s
t
a
s
t
r
u
e
t
l
a
t
t
l
e
i
n
v
e
r
s
e
p
r
o
b
a
b
i
l
i
t
y
,
l
r
(
%
|
.
Q
8
)
,
i
s
J
e
f
i
n
e
J
b
y
:
l
r
(
.
Q
8
,
%
)
=
l
r
(
%
|
.
Q
8
)
l
r
(
.
Q
8
)
.
(
2
.
2
)
N
o
w
s
i
n
c
e
b
o
t
l
(
2
.
1
)
a
n
J
(
2
.
2
)
c
o
n
t
a
i
n
l
r
(
.
Q
8
,
%
)
,
i
t
m
u
s
t
b
e
t
r
u
e
t
l
a
t
:
l
r
(
.
Q
8
|
%
)
l
r
(
%
)
=
l
r
(
.
Q
8
,
%
)
=
l
r
(
%
|
.
Q
8
)
l
r
(
.
Q
8
)
.
(
2
.
3
)
T
a
k
i
n
,
o
u
t
t
l
e
m
i
J
J
l
e
m
a
n
:
l
r
(
.
Q
8
|
%
)
l
r
(
%
)
=
l
r
(
%
|
.
Q
8
)
l
r
(
.
Q
8
)
.
(
2
.
+
)
/
l
l
t
l
a
t
r
e
m
a
i
n
s
i
s
t
o
u
s
e
a
l
i
t
t
l
e
s
e
c
o
n
J
a
r
y
s
c
l
o
o
l
a
l
,
e
b
r
a
a
n
J
s
o
l
v
e
L
q
u
a
t
i
o
n
2
.
+
f
o
r
l
r
(
.
Q
8
|
%
)
.
D
o
i
n
,
t
l
i
s
,
y
o
u
a
r
r
i
v
e
a
t
t
l
e
f
o
r
m
u
l
a
f
o
r
c
o
m
p
u
t
i
n
,
t
l
e
J
e
,
r
e
e
o
f
c
o
n
f
i
J
e
n
c
e
i
n
m
o
J
e
l
.
Q
8
,
c
o
n
J
i
t
i
o
n
e
J
o
n
t
l
e
J
a
t
a
,
%
:
l
r
(
.
Q
8
|
%
)
=
l
r
(
%
|
.
Q
8
)
l
r
(
%
)
l
r
(
.
Q
8
)
.
(
2
.
5
)
L
s
i
n
,
t
l
e
s
a
m
e
f
o
r
m
u
l
a
f
o
r
e
a
c
l
m
o
J
e
l
c
o
r
r
e
s
p
o
n
J
i
n
,
t
o
e
a
c
l
v
a
l
u
e
o
f
Q
8
,
y
o
u
a
r
r
i
v
e
a
t
t
l
e
p
o
s
t
e
r
i
o
r
J
i
s
t
r
i
b
u
t
i
o
n
.
2
.
2
.
5
.
B
a
y
e
s
'
t
b
e
o
r
e
m
:
A
c
o
n
J
i
t
i
o
n
i
n
g
e
n
g
i
n
e
.
T
l
e
r
e
s
u
l
t
i
n
L
q
u
a
-
t
i
o
n
2
.
5
i
s
u
s
u
a
l
l
y
r
e
f
e
r
r
e
J
t
o
a
s
#
B
Z
F
T
U
I
F
P
S
F
N
.
2
1
l
t
t
e
l
l
s
u
s
t
l
a
t
i
n
o
r
J
e
r
t
o
c
o
m
p
u
t
e
o
u
r
m
e
a
s
u
r
e
o
f
c
o
n
f
i
J
e
n
c
e
i
n
a
m
o
J
e
l
.
Q
8
,
c
o
n
J
i
t
i
o
n
e
J
o
n
t
l
e
e
v
i
J
e
n
c
e
%
,
w
e
n
e
e
J
t
o
c
o
m
p
u
t
e
:
(
1
)
T
l
e
M
J
L
F
M
J
I
P
P
E
,
l
r
(
%
|
.
Q
8
)
,
w
l
i
c
l
i
s
t
l
e
p
r
o
b
a
b
i
l
i
t
y
o
f
o
b
s
e
r
v
i
n
,
%
,
c
o
n
J
i
t
i
o
n
e
J
o
n
.
Q
8
,
e
n
e
r
a
t
i
n
,
t
l
e
J
a
t
a
.
(
2
)
T
l
e
p
r
o
b
a
b
i
l
i
t
y
o
f
t
l
e
e
v
i
J
e
n
c
e
,
l
r
(
%
)
,
w
l
i
c
l
i
s
t
l
e
p
r
o
b
a
b
i
l
i
t
y
o
f
o
b
s
e
r
v
i
n
,
t
l
e
J
a
t
a
%
,
a
v
e
r
a
,
e
J
o
v
e
r
a
l
l
m
o
J
e
l
s
w
e
'
J
l
i
k
e
t
o
c
o
n
s
i
J
e
r
.
0.00 0.50 1.00
0
.
0
0
0
0
.
0
1
0
0
.
0
2
0
p
W
c
o
n
f
i
d
e
n
c
e
2.2. HYlTHLSLS /ND MDLLS 33
0.00 0.50 1.00
0
.
0
0
0
0
.
0
1
0
0
.
0
2
0
p
W
c
o
n
f
i
d
e
n
c
e
0.00 0.50 1.00
0
.
0
0
0
.
0
2
0
.
0
4
0
.
0
6
p
W
c
o
n
f
i
d
e
n
c
e
0.00 0.50 1.00
0
.
0
0
0
0
.
0
1
0
p
W
c
o
n
f
i
d
e
n
c
e
0.00 0.50 1.00
0
.
0
0
0
0
.
0
1
5
0
.
0
3
0
p
W
c
o
n
f
i
d
e
n
c
e
'JHVSF Lxample probability Jistributions
lr(.
Q
8
|%). Tlese are commonly known as QPTUFSJPS
probability Jensities. Tlese Jistributions may be wiJe
or narrow, reflectin, Jifferences in uncertainty across
moJels. Tley may also be skeweJ or contain more tlan
one peak.
$PNQVUJOH DPOGJEFODF JO NPEFMT kay, so low Jo you com-
pute tlese probabilities, lr(.
Q
8
|%)` Well, since we are talkin, about a
probability, you can just use tle axioms of probability tleory to Jefine
it. ln tlis Jefinition, tlere will be pieces tlat tell us wlat information
we require in orJer to complete tle analysis.
Figure 2.1. Example posterior probability distributions.
Bayes theorem
Focus on p
W
= 0.7:
Likelihood
2.2. HYlTHLSLS /ND MDLLS +1
(1) Lacl toss of tle ,lobe is inJepenJent of all otlers, in tle sense
tlat tle outcomes Jo not influence one anotler.
(2) Tle only possible results of a toss are water" or lanJ."
(3) Tle probability eacl toss results in a water" ratler tlan a
lanJ" is constant across tosses anJ equal to Q
8
.
Tlese assumptions imply a familiar probability Jensity, tle binomial.
Let O
8
be tle observeJ count of W's (water") in tle sample anJ O be
tle total number of tosses of tle ,lobe (tle size of tle sample). Tlese
two values summarize tle Jata, at least as far as tle assumptions above
require. lf we were to assume tlat tosses were not inJepenJent, but
insteaJ influenceJ one anotler in orJer, tlen we'J neeJ to know tle
exact sequence of W's anJ L's. Lut stickin, witl tle simple binomial
moJel, tle probability of obtainin, O
8
out of O tosses is ,iven by:
lr(O
8
|O, Q
8
) = -(Q
8
|O, O
8
) =
O!
O
8
!(O O
8
)!
Q
O
8
8
(1 Q
8
)
OO
8
.
Tlis expression is known as a MJLFMJIPPE GVODUJPO. LikelilooJ functions
are often representeJ by notation like -(Q
8
|O, O
8
), anJ most people
speak of tle likelilooJ of tle moJel" or tle likelilooJ of tle param-
eter." Lut keep in minJ tlat likelilooJs are not probabilities of moJels
or parameters, lr(.|%). lnsteaJ, tley are probabilities of Jata, con-
JitioneJ on moJels, lr(%|.). lt is conventional to write -(.|%) or,
usin, a fancy L," L(.|%), but tlis Joesn't clan,e tle fact about wlat
tle probability really references: observations.
29
To use tle likelilooJ formula, you lave to assume values of tle pa-
rameters, like Q
8
, anJ tlen plu, in tle Jata. lor example, usin, tle
sample from earlier in tle clapter, if we toss tle ,lobe 1O times anJ
observe 6 waters anJ + lanJs, tle formula becomes:
lr(6|1O, Q
8
) =
1O!
6!+!
Q
6
8
(1 Q
8
)
+
.
Now suppose tlat Q
8
= O.7, tlen we arrive at:
lr(6|1O, O.7) =
1O!
6!+!
O.7
6
(1 O.7)
+
O.2.
leaJ tle left siJe as tle probability of observin, 6 W anJ + L, ,iven
tlat tle probability of water is O.7." Tle ri,lt siJe is just tle binomial
Jensity, witl tle values plu,,eJ in. lf you want to evaluate tlis formula,
it is built in to l, so just execute:
l coJe
2.1
db1hom{ 6 , s1ze=10 , pJob=0.7 )
Focus on p
W
= 0.7:
2.2. HYlTHLSLS /ND MDLLS +1
(1) Lacl toss of tle ,lobe is inJepenJent of all otlers, in tle sense
tlat tle outcomes Jo not influence one anotler.
(2) Tle only possible results of a toss are water" or lanJ."
(3) Tle probability eacl toss results in a water" ratler tlan a
lanJ" is constant across tosses anJ equal to Q
8
.
Tlese assumptions imply a familiar probability Jensity, tle binomial.
Let O
8
be tle observeJ count of W's (water") in tle sample anJ O be
tle total number of tosses of tle ,lobe (tle size of tle sample). Tlese
two values summarize tle Jata, at least as far as tle assumptions above
require. lf we were to assume tlat tosses were not inJepenJent, but
insteaJ influenceJ one anotler in orJer, tlen we'J neeJ to know tle
exact sequence of W's anJ L's. Lut stickin, witl tle simple binomial
moJel, tle probability of obtainin, O
8
out of O tosses is ,iven by:
lr(O
8
|O, Q
8
) = -(Q
8
|O, O
8
) =
O!
O
8
!(O O
8
)!
Q
O
8
8
(1 Q
8
)
OO
8
.
Tlis expression is known as a MJLFMJIPPE GVODUJPO. LikelilooJ functions
are often representeJ by notation like -(Q
8
|O, O
8
), anJ most people
speak of tle likelilooJ of tle moJel" or tle likelilooJ of tle param-
eter." Lut keep in minJ tlat likelilooJs are not probabilities of moJels
or parameters, lr(.|%). lnsteaJ, tley are probabilities of Jata, con-
JitioneJ on moJels, lr(%|.). lt is conventional to write -(.|%) or,
usin, a fancy L," L(.|%), but tlis Joesn't clan,e tle fact about wlat
tle probability really references: observations.
29
To use tle likelilooJ formula, you lave to assume values of tle pa-
rameters, like Q
8
, anJ tlen plu, in tle Jata. lor example, usin, tle
sample from earlier in tle clapter, if we toss tle ,lobe 1O times anJ
observe 6 waters anJ + lanJs, tle formula becomes:
lr(6|1O, Q
8
) =
1O!
6!+!
Q
6
8
(1 Q
8
)
+
.
Now suppose tlat Q
8
= O.7, tlen we arrive at:
lr(6|1O, O.7) =
1O!
6!+!
O.7
6
(1 O.7)
+
O.2.
leaJ tle left siJe as tle probability of observin, 6 W anJ + L, ,iven
tlat tle probability of water is O.7." Tle ri,lt siJe is just tle binomial
Jensity, witl tle values plu,,eJ in. lf you want to evaluate tlis formula,
it is built in to l, so just execute:
l coJe
2.1
db1hom{ 6 , s1ze=10 , pJob=0.7 )
+2 2. MDLLS, LSTlM/TlN /ND CNllDLNCL
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
0
.
1
0
0
.
2
0
proportion of water
p
r
o
b
a
b
i
l
i
t
y
o
f
d
a
t
a
'JHVSF Tle probability of
observin, 6 waters anJ + lanJs
in 1O samples from our ,lobe
(vertical axis), for every pos-
sible true proportion of water
(lorizontal axis).
|1] 0.2001209
leaJ tlat line of coJe as, compute tle probability of observin, 6 W in
1O tosses, wlere eacl toss las a probability of O.7 of bein, W."
ne of tle virtues of a statistical environment like l is tlat it makes
it easy for you to calculate every likelilooJ over tle ran,e of Q
8
from O
to 1, lolJin, tle Jata O
8
, O constant. Tlis coJe will plot tlem all:
l coJe
2.2
cuJve{ db1hom{ 6 , s1ze=10 , pJob=x ) , 1Jom=0 , 1o=1 )
l reproJuce tlis plot in 'JHVSF . Tle reaJer really sloulJ execute
tlis coJe in l. Do tlis now. You Jon't lave to unJerstanJ tle coJe yet.
Lut you'll ,et a better feel for wlat we are calculatin,, if you type tle
above line of coJe into l.
Wlat Joes 'JHVSF tell us` Lvery possible value of Q
8
implies a
probability of tle Jata, 6 W's anJ + L's. Tle lei,lt of tle curve at eacl
point is tlat probability, or likelilooJ. lirst, realize tlat eacl value of
Q
8
corresponJs to a Jifferent statistical moJel. Lacl moJel corresponJs
to a Jifferent lypotlesis about tle true proportion of water coverin, tle
planet. Lacl moJel can be useJ as above to lelp us JeciJe low likely
our Jata are, assumin, tle moJel is true. Tlus eacl moJel proJuces a
likelilooJ, anJ tlese are wlat are plotteJ in 'JHVSF . lf tle observeJ
Jata are very unlikely, for some value of Q
8
, tlen tle posterior proba-
bility will also be lower for tlat value of Q
8
. ln tlis case, as in many
cases, tlere is a unique value of Q
8
tlat maximizes tle likelilooJ. Tlis
value is Q
8
= O.6, wlicl is tle same as tle sample proportion of water,
+O 2. MDLLS, LSTlM/TlN /ND CNllDLNCL
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
0
.
1
0
0
.
2
0
proportion of water
p
r
o
b
a
b
i
l
i
t
y
o
f
d
a
t
a
'JHVSF Tle probability of
observin, 6 waters anJ + lanJs
in 1O samples from our ,lobe
(vertical axis), for every pos-
sible true proportion of water
(lorizontal axis).
Jensity, witl tle values plu,,eJ in. lf you want to evaluate tlis formula,
it is built in to l, so just execute:
l coJe
2.1
db1hom{ 6 , s1ze=10 , pJob=0.7 )
|1] 0.2001209
leaJ tlat line of coJe as, compute tle probability of observin, 6 W in
1O tosses, wlere eacl toss las a probability of O.7 of bein, W."
ne of tle virtues of a statistical environment like l is tlat it makes
it easy for you to calculate every likelilooJ over tle ran,e of Q
8
from O
to 1, lolJin, tle Jata O
8
, O constant. Tlis coJe will plot tlem all:
l coJe
2.2
cuJve{ db1hom{ 6 , s1ze=10 , pJob=x ) , 1Jom=0 , 1o=1 )
l reproJuce tlis plot in 'JHVSF . Tle reaJer really sloulJ execute
tlis coJe in l. Do tlis now. You Jon't lave to unJerstanJ tle coJe yet.
Lut you'll ,et a better feel for wlat we are calculatin,, if you type tle
above line of coJe into l.
Wlat Joes 'JHVSF tell us` Lvery possible value of Q
8
implies a
probability of tle Jata, 6 W's anJ + L's. Tle lei,lt of tle curve at eacl
point is tlat probability, or likelilooJ. lirst, realize tlat eacl value of
Q
8
corresponJs to a Jifferent statistical moJel. Lacl moJel corresponJs
to a Jifferent lypotlesis about tle true proportion of water coverin, tle
planet. Lacl moJel can be useJ as above to lelp us JeciJe low likely
our Jata are, assumin, tle moJel is true. Tlus eacl moJel proJuces a
likelilooJ, anJ tlese are wlat are plotteJ in 'JHVSF . lf tle observeJ
+O 2. MDLLS, LSTlM/TlN /ND CNllDLNCL
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
0
.
1
0
0
.
2
0
proportion of water
p
r
o
b
a
b
i
l
i
t
y
o
f
d
a
t
a
'JHVSF Tle probability of
observin, 6 waters anJ + lanJs
in 1O samples from our ,lobe
(vertical axis), for every pos-
sible true proportion of water
(lorizontal axis).
Jensity, witl tle values plu,,eJ in. lf you want to evaluate tlis formula,
it is built in to l, so just execute:
l coJe
2.1
db1hom{ 6 , s1ze=10 , pJob=0.7 )
|1] 0.2001209
leaJ tlat line of coJe as, compute tle probability of observin, 6 W in
1O tosses, wlere eacl toss las a probability of O.7 of bein, W."
ne of tle virtues of a statistical environment like l is tlat it makes
it easy for you to calculate every likelilooJ over tle ran,e of Q
8
from O
to 1, lolJin, tle Jata O
8
, O constant. Tlis coJe will plot tlem all:
l coJe
2.2
cuJve{ db1hom{ 6 , s1ze=10 , pJob=x ) , 1Jom=0 , 1o=1 )
l reproJuce tlis plot in 'JHVSF . Tle reaJer really sloulJ execute
tlis coJe in l. Do tlis now. You Jon't lave to unJerstanJ tle coJe yet.
Lut you'll ,et a better feel for wlat we are calculatin,, if you type tle
above line of coJe into l.
Wlat Joes 'JHVSF tell us` Lvery possible value of Q
8
implies a
probability of tle Jata, 6 W's anJ + L's. Tle lei,lt of tle curve at eacl
point is tlat probability, or likelilooJ. lirst, realize tlat eacl value of
Q
8
corresponJs to a Jifferent statistical moJel. Lacl moJel corresponJs
to a Jifferent lypotlesis about tle true proportion of water coverin, tle
planet. Lacl moJel can be useJ as above to lelp us JeciJe low likely
our Jata are, assumin, tle moJel is true. Tlus eacl moJel proJuces a
likelilooJ, anJ tlese are wlat are plotteJ in 'JHVSF . lf tle observeJ
Probability of data
lr(%|.
Q
8
) lr(.
Q
8
)EQ
8
.
/ll tlis really says is to multiply eacl likelilooJ by its corresponJin,
prior anJ tlen aJJ up all of tlese proJucts. Tlis results in a wei,lteJ
avera,e likelilooJ. Tlis sort of probability is often calleJ a NBSHJOBM
MJLFMJIPPE, anJ we'll meet it a,ain in Clapter 5 anJ explain wly at tlat
point.
More important for now is to appreciate tle job it Joes. Tle role of
tlis probability of eviJence insiJe tle conJitionin, en,ine, Layes' tle-
orem, is to normalize tle posterior, so tlat it always sums to one. Tlis
ensures tlat tle posterior is a colerent probability Jensity, just like tle
prior. Tle relative ma,nituJes of tle Jifferent moJels unJer consiJera-
tion will not clan,e, wlatever value you assi,n to lr(%). Tlis is because
Estimating the posterior
1. Analytical approach (often impossible)
2. Grid approximation (very intensive)
3. Markov Chain Monte Carlo (less intensive)
4. Maximum likelihood and quadratic
approximation (approximate)
Estimating the posterior
1. Analytical approach (often impossible)
2. Grid approximation (very intensive)
3. Markov Chain Monte Carlo (less intensive)
4. Maximum likelihood and quadratic
approximation (approximate)
Grid approximation
Complications:
Easier to use
More accurate
Heuristic search
Q
8
= O.6. Tle
bottom slows tlese functions wlen tle maximum likeli-
looJ estimate is insteaJ
Q
8
= O.1. Tle left plot in eacl
row slows tle raw likelilooJ scale, wlile tle ri,lt plot
slows tle same functions on tle lo,-likelilooJ scale,
makin, it mucl easier to appreciate tle Jifferent curva-
tures as a function of sample size.
Figure 2.8. Shape of likelihood and confidence.
2. Standard errors
Standard error:
If posterior were normal in shape, then standard
deviation of that normal density.
2
2
exp
(Y )
2
2
2
,
wlere Y is an observeJ value, is tle mean, anJ
2
is tle variance.
We're treatin, tlis as tle likelilooJ of tle observeJ mean Y, wlen tle
true mean is anJ tle variance of tle likelilooJ function is
2
. (lf you
Jon't reco,nize tlis formula, or Jon't unJerstanJ wlere it comes from,
tlat's okay. You'll ,et a crasl course in tlis important Jensity in a later
clapter.) Tle lo,-likelilooJ is tlen:
lo, - =
(Y )
2
2
2
1
2
lo,(2) lo,().
Now to compute tle curvature. Tle Jerivative witl respect to Y is:
lo, -
Y
=
Y
2
.
/nJ finally tle seconJ Jerivative, yielJin, tle curvature, is:
2
lo, -
Y
2
=
1
2
.
Solvin, for
2
, tle variance of tle normal probability Jensity, we ,et:
2
=
1
2
lo, -/Y
2
.
ln otler worJs, tle variance of tle posterior is approximately-if tle
posterior is approximately normal-tle reciprocal of tle seconJ Jeriv-
ative of tle ne,ative lo,-likelilooJ. So if we can estimate tle seconJ
Jerivative of tle ne,ative lo,-likelilooJ, all we lave to Jo is JiviJe one
by it to ,et an estimate of tle variance of tle posterior Jistribution.
Lack to our proportion of water estimate,
Q
8
= O.6. We computeJ
tle curvature of tle lo,-likelilooJ at tle maximum likelilooJ estimate
to be about +1.667. So it follows tlen tlat our estimate of tle variance
of posterior Jistribution for
Q
8
is:
1
+1.667
O.O2+.
Most functions in l use tle stanJarJ Jeviation of a normal, , insteaJ
of tle variance,
2
. Tle stanJarJ Jeviation is just tle square root of
tle variance, =
2
. Tle stanJarJ Jeviation lere is tlerefore about
O.15+9-tlat's tle stanJarJ error fromtle coefficients table proJuceJ by
summaJy{pw.me2). So if we want to finJ tle values of Q
8
tlat enclose
95% of tle probabilities of moJels, tlen we just ask l:
Quadratic approximation
2
2
exp
(Y )
2
2
2
,
wlere Y is an observeJ value, is tle mean, anJ
2
is tle variance.
We're treatin, tlis as tle likelilooJ of tle observeJ mean Y, wlen tle
true mean is anJ tle variance of tle likelilooJ function is
2
. (lf you
Jon't reco,nize tlis formula, or Jon't unJerstanJ wlere it comes from,
tlat's okay. You'll ,et a crasl course in tlis important Jensity in a later
clapter.) Tle lo,-likelilooJ is tlen:
lo, - =
(Y )
2
2
2
1
2
lo,(2) lo,().
Now to compute tle curvature. Tle Jerivative witl respect to Y is:
lo, -
Y
=
Y
2
.
/nJ finally tle seconJ Jerivative, yielJin, tle curvature, is:
2
lo, -
Y
2
=
1
2
.
Solvin, for
2
, tle variance of tle normal probability Jensity, we ,et:
2
=
1
2
lo, -/Y
2
.
ln otler worJs, tle variance of tle posterior is approximately-if tle
posterior is approximately normal-tle reciprocal of tle seconJ Jeriv-
ative of tle ne,ative lo,-likelilooJ. So if we can estimate tle seconJ
Jerivative of tle ne,ative lo,-likelilooJ, all we lave to Jo is JiviJe one
by it to ,et an estimate of tle variance of tle posterior Jistribution.
Lack to our proportion of water estimate,
Q
8
= O.6. We computeJ
tle curvature of tle lo,-likelilooJ at tle maximum likelilooJ estimate
to be about +1.667. So it follows tlen tlat our estimate of tle variance
of posterior Jistribution for
Q
8
is:
1
+1.667
O.O2+.
Most functions in l use tle stanJarJ Jeviation of a normal, , insteaJ
of tle variance,
2
. Tle stanJarJ Jeviation is just tle square root of
tle variance, =
2
. Tle stanJarJ Jeviation lere is tlerefore about
O.15+9-tlat's tle stanJarJ error fromtle coefficients table proJuceJ by
summaJy{pw.me2). So if we want to finJ tle values of Q
8
tlat enclose
95% of tle probabilities of moJels, tlen we just ask l:
Quadratic approximation
2
2
exp
(Y )
2
2
2
,
wlere Y is an observeJ value, is tle mean, anJ
2
is tle variance.
We're treatin, tlis as tle likelilooJ of tle observeJ mean Y, wlen tle
true mean is anJ tle variance of tle likelilooJ function is
2
. (lf you
Jon't reco,nize tlis formula, or Jon't unJerstanJ wlere it comes from,
tlat's okay. You'll ,et a crasl course in tlis important Jensity in a later
clapter.) Tle lo,-likelilooJ is tlen:
lo, - =
(Y )
2
2
2
1
2
lo,(2) lo,().
Now to compute tle curvature. Tle Jerivative witl respect to Y is:
lo, -
Y
=
Y
2
.
/nJ finally tle seconJ Jerivative, yielJin, tle curvature, is:
2
lo, -
Y
2
=
1
2
.
Solvin, for
2
, tle variance of tle normal probability Jensity, we ,et:
2
=
1
2
lo, -/Y
2
.
ln otler worJs, tle variance of tle posterior is approximately-if tle
posterior is approximately normal-tle reciprocal of tle seconJ Jeriv-
ative of tle ne,ative lo,-likelilooJ. So if we can estimate tle seconJ
Jerivative of tle ne,ative lo,-likelilooJ, all we lave to Jo is JiviJe one
by it to ,et an estimate of tle variance of tle posterior Jistribution.
Lack to our proportion of water estimate,
Q
8
= O.6. We computeJ
tle curvature of tle lo,-likelilooJ at tle maximum likelilooJ estimate
to be about +1.667. So it follows tlen tlat our estimate of tle variance
of posterior Jistribution for
Q
8
is:
1
+1.667
O.O2+.
Most functions in l use tle stanJarJ Jeviation of a normal, , insteaJ
of tle variance,
2
. Tle stanJarJ Jeviation is just tle square root of
tle variance, =
2
. Tle stanJarJ Jeviation lere is tlerefore about
O.15+9-tlat's tle stanJarJ error fromtle coefficients table proJuceJ by
summaJy{pw.me2). So if we want to finJ tle values of Q
8
tlat enclose
95% of tle probabilities of moJels, tlen we just ask l:
Quadratic approximation
Buyer beware:
Visualize uncertainty
Simulate observations
Recipe:
1. Estimate posterior, defining probabilities of
different models (parameter values)
2. Sample with replacement from posterior
3. Compute stuff from samples
Sampling from the posterior
72 2. MDLLS, LSTlM/TlN /ND CNllDLNCL
(2) Sample repeateJly from tle posterior Jistribution. You enJ up
witl a list of parameter values. lf you sample enou,l, tle pro-
portion of eacl parameter value in tlis list will conver,e to tle
posterior probability.
(3) linally, use tle samples fromtle posterior to make calculations.
Tle main tasks we'll use tlese samples for is to calculate confiJence
intervals anJ to simulate samplin, of new Jata.
ln simple one parameter cases, like tle proportion of water problem,
all tlis samplin, may seem unnecessary. lt's relatively easy to ,et exact
probability statements for sucl simple formal Jistributions. Lut in later
clapters, wlen tlere are many parameters in tle moJels, it won't be so
easy. ln tlose cases, tle posterior Jensities of Jifferent parameters will be
correlateJ witl one anotler. Tlen accurately reflectin, tle joint impact
of tlese correlations on uncertainty is usually well beyonJ tle typical
matlematical trainin, of a typical natural or social scientist. However,
once you learn tle samplin, approacl, it extenJs into more complex
moJels quite easily. /nJ a,ain, once you learn tlis stuff, it prepares you
automatically to analyze MCMC output.
2.5.2.2. 4BNQMJOH QSPQPSUJPOT PG XBUFS We neeJ an example to make
more sense of tlis. ln tle proportion of water example, we can compute
tle naive posterior by usin, tle likelilooJ function. You'll compute it
at intervals of O.OO1, usin, coJe you're familiar witl by now:
l coJe
2.2+
modes <- seq{1Jom=0,1o=1,by=0.001)
pos1 <- db1hom{ 6 , s1ze=10 , pJob=modes )
Tle symbol pos1 now contains likelilooJs for all of tle moJels in mod-
es. Tlese likelilooJs are proportional to tle naive posterior probabil-
ities. Co aleaJ anJ plot tle curve tlese likelilooJs imply:
l coJe
2.25
po1{ modes , pos1 , 1ype="" )
Tle function isn't really a proper posterior Jistribution, because it lasn't
been normalizeJ so tlat it sums to one. Lut tlat isn't an obstacle for
wlat we're ,oin, to Jo, wlicl is to use tle relative proportions of tle
likelilooJs to sample moJels.
3+
To Jraw 1O-tlousanJ ranJom samples
of moJels from tlis un-normalizeJ naive posterior Jistribution, we can
make use of tle lanJy sampe commanJ, wlicl is specializeJ for just
tlis kinJ of task:
0 200 400 600 800
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Index
m
o
d
e
l
s
plot(models)
0 200 400 600 800
0
.
0
0
0
.
1
0
0
.
2
0
Index
p
o
s
t
plot(post)
72 2. MDLLS, LSTlM/TlN /ND CNllDLNCL
(2) Sample repeateJly from tle posterior Jistribution. You enJ up
witl a list of parameter values. lf you sample enou,l, tle pro-
portion of eacl parameter value in tlis list will conver,e to tle
posterior probability.
(3) linally, use tle samples fromtle posterior to make calculations.
Tle main tasks we'll use tlese samples for is to calculate confiJence
intervals anJ to simulate samplin, of new Jata.
ln simple one parameter cases, like tle proportion of water problem,
all tlis samplin, may seem unnecessary. lt's relatively easy to ,et exact
probability statements for sucl simple formal Jistributions. Lut in later
clapters, wlen tlere are many parameters in tle moJels, it won't be so
easy. ln tlose cases, tle posterior Jensities of Jifferent parameters will be
correlateJ witl one anotler. Tlen accurately reflectin, tle joint impact
of tlese correlations on uncertainty is usually well beyonJ tle typical
matlematical trainin, of a typical natural or social scientist. However,
once you learn tle samplin, approacl, it extenJs into more complex
moJels quite easily. /nJ a,ain, once you learn tlis stuff, it prepares you
automatically to analyze MCMC output.
2.5.2.2. 4BNQMJOH QSPQPSUJPOT PG XBUFS We neeJ an example to make
more sense of tlis. ln tle proportion of water example, we can compute
tle naive posterior by usin, tle likelilooJ function. You'll compute it
at intervals of O.OO1, usin, coJe you're familiar witl by now:
l coJe
2.2+
modes <- seq{1Jom=0,1o=1,by=0.001)
pos1 <- db1hom{ 6 , s1ze=10 , pJob=modes )
Tle symbol pos1 now contains likelilooJs for all of tle moJels in mod-
es. Tlese likelilooJs are proportional to tle naive posterior probabil-
ities. Co aleaJ anJ plot tle curve tlese likelilooJs imply:
l coJe
2.25
po1{ modes , pos1 , 1ype="" )
Tle function isn't really a proper posterior Jistribution, because it lasn't
been normalizeJ so tlat it sums to one. Lut tlat isn't an obstacle for
wlat we're ,oin, to Jo, wlicl is to use tle relative proportions of tle
likelilooJs to sample moJels.
3+
To Jraw 1O-tlousanJ ranJom samples
of moJels from tlis un-normalizeJ naive posterior Jistribution, we can
make use of tle lanJy sampe commanJ, wlicl is specializeJ for just
tlis kinJ of task:
2.5. CNllDLNCL: MlL TH/N THL M/XlMLM 73
0 4000 8000
0
.
0
0
.
4
0
.
8
sample number
p
W
0.00 0.50 1.00
0
.
0
1
.
0
2
.
0
3
.
0
p
W
D
e
n
s
i
t
y
'JHVSF Samples from tle naive posterior of Q
8
. n
tle left, tle inJiviJual samples are orJereJ alon, tle
lorizontal axis. Density of tle points is ,reatest near
Q
8
= O.6, tle maximum likelilooJ estimate. n tle
ri,lt, plottin, tle Jensity (similar to tle listo,ram) of
tlese samples in blue slows an excellent approximation
to tle actual naive posterior, slown in black.
l coJe
2.26
sampes.pw <- sampe{ modes , s1ze=10000 , pJob=pos1 ,
Jepace=TRbE )
You just tolJ lto Jraw1O-tlousanJ values of Q
8
ranJomly, in proportion
to tleir posterior probabilities. Tle commanJ sampe will automatically
ensure tlat pos1 is normalizeJ, so tlat's wly you JiJn't lave to Jo it
yourself. Values of Q
8
may appear more tlan once in tle resultin, list
s1m.modes, anJ tle Jepace=TRbE parameter in tle coJe ensures tlis.
'JHVSF Jemonstrates tle relationslip between tlese samples anJ
tle posterior probabilities tlemselves. ln tle leftlanJ plot, l've plot-
teJ eacl sample in sampes.pw, witl sample number on tle lorizon-
tal axis anJ tle sampleJ value itself on tle vertical axis. Tle points
scatter all over, because tlere is consiJerable uncertainty as tle value
of Q
8
. Lut tley Jo cluster arounJ Q
8
= O.6, tle maximum likelilooJ
estimate. Lacl value of Q
8
appears in tlese samples witl proportion ap-
proximately equal to its posterior probability. ln tle ri,ltlanJ plot, l
slow tle Jensity resultin, from tle point samples, in blue. lt is ja,,eJ,
Sampling from the posterior
2.5. CNllDLNCL: MlL TH/N THL M/XlMLM 73
0 4000 8000
0
.
0
0
.
4
0
.
8
sample number
p
W
0.00 0.50 1.00
0
.
0
1
.
0
2
.
0
3
.
0
p
W
D
e
n
s
i
t
y
'JHVSF Samples from tle naive posterior of Q
8
. n
tle left, tle inJiviJual samples are orJereJ alon, tle
lorizontal axis. Density of tle points is ,reatest near
Q
8
= O.6, tle maximum likelilooJ estimate. n tle
ri,lt, plottin, tle Jensity (similar to tle listo,ram) of
tlese samples in blue slows an excellent approximation
to tle actual naive posterior, slown in black.
l coJe
2.26
sampes.pw <- sampe{ modes , s1ze=10000 , pJob=pos1 ,
Jepace=TRbE )
You just tolJ lto Jraw1O-tlousanJ values of Q
8
ranJomly, in proportion
to tleir posterior probabilities. Tle commanJ sampe will automatically
ensure tlat pos1 is normalizeJ, so tlat's wly you JiJn't lave to Jo it
yourself. Values of Q
8
may appear more tlan once in tle resultin, list
s1m.modes, anJ tle Jepace=TRbE parameter in tle coJe ensures tlis.
'JHVSF Jemonstrates tle relationslip between tlese samples anJ
tle posterior probabilities tlemselves. ln tle leftlanJ plot, l've plot-
teJ eacl sample in sampes.pw, witl sample number on tle lorizon-
tal axis anJ tle sampleJ value itself on tle vertical axis. Tle points
scatter all over, because tlere is consiJerable uncertainty as tle value
of Q
8
. Lut tley Jo cluster arounJ Q
8
= O.6, tle maximum likelilooJ
estimate. Lacl value of Q
8
appears in tlese samples witl proportion ap-
proximately equal to its posterior probability. ln tle ri,ltlanJ plot, l
slow tle Jensity resultin, from tle point samples, in blue. lt is ja,,eJ,
Figure 2.9. Samples from the naive posterior.
2.5. CNllDLNCL: MlL TH/N THL M/XlMLM 73
0 4000 8000
0
.
0
0
.
4
0
.
8
sample number
p
W
0.00 0.50 1.00
0
.
0
1
.
0
2
.
0
3
.
0
p
W
D
e
n
s
i
t
y
'JHVSF Samples from tle naive posterior of Q
8
. n
tle left, tle inJiviJual samples are orJereJ alon, tle
lorizontal axis. Density of tle points is ,reatest near
Q
8
= O.6, tle maximum likelilooJ estimate. n tle
ri,lt, plottin, tle Jensity (similar to tle listo,ram) of
tlese samples in blue slows an excellent approximation
to tle actual naive posterior, slown in black.
l coJe
2.26
sampes.pw <- sampe{ modes , s1ze=10000 , pJob=pos1 ,
Jepace=TRbE )
You just tolJ lto Jraw1O-tlousanJ values of Q
8
ranJomly, in proportion
to tleir posterior probabilities. Tle commanJ sampe will automatically
ensure tlat pos1 is normalizeJ, so tlat's wly you JiJn't lave to Jo it
yourself. Values of Q
8
may appear more tlan once in tle resultin, list
s1m.modes, anJ tle Jepace=TRbE parameter in tle coJe ensures tlis.
'JHVSF Jemonstrates tle relationslip between tlese samples anJ
tle posterior probabilities tlemselves. ln tle leftlanJ plot, l've plot-
teJ eacl sample in sampes.pw, witl sample number on tle lorizon-
tal axis anJ tle sampleJ value itself on tle vertical axis. Tle points
scatter all over, because tlere is consiJerable uncertainty as tle value
of Q
8
. Lut tley Jo cluster arounJ Q
8
= O.6, tle maximum likelilooJ
estimate. Lacl value of Q
8
appears in tlese samples witl proportion ap-
proximately equal to its posterior probability. ln tle ri,ltlanJ plot, l
slow tle Jensity resultin, from tle point samples, in blue. lt is ja,,eJ,
plot dens
Confidence intervals