Documente Academic
Documente Profesional
Documente Cultură
Contents
6 Conjugate families 1
6.1 Binomial - beta prior . . . . . . . . . . . . . . . . . . . . . . 2
6.1.1 Uninformative priors . . . . . . . . . . . . . . . . . . 2
6.2 Gaussian (unknown mean, known variance) . . . . . . . . . 2
6.2.1 Uninformative prior . . . . . . . . . . . . . . . . . . 4
6.3 Gaussian (known mean, unknown variance) . . . . . . . . . 4
6.3.1 Uninformative prior . . . . . . . . . . . . . . . . . . 5
6.4 Gaussian (unknown mean, unknown variance) . . . . . . . . 5
6.4.1 Completing the square . . . . . . . . . . . . . . . . . 6
6.4.2 Marginal posterior distributions . . . . . . . . . . . . 7
6.4.3 Uninformative priors . . . . . . . . . . . . . . . . . . 9
6.5 Multivariate Gaussian (unknown mean, known variance) . . 11
6.5.1 Completing the square . . . . . . . . . . . . . . . . . 11
6.5.2 Uninformative priors . . . . . . . . . . . . . . . . . . 12
6.6 Multivariate Gaussian (unknown mean, unknown variance) 13
6.6.1 Completing the square . . . . . . . . . . . . . . . . . 14
6.6.2 Inverted-Wishart kernel . . . . . . . . . . . . . . . . 14
6.6.3 Marginal posterior distributions . . . . . . . . . . . . 15
6.6.4 Uninformative priors . . . . . . . . . . . . . . . . . . 17
6.7 Bayesian linear regression . . . . . . . . . . . . . . . . . . . 18
6.7.1 Known variance . . . . . . . . . . . . . . . . . . . . 19
6.7.2 Unknown variance . . . . . . . . . . . . . . . . . . . 22
6.7.3 Uninformative priors . . . . . . . . . . . . . . . . . . 26
6.8 Bayesian linear regression with general error structure . . . 27
iv Contents
6
Conjugate families
Conjugate families arise when the likelihood times the prior produces a
recognizable posterior kernel
p ( | y) ( | y) p ()
where the kernel is the characteristic part of the distribution function that
depends on the random variable(s) (the part excluding any normalizing
constants). For example, the density function for a univariate Gaussian or
normal is
1 1 2
exp 2 (x )
2 2
1
as 2 is a normalizing constant. Now, we discuss a few common conjugate
family results1 and uninformative prior results to connect with classical
results.
combines with a Gaussian or normal prior for given 2 with prior mean
0 and prior variance 20
2
2 2
1 ( 0 )
p | ; 0 , 0 exp
2 20
2 Somewould utilize Jereys prior, p () beta ; 12 , 12 , which is invariant to trans-
formation, as the uninformative prior.
6.2 Gaussian (unknown mean, known variance) 3
or writing 20 2 /0 , we have
2
2 2
1 0 ( 0 )
p | ; 0 , /0 exp
2 2
to yield
2 2
2
1 (y ) 0 ( 0 )
p | y, , 0 , /0 exp +
2 2 2
Finally, we have
2
1 ( 1 )
p | y, , 0 , 2 /0 exp
2 21
1
0 0 +y 0 + 12 y 2 1
where 1 = 0 +1 = 0
1 1 and 21 = 0 +1 = 1
+ 12
, or the posterior
0 + 2 0
distribution of the mean given the data and priors is Gaussian or normal.
Notice, the posterior mean, 1 , weights the data and prior beliefs by their
relative precisions.
For a sample of n exchangeable draws, the likelihood is
n
1 (yi )
2
( | y, ) exp
i=1
2 2
1
0 0 +ny 0 + n2 y 2
where n = 0 +n = 0
1 n , y is the sample mean, and 2n = 0 +n =
0 + 2
1
1
+ n2
, or the posterior distribution of the mean, , given the data and
0
priors is again Gaussian or normal and the posterior mean, n , weights the
data and priors by their relative precisions.
2
3 0 0 is a scaled, inverted-chi
X
square 0 , 20 with scale 20 where X is a chi
square( 0 ) random variable.
6 6. Conjugate families
or
1
n
2
n
2 2 2 2
, | y exp 2 (yi y) + (y )
2 i=1
2
2 2
2 2
1 0 ( 0 )
p | ; 0 , /0 p ; 0 , 0 exp
2 2
( 0 /2+1) 0 2
2 exp 20
2
2 ( 0+3
2 )
2
0 20 + 0 ( 0 )
exp
2 2
to yield a normal | 2 ; n , 2n /n *inverted-chi square 2 ; n , 2n joint
posterior distribution5 where
n = 0 + n
n = 0 + n
0 n 2
n 2n = 0 20 + (n 1) s2 + (0 y)
0 + n
That is, the joint posterior is
n+20 +3
p , 2 | y; 0 , 2 /0 , 0 , 20 2
2 2
1 0 0 + (n 1) s
2
exp 2 +0 ( 0 )
2 2
+n ( y)
4 The prior for the mean, , is conditional on the scale of the data, 2 .
5 The product of normal or Gaussian kernels produces a Gaussian kernel.
6.4 Gaussian (unknown mean, unknown variance) 7
1
n
where y = n i=1 yi gives
2
(0 + n) ( n ) = (0 + n) 2 2 (0 + n) n + (0 + n) 2n
= (0 + n) 2 2 (0 0 + ny) + (0 + n) 2n
While expanding the exponent includes the square plus additional terms
as follows
2 2
0 ( 0 ) + n ( y) = 0 2 20 + 20 + n 2 2y + y 2
= (0 + n) 2 2 (0 0 + ny) + 0 20 + ny 2
2
1 2
1 exp 2 (0 + n) ( n )
2
2
marginal posterior
for the mean,
, on integrating out is a noncentral,
2n
scaled-Student t ; n , n , n 6 for the mean
n2+1
n
p ; n , 2n , n , n n ( n )2
n + 2n
or
n2+1
2
n 2n n ( n )
p ; n , , n 1+
n n 2n
and the marginal posterior for the variance, 2 , is an inverted-chi square 2 ; n , 2n
on integrating out .
2 2
2 ( n /2+1) n 2n
p ; n , n exp
2 2
Derivation of the marginal posterior for the mean, , is as follows. Let
A
z = 2 2 where
0 n 2 2
A = 0 20 + (n 1) s2 + (0 y) + (0 + n) ( n )
0 + n
2
= n 2n + (0 + n) ( n )
The marginal posterior for the mean, , integrates out 2 from the joint
posterior
p ( | y) = p , 2 | y d 2
0
2 n+20 +3 A
= exp 2 d 2
0 2
2
A
Utilizing 2 = 2z and dz = 2zA d 2 or d 2 = 2zA2 dz,
n+20 +3
A A
p ( | y) exp [z] dz
0 2z 2z 2
n+20 +1
A
z 1 exp [z] dz
0 2z
n+ 0 +1 n+ 0 +1
A 2 z 2 1
exp [z] dz
0
6 The
noncentral, scaled-Student t ; n , 2n /n , n implies n
n / n
has a standard
2 n +1
n 2
n / n
Student-t( n ) distribution p ( | y) 1 + n
.
6.4 Gaussian (unknown mean, unknown variance) 9
n+ 0 +1
The integral 0 z 2 1 exp [z] dz is a constant since it is the kernel of
a gamma density and therefore can be ignored when deriving the kernel of
the marginal posterior for the mean
n+ 0 +1
p ( | y) A 2
n+20 +1
2
n 2n + (0 + n) ( n )
n+20 +1
2
(0 + n) ( n )
1+
n 2n
2n
which is the kernel for a noncentral, scaled Student t ; n , 0 +n , n + 0 .
Derivation of the marginal posterior for 2 is somewhat simpler. Write
the joint posterior in terms of the conditional posterior for the mean mul-
tiplied by the marginal posterior for 2 .
p , 2 | y = p | 2 , y p 2 | y
Marginalization of 2 is achieved by integrating out .
2
p |y = p 2 | y p | 2 , y d
Since only the conditional posterior involves the marginal posterior for
2 is immediate.
n+20 +3 A
p , 2 | y 2 exp 2
2
2
2 n+20 +2 n 2n 1 (0 + n) ( n )
exp exp
2 2 2 2
As before, the integral involves the kernel of a gamma density and therefore
is a constant which can be ignored. Hence,
p ( | y) An/2
n2
2
(n 1) s2 + n ( y)
n1+1
2 2
n ( y)
1+
(n 1) s2
2
which we recognize as the kernel of a noncentral, scaled Student t ; y, sn , n 1 .
6.5 Multivariate Gaussian (unknown mean, known variance) 11
The product of the likelihood and prior yields the kernel of a multivariate
posterior Gaussian distribution for the mean
1 T 1
p ( | , y; 0 , 0 ) exp ( 0 ) 0 ( 0 )
2
n
1 T 1
exp (yi ) (yi )
i=1
2
leads to
T
1
0 + n
1
= T 10 + n
1
T
1 1
2 0 + n
T 1 1
+ 0 + n
T
Thus, adding and subtracting 1 0 + n
1
in the exponent com-
pletes the square (with three extra terms).
n
T T
( 0 ) 1
0 ( 0 ) + (yi ) 1 (yi )
i=1
T
T
T
= 1
0 + n 1
1
0 + n
1
2 + 10 + n
1
n
T
10 + n
1
+ T0 1
0 0 + yiT 1 yi
i=1
T
= 1
0 + n 1
n
T
1
0 + n
1
+ T0 1
0 0 + yiT 1 yi
i=1
Hence, the posterior for the mean has expected value and variance
1 1
V ar [ | y, , 0 , 0 ] = 1
0 + n
As in the univariate case, the data and prior beliefs are weighted by their
relative precisions.
n
1 T 1
( | , y) exp (yi ) (yi )
2 i=1
6.6 Multivariate Gaussian (unknown mean, unknown variance) 13
n
n
T
(yi ) 1 (yi ) = yiT 1 yi 2nT 1 y + nT 1
i=1 i=1
+ny T 1 y ny T 1 y
T
= n (y ) 1 (y )
n
+ yiT 1 yi ny T 1 y
i=1
The latter two terms are constants, hence, the posterior kernel is
n
T
p ( | , y) exp (y ) 1 (y )
2
which is Gaussian or N ; y, n1 , the classical result.
1
n T 1
where s2 = n1 i=1 (yi y) (yi y) combines with a Gaussian-
inverted Wishart prior
1
12 1 T 1
p | ; 0 , p ; , || exp ( 0 ) 0 ( 0 )
0 2
1
+k+1 tr
|| 2 || 2
exp
2
14 6. Conjugate families
tr (A) + tr (B) = tr (A + B)
and
tr (CD) = tr (DC)
We immediately have the results
tr xT x = tr xxT
and
tr xT 1 x = tr 1 xxT = tr xxT 1
1
Therefore, the above joint posterior can be rewritten as a N ; n , (0 + n)
inverted-Wishart 1 ; + n, n
+n
+n+k+1 1
p (, | y) |n | 2 || 2
exp tr n 1
2
1 0 + n T
|| 2 exp ( n ) 1 ( n )
2
where
1
n = (0 0 + ny)
0 + n
and
n
T 0 n T
n = + (yi y) (yi y) + (y 0 ) (y 0 )
i=1
0 + n
1
Now, its apparent the conditional posterior for given is N n , (0 + n)
0 + n T
p ( | , y) exp ( n ) 1 ( n )
2
Student tk (; n , , + n k + 1)
where
1 1
= (0 + n) ( + n k + 1) n
16 6. Conjugate families
Marginalization of the mean derives from the following identities (see Box
and Tiao [1973], p. 427, 441). Let Z be a m m positive definite symmetric
matrix consisting of 12 m (m + 1) distinct random variables zij (i, j = 1, . . . , m; i j).
And let q > 0 and B be an mm positive definite symmetric matrix. Then,
the distribution of zij ,
1
q1
p (Z) |Z| 2 exp 12 tr (ZB) , Z > 0
1 p(p1)
p p
p1
p (b) = 12 2 b+ 2 , b> 2
=1
and
(z) = tz1 et dt
0
or for integer n,
(n) = (n 1)!
( 11 , 12 , . . . , kk )
=
1 ( 11 , 12 , . . . , kk )
k+1
= ||
6.6 Multivariate Gaussian (unknown mean, unknown variance) 17
1 (+n+1)
p , 1 | y d1 |S ()| 2
1 >0
1 (+n+1)
T 2
n + (0 + n) ( n ) ( n )
1 (+n+1)
T 2
I + (0 + n) 1
n ( n ) ( n )
p ( | y) p , 1 | y d1
1 >0
1 nk1
2 exp 1 tr S () 1 d1
1 >0 2
The first identity (I.1) produces
n
p ( | y) |S ()| 2
n n2
T T
(y yi ) (y yi ) + n (y ) (y )
i=1
n 1 n2
T
I + n (y yi ) (y yi )
T
(y ) (y )
i=1
The second identity (I.2) identifies the marginal posterior for as (multi-
variate) Student tk ; y, n1 s2 , n k
n2
n T T
p ( | y) 1 + (y ) (y )
(n k) s2
n T
where (n k) s2 = i=1 (y yi ) (y yi ). The marginal posterior for the
1 n T
variance is I-W ; n, n where now n = i=1 (y yi ) (y yi ) .
y = X +
6.7 Bayesian linear regression 19
where 1 T
= X0T X0 + X T X
X0 X0 0 + X T X
= X T X 1 X T y
and 1
V = 2 X0T X0 + X T X
The variance expression follows from rewriting the estimator
1 T
= X0T X0 + X T X X0 X0 0 + X T X
1 T 1 T 1 T
= X0T X0 + X T X X0 X0 X0T X0 X 0 y0 + X T X X T X X y
T 1
= X0 X0 + X T X X0T y0 + X T y
then
1 T
= X0T X0 + X T X X0 X0 + X0T 0 + X T X + X T
Hence,
E | X, X0 =
1 T
= X0T X0 + X T X X 0 0 + X T
so that
V V ar | X, X0
T
= E | X, X0
1 T T
X0T X0 + X T X X0 0 + X T X0T 0 + X T
= E 1
X0T X0 + X T X | X, X0
T T
1 X0T 0 T0 X0 + X T T0 X0
X X 0 + X X
= E 0
T
+X0T 0 T X + X T T T X
1
T
X0 X0 + X X | X, X0
T 1 1
= X0 X0 + X T X X0T 2 IX0 + X T T IX X0T X0 + X T X
1 T 1
= 2 X0T X0 + X T X X0 X0 + X T X X0T X0 + X T X
1
= 2 X0T X0 + X T X
Now, lets backtrack and derive the conditional posterior as the product
of conditional priors and the likelihood function. The likelihood function
for known variance is
2
1 T
| , y, X exp 2 (y X) (y X)
2
Conditional Gaussian priors are
1 T
p | 2 exp 2 ( 0 ) V01 ( 0 )
2
The conditional posterior is the product of the prior and likelihood
T
2 1 (y X) (y X)
p | , y, X exp 2 T
2 + ( 0 ) V01 ( 0 )
T T
1 y y 2y T X + X T X
= exp 2 + T X0T X0 2 T0 X0T X0
2
+ T0 X0T X0 0
The first and last terms in the exponent do not involve (are constants)
and can ignored as they are absorbed through normalization. This leaves
1 2y T X + T X T X + T X0T X0
p | 2 , y, X exp 2
2 2 T0 X0T X0
1 T X0T X0 + X T X
= exp 2
2 2 y T X + T0 X0T X0
6.7 Bayesian linear regression 21
The last term in the exponent is all constants (does not involve ) so its
absorbed through normalization and disregarded for comparison of kernels.
Hence,
1 T
p | 2 , y, X exp V1
2
1 T X0T X0 + X T X
exp 2
2 2 y T X + T0 X0T X0
as claimed.
Uninformative priors
If theprior for
is uniformly distributed conditional on known variance,
2 , p | 2 1, then its as if X0T X0 0 (the information matrix for
the prior is null) and the posterior for is
p | 2 , y, X N , 2 X T X 1
The first term in the exponent doesnt depend on and can be dropped
as its absorbed via normalization. This leaves
2
1 T T T
p | , y, X exp 2 2y X + X X
2
Now, write p | 2 , y, X N , 2 X T X 1
2
1
T
T
p | , y, X exp 2 X X
2
and expand
1 + X
T X
p | 2 , y, X exp 2 T X T X 2 T X T X
2
The last term in the exponent doesnt depend on and is absorbed via
normalization. This leaves
1
p | 2 , y, X exp 2 T X T X 2 T X T X
2
1 1 T
exp 2 T X T X 2 T X T X X T X X y
2
1
exp 2 T X T X 2 T X T y
2
As this latter expression matches the simplifiedlikelihood expression, the
demonstration is complete, p | 2 , y, X N , 2 X T X 1 .
or
2
n 1 2 T T
, | y, X exp 2 (n p) s + ( b) X X ( b)
2
1
where s2 = np eT e.7
2 2 1
The conjugate prior
2 for linear
regression is the Gaussian | ; 0 , 0 -
2
inverse chi square ; 0 , 0
T
( 0 ) 0 ( 0 )
2
p | ; 0, 2
1
0 p 2
; 0 , 20 p
exp
2 2
0 2
( 0 /2+1) exp 20
2
Combining the prior
with the likelihood gives a joint Gaussian , 2 1
n -
inverse chi square 0 + n, 2n posterior
(n p) s2
2
p , | y, X; 0 , 2
1 2
0 , 0 , 0 exp n
2 2
T
( b) X T X ( b)
exp
2 2
T
p ( 0 ) 0 ( 0 )
exp
2 2
2 ( 0 /2+1) 0 20
exp 2
2
becomes
1
= , 2 | y, X = n exp 2 (n 1) s2 + n ( y)2
2
1 T
where = , b = X T X X y = y, p = 1, and X T X = n.
24 6. Conjugate families
where 1
= 0 + X T X 0 0 + X T Xb
n = 0 + X T X
and
T T
n 2n = 0 20 +(n p) s2 + 0 0 0 + XT X
where n = 0 +n. The conditional posterior of given 2 is Gaussian , 2 1
n .
The latter two terms are constants not involving (and can be ignored
when writing the kernel for the conditional posterior) which well add to
when we complete the square. Now, write out the square centered around
T
0 + X T X = T 0 + X T X
T T
2 0 + X T X + 0 + X T X
Substitute for in the second term on the right hand side and the first two
terms are identical to the two terms in equation (6.1). Hence, the exponents
from the prior for the mean and likelihood in (6.1) are equal to
T
0 + X T X
T
T X T X
0 + X T X + T0 0 0 +
6.7 Bayesian linear regression 25
where 1 = X T X.
Stacked regression
Bayesian linear regression with conjugate priors works as if we have a prior
sample {X0 , y0 }, 0 = X0T X0 , and initial estimates
1 T
0 = X0T X0 X 0 y0
Then, we combine this initial "evidence" with new evidence to update our
beliefs in the form of the posterior. Not surprisingly, the posterior mean is
a weighted average of the two "samples" where the weights are based on
the relative precision of the two "samples".
2
A
Utilizing 2 = 2z and dz = 2zA d 2 or d 2 = 2zA2 dz, (1 and 2 are
constants and can be ignored when deriving the kernel)
n+ 02+p+2
A A
p ( | y) exp [z] dz
0 2z 2z 2
n+ 0 +p n+ 0 +p
A 2 z 2 1
exp [z] dz
0
n+ 0 +k
The integral 0 z 2 1 exp [z] dz is a constant since it is the kernel of
a gamma density and therefore can be ignored when deriving the kernel of
the marginal posterior for beta
n+ 0 +p
p ( | y) A 2
T n+20 +p
n 2n + n
T n+20 +p
n
1+
n 2n
the kernel for a noncentral, scaled (multivariate) Student tp ; , 2n 1
n , n + 0 .
where
2n = (n p) s2
1
Hence, the conditional posterior for given 2 is Gaussian b, 2 X T X .
1
The marginal posterior for is multivariate Student tp ; b, s2 X T X ,n p ,
the classical estimator. Derivation of the marginal posterior for is analo-
A 2 T
gous to that above. Let z = 2 2 where A = (n p) s +( b) X T X ( b).
2
Integrating out of the joint posterior produces the marginal posterior
for .
p ( | y) p , 2 | y d 2
2 n+2 A
2
exp 2 d 2
2
Substitution yields
n+2
A 2
A
p ( | y) exp [z] dz
2z 2z 2
n n
A 2 z 2 1 exp [z] dz
y = X + , ( | X) N (0, )
First, we consider the known variance case, then take up the unknown
variance case.
28 6. Conjugate families
The first and last terms in the exponent do not involve (are constants)
and can ignored as they are absorbed through normalization. This leaves
1 2y T 1 X + T X T 1 X
p | 2 , y, X exp 2
2 + T V1 2 T0 V1
1 T V 1 + X T 1 X
= exp 2
2 2 y T 1 X + T0 V 1
The last term in the exponent is all constants (does not involve ) so its
absorbed through normalization and disregarded for comparison of kernels.
Hence,
1 T
p | 2 , y, X exp V1
2
1 T X0T X0 + X T X
exp 2 T
2 2 y T 1 X + T0 V1
as claimed.
and variance unknown where each draw is an element of the y vector and
X is an n p matrix of regressors. A Gaussian likelihood is
n 1 T 1
(, | y, X) || 2
exp (y ) (y )
2
T 1
n 1 (y Xb) (y Xb)
|| 2 exp T
2 + (b ) X T 1 X (b )
n 1 2 T T 1
|| 2
exp (n p) s + (b ) X X (b )
2
1 T 1 1 T
where b = X T 1 X X y and s2 = np (y Xb) 1 (y Xb).
Combine the likelihood with a Gaussian-inverted Wishart prior
1 T
p ( | ; 0 , ) p 1 ; , exp ( 0 ) 1 ( 0 )
2
+p+1 tr 1
|| ||
2 2
exp
2
T 1 1
where tr () is the trace of the matrix, it is as if = X0 0 X0 , and
is degrees of freedom to produce the joint posterior
1
+n+p+1 tr
p (, | y, X) || 2 || 2
exp
2
(n p) s2
1 T
exp + (b ) X T 1 X (b )
2 T
+ ( 0 ) 1 ( 0 )
= (n p) s2 + bT X T 1 Xb 2 T X T 1 Xb + T X T 1 X
+ T 1 T 1 T 1
2 0 + 0 0
= (n p) s2 + T 1
+ X T 1
X
2 T V1 + bT X T 1 Xb + T0 1
0
= (n p) s2 + T V1 2 T V1 + bT X T 1 Xb + T0 1
0
where
1
= 1
+X
T 1
X 1 T 1
0 + X Xb
= V 1
0 + X T 1
Xb
6.8 Bayesian linear regression with general error structure 31
1
and V = 1
+ X T 1
X .
Variation in around is
T T
V1 = T V1 2 T V1 + V1
The first two terms are identical to two terms in the posterior involving
and there is apparently no recognizable kernel from these expressions. The
joint posterior is
p (, | y, X)
+n+p+1 tr 1
|| ||2 2
exp
2
T 1
V
1 T
exp + (n p) s2 V1
2
+bT X T 1 Xb + T0 1
0
tr 1 + (n p) s2
+n+p+1
1 T
|| 2 || 2
exp V1
2 +bT X T 1 Xb + T 1
0 0
1 T
exp V1
2
Therefore, we write the conditional posteriors for the parameters of interest.
First, we focus on then we take up .
The conditional posterior for conditional on involves collecting all
terms
involving
. Hence, the conditional posterior for is ( | )
N , V or
1 T 1
p ( | , y, X) exp V
2
Inverted-Wishart kernel
Now, we gather all terms involving and write the conditional posterior
for .
p ( | , y, X)
+n+p+1 1 tr 1 + (n p) s2
|| ||
2 2
exp T
2 + (b ) X T 1 X (b )
tr 1 +
+n+p+1 1 T
|| 2 || 2
exp (y Xb) 1 (y Xb)
2 T T 1
+ (b ) X X (b )
T
+n+p+1 1 + (y Xb) (y Xb)
|| 2 || 2
exp tr T 1
2 + (b ) X T X (b )
32 6. Conjugate families
where
T T
n = + (y Xb) (y Xb) + (b ) X T X (b )
n
n+1 1 1
p (, | y) |n | ||
2 2
exp tr n
2
where
T T
n = (y Xb) (y Xb) + (b ) X T X (b )
As with informed priors, a Gibbs sampler (sequential draws from the condi-
tional posteriors) can be employed to draw inferences for the uninformative
prior case.
Next, we discuss posterior simulation, a convenient and flexible strategy
for drawing inference from the evidence and (conjugate) priors.
34 6. Conjugate families
discrete data:
beta-negative negative
beta beta
binomial a1 b1 binomial b+s1
p (1 p) s pa+nr1 (1 p)
p pnr (1 p)
beta-binomial-
hypergeometric8 hypergeometric
k k N k beta-binomial
k unknown beta-binomial
x nx N n
population success n
N kx
N known x (a+k)(b+N k)(a+b+n)
population size (a+x)(b+nx)(a+b)
, n (a+x)(b+nx)(a+b+N ) ,
(a)(b)(a+b+n)
n known sampling k = x, x + 1, . . . ,
x = 0, 1, 2, . . . , n
sample size without x+N n
x known replacement
sample success
n n n!
s= i=1 yi , = x!(nx)! ,
x
(a)(b)
(z) = 0
et tz1 dt, (n) = (n 1)! for n a positive integer, B (a, b) = (a+b)
continuous data:
Pareto-uniform
w
uniform Pareto
w unknown Pareto 1
wn ,
a
upper bound, ab (a+n) max[b,xi ]a+n
wa+1 w > max (xi ) wa+n+1
0 known
lower bound
Pareto-Pareto
Pareto Pareto
Pareto
unknown aba (an)b(an)
a+1
, n , an+1
,
precision,
>b 0 < < min (xi ) a > n, > b
known
shape
gamma-Pareto
Pareto
gamma n n gamma
m+1
,
unknown a1 e/b
n a+n1 e/b
ba (a) ,
(b )a+n (a+n) ,
shape, m= xi , 1
known >0 i=1 b = 1 +log mn log
> 0
b
0 < < min (xi )
precision
inverse gamma-
gamma
gamma inverse gamma
inverse gamma s/ 1an e1/b
unknown 1a 1/b e n , ,
(a)b
e
a n (a+n)(b )a+n
rate, s = i=1 xi b
b = 1+bs
known
shape
conjugate prior-
gamma gamma
nonstandard m1
1 c , nonstandard
a ()b , n ()n
unknown n
(am)1 (c+n)
a, b, c > 0 m= xi , ()b+n
shape, i=1
known >0
xi > 0
rate
36 6. Conjugate families
continuous data:
normal normal
normal
2
normal-normal 2
n
)2 exp ( n)
,
exp ( 0)
2 2
, exp (yi2 2
2 2 n
2
0
i=1 ss n = 000+n
+ny
,
20 = 2
0 = exp 2 2 2n = 0+n
joint posterior:
normal | 2 normal | 2
inverse gamma
normal | 2 inverse gamma
(0 )2 normal
inverse gamma- 1
0 exp 2 20
2
0 0
(a+1) (21)n/2
0
exp
normal 2 ss 2 2 2
, 2 exp 2
exp b2 ,
2
2 a +1
2
20 = 0
exp b 2 ;
Student t marginal
posterior for :
2a 2+1
2
0 b 0
1+ 2 ;
inverse gamma
marginal
posterior for 2 :
a +1
2
exp b 2 ,
a = a + n2 , 0 = 0 + n,
1 ss
b + 2
b = 0 n(y0 )2 ,
+ 2(0 +n)
0 = 000+n
+ny
n 2
ss = i=1 (yi )
6.9 Appendix: summary of conjugacy 37
continuous data:
bilateral
bilateral bivariate
bilateral
bivariate Pareto
bivariate uniform
Pareto n (a +n) (a + n + 1)
Pareto- a 1 a+n
a(a+1)(r2 r1 ) r2 r1
uniform (ul)a+2
, ul
,
(ul)a+n+2
l, u l < r 1 , u > r2
r1 = min (r1 , xi ) ,
r2 = max (r2 , xi )
2
0
i=1 lss n = 0 00+nlog y
,
20 = +n
2
0 = exp 2 2 2
n = 0 +n
n 2
lss = i=1 (log yi )
38 6. Conjugate families
continuous data:
multivariate normal
inverted Wishart-
multivariate normal
,
prior (, )
12 T
multivariate normal ( | ) || exp 20 ( 0 ) 1 ( 0 )
+k+1 tr (1 )
inverted Wishart () || ||2 2
exp 2
likelihood (, | y)
n T
multivariate normal || 2
exp 12 (n 1) s2 + (y ) 1 (y )
joint posterior (, | y)
T
multivariate normal ( | , y) exp 02+n ( n ) 1 ( n )
+n
+n+k+1 tr (n 1 )
inverted Wishart ( | y) || 2
|| 2
exp 2
marginal posterior
12 (+n+1)
T
multivariate Student t ( | y) I + (0 + n) ( n ) 1
n ( n )
+n
+n+k+1 tr (n 1 )
inverted Wishart ( | y) || 2
|| 2
exp 2
0 0 +ny 1
n T
n = 0 +n , s2 = n1 i=1 (yi y) 1 (yi y) ,
n T 0 n T
n = + i=1 (yi y) (yi y) + 0 +n (0 y) (0 y)
6.9 Appendix: summary of conjugacy 39
continuous data:
linear regression
normal inverse chi square-normal
, 2
prior , 2
T
normal | 2 p exp 21 2 ( 0 ) 0 ( 0 )
0 20
inverse chi square 2 ( 0 /2+1) exp 2 2
normal likelihood , 2 | y, X
T
normal n exp 21 2 eT e + ( b) X T X ( b)
joint posterior p , 2 | y, X
T
normal p | 2 , y, X p exp 21 2 n
[( 0 +n)/2+1]
2
inverse chi square 2 | y, X 2 exp 2n2
marginal posterior
T 0 +n+p
2
1
Student t | 2 , y, X 1+ n 2n n
[( 0 +n)/2+1]
2
inverse chi square 2 | y, X 2 exp 2n2
where
1 T
e = y Xb, b = XT X X y,
1
= 0 + X T X 0 0 + X T Xb , n = 0 + X T X,
T T
n 2n = 0 20 + eT e + 0 0 0 + XT X ,
and n = 0 + n
40 6. Conjugate families
continuous data:
linear regression
with general variance
normal inverted Wishart-normal
,
prior (, )
T
normal ( | ) exp 12 ( 0 ) 1
( 0 )
+p+1 tr (1 )
inverted Wishart () || ||
2 2
exp 2
normal likelihood (, | y, X)
n T
normal || 2
exp 12 (n p) s2 + ( b) X T 1 X ( b)
conditional posterior T
normal p ( | , y, X) exp 12 V1
+n
+n+p+1 tr (n 1 )
inverted Wishart ( | , y, X) || 2
|| 2
exp 2
1 T 1 T 1
s2 = np (y Xb) 1 (y Xb) , b = X T 1 X X y,
1 1
V = 1
+ X T 1
X , = 1
+ X T 1
X 1
0 + X T 1
Xb ,
T T
n = + (y Xb) (y Xb) + b X T X b ,
1
and = X0T 1
0 X0