Documente Academic
Documente Profesional
Documente Cultură
Piyush Rai
Oct 3, 2017
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 1
Sampling Methods: Recap
Any probability distribution p(z) can be (approximately) represented using a set of samples
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 2
Sampling Methods: Recap
Any probability distribution p(z) can be (approximately) represented using a set of samples
Samples can come from p(z) or some proposal distribution if p(z) is a difficult distribution
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 2
Sampling Methods: Recap
Any probability distribution p(z) can be (approximately) represented using a set of samples
Samples can come from p(z) or some proposal distribution if p(z) is a difficult distribution
Given a set of samples {z (`) }L`=1 , the sample-based approximation of p(z) can be written as
L L
1X 1X
p(z) (z = z (`) ) or p(z) z (`) (z)
L L
`=1 `=1
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 2
Sampling Methods: Recap
Looked at some basic methods for generating samples from a probability distribution
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 3
Sampling Methods: Recap
Looked at some basic methods for generating samples from a probability distribution
Transformation based methods
Rejection sampling
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 3
Sampling Methods: Recap
Looked at some basic methods for generating samples from a probability distribution
Transformation based methods
Rejection sampling
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 3
Sampling Methods: Recap
Looked at some basic methods for generating samples from a probability distribution
Transformation based methods
Rejection sampling
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 3
Sampling Methods: Recap
Looked at some basic methods for generating samples from a probability distribution
Transformation based methods
Rejection sampling
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 3
Sampling Methods: Recap
Looked at some basic methods for generating samples from a probability distribution
Transformation based methods
Rejection sampling
p(z)
[Note: I.S. (1) assumes p(z) can be evaluated at any z, I.S. (2) assumes p(z) = Zp can only be evaluated up to a prop. constant]
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 3
Limitations of Basic Sampling Methods
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 4
Limitations of Basic Sampling Methods
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 4
Limitations of Basic Sampling Methods
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 4
Limitations of Basic Sampling Methods
Difficult to find good prop. distr. especially when z is high-dim. (e.g., models with many params)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 4
Limitations of Basic Sampling Methods
Difficult to find good prop. distr. especially when z is high-dim. (e.g., models with many params)
In high dimensions, most of the mass of p(z) is concentrated in a tiny region of the z space
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 4
Limitations of Basic Sampling Methods
Difficult to find good prop. distr. especially when z is high-dim. (e.g., models with many params)
In high dimensions, most of the mass of p(z) is concentrated in a tiny region of the z space
Difficult to a priori know what those regions are, thus difficult to come up with good proposal dist.
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 4
Markov Chain Monte Carlo (MCMC) Methods
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 5
Markov Chain Monte Carlo (MCMC)
p(z)
Goal: Generate samples from some target distribution p(z) = Z , where z is high-dimensional
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 6
Markov Chain Monte Carlo (MCMC)
p(z)
Goal: Generate samples from some target distribution p(z) = Z , where z is high-dimensional
Will again assume that we can evaluate p(z) at least up to a proportionality constant
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 6
Markov Chain Monte Carlo (MCMC)
p(z)
Goal: Generate samples from some target distribution p(z) = Z , where z is high-dimensional
Will again assume that we can evaluate p(z) at least up to a proportionality constant
Basic idea in MCMC: Use a Markov Chain to generate samples from p(z)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 6
Markov Chain Monte Carlo (MCMC)
p(z)
Goal: Generate samples from some target distribution p(z) = Z , where z is high-dimensional
Will again assume that we can evaluate p(z) at least up to a proportionality constant
Basic idea in MCMC: Use a Markov Chain to generate samples from p(z)
How: Given a current sample z (`) from the chain, generate the next sample z (`+1) as
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 6
Markov Chain Monte Carlo (MCMC)
p(z)
Goal: Generate samples from some target distribution p(z) = Z , where z is high-dimensional
Will again assume that we can evaluate p(z) at least up to a proportionality constant
Basic idea in MCMC: Use a Markov Chain to generate samples from p(z)
How: Given a current sample z (`) from the chain, generate the next sample z (`+1) as
Use a proposal distribution q(z|z (`) ) to generate a candidate sample z
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 6
Markov Chain Monte Carlo (MCMC)
p(z)
Goal: Generate samples from some target distribution p(z) = Z , where z is high-dimensional
Will again assume that we can evaluate p(z) at least up to a proportionality constant
Basic idea in MCMC: Use a Markov Chain to generate samples from p(z)
How: Given a current sample z (`) from the chain, generate the next sample z (`+1) as
Use a proposal distribution q(z|z (`) ) to generate a candidate sample z
Accept/reject z as the next sample based on an acceptance criterion (will see later)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 6
Markov Chain Monte Carlo (MCMC)
p(z)
Goal: Generate samples from some target distribution p(z) = Z , where z is high-dimensional
Will again assume that we can evaluate p(z) at least up to a proportionality constant
Basic idea in MCMC: Use a Markov Chain to generate samples from p(z)
How: Given a current sample z (`) from the chain, generate the next sample z (`+1) as
Use a proposal distribution q(z|z (`) ) to generate a candidate sample z
Accept/reject z as the next sample based on an acceptance criterion (will see later)
If accepted, z (`+1) = z . If rejected, z (`+1) = z (`)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 6
Markov Chain Monte Carlo (MCMC)
p(z)
Goal: Generate samples from some target distribution p(z) = Z , where z is high-dimensional
Will again assume that we can evaluate p(z) at least up to a proportionality constant
Basic idea in MCMC: Use a Markov Chain to generate samples from p(z)
How: Given a current sample z (`) from the chain, generate the next sample z (`+1) as
Use a proposal distribution q(z|z (`) ) to generate a candidate sample z
Accept/reject z as the next sample based on an acceptance criterion (will see later)
If accepted, z (`+1) = z . If rejected, z (`+1) = z (`)
If q(z|z (`) ) has certain properties, the Markov chains stationary distribution will be p(z)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 6
Markov Chain Monte Carlo (MCMC)
p(z)
Goal: Generate samples from some target distribution p(z) = Z , where z is high-dimensional
Will again assume that we can evaluate p(z) at least up to a proportionality constant
Basic idea in MCMC: Use a Markov Chain to generate samples from p(z)
How: Given a current sample z (`) from the chain, generate the next sample z (`+1) as
Use a proposal distribution q(z|z (`) ) to generate a candidate sample z
Accept/reject z as the next sample based on an acceptance criterion (will see later)
If accepted, z (`+1) = z . If rejected, z (`+1) = z (`)
If q(z|z (`) ) has certain properties, the Markov chains stationary distribution will be p(z)
Informally, stationary distribution means where the chain will eventually reach
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 6
Markov Chain Monte Carlo (MCMC)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 7
Markov Chain
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 8
Markov Chain
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 8
Markov Chain
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 8
Markov Chain
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 8
Markov Chain
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 8
Markov Chain
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 8
Markov Chain
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 8
Markov Chain
Homogeneous Markov Chain: Transition probabilities T` = T (same everywhere along the chain)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 8
Some Properties
Consider the discrete case when z has K possible states (and T is a matrix of size K K )
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 9
Some Properties
Consider the discrete case when z has K possible states (and T is a matrix of size K K )
Assume the graph representing the possible state transitions is irreducible and aperiodic
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 9
Some Properties
Consider the discrete case when z has K possible states (and T is a matrix of size K K )
Assume the graph representing the possible state transitions is irreducible and aperiodic
Ergodic Property: Under the above assumption, for any choice of an initial probability vector v
v T m = p as m
where the probability vector p represents the invariant or stationary distribution of the chain
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 9
Some Properties
Consider the discrete case when z has K possible states (and T is a matrix of size K K )
Assume the graph representing the possible state transitions is irreducible and aperiodic
Ergodic Property: Under the above assumption, for any choice of an initial probability vector v
v T m = p as m
where the probability vector p represents the invariant or stationary distribution of the chain
Why do we need the graph to be irreducible and aperiodic?
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 9
Some Properties
Consider the discrete case when z has K possible states (and T is a matrix of size K K )
Assume the graph representing the possible state transitions is irreducible and aperiodic
Ergodic Property: Under the above assumption, for any choice of an initial probability vector v
v T m = p as m
where the probability vector p represents the invariant or stationary distribution of the chain
Why do we need the graph to be irreducible and aperiodic?
Irreducible: No disjoint sets of nodes. Can reach from any state to any state
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 9
Some Properties
Consider the discrete case when z has K possible states (and T is a matrix of size K K )
Assume the graph representing the possible state transitions is irreducible and aperiodic
Ergodic Property: Under the above assumption, for any choice of an initial probability vector v
v T m = p as m
where the probability vector p represents the invariant or stationary distribution of the chain
Why do we need the graph to be irreducible and aperiodic?
Irreducible: No disjoint sets of nodes. Can reach from any state to any state
Aperiodic: No cycles in the graph (otherwise would oscillate forever). Consider this example
0 1
v = [1/5, 4/5] T =
1 0
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 9
Some Properties
Note that, for Ergodic case, p T = p
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 10
Some Properties
Note that, for Ergodic case, p T = p . Therefore is the left eigenvector of T with eigenvalue 1
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 10
Some Properties
Note that, for Ergodic case, p T = p . Therefore is the left eigenvector of T with eigenvalue 1
For the discrete-valued z with K = 5 possible states, p = [p1 , . . . , p5 ], and we can write
X 5
pi Tij = pj
i=1
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 10
Some Properties
Note that, for Ergodic case, p T = p . Therefore is the left eigenvector of T with eigenvalue 1
For the discrete-valued z with K = 5 possible states, p = [p1 , . . . , p5 ], and we can write
X 5
pi Tij = pj
i=1
For the continuous z case, we can equivalently write (for any two state values z and z 0 )
Z
p (z 0 )T (z 0 , z)dz 0 = p (z)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 10
Some Properties
Note that, for Ergodic case, p T = p . Therefore is the left eigenvector of T with eigenvalue 1
For the discrete-valued z with K = 5 possible states, p = [p1 , . . . , p5 ], and we can write
X 5
pi Tij = pj
i=1
For the continuous z case, we can equivalently write (for any two state values z and z 0 )
Z
p (z 0 )T (z 0 , z)dz 0 = p (z)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 10
Some Properties
Note that, for Ergodic case, p T = p . Therefore is the left eigenvector of T with eigenvalue 1
For the discrete-valued z with K = 5 possible states, p = [p1 , . . . , p5 ], and we can write
X 5
pi Tij = pj
i=1
For the continuous z case, we can equivalently write (for any two state values z and z 0 )
Z
p (z 0 )T (z 0 , z)dz 0 = p (z)
Integrating both sides w.r.t. z 0 gives p (z 0 )T (z 0 , z)dz 0 = p (z) (i.e., the Ergodic property)
R
Thus a Markov chain with detailed balance will always converge to a stationary distribution p (z)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 10
Some Properties
Note that, for Ergodic case, p T = p . Therefore is the left eigenvector of T with eigenvalue 1
For the discrete-valued z with K = 5 possible states, p = [p1 , . . . , p5 ], and we can write
X 5
pi Tij = pj
i=1
For the continuous z case, we can equivalently write (for any two state values z and z 0 )
Z
p (z 0 )T (z 0 , z)dz 0 = p (z)
Integrating both sides w.r.t. z 0 gives p (z 0 )T (z 0 , z)dz 0 = p (z) (i.e., the Ergodic property)
R
Thus a Markov chain with detailed balance will always converge to a stationary distribution p (z)
Homogeneous Markov Chains satisfy detailed balance/ergodic property under mild conditions
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 10
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
But we usually require several samples to approximate the distribution. How do we get those?
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
But we usually require several samples to approximate the distribution. How do we get those?
Start at an initial z (0) . Using a prop. dist. p(z (`+1) |z (`) ), run the chain long enough, say T1 steps
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
But we usually require several samples to approximate the distribution. How do we get those?
Start at an initial z (0) . Using a prop. dist. p(z (`+1) |z (`) ), run the chain long enough, say T1 steps
Discard the first (T1 1) samples (called burn-in samples) and take the last sample z (T1 )
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
But we usually require several samples to approximate the distribution. How do we get those?
Start at an initial z (0) . Using a prop. dist. p(z (`+1) |z (`) ), run the chain long enough, say T1 steps
Discard the first (T1 1) samples (called burn-in samples) and take the last sample z (T1 )
Continue from z (T1 ) up to T2 steps, discard intermediate samples, take the last sample z (T2 )
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
But we usually require several samples to approximate the distribution. How do we get those?
Start at an initial z (0) . Using a prop. dist. p(z (`+1) |z (`) ), run the chain long enough, say T1 steps
Discard the first (T1 1) samples (called burn-in samples) and take the last sample z (T1 )
Continue from z (T1 ) up to T2 steps, discard intermediate samples, take the last sample z (T2 )
This helps ensure that z (T1 ) and z (T2 ) are uncorrelated
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
But we usually require several samples to approximate the distribution. How do we get those?
Start at an initial z (0) . Using a prop. dist. p(z (`+1) |z (`) ), run the chain long enough, say T1 steps
Discard the first (T1 1) samples (called burn-in samples) and take the last sample z (T1 )
Continue from z (T1 ) up to T2 steps, discard intermediate samples, take the last sample z (T2 )
This helps ensure that z (T1 ) and z (T2 ) are uncorrelated
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
But we usually require several samples to approximate the distribution. How do we get those?
Start at an initial z (0) . Using a prop. dist. p(z (`+1) |z (`) ), run the chain long enough, say T1 steps
Discard the first (T1 1) samples (called burn-in samples) and take the last sample z (T1 )
Continue from z (T1 ) up to T2 steps, discard intermediate samples, take the last sample z (T2 )
This helps ensure that z (T1 ) and z (T2 ) are uncorrelated
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
But we usually require several samples to approximate the distribution. How do we get those?
Start at an initial z (0) . Using a prop. dist. p(z (`+1) |z (`) ), run the chain long enough, say T1 steps
Discard the first (T1 1) samples (called burn-in samples) and take the last sample z (T1 )
Continue from z (T1 ) up to T2 steps, discard intermediate samples, take the last sample z (T2 )
This helps ensure that z (T1 ) and z (T2 ) are uncorrelated
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
MCMC: The Basic Scheme
Running the MCMC chain infinitely long gives us ONE sample from the target distribution
But we usually require several samples to approximate the distribution. How do we get those?
Start at an initial z (0) . Using a prop. dist. p(z (`+1) |z (`) ), run the chain long enough, say T1 steps
Discard the first (T1 1) samples (called burn-in samples) and take the last sample z (T1 )
Continue from z (T1 ) up to T2 steps, discard intermediate samples, take the last sample z (T2 )
This helps ensure that z (T1 ) and z (T2 ) are uncorrelated
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 11
Some MCMC Sampling Algorithms
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 12
Metropolis-Hastings (MH) Sampling
Assume a proposal distribution q(z|z ( ) ), e.g., N (z|z ( ) , 2 ID )
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 13
Metropolis-Hastings (MH) Sampling
Assume a proposal distribution q(z|z ( ) ), e.g., N (z|z ( ) , 2 ID )
In each step, draw z q(z|z ( ) ) and accept the sample z with probability
p(z )q(z ( ) |z )
A(z , z ( ) ) = min 1,
p(z ( ) )q(z |z ( ) )
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 13
Metropolis-Hastings (MH) Sampling
Assume a proposal distribution q(z|z ( ) ), e.g., N (z|z ( ) , 2 ID )
In each step, draw z q(z|z ( ) ) and accept the sample z with probability
p(z )q(z ( ) |z )
A(z , z ( ) ) = min 1,
p(z ( ) )q(z |z ( ) )
The acceptance probability makes intuitive sense. Note the kind of z would it favor/unfavor:
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 13
Metropolis-Hastings (MH) Sampling
Assume a proposal distribution q(z|z ( ) ), e.g., N (z|z ( ) , 2 ID )
In each step, draw z q(z|z ( ) ) and accept the sample z with probability
p(z )q(z ( ) |z )
A(z , z ( ) ) = min 1,
p(z ( ) )q(z |z ( ) )
The acceptance probability makes intuitive sense. Note the kind of z would it favor/unfavor:
It favors accepting z if p(z ) has a higher value than p(z ( ) )
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 13
Metropolis-Hastings (MH) Sampling
Assume a proposal distribution q(z|z ( ) ), e.g., N (z|z ( ) , 2 ID )
In each step, draw z q(z|z ( ) ) and accept the sample z with probability
p(z )q(z ( ) |z )
A(z , z ( ) ) = min 1,
p(z ( ) )q(z |z ( ) )
The acceptance probability makes intuitive sense. Note the kind of z would it favor/unfavor:
It favors accepting z if p(z ) has a higher value than p(z ( ) )
Unfavors z if the proposal distribution q unduly favors it (i.e., if q(z |z ( ) ) is large)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 13
Metropolis-Hastings (MH) Sampling
Assume a proposal distribution q(z|z ( ) ), e.g., N (z|z ( ) , 2 ID )
In each step, draw z q(z|z ( ) ) and accept the sample z with probability
p(z )q(z ( ) |z )
A(z , z ( ) ) = min 1,
p(z ( ) )q(z |z ( ) )
The acceptance probability makes intuitive sense. Note the kind of z would it favor/unfavor:
It favors accepting z if p(z ) has a higher value than p(z ( ) )
Unfavors z if the proposal distribution q unduly favors it (i.e., if q(z |z ( ) ) is large)
Favors z if we can reverse to z ( ) from z (i.e., if q(z ( ) |z ) is large). Needed for good mixing
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 13
Metropolis-Hastings (MH) Sampling
Assume a proposal distribution q(z|z ( ) ), e.g., N (z|z ( ) , 2 ID )
In each step, draw z q(z|z ( ) ) and accept the sample z with probability
p(z )q(z ( ) |z )
A(z , z ( ) ) = min 1,
p(z ( ) )q(z |z ( ) )
The acceptance probability makes intuitive sense. Note the kind of z would it favor/unfavor:
It favors accepting z if p(z ) has a higher value than p(z ( ) )
Unfavors z if the proposal distribution q unduly favors it (i.e., if q(z |z ( ) ) is large)
Favors z if we can reverse to z ( ) from z (i.e., if q(z ( ) |z ) is large). Needed for good mixing
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 13
Metropolis-Hastings (MH) Sampling
Assume a proposal distribution q(z|z ( ) ), e.g., N (z|z ( ) , 2 ID )
In each step, draw z q(z|z ( ) ) and accept the sample z with probability
p(z )q(z ( ) |z )
A(z , z ( ) ) = min 1,
p(z ( ) )q(z |z ( ) )
The acceptance probability makes intuitive sense. Note the kind of z would it favor/unfavor:
It favors accepting z if p(z ) has a higher value than p(z ( ) )
Unfavors z if the proposal distribution q unduly favors it (i.e., if q(z |z ( ) ) is large)
Favors z if we can reverse to z ( ) from z (i.e., if q(z ( ) |z ) is large). Needed for good mixing
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 13
MH Sampling in Action: A Toy Example..
4 1 2 0.01 0
Target p(z) = N , , Proposal q(z (t) |z (t1) ) = N z (t1) ,
4 3 4 0 0.01
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 14
MH Sampling in Action: A Toy Example..
4 1 2 0.01 0
Target p(z) = N , , Proposal q(z (t) |z (t1) ) = N z (t1) ,
4 3 4 0 0.01
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 14
MH Sampling in Action: A Toy Example..
4 1 2 0.01 0
Target p(z) = N , , Proposal q(z (t) |z (t1) ) = N z (t1) ,
4 3 4 0 0.01
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 14
MH Sampling in Action: A Toy Example..
4 1 2 0.01 0
Target p(z) = N , , Proposal q(z (t) |z (t1) ) = N z (t1) ,
4 3 4 0 0.01
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 14
MH Sampling in Action: A Toy Example..
4 1 2 0.01 0
Target p(z) = N , , Proposal q(z (t) |z (t1) ) = N z (t1) ,
4 3 4 0 0.01
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 14
MH Sampling: Some Comments
Special Case: If proposal distrib. is symmetric, we get Metropolis Sampling algorithm with
p(z )
( )
A(z , z ) = min 1,
p(z ( ) )
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 15
MH Sampling: Some Comments
Special Case: If proposal distrib. is symmetric, we get Metropolis Sampling algorithm with
p(z )
( )
A(z , z ) = min 1,
p(z ( ) )
Limitation: MH can have a very slow convergence
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 15
Gibbs Sampling (Geman & Geman, 1984)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 16
Gibbs Sampling (Geman & Geman, 1984)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 16
Gibbs Sampling (Geman & Geman, 1984)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 16
Gibbs Sampling (Geman & Geman, 1984)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 16
Gibbs Sampling (Geman & Geman, 1984)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 16
Gibbs Sampling (Geman & Geman, 1984)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 16
Gibbs Sampling (Geman & Geman, 1984)
p(z )q(z|z )
A(z , z) =
p(z)q(z |z)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 16
Gibbs Sampling (Geman & Geman, 1984)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 16
Gibbs Sampling (Geman & Geman, 1984)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 16
Gibbs Sampling: Sketch of the Algorithm
M: Total number of variables, T : number of Gibbs sampling steps
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 17
Gibbs Sampling: Sketch of the Algorithm
M: Total number of variables, T : number of Gibbs sampling steps
Note: When sampling each variable from its conditional posterior, we use the most recent values of all
other variables (this is akin to a co-ordinate ascent like procedure)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 17
Gibbs Sampling: Sketch of the Algorithm
M: Total number of variables, T : number of Gibbs sampling steps
Note: When sampling each variable from its conditional posterior, we use the most recent values of all
other variables (this is akin to a co-ordinate ascent like procedure)
Note: Order of updating the variables usually doesnt matter (but see Scan Order in Gibbs Sampling: Models in
Which it Matters and Bounds on How Much from NIPS 2016)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 17
Gibbs Sampling: A Simple Example
Can sample from a 2-D Gaussian using 1-D Gaussians (recall that if the joint distribution is a 2-D
Gaussian, conditionals will simply be 1-D Gaussians)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 18
Gibbs Sampling: A Simple Example
Can sample from a 2-D Gaussian using 1-D Gaussians (recall that if the joint distribution is a 2-D
Gaussian, conditionals will simply be 1-D Gaussians)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 18
Gibbs Sampling: A Simple Example
Can sample from a 2-D Gaussian using 1-D Gaussians (recall that if the joint distribution is a 2-D
Gaussian, conditionals will simply be 1-D Gaussians)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 18
Gibbs Sampling: A Simple Example
Can sample from a 2-D Gaussian using 1-D Gaussians (recall that if the joint distribution is a 2-D
Gaussian, conditionals will simply be 1-D Gaussians)
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 18
Next Class..
Probabilistic Machine Learning - CS772A (Piyush Rai, IITK) Approximate Inference: Sampling Methods (2) 19