Cse291d 9

CSE291D Lecture 9
Hidden Markov Models

(revisited)
1
Latent variable models
Z Latent variables
Parameters Φ X Observed data

Data
Points
Dimensionality(X) >> dimensionality(Z)

Z is a bottleneck, which finds a compressed, low-dimensional representation of X 2
Mixture models
Discrete
Z latent variables:
Cluster assignments
Parameters Φ X Observed data

Data
Points
3
Hidden Markov models
Discrete
latent variables:
Z Z Z
Cluster assignments …
Parameters Φ X X X
Data Points
Observed data
4
Hidden Markov models
vs mixture models
5
Example:
Occasionally dishonest casino
Fair die, F Loaded die, L
6
Constructing graphical model
plate diagrams from pseudocode
• If a variable is drawn based on another variable, draw an arrow to it

• If there is a for loop, draw a plate (unless arrows between duplicates)
(two separate
plates is also fine)
K = 1:K 7
Constructing graphical model
plate diagrams from pseudocode
K = 1:K
Z1 Z2 Z3 ZT
…
K = 1:K X1 X2 X3 X4
8
Joint distributions
from plate diagrams
• If there is a plate, include a product over the
plate’s factors
K = 1:K
Z1 Z2 Z3 ZT
…
K = 1:K
X1 X2 X3 X4
9
9
Learning outcomes
By the end of the lesson, you should be able to:
• Perform Bayesian inference for HMMs via MCMC

– Direct Gibbs, forward-backward blocked sampler
– Collapsed, uncollapsed
• Apply generalizations of HMMs for modeling

– Semi-Markov, input/output, factorial HMMs
10
11
12
13
Example application:
Part of speech tagging
The quick brown fox jumps over the sly lazy dogs
Plural noun
The sailor dogs the hatch
Verb: fastens (a watertight door) securely
• HMM POS tagger:

POS is a latent state for each word
14
Bayesian inference for HMMs
via MCMC
• Two tricks for speeding up MCMC:
– Blocked Gibbs (sampling all states at once)
– Collapsed Gibbs (marginalizing out parameters)
Explicit Collapsed
Pointwise Direct Gibbs sampler Collapsed Gibbs sampler
Blocked Blocked Gibbs sampler Collapsed blocked Gibbs

(forward-filtering, sampler
backward sampling)
15
Direct Gibbs sampler
(Pointwise, explicit)
• Until convergence:
– Sample transition probabilities
– Sample likelihood parameters
– Sample latent states, one at a time
16
Collapsed Gibbs sampling
(mixture model example)
Before collapsing After collapsing
17
Collapsed Gibbs sampler
(Pointwise, collapsed)
• Marginalize out transition probabilities
(need Dirichlet prior for this)
Polya urns Bookkeeping: Account for change in z’s
18
Blocked Gibbs sampler
(Blocked, explicit)
• Until convergence:
– Sample transition probabilities
– Sample likelihood parameters
– Sample all z’s in each HMM chain at once
19
Blocked Gibbs sampler
(Blocked, explicit)
• Sample all z’s in each HMM chain at once
– Forwards filtering, backwards sampling algorithm
• Perform a filtering pass, forwards in time (left-right).

Compute:
• Sample the last state from the filtering distribution
• Recursively sample, backwards from the last state,

to obtain a sample from the joint posterior of the z’s
20
Forwards filtering
• For each timestep t
– Use Bayes’ rule to recursively compute probability
of current state
– The “prior” is the prediction from previous step
NB: Parameters are implicitly conditioned on here 21

Backwards sampling
• In the forwards pass, cache the
filtering probabilities:
• We can then sample the last state correctly

from its posterior marginal distribution
22
Backwards sampling
• We can re-write the joint posterior from
right to left:
• Sample from right to left
23
Backwards sampling
• Compute the sampling distribution recursively
24
Collapsed blocked Gibbs sampler
(blocked, collapsed)
• Collapsing and blocking can both be beneficial
– However, the forward filtering, backward sampling algorithm for
blocked Gibbs assumes parameters are available.
– How to resolve this dilemma, and get best of both worlds?
• Solution:
– Use a Metropolis-Hastings-within-Gibbs scheme
• Temporarily re-instantiate parameters to good values
• Use these values to generate a proposal state sequence, via the FFBS algorithm
• Choose to accept or reject the proposal via a Metropolis-Hastings decision
Johnson, M., Griffiths, T. L., & Goldwater, S. (2007, April). Bayesian Inference for PCFGs
via Markov Chain Monte Carlo. In HLT-NAACL (pp. 139-146). 25
How to re-instantiate parameters?
• The standard Rao-Blackwellized estimator is to
plug in the posterior predictive probability,
from the Polya urn model
26
27
Hidden semi-Markov models
• According to the generative process of an HMM, the probability we stay in
state I for a duration of d steps is geometrically distributed:
• Hidden semi-Markov models encode the probability of this duration,

and enforce it with duration counter variables Dt
28
Input-Output HMMs
• Condition on inputs at each timestep, which can
affect state transitions and/or outputs
• This does not greatly complicate inference

29
Automatically illustrating a guacamole recipe from https://www.youtube.com/watch?v=H7Ne3s202lU
30
31
Input-output HMM for
recipe/speech alignment
Recipe step
Background
switch
Speech transcription
(words)
Textual recipe
steps
32
Factorial HMMs
• Multiple hidden chains each encode aspects of latent state.
• Each chain evolves independently, but observations generated
based on all chains
33
34
Example: Dynamic relational infinite
feature model (DRIFT)
• Model social networks over time
• Each actor has a vector of latent features
(e.g. interests), each with Markov dynamics
Feature 1 Feature 2 Feature 3
time
actor
J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth. A dynamic relational infinite feature model for longitudinal 35
social networks. Proceedings of the 14th International Conference on AI and Statistics (AI Stats), April 2011.
Miller, Griffiths, Jordan (2009)
Latent Feature Relational Model
Alice Bob
Cycling Tango
Fishing Salsa
Running
Claire
Waltz
Running
Cycling Fishing Running Tango Salsa Waltz

Alice
Z= Bob
Claire
K.T. Miller, T.L. Griffiths, and M.I. Jordan. Nonparametric latent feature models for link prediction. In Advances in 36
Neural Information Processing Systems (NIPS), 2009.
Dynamic relational infinite feature
model (DRIFT)
J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth. A dynamic relational infinite feature model for longitudinal 37
social networks. Proceedings of the 14th International Conference on AI and Statistics (AI Stats), April 2011.
Think-pair-share:
Tennis match video action recognition
• You are a data analyst hired to help analyze the playing
style of professional tennis players, in order to help them
improve their performance.
• Design a system that uses an HMM-based model to

automatically annotate videos of tennis matches with the
types of tennis strokes used by each player (backhand
volley, forehand stroke, smash, service…).
– What features and likelihood model will you use?
– Can you encode a special-purpose transition matrix for the
game of tennis?
– How will you train the model?
– Will extensions of HMMs be useful?
38

Cse291d 9

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Cse291d 9

Încărcat de

Drepturi de autor:

Formate disponibile

CSE291D Lecture 9

Hidden Markov Models

Parameters Φ X Observed data

Dimensionality(X) >> dimensionality(Z)

Parameters Φ X Observed data

Fair die, F Loaded die, L

• If a variable is drawn based on another variable, draw an arrow to it

• Perform Bayesian inference for HMMs via MCMC

• Apply generalizations of HMMs for modeling

The sailor dogs the hatch

Verb: fastens (a watertight door) securely

• HMM POS tagger:

Pointwise Direct Gibbs sampler Collapsed Gibbs sampler

Blocked Blocked Gibbs sampler Collapsed blocked Gibbs

– Sample likelihood parameters

– Sample latent states, one at a time

Before collapsing After collapsing

Polya urns Bookkeeping: Account for change in z’s

– Sample likelihood parameters

– Sample all z’s in each HMM chain at once

– Forwards filtering, backwards sampling algorithm

• Perform a filtering pass, forwards in time (left-right).

• Sample the last state from the filtering distribution

• Recursively sample, backwards from the last state,

– The “prior” is the prediction from previous step

NB: Parameters are implicitly conditioned on here 21

• We can then sample the last state correctly

• Sample from right to left

• Hidden semi-Markov models encode the probability of this duration,

• This does not greatly complicate inference

Cycling Fishing Running Tango Salsa Waltz

• Design a system that uses an HMM-based model to

S-ar putea să vă placă și