Sunteți pe pagina 1din 38

CSE291D Lecture 9

Hidden Markov Models


(revisited)

1
Latent variable models

Z Latent variables

Parameters Φ X Observed data


Data
Points

Dimensionality(X) >> dimensionality(Z)


Z is a bottleneck, which finds a compressed, low-dimensional representation of X 2
Mixture models

Discrete
Z latent variables:
Cluster assignments

Parameters Φ X Observed data


Data
Points

3
Hidden Markov models

Discrete
latent variables:
Z Z Z
Cluster assignments …

Parameters Φ X X X

Data Points
Observed data
4
Hidden Markov models
vs mixture models

5
Example:
Occasionally dishonest casino

Fair die, F Loaded die, L

6
Constructing graphical model
plate diagrams from pseudocode

• If a variable is drawn based on another variable, draw an arrow to it


• If there is a for loop, draw a plate (unless arrows between duplicates)

(two separate
plates is also fine)
K = 1:K 7
Constructing graphical model
plate diagrams from pseudocode

K = 1:K

Z1 Z2 Z3 ZT

K = 1:K X1 X2 X3 X4
8
Joint distributions
from plate diagrams
• If there is a plate, include a product over the
plate’s factors

K = 1:K

Z1 Z2 Z3 ZT


K = 1:K
X1 X2 X3 X4
9

9
Learning outcomes
By the end of the lesson, you should be able to:

• Perform Bayesian inference for HMMs via MCMC


– Direct Gibbs, forward-backward blocked sampler
– Collapsed, uncollapsed

• Apply generalizations of HMMs for modeling


– Semi-Markov, input/output, factorial HMMs

10
11
12
13
Example application:
Part of speech tagging
The quick brown fox jumps over the sly lazy dogs

Plural noun

The sailor dogs the hatch

Verb: fastens (a watertight door) securely

• HMM POS tagger:


POS is a latent state for each word
14
Bayesian inference for HMMs
via MCMC
• Two tricks for speeding up MCMC:
– Blocked Gibbs (sampling all states at once)
– Collapsed Gibbs (marginalizing out parameters)
Explicit Collapsed

Pointwise Direct Gibbs sampler Collapsed Gibbs sampler

Blocked Blocked Gibbs sampler Collapsed blocked Gibbs


(forward-filtering, sampler
backward sampling)
15
Direct Gibbs sampler
(Pointwise, explicit)
• Until convergence:
– Sample transition probabilities

– Sample likelihood parameters

– Sample latent states, one at a time

16
Collapsed Gibbs sampling
(mixture model example)

Before collapsing After collapsing

17
Collapsed Gibbs sampler
(Pointwise, collapsed)
• Marginalize out transition probabilities
(need Dirichlet prior for this)

Polya urns Bookkeeping: Account for change in z’s

18
Blocked Gibbs sampler
(Blocked, explicit)
• Until convergence:
– Sample transition probabilities

– Sample likelihood parameters

– Sample all z’s in each HMM chain at once

19
Blocked Gibbs sampler
(Blocked, explicit)
• Sample all z’s in each HMM chain at once

– Forwards filtering, backwards sampling algorithm

• Perform a filtering pass, forwards in time (left-right).


Compute:

• Sample the last state from the filtering distribution

• Recursively sample, backwards from the last state,


to obtain a sample from the joint posterior of the z’s

20
Forwards filtering
• For each timestep t
– Use Bayes’ rule to recursively compute probability
of current state

– The “prior” is the prediction from previous step

NB: Parameters are implicitly conditioned on here 21


Backwards sampling
• In the forwards pass, cache the
filtering probabilities:

• We can then sample the last state correctly


from its posterior marginal distribution

22
Backwards sampling
• We can re-write the joint posterior from
right to left:

• Sample from right to left

23
Backwards sampling
• Compute the sampling distribution recursively

24
Collapsed blocked Gibbs sampler
(blocked, collapsed)
• Collapsing and blocking can both be beneficial
– However, the forward filtering, backward sampling algorithm for
blocked Gibbs assumes parameters are available.
– How to resolve this dilemma, and get best of both worlds?

• Solution:
– Use a Metropolis-Hastings-within-Gibbs scheme
• Temporarily re-instantiate parameters to good values
• Use these values to generate a proposal state sequence, via the FFBS algorithm
• Choose to accept or reject the proposal via a Metropolis-Hastings decision

Johnson, M., Griffiths, T. L., & Goldwater, S. (2007, April). Bayesian Inference for PCFGs
via Markov Chain Monte Carlo. In HLT-NAACL (pp. 139-146). 25
How to re-instantiate parameters?
• The standard Rao-Blackwellized estimator is to
plug in the posterior predictive probability,
from the Polya urn model

26
27
Hidden semi-Markov models
• According to the generative process of an HMM, the probability we stay in
state I for a duration of d steps is geometrically distributed:

• Hidden semi-Markov models encode the probability of this duration,


and enforce it with duration counter variables Dt

28
Input-Output HMMs
• Condition on inputs at each timestep, which can
affect state transitions and/or outputs

• This does not greatly complicate inference


29
Automatically illustrating a guacamole recipe from https://www.youtube.com/watch?v=H7Ne3s202lU
30
31
Input-output HMM for
recipe/speech alignment

Recipe step

Background
switch

Speech transcription
(words)

Textual recipe
steps

32
Factorial HMMs
• Multiple hidden chains each encode aspects of latent state.
• Each chain evolves independently, but observations generated
based on all chains

33
34
Example: Dynamic relational infinite
feature model (DRIFT)
• Model social networks over time
• Each actor has a vector of latent features
(e.g. interests), each with Markov dynamics
Feature 1 Feature 2 Feature 3

time

actor
J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth. A dynamic relational infinite feature model for longitudinal 35
social networks. Proceedings of the 14th International Conference on AI and Statistics (AI Stats), April 2011.
Miller, Griffiths, Jordan (2009)
Latent Feature Relational Model

Alice Bob

Cycling Tango
Fishing Salsa
Running
Claire

Waltz
Running

Cycling Fishing Running Tango Salsa Waltz


Alice
Z= Bob
Claire
K.T. Miller, T.L. Griffiths, and M.I. Jordan. Nonparametric latent feature models for link prediction. In Advances in 36
Neural Information Processing Systems (NIPS), 2009.
Dynamic relational infinite feature
model (DRIFT)

J. R. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth. A dynamic relational infinite feature model for longitudinal 37
social networks. Proceedings of the 14th International Conference on AI and Statistics (AI Stats), April 2011.
Think-pair-share:
Tennis match video action recognition
• You are a data analyst hired to help analyze the playing
style of professional tennis players, in order to help them
improve their performance.

• Design a system that uses an HMM-based model to


automatically annotate videos of tennis matches with the
types of tennis strokes used by each player (backhand
volley, forehand stroke, smash, service…).
– What features and likelihood model will you use?
– Can you encode a special-purpose transition matrix for the
game of tennis?
– How will you train the model?
– Will extensions of HMMs be useful?

38

S-ar putea să vă placă și