Markov Networks

Markov Networks
Overview
Markov networks
Inference in Markov networks
Computing probabilities
Markov chain Monte Carlo
Belief propagation
MAP inference
Learning Markov networks
Weight learning
Generative
Discriminative (a.k.a. conditional random fields)
Structure learning
Markov Networks
Undirected graphical models
Smoking
Cancer
Asthma
Cough
Potential functions defined over cliques
1
P( x) c ( xc )
Z c
Z c ( xc )
x
Smoking Cancer
(S,C)
False
False
4.5
False
True
4.5
True
False
2.7
True
True
4.5
Markov Networks
Undirected graphical models
Smoking
Cancer
Asthma
Cough
Log-linear model:
P ( x) exp
Z
w f ( x)
i i
Weight of Feature i
Feature i
1 if Smoking Cancer
f1 (Smoking, Cancer )
0 otherwise
w1 1.5
Hammersley-Clifford Theorem
If Distribution is strictly positive (P(x) > 0)
And Graph encodes conditional independences
Then Distribution is product of potentials over
cliques of graph
Inverse is also true.
(Markov network = Gibbs distribution)
Markov Nets vs. Bayes Nets

Property
Markov Nets
Bayes Nets
Form
Prod. potentials
Prod. potentials
Potentials
Arbitrary
Cond. probabilities
Cycles
Allowed
Forbidden
Partition func. Z = ?
Indep. check
Z=1
Graph separation D-separation
Indep. props. Some
Some
Inference
Convert to Markov
MCMC, BP, etc.
Inference in Markov Networks

Computing
probabilities
Markov chain Monte Carlo

Belief propagation
MAP
inference
Computing Probabilities
Goal:
Compute marginals & conditionals of
P( X ) exp
Z
i wi fi ( X )
Z exp wi f i ( X )
X
i
Exact
inference is #P-complete
Approximate inference
Monte Carlo methods

Belief propagation
Variational approximations
Markov Chain Monte Carlo

General
algorithm: Metropolis-Hastings
Sample next state given current one according

to transition probability
Reject new state with some probability to
maintain detailed balance
Simplest
(and most popular) algorithm:

Gibbs sampling
Sample one variable at a time given the rest
w f ( x)
P( x | MB( x ))
exp w f ( x 0) exp w f ( x 1)
exp
i i
i i
i i
Gibbs Sampling
state random truth assignment
for i 1 to num-samples do
for each variable x
sample x according to P(x|neighbors(x))
state state with new value of x
P(F) fraction of states in which F is true
Belief Propagation
Form
factor graph: Bipartite network of

variables and features
Repeat until convergence:
Nodes send messages to their features

Features send messages to their variables
Messages
Current approximation to node marginals

Initialize to 1
Belief Propagation
x f ( x)
Nodes
(x)
h x
hn ( x ) \{ f }
( x)
Features
(f)
Belief Propagation
x f ( x)
h x
hn ( x ) \{ f }
( x)
Features
(f)
Nodes
(x)
f x ( x) e
~{ x}
wf ( x )
y f
yn ( f ) \{ x}
( y )
MAP/MPE Inference
Goal:
Find most likely state of world given

evidence
arg max P ( y | x)
y
Query
Evidence
MAP Inference Algorithms

Iterated
conditional modes
Simulated annealing
Belief propagation (max-product)
Graph cuts
Linear programming relaxations
Learning Markov Networks

Learning
parameters (weights)
Generatively
Discriminatively
Learning
structure (features)
In this lecture: Assume complete data
(If not: EM versions of algorithms)
Generative Weight Learning

Maximize
likelihood or posterior probability

Numerical optimization (gradient or 2nd order)
No local maxima
log Pw ( x ) ni ( x) Ew ni ( x )
wi
No. of times feature i is true in data

Expected no. times feature i is true according to model
Requires
inference at each step (slow!)
Pseudo-Likelihood
PL( x) P ( xi | neighbors ( xi ))
i
Likelihood
of each variable given its

neighbors in the data
Does not require inference at each step
Consistent estimator
Widely used in vision, spatial statistics, etc.
But PL parameters may not work well for
long inference chains
Discriminative Weight Learning

(a.k.a. Conditional Random Fields)
conditional likelihood of query (y)
given evidence (x)
Maximize
log Pw ( y | x) ni ( x, y ) Ew ni ( x, y )
wi
No. of true groundings of clause i in data
Expected no. true groundings according to model
Voted
perceptron: Approximate expected

counts by counts in MAP state of y given x
Other Weight Learning

Approaches
Generative:
Iterative scaling
Discriminative: Max margin
Structure Learning
Start
with atomic features

Greedily conjoin features to improve score
Problem: Need to reestimate weights for
each new candidate
Approximation: Keep weights of previous
features constant

Markov Networks

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Markov Networks

Încărcat de

Drepturi de autor:

Formate disponibile

Markov Networks

Learning Markov networks

Undirected graphical models

Potential functions defined over cliques

Undirected graphical models

Markov Nets vs. Bayes Nets

Graph separation D-separation

Indep. props. Some

MCMC, BP, etc.

Inference in Markov Networks

Markov chain Monte Carlo

Compute marginals & conditionals of

Monte Carlo methods

Markov Chain Monte Carlo

Sample next state given current one according

(and most popular) algorithm:

Sample one variable at a time given the rest

factor graph: Bipartite network of

Nodes send messages to their features

Current approximation to node marginals

Find most likely state of world given

MAP Inference Algorithms

Learning Markov Networks

Generative Weight Learning

likelihood or posterior probability

No. of times feature i is true in data

inference at each step (slow!)

of each variable given its

Discriminative Weight Learning

Expected no. true groundings according to model

perceptron: Approximate expected

Other Weight Learning

with atomic features

S-ar putea să vă placă și