Documente Academic
Documente Profesional
Documente Cultură
Overview
Markov networks
Inference in Markov networks
Computing probabilities
Markov chain Monte Carlo
Belief propagation
MAP inference
Weight learning
Generative
Discriminative (a.k.a. conditional random fields)
Structure learning
Markov Networks
Smoking
Cancer
Asthma
Cough
1
P( x) c ( xc )
Z c
Z c ( xc )
x
Smoking Cancer
(S,C)
False
False
4.5
False
True
4.5
True
False
2.7
True
True
4.5
Markov Networks
Smoking
Cancer
Asthma
Cough
Log-linear model:
P ( x) exp
Z
w f ( x)
i i
Weight of Feature i
Feature i
1 if Smoking Cancer
f1 (Smoking, Cancer )
0 otherwise
w1 1.5
Hammersley-Clifford Theorem
If Distribution is strictly positive (P(x) > 0)
And Graph encodes conditional independences
Then Distribution is product of potentials over
cliques of graph
Inverse is also true.
(Markov network = Gibbs distribution)
Markov Nets
Bayes Nets
Form
Prod. potentials
Prod. potentials
Potentials
Arbitrary
Cond. probabilities
Cycles
Allowed
Forbidden
Partition func. Z = ?
Indep. check
Z=1
Some
Inference
Convert to Markov
probabilities
MAP
inference
Computing Probabilities
Goal:
P( X ) exp
Z
i wi fi ( X )
Z exp wi f i ( X )
X
i
Exact
inference is #P-complete
Approximate inference
algorithm: Metropolis-Hastings
Simplest
w f ( x)
P( x | MB( x ))
exp w f ( x 0) exp w f ( x 1)
exp
i i
i i
i i
Gibbs Sampling
state random truth assignment
for i 1 to num-samples do
for each variable x
sample x according to P(x|neighbors(x))
state state with new value of x
P(F) fraction of states in which F is true
Belief Propagation
Form
Messages
Belief Propagation
x f ( x)
Nodes
(x)
h x
hn ( x ) \{ f }
( x)
Features
(f)
Belief Propagation
x f ( x)
h x
hn ( x ) \{ f }
( x)
Features
(f)
Nodes
(x)
f x ( x) e
~{ x}
wf ( x )
y f
yn ( f ) \{ x}
( y )
MAP/MPE Inference
Goal:
arg max P ( y | x)
y
Query
Evidence
conditional modes
Simulated annealing
Belief propagation (max-product)
Graph cuts
Linear programming relaxations
parameters (weights)
Generatively
Discriminatively
Learning
structure (features)
In this lecture: Assume complete data
(If not: EM versions of algorithms)
log Pw ( x ) ni ( x) Ew ni ( x )
wi
Requires
Pseudo-Likelihood
PL( x) P ( xi | neighbors ( xi ))
i
Likelihood
Maximize
log Pw ( y | x) ni ( x, y ) Ew ni ( x, y )
wi
No. of true groundings of clause i in data
Voted
Iterative scaling
Discriminative: Max margin
Structure Learning
Start