Documente Academic
Documente Profesional
Documente Cultură
Conjugate Priors
Generative Models for Discrete Data
1
Participation grades: Poll Everywhere
• I will be recording participation in polls from
now on, for your 5% participation grade
6
Bayesian statistics: Recap
• Write down your prior beliefs, write down your likelihood, and
apply Bayes ‘ rule,
7
Conjugate priors
• How should we select our prior distribution?
– Conjugate priors are a mathematically convenient
choice
• Conjugate prior:
– Posterior is in the same family of distributions as
the prior
8
Why conjugate priors are important
• Tractability for simple models
– Closed form solutions to posterior from Bayes’ rule
9
Bernoulli model
• Flip a biased coin, what is the probability of a
heads?
10
Bernoulli model
• Flip a biased coin, what is the probability of a
heads?
11
Bernoulli model
• Note how the likelihood worked out nicely for
iid draws, since we could add up exponents
12
Beta distribution
• The distribution over [0,1] which is of the right
form so that we can add the exponents again
• Normalization constant
• Normalization constant
16
Beta distribution
normalization constant
17
Gamma function generalizes factorial
• More generally,
19
Putting it all together
20
Beta-bernoulli conjugacy
• Prior:
• Likelihood:
• Posterior:
21
Beta-bernoulli conjugacy
22
Beta-bernoulli conjugacy
23
Beta-bernoulli conjugacy
24
Beta-bernoulli conjugacy
25
Beta-bernoulli conjugacy
26
Beta-bernoulli conjugacy
27
Beta-bernoulli conjugacy
28
Beta-bernoulli conjugacy
29
Beta-bernoulli conjugacy
30
Shortcut – ignore normalizers
31
Shortcut – ignore normalizers
32
Shortcut – ignore normalizers
33
Shortcut – ignore normalizers
34
Shortcut – ignore normalizers
35
Interpretation of prior
hyper-parameters
36
37
38
39
40
41
Beta(1,1) Beta(1,10) Beta(2,2)
Beta(2,3) Beta(50,5)
42
43
Symmetric beta distributions with hyper-
parameters less than one are U-shaped
44
Beta-bernoulli posterior predictive
45
Beta-bernoulli posterior predictive
46
Beta-bernoulli posterior predictive
47
Beta-bernoulli posterior predictive
48
Beta-bernoulli posterior predictive
49
Dirichlet-multinomial model
• Multinomial distribution:
Roll a die N times, how many of each face?
• The data are the counts, not the sequence of draws,
so we need to multiply by a combinatorial term
50
Dirichlet-multinomial model
• Multinomial distribution:
Roll a die N times, how many of each face?
• The data are the counts, not the sequence of draws,
so we need to multiply by a combinatorial term
51
Dirichlet distribution
52
Dirichlet distribution
53
Dirichlet distribution
54
Dirichlet distribution
55
Dirichlet distribution
56
Multinomial distribution as an
urn process
• Place colored balls in an urn, where the number of colored
balls for each color k is proportional to θk
– For each of N observations
• Draw a ball from the urn, observe its color k
• Add one to the count of that color, Nk
• Place the ball back in the urn
57
Posterior predictive of Dirichlet-
Multinomial model: Polya Urn
• Place colored balls in an urn, where α is
the Dirichlet prior vector.
– For each of N observations
• Draw a ball from the urn, observe its color k
• Add one to the count of that color, Nk
• Place the ball back, along with a new ball of the same color
58
59
Dirichlet-multinomial text model
The quick brown fox jumps over the sly lazy dog
[5 6 37 1 4 30 5 22 570 12]
• Goals:
– a simple generative model for text documents
– Prediction of new words in a document
• Text compression
• Predictive text
• Classification, clustering
• Authorship identification
60
Dirichlet-multinomial text model
The quick brown fox jumps over the sly lazy dog
[5 6 37 1 4 30 5 22 570 12]
• Bag of words: represent document by its
count vector [N1, N2, …, NV]
• Dirichlet prior
61
Dirichlet-multinomial text model
• Model:
• Dirichlet posterior
62
Multinomial Naïve Bayes text classifier
• Goal: classification of documents
– E.g. what is the subject/topic of this document?
– Who wrote this document?
– News article vs scientific article vs tweet?
63
Multinomial Naïve Bayes text classifier
• Model:
65
Unsupervised multinomial Naïve Bayes
• Model:
67
Model-building in practice
• Construct a generative model of your data that
captures reasonable assumptions
• “All models are wrong, some models are useful” – G.E. Box
68
Build a model with standard
distributions as building blocks
• Binary variables as coin flips (Bernoulli)?
• Discrete variables as die rolls (categorical)?
• Count vectors as multinomials?
• Continuous variable as Gaussian?
• Dependent latent variables as HMM chain?
69
Formulating a generative model
• Having defined our data, latent variables,
observed variables and target variables, define
their dependencies.
70
The art of latent variable modeling:
Box’s loop
Evaluate,
Understand,
Data iterate
explore,
predict
Low-dimensional,
Complicated, noisy,
semantically meaningful
high-dimensional
representations
Latent
variable
(Algorithm, model) pair model
carefully co-designed for
tractability
71