Bayesian Statistics: Why and How

Bayesian Statistics: Why and How | JEPS Bulletin http://blog.efpsa.org/2015/08/03/bayesian-statistics-why-and...
JEPS BULLETIN
The Official Blog of the Journal of European Psychology Students
ABOUT JEPS WEBSITE B LO G C O N T R I B U TO R S CONTRIBUTE!
BAYESIAN STATISTICS: WHY AND HOW
Bayesian statistics is what all the cool kids are talking about
these days. Upon closer inspection, this does not come as a
surprise. In contrast to classical statistics, Bayesian inference is
principled, coherent, unbiased, and addresses an important
question in science: in which of my hypothesis should I be-
lieve in, and how strongly, given the collected data?
After briefly stating the fundamental difference between classical and Bayesian statistics, I will in-
troduce three software packages – JAGS, BayesFactor, and JASP – to conduct Bayesian inference. I
will analyse simulated data about the effect of wearing a hat on creativity, just as the previous
blog post did. In the end I will sketch some benefits a “Bayesian turn” would have on scientific
practice in psychology.
Note that this post is very long, introducing plenty of ideas that you might not be familiar with
(regrettably, they are not found in the standard university curriculum!). This post should serve as
a basic but comprehensive introduction to Bayesian inference for psychologists. Wherever possi-
ble, I have linked to additional literature and resources for more in-depth treatments.
After you have read this post, you will begin to understand what the fuss is all about, and you will
be familiar with tools to apply Bayesian inference. Let’s dive in!
Classical Statistics
1 of 30 18/01/16 07:47
What is probability?
Classical statistics conceptualizes probability as long run relative frequency. For example, the
probability of a coin coming up heads is the proportion of heads in an infinite set of coin tosses.
This is why classical statistics is sometimes called frequentist. At first glance, this definition
seems reasonable. However, to talkabout probability, we now have to think about an infinite rep-
etition of an event (e.g., tossing a coin). Frequentists, therefore, can only assign probability to
events that are repeatable. They cannot, for example, talk about the probability of the tempera-
ture rising 4 °C in the next 15 years; or Hillary Clinton winning the next U.S. election; or you ac-
ing your next exam. Importantly, frequentists cannot assign probabilities to theories and hy-
potheses – which arguably is what scientists want (see Gigerenzer, 1993).
p values
Because of the above definition of probability, inference in classical statistics is counterintuitive,

for students and senior researchers alike (e.g., Haller & Kraus, 2002; Hoekstra, Morey, Rouder, &
Wagenmakers, 2014; Oakes, 1986). One notoriously difficult concept is that of the p value: the
probability of obtaining a result as extreme as the one observed, or more extreme, given that the
null hypothesis is true. To compute p values, we collect data and then assume an infinite repe-
tition of the experiment, yielding more (hypothetical) data. For each repetition we compute a
test statistic, such as the mean. The distribution of these means is the sampling distribution of
the mean, and if our observed mean is far off in the tails of this distribution, based on an arbi-
trary standard (α = .05 ) we conclude that our result is statistically significant. For a graphical de-
piction, see the figure below – inspired by figure 1 in Wagenmakers, 2007.
It is very easy to gloss over these specific assumptions of classical statistics because properties of
the sampling distribution are often known analytically. For example, in the case of the t-test, the
2 of 30 18/01/16 07:47
variance of the sampling distribution is the variance of the actual collected data, divided by the
number of data points; σ 2̂ = σ 2 /N .
Inference based on p values is a remarkable procedure: We assume that we did the experiment an
infinite amount of time (we didn’t), and we compute probabilities of data we haven’t observed,
assuming that the null hypothesis is true. This way of drawing inferences has serious deficits. For
an enlightening yet easy to read paper about these issues, see Wagenmakers (2007). If you think
that you could need a more detailed refresher on p values – they are tricky! – see here.
Confidence Intervals
Recently, the (old) new statistics has been proposed, abandoning p values and instead focusing
on parameter estimation and confidence intervals (Cumming, 2014). As we will see later, parame-
ter estimation and hypothesis testing answer different questions, so abandoning one in favour of
the other is misguided. Because confidence intervals are still based on classical theory, they are
an equally flawed method of drawing inferences; see Lee (n.d.) and Morey et al. (submitted).
What are parameters?
Tied to the classical notion of probability, classical statistics considers the population parameter
as fixed, while the data are allowed to vary (repeated sampling). For example, the difference be-
tween men and women in height is exactly 15 centimeters. We can intuit that confidence inter-
vals, as well as statistical power, are not concerned with the actual data we have collected. Why?
Assume we collect height data from men and women, and compute a 95% confidence interval
around our difference estimate – this does not mean that we are 95% confident that the true pa-
rameter lies within these bounds! For the actual experiment, the true parameter either is or is not
within these bounds. 95% confidence intervals state that, if we were to repeat our experiment an
infinite amount of time, in 95% of all cases the parameter will be within those bounds (Hoekstra,
Morey, Rouder, & Wagenmakers, 2014). This must be so because we can only talk about proba-
bility as relative frequency: we have to assume repeating our experiment, even though we only
conducted one!
It is important to note that all probability statements in classical statistics are of that nature: they
average over an infinite repetition of experiments. Probability statements do not pertain to the
specific data you gathered; they are about the testing procedure itself. Extend this to statistical
power, which is the probability of finding an effect when there really is one. High-powered exper-
iments yield informative data, on average. Low-powered experiments yield uninformative data,
3 of 30 18/01/16 07:47
on average. However, for the specific experiment actually carried out – once the data are in – we
can go beyond power as a measure of how informative our experiment was (Wagenmakers et al.,
2014). In the last section I explain what this entails when using Bayesian inference for hypothesis
testing.
Bayesian Statistics
What is probability?
In Bayesian inference, probability is a means to quantify uncertainty. Continuing with the height
example above, Bayesians quantify uncertainty about the difference in height with a probability
distribution. It might be reasonable, for example, to specify a normal distribution with mean
μ = 15 and variance σ 2 = 16
diﬀerence ∼  (15, 16)
We most strongly believe that the height difference is 15 centimeters, although it could also be
10, or even 5 centimeters (but with lower probability). However, we might not be so sure about
our estimates. To incorporate uncertainty, we could increase the variance of the distribution (de-
crease the “peakedness”), say, in this case, to σ 2 = 16. The plot below compares possible prior
beliefs (click on the image to enlarge it).
In the Bayesian world, probability retains the intuitive, common-sense interpretation: it is simply
a measure of uncertainty.
What are parameters?
4 of 30 18/01/16 07:47
While parameters still have one single true value in some ontological sense, men are on average
exactly 15 centimeters taller than women, we quantify our uncertainty about this difference with a
probability distribution. The beautiful part is that, if we collect data, we simply update our prior
beliefs with the information that is in the data to yield posterior beliefs.
Bayesian Parameter Estimation
There’s no theorem like Bayes’ theorem

Like no theorem we know
Everything about it is appealing
Everything about it is a wow
Let out all that a priori feeling
You’ve been concealing right up to now!
– George Box
Because parameters themselves are assigned a distribution, statistical inference reduces to apply-
ing the rules of probability. We specify a joint distribution over data and parameters, p(y, θ) . By
the definition of conditional probability we can write p(y, θ) = p(y|θ)p(θ) . The first term, p(y|θ) ,
is the likelihood, and it embodies our statistical model. It also exists in the frequentist world, and
it contains assumptions about how our data points are distributed; for example, whether they are
normally distributed or Bernoulli distributed. The other term, p(θ) , is called the prior distribution
over the parameters, and quantifies our belief (before looking at data) in, say, height differences
among the sexes.
Combining the data we have collected with our prior beliefs is done via Bayes’ theorem:
p(θ) × p(y|θ)
p(θ|y) =
p(y)
where y is the probability of the data (which has no frequentist equivalent); in words:
prior × likelihood
posterior =
marginal likelihood
Because p(y) is just a normalizing constant so that p(θ|y) sums to 1, i.e. is a proper probability
distribution, we can drop it and write:
p(θ|y) ∼ p(θ) × p(y|θ)
5 of 30 18/01/16 07:47
With Bayes’ rule we are solving the “inverse probability problem”, that is we are going from the
effect (the data) back to the cause (the parameter) (Bayes, 1763; Laplace 1774).
Classical statistics has a whole variety of estimators, such as maximum likelihood and general-
ized least squares, which are investigated along specific dimensions (e.g. bias, efficiency, consis-
tency). In contrast, Bayesians always uses Bayes’ rule, which simply follows from probability. This
is why we say that Bayesian statistics is principled, rational, and coherent. Let’s take a look at a
simple example to see how the two estimation approaches differ.
A simple example
Suppose you flip a coin twice and observe heads both times. What is your estimate that the next
coin flip comes up heads? We know that flipping a coin several times, with the flips being inde-
pendent, can be described binomially with likelihood:
L(θ|y) = θ k × (1– θ)N–k
where N is the number of flips and k is the number of heads. The likelihood L(θ|y) is just the
probability expressed in terms of the parameter θ , instead of the data y (and need not sum to 1).
For example, assuming θ = .5 , the likelihood is:
L(θ = .5|y) = .52 × (1– .5)2−2

L(θ = .5|y) = .25
Plotting the likelihood for the whole range of possible θ values given our data (k = 2 and N = 2 )
yields:
1 curve(dbinom(2, 2, prob = x), from = 0, to = 1, ylab = 'likelihood')
6 of 30 18/01/16 07:47
We see that θ = 1 maximizes the likelihood; in other words: θ = 1 is the maximum likelihood es-
timator (MLE) for these data. Maximum likelihood estimation is the workhorse of classical infer-
ence. In classical statistics, then, the prediction for the next coin coming up heads would be 1.
This is somewhat counterintuitive; would that really be your prediction, after just two coin flips?
What does the Bayesian in you think?
First, we need to quantify our prior belief about the parameter θ . Invoking the principle of insuffi-
cient reason, we use a prior that assigns equal probability to all possible values of θ : the uniform
distribution. We can write this as a Beta distribution with parameters a = 1, b = 1:
Beta(θ|a, b) = K × θ a–1 (1– θ)b–1
where K is the normalizing constant and not of importance for now. a and b can be interpreted
as prior data. In our example, a would be the number of heads and b would be the number of
tails. Remember that in Bayesian inference we just multiply the likelihood with the prior. The Beta
distribution is a conjugate prior for the binomial likelihood. This means that when using this
prior, the posterior will again be a Beta distribution, making for trivial computation. Plugging in
Bayes’ rule yields:
p(θ|N, k) ∼ θ k (1– θ)N–k × θ a–1 (1– θ)b–1

p(θ|2, 2) ∼ θ 2 (1– θ)2–2 × θ a–1 (1– θ)b–1
p(θ|2, 2) ∼ θ a–1+2 (1– θ)b–1+2–2
Rearranging yields:
p(θ|2, 2) ∼ θ (a+2)–1 (1– θ)(b+2–2)–1
We recognize that our posterior is again a Beta distribution: it’s Beta(3, 1). In general, our updat-
ing rule is Beta(a + k, b + N– k).
7 of 30 18/01/16 07:47
The point with the most probability density, the mode, is (a– 1)/(a + b– 2) which in our case
yields 1. However, as we see when looking at the distribution, there is substantial uncertainty. To
account for this, we take the mean of the posterior distribution, a/(a + b), as our best guess for
the future. Thus, the prediction for the next coin flip being heads is .75. Note that classical esti-
mation, based on maximum likelihood (θMLE = k/N = 2/2), would predict heads with 100% cer-
tainty. For more on likelihoods, check out this nice blog post. For more details on Bayesian up-
dating, see this.
What can be seen in conjugate examples quite clearly is that the Bayesian posterior is a weighted
combination of the prior and the likelihood. We have seen that the posterior mean is:
k+a
p̂ =
a+b+n
which, with some clever rearranging, yields:
n k a+b a
p̂ = ( )+ ( )
a+b+n n a+b+n a+b
Both the maximum likelihood estimator θMLE = k/n and the mean of the prior a/(a + b) are
weighted by terms that depend on the sample size n and the prior parameters a and b. Two im-
portant things should be noted here. First, this implies that the posterior mean is shrunk toward
the prior mean. In hierarchical modeling, this is an extremely powerful idea. Shrinkage yields a
better estimator (Efron & Morris, 1977). For a nice tutorial on hierarchical Bayesian inference, see
Sorensen & Vasishth (submitted).
8 of 30 18/01/16 07:47
The second thing to note is that when n becomes very large, the Bayes point estimate and the
classical maximum likelihood estimate converge to the same value (Bernstein-von Mises theo-
rem). You can see this in the formula above. The first weight becomes 1 and the second weight
becomes 0, leaving only k/n, which is the maximum likelihood estimate. Therefore, when differ-
ent people initially disagree (different prior), once they have seen enough evidence, they will
agree (same posterior).
Note that identical estimation results need not lead to the same model comparison results. I
find the argument that because the estimation converges, we can just go on doing frequentist
statistics rather disturbing. First, you never have an infinite amount of participants. Second, the
prior allows you to incorporate theoretical meaningful information into your model, which can be
extremely valuable (e.g. this great problem). Third, and most important, estimation and testing
answer different questions. While parameter estimates might converge, Bayesian hypothesis test-
ing offers a much more elegant and richer inferential foundation over that provided by classical
testing (but more on this below).
Markov chain Monte Carlo
Using a simple example, we have seen how Bayesians update their prior beliefs upon seeing new
data, yielding posterior beliefs. However, conjugacy is not always given, and the prior times the
likelihood might be a unusually formed, complex distribution that we cannot track analytically.
For most of the 20th century, Bayesian analysis was restricted to conjugate priors for computa-
tional simplicity.
But things have changed. Due to the advent of cheap computing power and Markov chain Monte
Carlo techniques (MCMC) we now can estimate basically every posterior distribution we like.
Bayesian hypothesis testing
Parameter estimation is not the same as testing hypotheses (e.g. Morey, Rouder, Verhagen, & Wa-
genmakers, 2014; Wagenmakers, Lee, Rouder, & Morey, submitted). The argument is simple:
while each observation tells you something about the parameter, not every observation is infor-
mative about which hypothesis you should believe in. To build an intuition for that, let’s throw a
coin (again). Is it a fair coin (θ
= .5) or a biased (θ ≠ .5) coin? Assume the first observation is
heads. You learn something about the parameter p; for example, it can’t be equal to 0 (always
tails), and that a bias toward heads is more likely than a bias toward tails. Looking only at the pa-
rameter estimate thus carries some evidence against the null hypothesis. However, in fact you
have learned nothing that would strengthen your belief in either of your hypotheses. Observing
heads is equally consistent with the coin being unbiased and the coin being biased (the Bayes
9 of 30 18/01/16 07:47
factor is equal to 1, but more below). Inference by parameter estimation is inadequate, unprinci-
pled, and biased.
In order to test hypotheses, we have to compute the marginal likelihood, which we can skip when
doing parameter estimation (because it is just a constant). As a reminder, here is the formula
again:
prior × likelihood
posterior =
marginal likelihood
Let’s say we compare two models, M0 and M1 , that describe a difference in the creativity of peo-
ple who wear hats and others who don’t. The parameter of interest is δ. M0 restricts δ to 0, while
M1 lets δ vary. This corresponds to testing a null hypothesis H0 : δ = 0 against the alternative
H1 : δ ≠ 0. In order to test which hypothesis is more likely to be true, we pit the predictions of
the models against each other. The prediction of the models is embodied by their respective mar-
ginal likelihoods, p(y|M). For the discrete case, this yields:
∑
P(y|M) = P(y|θi , M)P(θi |M)
i=1
That is, we look at the likelihood of the data y for every possible value of θ and weight these val-
ues with our prior belief. For continuous distributions, we have an integral:
∫
p(y|M) = p(y|θ, M)p(θ|M)dθ
The ratio of the marginal likelihoods is the Bayes factor (Kass & Raftery, 1995):
p(y|M0 )
BF01 =
p(y|M1 )
which is the factor by which our prior beliefs about the hypotheses (not parameters!) get updated
to yield the posterior beliefs about which hypothesis is more likely:
p(M0 |y) p(y|M0 ) p(M0 )

=
p(M1 |y) p(y|M1 ) p(M1 )
The Bayes factor is a continuous measure of evidence: BF01 = 3 indicates that the data are three
times more likely under the null model M0 than under the alternative model M1 . Note that
BF10 = 1/BF01 , and that when the prior odds are 1, the posterior odds equal the Bayes factor.
10 of 30 18/01/16 07:47
Because complex models can capture many different observations, their prior on parameters p(θ)
is spread out wider than those of simpler models. Thus there is little density at any specific point
– because complex models can capture so many data points; taken individually, each data point is
comparatively less likely. For the marginal likelihood, this means that the likelihood gets multi-
plied with these low density values of the prior, which decreases the overall marginal likelihood.
Thus model comparison via Bayes factors incorporates an automatic Ockham’s razor, guarding us
against overfitting.
While classical approaches like the AIC naively add a penalty term (2 times the number of param-
eters) to incorporate model complexity, Bayes factors offer a more natural and principled ap-
proach to this problem. For details, see Vandekerckhove, Matzke, & Wagenmakers (2014).
Two priors
Note that there are actually two priors – one over models, as specified by the prior odds
p(M0 )/p(M1 ), and one over the parameters, p(θ) , which is implicit in the marginal likelihoods.
The prior odds are just a measure of how likely one model or hypothesis is over the other, prior
to data collection. It is irrelevant for the experiment at hand, but might be important for drawing
conclusions. For example, Daryl Bem found a high Bayes factor in support of precognition. Surely
this does not by itself mean that we should believe in precognition! The prior odds for pre-cogni-
tion are extremely low – at least for me – so that even if multiplied by a Bayes factor of 1000, the
posterior odds will be vanishingly small:
1 1000 1
= ⋅
100000 1 100000000
We can agree on how much the data support precognition (as quantified by the Bayes factor).
However, this does not mean we have to buy it. Extraordinary claims require extraordinary evi-
dence.
Savage-Dickey trick
For the univariable case, computing the marginal likelihood is pretty straightforward; however,
the integrals get harder with many variables, say in multiple regression. Note that while parame-
ter estimation is basically solved with MCMC methods, computing the marginal likelihood is still a
tough deal. However, when testing nested models, such as in the case of null hypotheses testing,
we can use a neat mathematical trick that sidesteps the issue of computing the marginal likeli-
hood – the Savage-Dickey density ratio:
11 of 30 18/01/16 07:47
p(δ = 0|M1 , y)
BF01 =
p(δ = 0|M1 )
That is, take the ratio of the posterior density at the point of interest, for example δ = 0, and di-
vide it by the prior density at the point of interest. Note that unlike to the p value, the Bayes fac-
tor is not limited to testing nested models, but can compare complex, functionally different mod-
els (as is common in cognitive science). We will see a nice graphical depiction of Savage-Dickey
later.
Let us revisit our coin toss example. We flipped the coin twice and observed heads both times.
Suppose we want to test the hypothesis that the coin is fair – that is θ = .5. Using the Savage-
Dickey density ratio, we divide the height of the posterior distribution at θ = .5 by the height of
the prior distribution at θ = .5 . Recall that:
p(θ) ∼ Beta(1, 1)
p(θ|y) ∼ Beta(3, 1)
Thus yielding:
p(θ = .5|y)
BF01 =
p(θ = .5)
.75
BF01 =
1
1
BF10 =1
3
The data are 1 13 times more likely under the model that assumes that the coin is biased toward
either heads or tails. For a nice first lesson in Bayesian inference that prominently features coin
tosses with interactive examples, see this. For an excellent introduction to the Savage-Dickey
density ratio, see Wagenmakers et al. (2010).
Lindley’s paradox and the problem with uninformative priors
Uninformative priors like Unif( − ∞, + ∞) can be used in parameter estimation because the data
quickly overwhelm the prior. However, when testing hypotheses, we need to use priors that are at
least somewhat informative. One can see the problem with uninformative priors in the Savage-
Dickey equation. These priors spread out their probability mass such that at each point there is
virtually zero density. Dividing something by a very, very small number yields a very, very high
number. The resulting Bayes factor favours the null hypothesis without bounds (DeGroot, 1982;
Lindley, 1957). Consequently, we need to use informative priors for hypothesis testing. Rouder,
12 of 30 18/01/16 07:47
Morey, Wagenmakers and colleagues have extended the default priors initially proposed by
Harold Jeffreys (1961). These default priors have certain important features, see Rouder & Morey
(2012), but should be given some thought and possible adjustments before used.
Creativity example
We want to know if wearing hats does indeed have an effect on creativity. Instead of collecting
real data, we just simulate data, assuming a real effect of Cohen’s d = 7/15.
1 set.seed(666)
2
3 x <- rep(0:1, each = 50)
4 y1 <- rnorm(50, mean = 50, sd = 15)
5 y2 <- rnorm(50, mean = 57, sd = 15)
6 dat <- data.frame(y = c(y1, y2), x = x)
7
8 boxplot(y ~ x, dat, col = c('blue', 'green'),
9 xlab = 'condition', ylab = 'creativity')
Classical Inference
In classical statistics we would compute p values. Because the t-test is simply a general linear
model with one categorical predictor, we can run:
1 summary(lm(y ~ x, dat))
2
3 ## Call:
4 ## lm(formula = y ~ x, data = dat)
5 ##
13 of 30 18/01/16 07:47
6 ## Residuals:
7 ## Min 1Q Median 3Q Max
8 ## -43.120 -10.578 0.219 11.295 33.223
9 ##
10 ## Coefficients:
11 ## Estimate Std. Error t value Pr(>|t|)
12 ## (Intercept) 49.028 2.195 22.339
If you are not familiar with R, this output might look daunting. Note that the t-test as a general
linear model states that
yi = β0 + β1 xi + ϵi
In the R output, (Intercept) is β0 , while x is β1 . β0 is the mean of the group in which our categori-
cal predictor is 0, i.e. the group which did not wear hats. The group that did wear hats, on the
other hand, gets a creativity boost of 6.944. The p value for this difference is p = .028. We would
conclude that the probability of observing these data or more extreme data given that there re-
α = .05 , we would say our result is
ally is no effect is only 2.8%. Using the conventional cutoff,
statistically significant. Although we have not computed p(H1 |y), we would conclude that the re-
sult supports the alternative hypothesis. Is this really so?
Bayesian Inference
Below I introduce three sofware packages for Bayesian inference. The first one, JAGS, is very
low-level and does not readily provide Bayes factors. Its primary use is to estimate the posterior
distribution. In our present example, we are interested in the difference between the two groups,
specifically in the effect size. As mentioned earlier, we can use uninformative priors for parameter
estimation. To get a feeling for slightly informative priors, however, I will specify default priors
when using JAGS. Subsequently we will compute Bayes factors using the BayesFactor R package
and the graphical software JASP.
Using JAGS
JAGS (“Just Another Gibbs Sampler”) implements MCMC methods (as discussed above) to sample
from the posterior distribution. This is especially needed when the posterior distribution is not
available in closed form – say when we don’t have conjugate priors.
To have a scale-free measure of our prior belief, we will specify our prior beliefs over the effect
size δ = μ/σ 2 . As suggested in the literature, we might want to use a default prior. For a detailed
rationale of so-called default priors, see pages 6 and 7 of this, Rouder, Speckman, Sun, Morey, &
Iverson (2009), and Rouder & Morey (2012). For a more thorough treatment of the default prior
14 of 30 18/01/16 07:47
approach pioneered by Harold Jeffreys, see Ly et al. (in press).
Constructing a default prior
What is our prior on the effect size? We might want to use a normal distribution centered on 0:
δ ∼ Normal(0, g)
The more difficult question is how to specify the variance, g. If we specify large values, then we
say that we also expect very high values for the effect size, like 2 or 3. This never happens in
psychology! Perhaps we should choose a lower value, like 1? Or better yet, why not quantify our
uncertainty about the variance with … a probability distribution!
This is what we will do, using the inverse Gamma distribution which has two parameters:
1 r2
g ∼ Inverse Gamma( , )
2 2
Now when we integrate out the parameter g, that is incorporate our uncertainty about the vari-
ance in our belief about the effect size, this elaborate prior specification simplifies to:
δ ∼ Cauchy(r)
The Cauchy distribution has only one parameter, r . It is similar to a normal distribution, but with
fatter tails (indicating greater uncertainty). The scale parameter r influences the width of the dis-
tribution; the higher the parameter value, the wider the distribution – meaning greater probability
density at the tails.
Are we done? Not quite! We still need to specify a prior over the variance. Because we do not
make inference about the variance, we can give it a noninformative, Jeffreys’ prior. We approxi-
mate the Jeffreys’ prior using an inverse Gamma distribution with very small shape and rate pa-
rameters (e.g. Morey, Rouder, Pratte, & Speckman, 2011):
σ 2 ∼ Inverse Gamma(.00001, .00001)
We also have to specify a prior distribution over β0 , the mean of the group that did not wear hats.
Again, because our inference does not depend on this parameter, we assign it an uninformative
prior:
15 of 30 18/01/16 07:47
β0 ∼ Normal(0, 10000)
Note that the specification in JAGS does not use the variance σ 2 , but precision, which is 1/σ 2 .
Therefore, to specify a high variance, we would use a low precision.
Although it is possible to compute Bayes factors directly in JAGS, for example using the prod-
uct-space method (Lodewyckx et al., 2011) or conditional marginal density estimation (Morey et
al., 2011), it is primarily used to estimate posterior distributions of parameters of interest. In the
example that follows, we will compute Bayes factors with easier-to-use software and use JAGS to
estimate posterior distributions. By the way, if you are lost with all those different distributions,
check out Richard Morey’s nice interactive visualisations.
Note that the general linear model assumes that:
yi = β0 + β1 xi + ϵi
ϵ ∼ Normal(0, σ 2 )
that is the errors are normally distributed and centered at 0. From this it follows that:
yi – (β0 + β1 xi ) ∼ Normal(0, σ 2 )
yi ∼ Normal(β0 + β1 xi , σ 2 )
which is the specification used in JAGS (with precision instead of variance).
1 library('rjags') # for communicating with JAGS

2 library('ggmcmc') # for nice visualizations
3 options(scipen = 100, digits = 5)
4
5 model_string <- '
6 model {
7 ### priors
8 sigma ~ dgamma(.00001, .00001) # [Jeffreys prior]
9 d ~ dnorm(0, invg) # [prior on effect size d]
10 invg ~ dgamma(1/2, rsqr/2) # [prior on variance of d]
11 b0 ~ dnorm(0, .0001) # [mean of the non-hat group]
12 tau <- pow(sigma, -2) # [precision = 1/variance]
13 b1 <- d * sigma # [difference between non-hat & hat]
14
15 ### likelihood
16 for (i in 1:N) {
17 y[i] ~ dnorm(mu[i], tau)
18 mu[i] <- b0 + b1 * x[i] # [t-test as a general linear model]
19 }
20 }'
21
22 params <- c('b0', 'b1', 'd')
16 of 30 18/01/16 07:47
23 data <- list('x' = dat$x, 'y' = dat$y, 'rsqr' = sqrt(2)/2, 'N' = nrow(dat))
24 model <- jags.model(textConnection(model_string), data = data, n.chains = 3)
25 samples <- coda.samples(model, n.iter = 100000, variable.names = params)
The plot below shows the posterior distributions of b0 (the mean in the no-hats condition), b1
(the difference between the two groups), and d (effect size):
How wonderful! Instead of a point estimate, we get a whole distribution. It is important to check if
the MCMC algorithm actually converged on the posterior distribution. Run the commands below
to monitor convergence:
1 #samp <- ggs(samples)

2 #ggs_autocorrelation(samp)
3 #ggs_traceplot(samp)
4 #ggs_geweke(samp)
5 #ggs_Rhat(samp)
6 #ggs_running(samp)
Upon inspection, everything seems fine!
Let’s compute credibility intervals – the Bayesian analogue to confidence intervals:
1 HPDinterval(samples)[[1]]
2
3 ## lower upper
4 ## b0 44.846982 53.24459
5 ## b1 0.362848 12.15736
6 ## d 0.034575 0.80231
7 ## attr(,"Probability")
8 ## [1] 0.95
17 of 30 18/01/16 07:47
We can be 95% confident that the effect size is between .034 and .802. There is substantial vari-
ability, but that’s just how things are (also this). Note that, in this case, the frequentist confidence
interval coincides with the Bayesian credibility interval.
What John Kruschke and other advocates of “Hypothesis testing via Estimation” would have us
look at to draw inferences is only these posterior distributions. Simplifying a bit, if the 95% credi-
bility interval of d excludes 0 (or some specified region of interest, say -.1 to .1), we would con-
clude that there really is an effect. Note that this is somewhat unprincipled – how strongly should
we believe in an effect? And it is biased against H0 – we will see below that there is not much evi-
dence for an effect.
Note also that we might have outliers in our data, rendering the normal distribution inadequate.
Thanks to JAGS’s flexibility, we can easily substitute the normal distribution with a t-distribution,
which has fatter tails. This would make our inference more robust, see e.g. chapter 16 in Kr-
uschke (2015).
Concluding the section on JAGS, I want to say that JAGS offers a lot of flexibility, but at the price
of complexity. Below I introduce you to more user-friendly software, which also enables us to
easily compute Bayes factors for common designs. However, if you are doing multivariate statis-
tics like structural equation modeling, there is no way around software like JAGS. If you are inter-
ested in cognitive modeling of the Bayesian kind, you also need to use JAGS. For a great book on
Bayesian cognitive modeling, see Lee & Wagenmakers (2013).
Using BayesFactor
The BayesFactor R package written by Richard Morey computes the Bayes factor for a variety of
common statistical models. Similarly to the frequentist analysis above, we specify the t-test as a
general linear model and use the function lmBF from the BayesFactor package:
1 library('BayesFactor')
2
3 lmBF(y ~ x, dat)
4
5 ## Bayes factor analysis
6 ## --------------
7 ## [1] x : 1.8934 ±0%
8 ## Against denominator:
9 ## Intercept only
10 ## ---
11 ## Bayes factor type: BFlinearModel, JZS
The resulting Bayes factor is BF10 = 1.84, that is the model that assumes an effect of hat is 1.84
times more likely then the model that assumes no effect – given a so-called JZS prior specifica-
18 of 30 18/01/16 07:47
tion. JZS stands for Jeffreys-Zellner-Siow. This prior is basically the same we have specified be-
fore. For a more detailed rationale, see Rouder et al. (2009).
The BayesFactor package also does ANOVA, multiple regression, proportions and more – do have
a look! For a more detailed explanation of the Bayesian t-test, see this. For a tutorial on regres-
sion that uses the BayesFactor package, click here.
Using JASP
“Gosh!” You might say. “This looks all very peculiar. I am not familiar with R, let alone with JAGS.
Bayes seems complicated enough. Oh, I better stick with p values and SPSS.”
NOOOOOOOOOOOOOOOOOOOOOOOOOO. There is a better way!
JASP is a slick alternative to SPSS that also provides Bayesian inference (and it does APA tables!).
First, let us write the data to a .csv file:
1 write.csv(dat, file = 'creativity.csv', row.names = FALSE)
Open JASP, read in the .csv file, and run a Bayesian independent t-test. Checking Prior and Pos-
terior and Additional Info gives us a beautiful plot summarizing the result:
Above you can see the Savage-Dickey density ratio, i.e. the height of the posterior divided by the
height of the prior, at the point of interest (here δ = 0).
Is this result consistent for different prior specifications? Checking Bayes factor robustness
check yields the following result:
19 of 30 18/01/16 07:47
We see that across different widths (0-1.5) of the Cauchy prior, the conclusion roughly stays the
same. If we were to specify a width of 200, what would happen? Lindley would turn around in his
grave! As mentioned above in the section on Lindley’s paradox, uninformative priors should not
be used for hypothesis testing because this leads the Bayes factor to strongly support H0 . You
can see that in the graph below.
Setting aside unreasonable prior specifications, the Bayes factor is not very different from 1. This
means that our data are not sufficiently informative to choose between the hypotheses. While the
Bayes factor is a continuous measure, it might be reasonable to label certain categories depend-
ing on the magnitude of the Bayes factor. Several labels have been proposed. For an overview, see
this. To get some intuition how a Bayes factor “feels”, see here.
Because the data are not compelling enough, we run an additional 20 participants per group:
1 set.seed(1000)
2 x <- rep(0:1, each = 20)
3 y1.2 <- rnorm(20, mean = 50, sd = 15)
20 of 30 18/01/16 07:47
4 y2.2 <- rnorm(20, mean = 57, sd = 15)

5 dat.2 <- data.frame(y = c(y1.2, y2.2), x = x)
6 dat.both <- rbind(dat, dat.2)
A quick look tells us that the Bayes factor in favour of the alternative hypothesis is now roughly
25:
1 lmBF(y ~ x, dat.both)
2
3 ## Bayes factor analysis
4 ## --------------
5 ## [1] x : 24.94 ±0%
6 ## Against denominator:
7 ## Intercept only
8 ## ---
9 ## Bayes factor type: BFlinearModel, JZS
We can see how the Bayes factor develops over the data collection using JASP. First, let’s write the
data to disk again:
1 write.csv(dat.both, file = 'creativity.csv', row.names = FALSE)
Running the same analysis in JASP in order to get those stunning graphs yields:
Checking Sequential Analysis yields:
21 of 30 18/01/16 07:47
This plot shows how the Bayes factor develops over the course of participant collection.
JASP includes several interesting datasets. I encourage you to play with them! Peter Edelsbrunner
and I recently gave a workshop where we had some exercises and more datasets. You can find the
materials here.
Advantages of Bayesian inference
The right kind of probability
I don’t think that researchers are really interested in the probability of the data (or more extreme
data), given that the null hypothesis is true and the data was collected according to a specific
(unknown) sampling plan. Rather, I believe that scientists care about which hypothesis is more
likely to be true after the experiment has been conducted. In the same vein, we don’t care about
how often, in an infinite repetition of the experiment, the parameter estimate lies in a specific in-
terval. We prefer a statement of confidence. How confident can I be that the parameter estimate
lies in this specific interval?
“Subjectivity”
“Today one wonders how it is possible that orthodox logic continues to be taught in some
places year after year and praised as ‘objective’, while Bayesians are charged with
‘subjectivity’. Orthodoxians, preoccupied with fantasies about nonexistent data sets and,
in principle, unobservable limiting frequencies – while ignoring relevant prior information
– are in no position to charge anybody with ‘subjectivity’.”
– Jaynes (2003, p. 550), as cited in Lee & Wagenmakers (2014, p. 61)
Bayesian inference conditions only on the observed data, that is it does not violate the likelihood
22 of 30 18/01/16 07:47
principle, and so is independent of the researcher’s intentions. This might sound surprising, but
p values are inherently subjective – in a really nasty way. For theoretical arguments about that,
see Berger & Berry (1988) and Wagenmakers (2007). Scientific objectivity is illusionary, and both
frequentist and Bayesian methods have their subjective elements. Crucially, though, Bayesian
subjectivity is open to inspection by everyone (just look at the prior!), whereas frequentist subjec-
tivity is not.
Optional Stopping
“The rules governing when data collection stops are irrelevant to data interpretation. It is
entirely appropriate to collect data until a point has been proven or disproven, or until the
data collector runs out of time, money, or patience.”
– Edwards, Lindman, & Savage (1963, p. 239)
Bayesians do not have an optional stopping problem (Edwards et al., 1963; Rouder, 2014). Recall
that in classical statistics, you cannot collect additional data once you have run your test, because
this inflates the nominalα level; that is it will rain false positives (Simmons, Nelson, & Simonsohn,
2011). For example, let’s say you test 20 participants and get p = .06. You cannot test another
batch of, say 5, and run your test again. In a sense, you are in limbo. You can neither conclude
that H1 is supported, nor that H0 is supported, nor that the data are uninformative. On other oc-
casions you might want to stop early because the data show a clear picture, and running more
subjects might be expensive or unethical. However, within the framework of classical statistics,
you are not permitted to do so. Using Bayesian inference, we can monitor the evidence as it
comes in, that is test after every participant, and stop data collection once the data are informa-
tive enough – say the Bayes factor in favour of H0 or H1 is greater than 10. For a recent paper on
sequential hypothesis testing, see Schönbrodt (submitted).
Continuous Measure of Evidence
The Bayes factor is a continuous measure of evidence and is directly interpretable. While infer-
ence in current statistical practices relies on an arbitrary cutoff of α = .05 and results in a coun-
terfactual (probability of at least as extreme data, given H0 ), Bayes factors straightforwardly tell
you which hypothesis is more likely given the data at hand.
Supporting the null
Bayes factors explicitly look at how likely the data are underH0 and H1 . Recall that the p value
looks at the probability of the data (or more extreme data), given that H0 is true. The logic be-
hind inference, called Fisher’s disjunction, is as follows: either a rare event has occured (with
23 of 30 18/01/16 07:47
probability p), or H0 is false. In the form of a syllogism, we have:
(Premise) If H0 , then y is very unlikely.
(Premise) y.
(Conclusion) H0 is very unlikely.
This reasoning is flawed, as demonstrated by the following:
(Premise) If an individual is a man, he is unlikely to be the Pope.
(Premise) Francis is the Pope.
(Conclusion) Francis is probably not a man.
Nonsense! The problem is that in classical inference, we do not look at the probability of the data
under H1 . The data at hand, Francis being the Pope, are infinitely more likely under the hypothe-
sis that Francis is a man (H0 ) than they are under the hypothesis that Francis is not a man (H1 ).
P values, because they only look at the probability of the data under H0 , are violently biased
against H0 (we have already seen this in our creativity example). For a more detailed treatment,
see Wagenmakers et al. (in press). A study looking at 855 t-tests to quantify the bias of p values
empirically can be found in Wetzels et al. (2011). A tragic, real-life case of how a p value caused
grave harm is the case of Sally Clark; the story is also told in Rouder et al. (submitted).
Bayesian inference conditions on both H0 and H1 , thus it also allows us the quantify support for
the null hypothesis. In science, invariances can be of great interest (Rouder et al., 2009). Being
able to support the null hypothesis is also important in replication research (e.g. Verhagen & Wa-
genmakers, 2014).
Conclusion
I hope to have convinced you that Bayesian statistics is a sound, elegant, practical, and useful
method of drawing inferences from data. Bayes factors continuously quantify statistical evidence
– either forH0 or H1 – and provide you with a measure of how informative your data are. If data
are not informative (BF ∼ 1), simply collect more data. Credibility intervals retain the intuitive,
common-sense notion of probability and tell you exactly what you want to know: how certain am
24 of 30 18/01/16 07:47
I that the parameter estimate lies within a specific interval?
JAGS, BayesFactor, and especially JASP provide easy-to-use software so that you can actually get
stuff done. In light of what I have told you so far, I want to end with a rather provocative quote by
Dennis Lindley, a famous Bayesian:
“[…] the only good statistics is Bayesian statistics. Bayesian statistics is not just another
technique to be added to our repertoire alongside, for example, multivariate analysis; it is
the only method that can produce sound inferences and decisions in multivariate, or any
other branch of, statistics. It is not just another chapter to add to that elementary text you
are writing; it is that text. It follows that the unique direction for mathematical statistics
must be along the Bayesian roads.”
– Lindley (1975, p. 106)
Suggested Readings
For easy introduction, I suggest playing with this and reading this, this, this, and this. Peter
Edelsbrunner and I recently did a workshop on Bayesian inference; you can find all the materials
(slides, code, exercises, reading list) on github. For a thorough treatment, I suggest Jackman
(2009) and Gill (2015). For an introduction more geared toward psychologists, but without a de-
tailed account of Bayes factors, see Kruschke (2015). For an excellent practical introduction to
Bayesian cognitive modeling, see Lee & Wagenmakers (2013).
Important Note
Bayesian and frequentist statistics have a long history of bitter rivalry (see for example McGrayne,
2011). Because the core issues – e.g. what is probability? – are philosophical rather than empiri-
cal, most of the debates were heated and emotional. There was a time when Bayes was more the-
ology than tool. Although Bayesian statistics – in contrast to our current statistical approach – is a
coherent, principled, and intuitive way of drawing inferences from data, there are still open is-
sues. A Bayesian is not a Bayesian (Berger, 2006; Gelman & Shalizi, 2013; Morey, Romeijn, &
Rouder, 2013; Kruschke & Liddell, submitted).
It would be foolish to think that Bayesian statistics could single-handedly turn around psychol-
ogy. The current “crisis in psychology” (Pashler & Wagenmakers, 2012) won’t be solved by report-
ing BF10 > 3 instead of p < .05. Bayes cannot be the antidote to questionable research prac-
tices, “publish or perish” incentives, or mindless statistics (e.g. Gigerenzer, 2004). However, be-
cause Bayes factors are not biased against H0 , allow us to state evidence for the absence of
an effect, and condition only on the observed data, Bayesian statistics increases both the flexi-
25 of 30 18/01/16 07:47
bility in data collection and the robustness of our inferences. With the above tools in the trunk,
there is no reason not to use Bayesian statistics.
References
Mr. Bayes, & Price, M. (1763). An Essay towards solving a Problem in the Doctrine of Chances. By
the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFRS. Philo-
sophical Transactions (1683-1775), 370-418.
Berger, J. (2006). The case for objective Bayesian analysis. Bayesian analysis, 1(3), 385-402.
Berger, J. O., & Berry, D. A. (1988). Statistical analysis and the illusion of objectivity. American Sci-
entist, 159-165.
Cumming, G. (2014). The new statistics: why and how. Psychological Science, 25(1), 7-29.
DeGroot, M. H. (1982). Lindleys paradox: Comment. Journal of the American Statistical Associa-
tion, 336-339.
Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological
research. Psychological Review, 70(3), 193.
Efron, B., & Morris, C. (1977). Stein’s Paradox in Statistics. Scientific American, 236, 119-127.
Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Jour-
nal of Mathematical and Statistical Psychology, 66(1), 8-38.
Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reason-
ing. A handbook for data analysis in the behavioral sciences: Method-
ological issues, 311-339.
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587-606.
Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with
their teachers. Methods of Psychological Research, 7(1), 1-20.
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation
of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157-1164.
26 of 30 18/01/16 07:47
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable re-
search practices with incentives for truth telling. Psychological Science, 23(5), 524-532.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association,
90(430), 773-795.
Kruschke, J. (2015). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic
Press.
Kruschke, J., & Lidell, Torrin (submitted). The Bayesian New Statistics: Two Historical Trends Con-
verge. Manuscript available from here.
Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cam-
bridge University Press.
Lindley, D. (1975). The future of statistics: A Bayesian 21st century. Advances in Applied Proba-
bility, 106-115.
Lindley, D. V. (1957). A statistical paradox. Biometrika, 187-192.
Lodewyckx, T., Kim, W., Lee, M. D., Tuerlinckx, F., Kuppens, P., & Wagenmakers, E.-J. (2011). A
tutorial on Bayes factor estimation with the product space method. Journal of Mathematical Psy-
chology, 55(5), 331-347.
Ly, A., Verhagen, A. J., & Wagenmakers, E.-J. (in press). Harold Jeffreyss default Bayes factor hy-
pothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical
Psychology. Available from here.
McGrayne, S. B. (2011). The theory that would not die: how Bayes’ rule cracked the enigma code,
hunted down Russian submarines, & emerged triumphant from two centuries of controversy. Yale
University Press.
Morey, R. D., Romeijn, J. W., & Rouder, J. N. (2013). The humble Bayesian: Model checking from a
fully Bayesian perspective. British Journal of Mathematical and Statistical Psychology, 66(1),
68-75.
Morey, R. D., Rouder, J. N., Verhagen, J., & Wagenmakers, E. J. (2014). Why Hypothesis Tests Are
Essential for Psychological Science: A Comment on Cumming (2014). Psychological Science, 25(6),
1289-1290.
27 of 30 18/01/16 07:47
Morey, R. D., Rouder, J. N., Pratte, M. S., & Speckman, P. L. (2011). Using MCMC chain outputs to
efficiently estimate Bayes factors. Journal of Mathematical Psychology, 55(5), 368-378.
Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral
sciences. Chichester: Wiley.
Pashler, H., & Wagenmakers, E. J. (2012). Editors Introduction to the Special Section on Replicabil-
ity in Psychological Science A Crisis of Confidence?. Perspectives on Psychological Science, 7(6),
528-530.
Rouder, J. N. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin & Re-
view, 21(2), 301-308.
Rouder, J. N., & Morey, R. D. (2012). Default Bayes factors for model selection in regression. Mul-
tivariate Behavioral Research, 47(6), 877-903.
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t-tests for ac-
cepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225-237.
Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., Wagenmakers,

E.-J., & Rouder, J. (submitted). Is there free lunch in inference? Available from here.
Sorensen, T., Vasishth, S. (submitted). Bayesian linear mixed models using Stan: A tutorial for
psychologists, linguists, and cognitive scientists. Available from here.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed
flexibility in data collection and analysis allows presenting anything as significant. Psychological
Science, 22(11), 1359-1366.
Vandekerckhove, J., Matzke, D., & Wagenmakers, E.-J. (2014). Model comparison and the princi-
ple of parsimony. Oxford Handbook of Computational and Mathematical Psychology. Oxford Uni-
versity Press, Oxford.
Verhagen, J., & Wagenmakers, E. J. (2014). Bayesian tests to quantify the result of a replication at-
tempt. Journal of Experimental Psychology: General, 143(4), 14-57.
Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psycho-
nomic Bulletin & Review, 14(5), 779-804.
28 of 30 18/01/16 07:47
Wagenmakers, E.-J., Lee, M., Rouder, J. N., & Morey, R. D. (submitted). Another statistical para-
dox. Available from here.
Wagenmakers, E.-J., Verhagen, J., Ly, A., Bakker, M., Lee, M. D., Matzke, D., . . . Morey, R. D. (in
press). A power fallacy. Behavior Research Methods. Available from here.
Wagenmakers, E.-J., Lee, M., Lodewyckx, T., & Iverson, G. J. (2008). Bayesian
versus frequentist inference. In Bayesian evaluation of informative hypotheses (pp. 181-207).
Springer.
Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010).

Bayesian hypothesis testing for psychologists: A tutorial on the Savage
Dickey method. Cognitive psychology, 60(3), 158-189.
About the author

Fabian Dablander is currently doing his masters in cognitive science at the University of
Tübingen. He is interested in Bayesian statistics, open science, and travelling. You can find
him on twitter @fdabl.
More at
Share this:
 Print  Email  Facebook  Twitter
Related Posts:
1. Bayesian Statistics: What is it and Why do we Need it?

2. Introducing JASP: A free and intuitive statistics software that might finally replace
SPSS
Posted in How-to, Research Methodology, Uncategorized and tagged Bayes, Bayesian statistics,
introduction, practical, statistics on August 3, 2015 by Fabian Dablander.
29 of 30 18/01/16 07:47
← HOW NOT TO WORRY ABOUT APA STYLE
INTRODUCING JASP: A FREE AND INTUITIVE STATISTICS SOFTWARE THAT MIGHT FINALLY REPLACE SPSS →
CATEGORIES
APA Guidelines
Authors' experience
Discussion
How-to
Interviews
JEPS
Literature research
Manuscript analysis
Open Access
Presenting results
Publishing in scientific journals
Research Methodology
Scientific writing
Uncategorized
PROUDLY POWERED BY WORDPRESS · THEME: SUITS BY THEME WEAVER
30 of 30 18/01/16 07:47

Bayesian Statistics: Why and How - JEPS Bulletin

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Bayesian Statistics: Why and How - JEPS Bulletin

Încărcat de

Drepturi de autor:

Formate disponibile

Bayesian Statistics: Why and How | JEPS Bulletin http://blog.efpsa.org/2015/08/03/bayesian-statistics-why-and...

ABOUT JEPS WEBSITE B LO G C O N T R I B U TO R S CONTRIBUTE!

Because of the above definition of probability, inference in classical statistics is counterintuitive,

What are parameters?

diﬀerence ∼  (15, 16)

What are parameters?

Bayesian Parameter Estimation

There’s no theorem like Bayes’ theorem

p(θ|y) ∼ p(θ) × p(y|θ)

L(θ|y) = θ k × (1– θ)N–k

L(θ = .5|y) = .52 × (1– .5)2−2

1 curve(dbinom(2, 2, prob = x), from = 0, to = 1, ylab = 'likelihood')

Beta(θ|a, b) = K × θ a–1 (1– θ)b–1

p(θ|N, k) ∼ θ k (1– θ)N–k × θ a–1 (1– θ)b–1

p(θ|2, 2) ∼ θ (a+2)–1 (1– θ)(b+2–2)–1

which, with some clever rearranging, yields:

Markov chain Monte Carlo

Bayesian hypothesis testing

p(M0 |y) p(y|M0 ) p(M0 )

Lindley’s paradox and the problem with uninformative priors

approach pioneered by Harold Jeffreys, see Ly et al. (in press).

Constructing a default prior

σ 2 ∼ Inverse Gamma(.00001, .00001)

Note that the general linear model assumes that:

which is the specification used in JAGS (with precision instead of variance).

1 library('rjags') # for communicating with JAGS

1 #samp <- ggs(samples)

Upon inspection, everything seems fine!

Let’s compute credibility intervals – the Bayesian analogue to confidence intervals:

NOOOOOOOOOOOOOOOOOOOOOOOOOO. There is a better way!

1 write.csv(dat, file = 'creativity.csv', row.names = FALSE)

4 y2.2 <- rnorm(20, mean = 57, sd = 15)

1 write.csv(dat.both, file = 'creativity.csv', row.names = FALSE)

Checking Sequential Analysis yields:

Advantages of Bayesian inference

The right kind of probability

Continuous Measure of Evidence

Supporting the null

probability p), or H0 is false. In the form of a syllogism, we have:

(Premise) If H0 , then y is very unlikely.

(Conclusion) H0 is very unlikely.

This reasoning is flawed, as demonstrated by the following:

(Premise) If an individual is a man, he is unlikely to be the Pope.

(Premise) Francis is the Pope.

(Conclusion) Francis is probably not a man.

I that the parameter estimate lies within a specific interval?

Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587-606.

Lindley, D. V. (1957). A statistical paradox. Biometrika, 187-192.

Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., Wagenmakers,

Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010).

About the author

 Print  Email  Facebook  Twitter

1. Bayesian Statistics: What is it and Why do we Need it?

← HOW NOT TO WORRY ABOUT APA STYLE

Publishing in scientific journals

PROUDLY POWERED BY WORDPRESS · THEME: SUITS BY THEME WEAVER

S-ar putea să vă placă și