Sunteți pe pagina 1din 4

In preparation for the release of the sequel to Dr.

Richard Carriers Proving History, I thought it


would be good to address the foundations of the sequel as presented in Proving History. Namely,
Dr. Carriers arguments about Bayes Theorem or BT (as presented in his book, anyway).
There are several reasons for this. First, Dr. Carrier argues that BT should be THE method for all
historical research. Second, the entire point of Proving History was to argue that the approach we
will find in the sequel is the best. The third and most important reason, however, is that Dr.
Carrier doesnt understand BT, how it differs from the method he actually advocates (Bayesian
inference/statistics), or how fundamentally inaccurate basically every claim he makes about his
methodology is.
We can begin our journey through Dr. Carriers failure to understand his own method by looking
at his flawed proof. At one point, Dr. Carrier states we can conclude here and now that
Bayes Theorem models and describes all valid historical methods. No other method is
needed (p. 106). His proof, though, stands or falls with the first proposition in it: BT is a
logically proven theorem (ibid.). This is true. However, Dr. Carrier doesnt seem to have read
the sources he cites. For example, on p. 50 Dr. Carrier refers the reader via an endnote (no. 9) to
several highly commendable texts on BT. The one he states gives a complete proof of formal
validity of BT is Papoulis, A. (1986). Probability, Random Variables, and Stochastic Processes.
(2nd Ed.). I dont have the 2nd edition, but I do have the 3rd and as this proof is trivial I really
could use any intro probability textbook. Papoulis begins his complete proof of formal validity
(as opposed to proof of informal validity? or incomplete proof?) by defining a set and probability
function for which the axioms of probability hold. A key axiom is that for any set (of cards, of
events, of possible outcomes when rolling dice, etc.) must sum or integrate to 1 (simplistically,
for those who havent taken any calculus, integration is a kind of summation). In Dr. Carriers
appendix (p. 284) he notes that probability functions must sum to 1. What he apparently
doesnt understand is what this entails. It means that in order to use BT to evaluate how probable
some outcome, result, historical event, etc., is, one must consider every single one.
Let me make this simpler with a simple example. I can calculate the probability that, given a full
deck of cards, a random draw will yield an ace because I know in advance every possible
outcome. If, however, someone mixed together 10 cards drawn at random from 300 different
decks and asked me to pick a card, I cant calculate the probability anymore. Even if I were told
that the new deck contained 3,000 cards, I have no idea how many are aces. Incidentally, this
is the perfect situation for Bayesian statistical inference, which works (simplistically) by
assuming e.g., a certain distribution of aces and then changing my model of how likely it is
that the next draw will yield an ace as I learn more about the distribution of cards in the entire
3,000 card deck.
Dr. Carrier wishes to use what he thinks BT is to evaluate the probability that particular events
occurred ~2,000 years ago. For example, on pp. 40-42 he considers the possibility that J esus was
a legendary rabbi in terms of the class of legendary rabbis and information we have on such
a class. We are in a far worse position than in the mixed card deck example above, because we
dont even know the number of legendary rabbis still less who they might be (if we did wed
have the answer: J esus either would be one or wouldnt).
There is another basic property of BT Dr. Carrier seems to have missed. As Papoulis clearly
states, BT is only valid for events/outcomes that are mutually exclusive. Often, both of these
requirements (the sum to 1 and mutually exclusivity) are given together: the set of outcomes
must be collectively exhaustive and mutually exclusive, or BT is only valid if
1) all possible outcomes are known
2) one and only one outcome can occur.
This makes BT useless for most purposes including historiography. However, Dr. Carrier isnt
really using BT. As his references show (as well as his formulation of the theorem shows), he is
actually using something called Bayesian inference/Bayesian analysis. However, this negates his
entire proof because it doesnt matter if BT is a logically proven theorem and there is no
complete proof of formal validity for some Bayesian inference/analysis theorem Dr. Carrier
could use in place of his first proposition.
Ok, so we cant use BT, but that doesnt mean we cant use Bayesian methods. However, in
order to use Bayesian methods Dr. Carrier would have to understand Bayesian statistics (and
statistics in general). He doesnt. We can see this clearly when Dr. Carrier, whose expertise is
ancient history, addresses the frequentist vs. Bayesian debate. To keep things simple, lets just
say that this is an ongoing debate arguably going back to Bayes but definitely is over a century
old. Dr. Carrier is apparently so confident in his mathematical acuity he resolves the dispute
with almost no reference to math or the literature in a few pages: The whole debate between
frequentists and Bayesians, therefore, has merely been about what a probability is a frequency of,
the rules are the same for either (p. 266). Hm. Amazing that generations of the best statistical
minds missed this. Oh wait. They didnt.
Lets look at how Carrier describes the dispute: The debate between the so-called frequentists
and Bayesians can be summarized thus: frequentists describe probabilities as a measure of the
frequency of occurrence of particular kinds of event within a given set of events, while
Bayesians often describe probabilities as measuring degrees of belief or uncertainthy. (p. 265).
This is laughably wrong:
Frequentist statistical procedures are mainly distinguished by two related features; (i) they
regard the information provided by the data x as the sole quantifiable form of relevant
probabilistic information and (ii) they use, as a basis for both the construction and the assessment
of statistical procedures, long-run frequency behaviour under hypothetical repetition of similar
circumstances.
Bernardo, J . M. & Smith, A. F. (1994). Bayesian Theory. Wiley.
Undoubtedly, the most critical and most criticized point of Bayesian analysis deals with the
choice of the prior distribution, since, once this prior distribution is known, inference can be led
in an almost mechanic way by minimizing posterior losses, computing higher posterior density
regions, or integrating out parameters to find the predictive distribution. The prior distribution is
the key to Bayesian inference and its determination is therefore the most important step in
drawing this inference. To some extent, it is also the most difficult. Indeed, in practice, it seldom
occurs that the available prior information is precise enough to lead to an exact determination of
the prior distribution, in the sense that many probability distributions are compatible with this
informationMost often, it is then necessary to make a (partly) arbitrary choice of the prior
distribution, which can drastically alter the subsequent inference.
Robert, C. P. (2001). The Bayesian Choice: From Decision-Theoretic Foundations to
Computational Implementation (Springer Texts in Statistics). (2nd Ed.). Springer.
The frequency part of frequentist does have to do with kinds of events, but frequencies are
the measure of probability, not the reverse. To illustrate, consider the bell curve (the graph of
the normal distribution). Its a probability distribution. Now imagine a standardized test like the
SATs which is designed such that scores will be normally distributed and have this bell curve
graph. What does the graph tell us? It tells us that the most people who take the test get very
close scores, but very infrequently some test-takers will get high scores and other will get low. In
other words, the bell curve is the graph of a probability function (technically, a probability
density function or pdf), and it is formed by the frequency of particular scores. We know that it is
very improbable for a persons score to fall in either of the ends/tails of the bell curve because
these are very infrequent outcomes.
What does this mean for frequentist methods? Well, Kaplan, The Princeton Review, and other
test prep companies try to show their methods work by using this normal distribution. They
claim that people who take their classes arent distributed the way the population is, because too
frequently students taking their class obtain scores above average (i.e., those who take the classes
have test scores that arent distributed the way the population is). They use the frequency of
higher-than-average scores to argue that their class must improve scores.
Whats key is that the data are obtained and analyzed but the distribution is only used to
determine whether the values the analysis yielded are statistically significant. Bayesian
inference reverses this, creating fundamental differences. The process starts with a probability
distribution. The prior distributions obtained represent uncertainty and make predictions about
the data that will be obtained. Once the new data is obtained, the model is adjusted to better fit it.
This is usually done many, many times as more and more information is tested against an
increasingly more accurate model. The key differences are
1) the iterative process
2) the use of models which make predictions
3) the use of distributions to represent unknowns and (in part) the way the model will learn or
adapt given new input.
So why dont we find any of this in Dr. Carriers description of Bayesian methods? Why do we
always find ad hoc descriptions of priors? Because Dr. Carrier wants to use Bayesian analysis
but apparently doesnt understand what priors actually are or how complicated they can be in
even simple models:
In many situations, however, the selection of the prior distribution is quite delicate in the
absence of reliable prior information, and generic solutions must be chosen instead. Since the
choice of the prior distribution has a considerable influence on the resulting inference, this
choice must be conducted with the utmost care.
Marin, J . M., & Robert, C. (2007). Bayesian Core: A Practical Approach to Computational
Bayesian Statistics. (Springer Texts in Statistics). Springer.
While the axiomatic development of Bayesian inference may appear to provide a solid
foundation on which to build a theory of inference, it is not without its problems. Suppose, for
example, a stubborn and ill-informed Bayesian puts a prior on a population proportion p that is
clearly terrible (to all but the Bayesian himself). The Bayesian will be acting perfectly logically
(under squared error loss) by proposing his posterior mean, based on a modest size sample, as the
appropriate estimate of p. This is no doubt the greatest worry that the frequentist (as well as the
world at large) would have about Bayesian inference that the use of a bad prior will lead to
poor posterior inference. This concern is perfectly justifiable and is a fact of life with which
Bayesians must contendWe have discussed other issues, such as the occasional inadmissibility
of the traditional or favored frequentist method and the fact that frequentist methods dont have
any real, compelling logical foundation. We have noted that the specification of a prior
distribution, be it through introspection or elicitation, is a difficult and imprecise process,
especially in multiparameter problems, and in any statistical problem, suffers from the potential
of yielding poor inferences as a result of poor prior modeling.
Samaniego, F. J . (2010). A Comparison of the Bayesian and Frequentist Approaches to
Estimation. (Springer Texts in Statistics). Springer.
The stubborn and ill-informed Bayesian is in a much better position than Dr. Carrier. Dr.
Carrier has confused BT with Bayesian analysis and the Bayesian approach with the frequentist
all because he apparently hasnt understood any of these. Instead of prior distributions his
priors are best guesses. Instead of real belief functions we find heres what I believe. No
considerations are given to the nature of the data (categorical, nominal, and in general none
numerical data require specific models and tests, Bayesian or not).
So instead of the universally valid historical method Dr. Carrier argues BT provides, all that hes
actually done is butcher mathematics in order to plug values in to a formula that is as
mathematical as numerology but apparently seems impressive if you have no clue what you are
talking about. Perhaps thats why Dr. Carriers CV indicates hes been lecturing on Bayes
Theorem since 2003, but his 2008 dissertation contains no reference to Bayes Theorem.

S-ar putea să vă placă și