Write Up

Fall
08
15
Spring
[Bayes Theorem]
Lola Argiro
Dayva Frank
Son Huynh
Wilhelmina (Nanrui) Tan
[MA 293 Discrete Mathematics - Boston University]
[Introduction]
Bayes Theorem was named after Reverend Thomas Bayes (1701-1761), a nonconformist
English minister who wanted to how to infer causes from effects.1 The central idea behind his
problem was figuring out the probability of a future event occurring given that he only knew how
frequent it had occurred or not occurred at all in the past. At first, he started guessing and
continued to polish his guess as he gathered more information. Therefore, his system of thinking
is: prior + probability of new observation given hypothesis posterior. Criticisms arose as
many mathematicians were shocked with the aspect of guessing and that the problem of priors
was insoluble. Bayes never had the opportunity to publish his work before his death. However,
his friend Richard Price (1723-1791) was asked to take over Bayes work. Price extensively and
rigorously edited Bayes work An Essay Towards Solving a Problem in the Doctrine of
Chances (1763), which contains Bayes Theorem and is considered one of the fundamental
results of probability theory. Prices contribution to this paper presents some of the philosophical
basis of Bayesian statistics. The paper seemed to be left unread and unpopular until the
appearance of Laplace.
The French mathematician Pierre Simon Laplace (1749-1827) later extended Bayes
work in his publication Mmoire sur la probabilit des causes par les vnements (1774),
which gave a clearer description of the inference problem for the unknown binomial parameter .
Laplace also stressed his argument for the choice of a uniform prior distribution and argues that
the posterior distribution of the parameter should be proportional to the likelihood of the data:
f(|x1, x2, . . . , xn) f(x1, x2, . . . , xn|). For several years, Laplace used, extended, and proved
his principle. Laplace used birth records data, where he wanted to find out whether the fact that
1 "A History of Bayes' Theorem." - Less Wrong. August 29, 2009.
3
slightly more boys than girls were being born was true or not. This led him to produce the
Central Limit Theorem that allowed him to analyze any type of data. Soon, Laplace figured out
what would have been Bayes general theorem2:
There are endless applications to Bayes Theorem. The most widely used application in
medicine is drug testing; for example, finding the probability that the person uses a drug given
that he or she tests positive. Also, extensive studies have been devoted to investigating the
probability of a patient dying from cancer given that he or she smokes. Another interesting
application of Bayes Theorem is use in wartime. During World War II, Alan Turing used Bayes
rule to guess the letters and crack the codes in the Enigma messages made by the Germans. Yet,
Bayes Theorem could be used in the courtroom where the jurors might be interested in finding
the probability of evidence given that the defendant is innocent. Also, in many cases, one may
wish to find the probability that a defective item that he or she buys comes from certain
companies that made the item in order to make better choices for future purchases.
Bayes Theorem is useful because it can help us make decisions in situations where not
many observations or pieces of evidence are readily available to us. In many scenarios, we are
given some information and through Bayes Theorem, we could find other information and
probability about our desired event so that we can draw the appropriate conclusions.
Furthermore, Bayes Theorem can assist us in correcting our intuition; which can sometimes be
very misleading. For example, in the Monty Hall problem, many people would assume that after
the host opens the door with the goat in it that the player has a 50/50 chance that he/she wins the
2 Fienberg, Stephen E. "When Did Bayesian Inference Become Bayesian?"Bayesian Analysis,

2006, 1-40.
4
car if he/she sticks with the door or decides to switch the door. However, it can be shown with
Bayes Theorem that the player actually has 1/3 chance of winning the car if he/she sticks but 2/3
chance of winning if he/she switches the door.
There are many interpretations of Bayes Theorem. Two major interpretations are the
Bayesian and the frequentist interpretations. The Bayesian interpretation measures the degree of
belief. For example, if we have a certain degree of belief about something, then after we obtain
some evidence, Bayes Theorem tells us how to choose the new degree of belief that accounts for
the evidence. The frequentist interpretation measures the proportion of outcomes or probability.
For example, given an event, only one of two possibilities can happen because it either occurs or
not occur. The relative frequency of the occurrence of the event, obtained by repetitions of the
experiment, measures the probability of the event. P(A) is the probability that event A occurs,
P(B) is the probability that event B occurs, and P(B|A) is the probability that event B occurs
given that event A occurred.
In order to fully understand Bayes' Theorem, one needs a general background in
probability. According to Seymour Lipschultz and Marc Lipson in Schaum's Outline of
Probability, Second Edition, probability theory is the mathematical modeling of the
phenomenon of chance or randomness 3 Jack V. Michaels states that the probability of an event is
always a number between 0 and 1 and that the assignment of probabilities of occurrence to
events is axiomatic.4 This means that assignments of probabilities are assumed to be true based
on certain reasons. There are two ways probabilities are determined: A Priori and A Posteriori. A
Priori probabilities are obtained before conducting any trials or experiments by just calculating
3 Lipschutz, Seymour, and Marc Lipson. "Introduction to Probability, Conditional Probability
and Independence, and Random Variables." In Schaum's Outlines Probability, 59
4 Michaels, Jack V. "BAYES' THEOREM IN DECISION MAKING Reasoning from Effect to
Cause." 1987, 44
5
the number of ways an event can occur out of the total number of possible outcomes.5 A
Posteriori probabilities are found through experimentation by calculating how many times, out of
a large number of repetitions, a certain event occurs. According to Jack V. Michaels, "Bayes'
Theorem employs both a priori probability and a posteriori probability to evaluate causal
hypotheses.6 Both texts discuss how the assignment of probabilities to events is determined
bases on overall trends in the long run, which refers to A Posteriori probabilities. Lipschultz and
Lipson state that [the] stable long-run behavior of random phenomena form the basis of
probability theory.7
It can be very useful to think of probability in terms of set theory. For example, a sample
space is all the possible outcomes that could occur from a certain experiment, and it is easy to
think of a sample space S as a set. All of the subsets of S are possible outcomes, or events.8 One
particular element of S is called a sample point. Elementary events are subsets of S that is a
singleton, or it only contains one element. Also, unions and intersections make up other events.
(A
B) is the event that at least one of A or B occurs. (A
occur. The empty set represents an impossible event, so if A
B) is the event that both A and B

B equals the empty set, that means
the two events are mutually exclusive, or they cannot happen at the same time.
Further, one must understand the concepts of conditional probability in order to
understand Bayes Theorem. Conditional probability is the probability that one event will

Cause." 1987, 45
Cause." 1987, 44
6
happen, given that a second event has happened.9 The probability of event A occurring given
event B has occurred is written as P(A|B) and is equal to the probability of both events A and B
occurring divided by the probability of event B occurring. In other words, P(A|B)= P(A
B) /
P(B). An extremely useful property used in Bayes Theorem is the Multiplication Theorem for
Conditional Probability, which is actually just a rearrangement of original definition of
conditional probability. If you take the equation for P(A|B) and multiply both sides of the
equation with P(B) or vice versa, you get P(A
B) = P(A)P(B|A) = P(B)P(A|B) = P(B
A).10 A
corollary to this theorem is that it can be applied to three events, or even more. The corollary
states that the probability of all three events A, B, and C occurring is equal to the product of the
probability that A occurs, the probability that B occurs given that A occurs, and the probability
that C occurs given that A and B have occurred.11 This is written out as P(A
| A)P(C | A
C) = P(A)P(B
B).
12

7
In class we learned about partitions of a set. We learned that if X is a set, then a partition
of X is a family of sets {Xi} i I such that 1) for all i = I, Xi is a non-empty subset of X, 2) X is
the unions of all Ai and, 3) all Xi are disjoint. Therefore, in set theory, partitioning a set basically
divides the set up into a bunch of disjoint subsets. The diagram above from Lipschultz and
Lipson's book shows a set S that is partitioned into A1, A2, and A3, and also another subset E of
S.13 In this example, E is equal to E
S because E is a subset of S, and S is equal to (A1
A3) because of the rule of partitions. Therefore, E is also equal to E

itself equals (E
A1)
(E
A2)
(E
( A1
A2
A2
A3). This
A3) .14 Also, each individual intersection of E and Ak
is disjoint from the other intersections. This works for any number of partitions you have of a set
and is also underlying basis for the law of total probability.
The Law of Total Probability encourages you to think of a sample space S as a set and if
you partition it into A1, A2, and A3 like we did before, those subsets are events. There also exists
another subset E of A, which is also some other possible event. Just as before each individual
intersection of E and Ak is disjoint from the other intersections. Therefore, P(E) = P(E
P(E
A2) + P(E
A1) +
A3), meaning that the probability of event E occurring equals sum of the
probabilities of event E and each individual event Ak occurring. Moreover, by the multiplication
theorem for conditional probability, the probability of a single (E
(Ak
Ak) equals the probability of
E), which equals the P(Ak)P(E | Ak). As a result, the Law of Total Probability states that if
you let E be an event in a sample space S and let A1, A2, , Ak be mutually disjoint events

8
whose union is S, then P(E) = P(A1)P(E | A1) + P(A2)P(E | A2) + + P(Ak)P(E | Ak). 15 This
formula is later used as a part of Bayes' formula.
Now, if {A1, A2, A3, , An} is a partition of some sample space S, and E is any other
event, by the definition of conditional probability, you get P(Ak
rearrange this, you get P(Ak | E) = P (Ak
E) = P(Ak)P(E | Ak).16 If you
E) / P(E). Then, by the multiplication theorem for
conditional probability, P(Ak | E) = ( P(Ak)P(E | Ak) ) / P(E), which is the most simple version
of Bayes' Theorem. We can also use the law of total probability to get Bayes' formula, which is
slightly more complicated. This is done by replacing P(E) in the denominator of the Bayes'
theorem with the definition of P(E) in the law of total probability. Thus for the same situation as
before, P(Ak | E) = ( P(Ak)P(E | Ak) ) / ( P(A1)P(E | A1) + P(A2)P(E | A2) + + P(An)P(E |
An) ).17 As Lipschultz and Lipson state, Bayes' formula allows you to determine the probability
that a particular one of the A's occurred given that event E has occurred.18 More useful, if A1, A2,
, An are all possible causes for event E, then P(Ak | E) is the probability that hypothesis Ak
caused the event E as opposed to all of the other possible causes Ak, given that event E has
occurred.19
[Exercises and Solutions]

[Question 1]20
16 Lipschutz, Seymour, and Marc Lipson. "Introduction to Porbablity, Conditional Probability
Cause." 1987, 46
9
Suppose that we have two identical boxes: box 1 and box 2. Box 1 contains 5 red balls and
3 green balls. Box 2 contains 2 red balls and 4 green balls. A box is selected at random and
exactly one ball is drawn from the box. Given that the selected ball is green, whats the
probability that it came from box 2?
[Solution 1]
1. Assign the following notation:
=all balls
=event you select box 1
2=event you sel e ct box 2
R=event you select a
G=event you select a

{ ball
2. Identify given information:
1
P( ) =
2
1
P( 2 ) =
2
3
P(G| ) =
8
4
2
P(G| 2 ) =
=
6
3
3. Apply information to Bayes Theorem:
20 Wayne, Hacker. "Mathematics for Business Decisions." January 1, 2007. Accessed April 5,
2015. http://dtc.pima.edu/~hacker/busmath/homework-sets/homework-set7-ltp-bayes-theoremsols.pdf.
10
P( 2 |G) =
1
3
25
48
P ( 2 ) P(G 2)
P ( ) P ( G| ) + P ( 2 ) P(G 2)
1 2
2 3
1 3
1 2
( )+( )
2 8
2 3
16
25
[Question 2]21
You are selling a product in an area where 30% of the people live in the city; the rest live in the
suburbs. 20% of the city dwellers use your product and 10% of the suburbanites use your
product. What percentage of the people currently using your product are city dwellers?
[Solution 2]
1. Assign the following notation :
=all of the potential customers
(cd)=city dweller
(s)=surburbanite
( c)=the personis a customer

( n )=the person is NOT a customer

P( (cd ) ) = 0.30
P( (s) ) = 1 - P( (cd ) ) = 1 - 0.30 = 0.70
P( ( c) | (cd ) ) = 0.20
P( ( c) | (s) ) = 0.10
11
P( (cd ) | ( c) ) =
(cd ) P( (c ) (cd ))
=
P
( 0.30)(0.20)
( 0.30 )( 0.20 )+(0.7)(0.1)
6
13
[Question 3]
An inexpensive blood test can be used to test whether or not a person has a certain type of
cancer. The test is not perfect: there is a 12% chance that a person who has the cancer will falsely
test negative, and a 15% chance that a person who does not have the cancer will falsely test
positive. More accurate (and more expensive) testing has shown that the cancer is present in 8%
of the tested population. What is the probability that a person who tests negative has this type of
cancer? What is the probability that a person who tests positive has this type of cancer?
[Solution 3]
1. Assign the following notation :
=all people
( c )= person with cancer
( n )= personwithout cancer
+
the persontests posit ive

P( ( c ) = 0.08
P( ( n ) ) = 1 0.08 = 0.92
P( T | ( c ) = 0.12
12
P(T(+)| ( c ) = 1 0.12 = 0.88

+
P( T | ( n ) = 0.15
P( T | ( n ) = 1 0.15 = 0.85
T ( ( c ) )
T ( (n))
P( T ( ) =
= .01213
T (| (c) )+ P ( ( n )) P
(c )
P ( (c )) P
P ( (c )) P
T (+ ( c ) )
+
T (+ (n))
+
P( T ( ) =
= .3378
T
(|
)+
P ( ( n )) P
(c)
(c )
P ( (c )) P
P ( (c )) P
[Question 4]22
A card is drawn from a standard deck of 52 cards and discarded (i.e. not replaced). A second card
is drawn from the remaining deck of 51 cards. Given that the second card was a spade, what is
the probability that the first card was also a spade?
[Solution 4]
13
=all 52 cards
S=the first card is a spade
S ( n )=the first card is not a spade
S 2=the second card is a spade

S 2 ( n )=the second card is not a spad e
13
P( S ) =
52
13 39
P( S (n) ) = 1 =
52 52
12
P(S2| S ) =
51
13
P(S2| S (n) ) =
51
P( S |S2) =
P ( s ) P(s 2s)
P ( s ) P ( s 2|s ) + P ( s ( n ) ) P(s 2s(n))
13 12
52 51
1
4
1
17
1
4
4
17
[Question 5]
In the world of professional basketball, or just the NBA, the number of assists is also one of the
stats that reflect the game and how well the players are doing in a basketball game. In the 20142015 regular season, among the 81 games that the Spurs have played so far, they have won 55
games, which is 67.9%. Among the 55 winning games, in 50 of them were their total assists over
20, while among the 81 games, 66 of them were the Spurs having total assists over 20. To find
out how "good" is the total number of assists as indicator of the performance of the team, let's
calculate the probability of them winning when the total assists is over 20.
[Solution 5]
14
=all games
Win=Spurs have won
Assists>20=the spurshad assists
P( Win ) = 67.9%
P( Assists>20 ) =
P(Assists > 20| Win ) =
66
81
50
55
P( Win Win|Assists > 20) =
59
) 67.9
55
66
81
P( Assists >20Win) P(Win)

=
P( Assists>20)
50
) 67.9
55
66
81
= 75.8%
[Question 5.1]
However, for the 81 games they played, the Spurs had more than 25 assists in 39 games, and in
the 55 games they won, 32 of the games did the Spurs had over 25 assists. The probability of
them winning when the total assists is over 25 just became
15
[Solution 5.1]
=all games
Win=Spurs have won
Assists>25=the spurshad assists

P( Win ) = 67.9%
39
P( Assists>25 ) =
81
32
P(Assists > 25| Win ) =
55
P(Win| Assists > 25) =
P( Assists >25Win) P(Win)

=
P(Assists >25)
32
) 67.9
55
39
81
82.0%
[Conclusion 5.1]
When the total assists go up from 20 to 25, the chance of the Spurs winning goes up 6.2%.
[Question 6]
"Make the free throws" and "don't turn it over" can be called the fundamental elements if you
ever want to win the game. However, is it really true, that when you do the basics well, i.e., keep
the turnovers (TO) below 15 and the free throw percentage (FT%) above 80%, that you are more
likely to win? Take the Spurs again as an example, in the 55 games they won, 24 of them they
achieved the goal of keeping the TO below 15 and FT% above 80%. While on average, on 41
nights out of 81 did their FT% go above 80, and on 52 nights out of 81 did they manage to keep
the TO below 15.
16
[Solution 6]
=all games
Win=Spurs have won
< 15=turnover below 15

FT >80 =free throw percentage above 80
P( Win ) = 67.9%
41 52
P( <15FT >80 ) =
81 81
24
P(TO < 15 and FT% > 80%| Win ) =
55
P(<15FT >80 Win)P(Win)
P(Win| TO <15 and FT% > 80%) =
=
P(<15FT >80 )
( 2455 ) X 67.9
41 52
81 81
= 91.2%.
[Question 7]
The Spurs are known for their great shooters: Danny Green, Patty Mills, Matt Bonner and Manuhas his ups and downs-Ginobili. During the 2014-2015 NBA season, when the 3-point shooters
of the San Antonio Spurs are feeling good, the 3-point field goal percentage of the Spurs
averages at about 43%, while the 3-point field goal percentage is around 22% when their 3-point
shooters are not feeling good. Given that in 73% of their games their 3-pt shooters are feeling
good while in 27% of the games they are not, what is the probability that when the Spurs made
their first 3-point shot, their shooters are feeling good that night? (To qualify as "feeling good",
the 3-pt field goal for the Spurs need to be 30%).
17
[Solution 7]
=all games
Made the shot=Player made a 3point shot
Shoo ters feeling good= player ' satisfied with performance

Shooters not feeling good= player is not satisfied with performance
P(Shooter feeling good) = 73%
P(Made the shot| Shooter feeling good) = 43%
P( Made the shot ) =
P ( Made the shot|Not feeling good ) P ( Not feeling good ) + P ( Madethe shot |Feeling good ) P ( Feelin
= (22%)(27%) + (43%)(73%)
P (Shooters feeling good| Made the shot) =
P(Made the shotShooters feeling good ) P(Shooters feeling good )

P ( Made the shot )
43 %X73
=
P ( Made the shot )
=
( 43 ) (73 )
P ( Made the shot|Not feeling good ) XP ( Not feeling good ) + P ( Made the shot |Feeling good ) P ( Fee
(43 )(73 )
=
= 84.1%
(22 )(27 )+(43 )(73 )
[Question 7.1]
In the 81 games that the Spurs have played so far, they have won 67.9% of them, and among the
55 winning games, in 44 of them were their 3-point shooters feeling good. Usually, I would be
called "nuts" if I see the Spurs made their first 3-point attempt and yell out "yep they are going to
win", but is their chance of winning really completely unrelated to whether they made their first
18
3-pt shot, that the probability of winning is the same whether they make the shot or not? If in the
last game of the regular season vs. the New Orleans Pelicans, the Spurs made the shot in their
first 3-pt attempt, what is the probability of them winning the game?
[Solution 7.1]
=al l games
Win=Spurs wonthe game
Made the shot=Player made a 3 point shot
Shooters feeling good= player ' satisfied with performance

Shooters not feeling good= player is not satisfied with performance
44
P(Feeling good| Win) =
= 80%
55
P(Not feeling good| Win) = 1 - P(Feeling good| Win) = 1-80% = 20%.
P( Made the shot Win) =P(Made the shot| Feeling good)P(Feeling good| Win)
+ P(Made the shot| Not feeling good)P(Not feeling good| Win) =

(80 )( 43 )+(180 )(22 )
P(Win| Made the shot)
P ( Made the shot|Win ) P (Win )
=
P ( Madethe shot )
=
( P ( Feeling good|Win ) P ( Made the shot |Feeling good ) + P ( Not feeling good|Win ) P ( Made the shot
P ( Made the shot )
=
(( 80 )(43 )+(180 )(22 )) X 67.9

(22 )(27 )+(43 )(73 )
[Conclusion 7.1]
= 70.6%
19
So there's 70.6% of chance that when the Spurs made the shot in their first attempt of 3pt shoot,
they are going to win the game, which is 2.7% higher than their average winning %. (P.S. they
actually lost their last regular season game and I don't really remember if they made the first 3pt
attempt or not.)
[Bibliography]
"A History of Bayes' Theorem." - Less Wrong. August 29, 2009. Accessed April 8, 2015.
http://lesswrong.com/lw/774/a_history_of_bayes_theorem/.
Fienberg, Stephen E. "When Did Bayesian Inference Become Bayesian?"Bayesian Analysis,

2006, 1-40. Accessed April 3, 2015. http://www.stat.cmu.edu/~fienberg/fienberg-BA-06Bayesian.pdf.
Wayne, Hacker. "Mathematics for Business Decisions." January 1, 2007. Accessed April 5, 2015.
http://dtc.pima.edu/~hacker/busmath/homework-sets/homework-set7-ltp-bayes-theoremsols.pdf.
20
Lipschutz, Seymour, and Marc Lipson. "Introduction to Porbablity, Conditional Probability and
Independence, and Random Variables." In Schaum's Outlines Probability, 44-90. 2nd ed.
New York, N.Y.: McGraw-Hill, 2011.
Michaels, Jack V. "BAYES' THEOREM IN DECISION MAKING Reasoning from Effect to

Cause." 1987, 44-50. http://www.value-eng.org/knowledge_bank/attachments/Bayes
Theorem in Decision Making.pdf.

Write Up

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Write Up

Încărcat de

Drepturi de autor:

Formate disponibile

Fall

[MA 293 Discrete Mathematics - Boston University]

2 Fienberg, Stephen E. "When Did Bayesian Inference Become Bayesian?"Bayesian Analysis,

B) is the event that at least one of A or B occurs. (A

occur. The empty set represents an impossible event, so if A

B) is the event that both A and B

5 Michaels, Jack V. "BAYES' THEOREM IN DECISION MAKING Reasoning from Effect to

B) = P(A)P(B|A) = P(B)P(A|B) = P(B

9 Lipschutz, Seymour, and Marc Lipson. "Introduction to Probability, Conditional Probability

S because E is a subset of S, and S is equal to (A1

A3) because of the rule of partitions. Therefore, E is also equal to E

A3) .14 Also, each individual intersection of E and Ak

Ak) equals the probability of

13 Lipschutz, Seymour, and Marc Lipson. "Introduction to Probability, Conditional Probability

E) = P(Ak)P(E | Ak).16 If you

E) / P(E). Then, by the multiplication theorem for

[Exercises and Solutions]

R=event you select a

G=event you select a

( c)=the personis a customer

2. Identify given information:

the persontests posit ive

2. Identify given information:

P(T(+)| ( c ) = 1 0.12 = 0.88

S 2=the second card is a spade

2. Identify given information:

P(Assists > 20| Win ) =

3. Apply information to Bayes Theorem:

P( Win Win|Assists > 20) =

P( Assists >20Win) P(Win)

Assists>25=the spurshad assists

P(Win| Assists > 25) =

P( Assists >25Win) P(Win)

< 15=turnover below 15

Shoo ters feeling good= player ' satisfied with performance

P(Made the shotShooters feeling good ) P(Shooters feeling good )

Shooters feeling good= player ' satisfied with performance

+ P(Made the shot| Not feeling good)P(Not feeling good| Win) =

(( 80 )(43 )+(180 )(22 )) X 67.9

Fienberg, Stephen E. "When Did Bayesian Inference Become Bayesian?"Bayesian Analysis,

Michaels, Jack V. "BAYES' THEOREM IN DECISION MAKING Reasoning from Effect to

S-ar putea să vă placă și