Documente Academic
Documente Profesional
Documente Cultură
TaughtbyLyndaWhite-l.white@ic.ac.uk.
Autumn2014
Contents
0 Introduction
0.1 Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.2 Utility Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.3 Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
2
3
1 Game Theory
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Two-Person Zero-Sum Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Non-Zero-Sum Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
4
38
2 Utility
2.1 Introduction . . . . . . . .
2.2 The Lottery Axioms . . .
2.3 The Existence of a Unique
2.4 The Utility of Money . . .
51
51
51
53
54
. . . . . . . . . .
. . . . . . . . . .
Utility Function
. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Bayesian Methods
55
4 Decision Theory
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Decision Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
57
58
59
63
67
67
69
70
72
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 Introduction
0.1 Game Theory
Startedinthe1920s. Borel1921,VonNeumann1928. DevelopedinWWIIforlogistics,submarinesearch,and
airdefence. 1944gaveusvonNeumannandMorgensternsTheTheoryofGamesandEconomicBehaviour.
Later on John Nash [film: A Beautiful Mind].
0.1.1 Aim
The aim of Game Theory is to develop optimal strategies in competitive situations involving two or more
intelligent antagonists. Key points:
Conflicting interests
Playing rationally
Areas of application:
1. Games:
Chess: 2 players, no random element, perfect information
Monopoly:
0.2
Utility Theory
We will need to assess the value of the outcomes of any decisions we make. Not all outcomes are monetary.
We need a common scale [of value] to compare the values of, say, a kidney machine and an incubator in a
hospital. We will develop a mathematical theory based on axioms representing how a reasonable person makes
decisions. This leads to a utility function, a measure of value.
Risk = E[loss] =
E[utility]
We will look at risk aversion. How risk averse are you? Consider choosing one of the following:
1 guaranteed
(
2 if a fair coin shows H
Game Theory
1.1
Introduction
We assume 2 players, A and B, both are rational and greedy! Each player has available a number of strategies,
or recipes, for playing the game. A strategy for a player [sometimes called a pure strategy] involves a
complete description of all the moves that the player will make, including responses to the opponents moves
and any random moves [e.g. using dice or coins]. I.e. a strategy is a program that can be followed mechanically
by the player. It usually comprises a sequence of moves, some of which may be random. Examples:
1. Two companies, A and B. Each produces a product. To capture more of the market each will decide to
do one of three things:
Spend 10% extra on advertising
Reduce the product price
Give away a free product
b|.
2
Let
then gA
(a, b) + gB
(a, b) = 0 8 a, b
c
gB (a, b) = gB (a, b) 2
If the gain to A + the gain to B 6= constant, i.e. in a non-constant-sum game, there are two possibilities:
1. The players cant collaborate, theres no pre-play communication. E.g. Prisoners Dilemma
2. The players can collaborate for their mutual advantage. This is sometimes called bargaining
Listing the strategies for each player and forming a table of pay-os [or gains] is called the normal form of the
game.
Another form of the game, the extensive form, describes the game as a tree. E.g. Nim, Piles of stones. Players,
in turn, take stones from one pile only. The aim is to take the last stone. The two approaches are equivalent
with strategies being described using paths through the tree in the extensive form, see section 4.4 for an
example. We will look only at the normal form.
[End lecture 2, 14/10/14]
1.2
1.2.1
Assume gain to A + gain to B = 0 for all a 2 AS , b 2 BS . AS , BS can be finite or infinite. A and B choose
their strategies independently and simultaneously with no collaboration. If A chooses a 2 AS and B chooses
b 2 BS let:
g(a, b) = gain to A
so g(a, b) = gain to B or g(a, b) = loss to B. N.B. g(a, b) could be negative! g(a, b) can be monetary or
simply a score, e.g. A wins =) g = +1 and A loses =) g = 1. If AS = {a1 , . . . , am } and BS = {b1 , . . . , bn }
are finite, the game is called a finite or matrix game. In this case, we write:
gij = g(ai , bj )
4
0
41
1
1
0
1
3
1
15
0
2. Matching pennies. A and B each has a coin and simultaneously each shows a side, H or T , of her coin.
If both show the same then B pays A one unit [i.e. A wins, B loses]. Otherwise, B wins one unit from
A. a1 = H, a2 = T , b1 = H, b2 = T :
1
1
G=
1 1
3. Matching pennies with imperfect spying. Like matching pennies but after A has made a choice of H or
T [without showing it to B] a spy-coin is tossed by a third party. The spy is unreliable and:
P (spy-coin shows As choice) = p
P (spy-coin shows opposite of As choice) = 1
Both A and B know the value of p. The result of the spy coin toss is then shown to B who then makes
a choice. A has 2 pure strategies: a1 = H; a2 = T . B has 4 pure strategies: a strategy is a map
{spy-coin tosses} ! {H, T }. We can label these strategies:
(H, H)
(H, T )
(T, H)
(T, T )
Here, the first entry in each ordered pair tells B how to play if the spy-coin shows heads and the second
entry tells B how to play if the spy-coin shows tails. E.g. (H, H) tells B to play H in either case. (H, T )
tells B to copy the spy-coin.
Lets take a look at the pay-o matrix:
3
(H, H) (H, T ) (T, H) (T, T )
4a1 = H
1
2p 1 1 2p
1 5
a2 = T
1
2p 1 1 2p
1
(1
p) = 2p
g(a, b) = ET (gain to A)
(
1 P (T < a) + ( 1) P (a < T < b)
=
1 P (b < T < a) + ( 1) P (T < b)
(
2a b 4 a < b
=
[N.B. a 6= b]
a 2b + 4 a > b
1.2.2
a<b
a>b
Consider player A and pure strategy a 2 AS . We know that A cannot get less than:
inf g(a, b)
b2BS
[Recall that inf = greatest lower bound. In a finite game, this inf is simply the smallest entry in row a.]
A cannot guarantee getting more than this. Hence:
L
= sup
a2AS
inf g(a, b)
b2BS
is the upper limit to what A can guarantee getting. [sup = least upper bound. In a finite game,
maxa minb g(a, b).]
In a finite game, this is the largest entry in column b. B cant guarantee losing less if they play b. B wants to
choose b to minimise this guarantee level, so if:
(
)
U
then
= inf
b2BS
sup g(a, b)
a2AS
Definition:
Lemma 1.
a0
Definition: If
a0
a0
a0
then their common value is called the pure value of the game.
col. max.
6 a1
6
4 a2
a3
3
b 1 b 2 b3
1 3 27
7
5 4 65
3 2 4
row min.
1
4
3
Largest row min. = smallest col. max. = 4. Pure value = 4. A should play a2 and B should play b2 .
If such a , b exist [as in 1] we say that g(a , b ) is a pure strategy saddle-point at g(a , b ) [Similarly (a , b )].
We sometimes say that a and b are good strategies for A and B respectively. We use the abbreviation PSSP.
If a game has a PSSP [i.e. an element of the pay-o matrix that is the smallest in its row and largest in its
column] then we have solved the game.
Definition: A pure strategy a 2 AS is said to be dominated by another pure strategy a0 2 AS if:
g(a, b) g(a0 , b) 8b 2 BS
[I.e. a0 is always better than a, whatever B plays.] Similarly b 2 BS is dominated by another pure strategy
b0 2 BS if g(a, b) g(a, b0 ) 8a 2 AS .
Warning: It is possible for a game to have a PSSP at
strategy! Exempli gratia:
2
b1
6 a1
3
6
4 a2
3
a3
1
Exercises
1. Show that if a game has two PSSPs, given by (a, b) and (a0 , b0 ), then g(a, b) = g(a0 , b0 )
2. Show that if a game has a unique PSSP, given by (a , b ), then a is not dominated by any other a0 2 As
with a similar result for B. [Hint: WLOG let (a1 , b1 ) be a unique PSSP in a finite game. Suppose, for a
contradiction, that a2 dominates a1 . Show that (a2 , b1 ) also gives a PSSP]
If A has a pure strategy, a, that is dominated by a pure strategy a0 2 AS then we delete the row of the pay-o
matrix corresponding to a. We may lose a PSSP by deleting a row [or column] like this, but it will not aect
the existence [or otherwise] of a PSSP in the game. See 2. above. It is good practice to delete dominated rows
and columns and to look for PSSPs.
Not all games have a PSSP. Heres an example of
2
b1
6 a1 1
6
4 a2 5
a3 3
col. max.
= 4,
= 5]:
5 4 6
3 7 4
We can now delete b3 [compare it with b1 ] to give:
2
4 a2
a3
3
b1 b2
5 45
3 7
[End lecture 4, 16/10/14]
They walk towards each other. Each must decide when [i.e. at what distance] they should fire. Rules:
1. A duellist who hits the other one wins 1 point and the loser loses 1 point
2. If a duellist fires and misses, the other can walk up and shoot at point blank range, therefore winning
3. If both fire at the same time and both hit or both miss, each gets a pay-o of 0
Thisisatwo-personzero-sumgame. SothepurestrategiesforX andY areXS =YS =[0,1]. LetX fireata
distancexapart:letYfireatadistanceyapart.x,y2[0,1].
8
Well assume P(X hits when distance apart is x) = p1 (x) and P(Y hits when distance apart is y) = p2 (y).
BothX andY knowthefunctionsp1 (x)andp2 (y). p1 (x)andp2 (y)aremonotonicdecreasingandcontinuous.
8
>
x>y
<X fires first
p1 (0) = p2 (0) = 1. The expected pay-os to X depend on whether: Y fires first
x<y
>
:
Both fire together x = y
8
>
x>y
<1 P (X hits) + ( 1) P (X misses)
We get g(x, y) = 1 P (Y misses) + ( 1) P (Y hits)
x<y
>
:
1 P (X hits) P (Y misses) + ( 1) P (X misses) P (Y hits) x = y
8
>
x>y
<2p1 (x) 1
Using P (X hits) = p1 (x) and P (Y hits) = p2 (y) =) g(x, y) = 1 2p2 (y)
x<y
>
:
p1 (x) p2 (x) x = y
Consider the pure strategy, d, that is to fire at distance apart d where p1 (d) + p2 (d) = 1. [9 such a solution.]
N.B. d satisfies p1 (d) = 1 p2 (d) i.e. P (X hits) = P (Y misses) and vice-versa. Suggest X and Y both play
this pure strategy < d >. We find that (< d >, < d >) gives a PSSP. To see this, calculate:
8
>
x>d
<2p1 (x) 1
g(x, < d >) = 1 2p2 (d)
x<d
>
:
p1 (d) p2 (d) x = d
Note that 2p1 (x)
1 2p2 (d) = p1 (d)
1.2.3
p2 (d). Also
p2 (d) = g(< d >, < d >) so the pure value of the game is p1 (d)
p2 (d)
Randomised Strategies
If 9 a PSSP, its clear to both players what they need to do; were home and dry. If @ a PSSP then we will
find that it is beneficial to introduce randomised strategies. Consider the following example:
a1
a2
b1 b2
| 3 4
| 5 2
No PSSP
Suppose the game is to be played many times and suppose A always plays a1 . Then B would always play b1
and A would get 3. If A always plays a2 then B would play b2 and A would get 2. I.e. if A always sticks to one
pure strategy, they cant get more than 3. With randomised strategies A can do better than 3 = max(2, 3). In
fact, they can get 3 12 on average. Also, randomised strategies introduce an element of surprise.
Definition: A randomised [or mixed] strategy for A is a probability distribution over AS , As pure strategies
[with a similar definition for B].
If the game is finite and AS = {aP
1 , . . . , am }, BS = {b1 , . . . bn } then a randomised strategy is a set of probabilities
{p1 , . . . , pm } where pi 0 and i pi = 1 such that
P A chooses pure strategy ai with probability pi . We write
= (p1 , . . . , pm ), = (q1 , . . . , qm ) where qi 0, i qi = 1. Sometimes we write = p1 a1 + p2 a2 + + pm am
or = p1 < a1 > + + pm < am >.
N.B. Pure strategies are just special cases of a randomised strategies. E.g. the pure strategy a1 corresponds
to := (1, 0, . . . , 0) =: (p1 , p2 , . . . , pm ).
A and B dont reveal their strategies [randomised] to each other pre-play. If A chooses and B chooses
then the pay-o to A, in a finite game, is:
XX
E[gain to A] =
gij pi qj
i
HereG=(gij)isthepay-omatrix.Noticethatthetermpiqjoccursbecausetheplayerschoosetheirrandomised
strategiesindependently.Sincethestrategychoicesareindependentthiscanalsobewrittenas:
X X
X
pi
gij qj =
pi g(ai , )
i
Z Z
a
Question 1
Part I
B
b 1 b 2 b 3 b4
| a a b b
| c d c d
| c e c e
a1
a2
a3
WLOG. a b. If we look at column b3 and compare it with b1 we see that b3 is dominated by b1 and similarly
b4 is dominated by b2 . This leaves us with the following:
a1
a2
a3
b1 b2
| a a
| c d
| c e
WLOG lets take d e. So a2 is dominated by a3 . Notice that we have not gained any PSSPs along the
way. We now have:
a1 |
a3 |
b1 b 2
a a
c e
WLOG c e. The second strategy for B in this 2 2 game is dominated by the first. We are then left with
a
c
Part II
1
G=4 0
0
3
2 2
2 15
0 3
Question 2
Part I
WLOG we can assume the pay-os are 1, 2, 3, 4 in some order and that g(a1 , b1 ) = 1. We can write down the
6 possible pay-o matrices:
1
4
2
3
1
3
2
4
1 3
4 2
4
6
2
3
1
2
3
4
1 4
3 2
1
2
4
3
Part II
3 3 game.
P (PSSP) =
(i,j)
since the pay-os are disjoint [if there were more than one PSSP, the entries in the pay-o matrix would be the
same; this contradicts the fact that all entries are disjoint], we have the union of 9 disjoint events. Consider:
P (PSSP in cell (i, j)| Entries in cells in row i and col. j consist of, say, a, b, c, d, e) =
4
5!
E.g. (i, j) is the middle cell. The entry in the cell (i, j) must be the middle value in {a, b, c, d, e}
The 4 remaining values must be placed such that the 2 largest values go in row i and the 2 smallest values go
in column j [2 orders in each case]. 9 5! possible arrangements.
This conditional probability is the same for all selections of 5 symbols out of the 9, so unconditionally,
P (PSSP in cell (i, j)) = 5!4 and so:
4
3
P (PSSP) = 9 =
5!
10
Question 3
Assume that the volume of available business is proportional to the population and that between them the two
companies take all of the business [so it becomes a constant sum game].
Small Company
Large Company
W
X
Y
Z
|
|
|
|
W
60
72
64
56
X
48
60
56
52
Y
56
64
60
48
Z
64
68
72
60
E.g. g(X, Y ) = (0.8 20) + (0.8 40) + (0.4 20) + (0.4 20) = 64, since L is closer to X and W but farther
from Y and Z than S is. 9 a PSSP at (X, X) and L can expect to get 60% of the business. Constant-sum game.
When the population distribution changes, let p = 3q. The population distribution becomes:
W : 20(1 + 2q)%
N.B. that the sum is 100.
X : 40(1
3q)%
Y : 20(1 + 2q)%
Z : 20(1 + 2q)%
3
8
If p =
3
8
then (X, X), (X, Y ), (Y, X) and (Y, Y ) all become PSSPs
Large Company
W
X
Y
Z
W
X
Y
Z
|
60
48 + 16q 56 8q 64 32q
| 72 16q
60
64 32q 68 24q
| 64 + 8q 56 + 32q
60
72 16q
| 56 + 32q 52 + 24q 48 + 16q
60
[If the stores can locate anywhere between W and Z, the 9 a PSSP at (X, X).]
Extra Solutions
Lets also answer the exercises posed in lecture 4:
1. If a game has 2 PSSPs then they have the same pay-o.
WLOG 9 PSSPs at distinct locations in the pay-o matrix at (a1 , b1 ) and (a2 , b2 ). If these are in the
same row or column they must have the same pay-o. If not, then we have:
g(a1 , b1 ) g(a1 , b2 ) because (a1 , b1 ) is a PSSP
g(a2 , b2 ) because (a2 , b2 ) is a PSSP
2. If (a1 , b1 ) is a PSSP and a2 dominates a1 , show that (a2 , b1 ) also gives a PSSP. [Hint: Show g(a1 , b1 ) =
g(a2 , b1 ).]
Clearly:
g(a2 , b1 )
a2 dom. a1
g(a1 , b1 )
g(a2 , b1 )
So g(a1 , b1 ) = g(a2 , b1 ) and g(a2 , b1 ) g(a, b1 ) 8a. Also, g(a2 , b1 ) = g(a1 , b1 ) g(a1 , b) 8b [as (a1 , b1 ) is
a PSSP] g(a2 , b) 8b [a2 dominates a1 ].
So g(a2 , b1 ) is the smallest in its row and largest in its column
[End lecture 6, 23/10/14]
We only consider the cases where pairs of random strategies have finite pay-os.
(
A = { : g(, ) < 1 8 }
Let
B = { : g(, ) < 1 8}
Example:
3
(
b1 b2
= 13 , 23
4 a1 3 4 5
Let
= 34 , 14
a2 5 1
1 3
1 1
2 3
2 1
g(, ) = 3
+ 4
+ 5
+ 1
3 4
3 4
3 4
3 4
1
3
1
2
3
1
=
3 +4
+
5 +1
3
4
4
3
4
4
What are the analogues of L , U when we allow random strategies?
2B
2 B:
= supa2AS g(a,
g( , ) = inf b2BS g( , b)
)=
X
i
pi g(ai ,
= sup g(a,
a2AS
X
i
X
i
pi sup g(a,
a2AS
for all
pi 8 =) sup g(,
2A
sup2A g(,
) sup g(a,
because AS A .
0] =
a2AS
Or equivalently [by lemma 2] such that inf b g( , b) = sup inf b g(, b). RHS is called the lower value of
the game and is denoted by VL [compare this with L where we only look at pure strategies]. Clearly
sup inf b g(, b) supa inf b g(a, b) [AS A ] i.e. VL
L.
Similarly a minimax strategy for B is a randomised strategy such that sup g(, ) = inf sup g(, ).
RHS = upper value of the game = VU . We showed [lemma 1] that L U . The same argument shows
VL VU . We also have VU U , so we have L VL VU U .
If VL = VU then their common value, denoted by V , is called the value of the game and ( ,
saddle-point of the game [where represents the maximin and a minimax] and we have:
inf g( , ) = sup g(,
) [= g( ,
define a
)]
[Compare this with PSSP.] If and define a saddle-point then we say that and are good strategies
for A and B respectively. In this case, the quantity V = value = amount A should pay B for the privilege of
playing the game. V = 0 defines a fair game.
Lets find a minimax strategy for B. [Well see a better method later.] Example:
2
4 a1
a2
3
b1 b 2
1 35
2 1
L
U
=1
=) VU , VL 2 [1, 2]
=2
Let =(q1 ,q2 ) where qi 0 and q1 +q2 =1. We will find q1 and q2 to make minimax for B. Calculate:
g(a1 , )=q1 +3q2 =3 2q1
g(a2 , )=2q1 +q2 =1+q1
We need to choose q1 so that the larger of these two, 3 2q1 and 1+q1 is minimised. [N.B. we only need to
considerg(a1, )andg(a2, )bylemma2.]
At x: q1 = 23 , q2 = 13 . So
2 1
3, 3
2
3
Similarly, we find = 13 , 23 is maximin for A and VL = 53 . As these two are equal, the game has value 53 .
This is a very long-winded method, and later on well see a better method of solving 2 2 games.
Note if AS and/or BS are/is not finite, then there is no guarantee that the suprema and infima above are
actually attained by any randomised strategies.
Definition: If a game has a value and if the maximin and minimax strategies for A, B respectively exist, then
the game is called strictly determined. We will prove later that all finite games are strictly determined.
All perfect information games have a PSSP [including chess] but we wont prove this.
To solve a game means to find VL , VU and maximin and minimax strategies for A, B when these exist. The
following result enables us to check whether a guessed solution to a game is actually a solution.
[End lecture 7, 28/10/14]
In the class on Thursday 6th November, well go over questions 4, 7, 12, and 11.
The following lemma tells us how to check a guessed solution to a game is actually a solution:
Lemma 3. If ,
are randomised strategies for A, B respectively such that for all a 2 AS , b 2 BS we have:
g(a,
then the game has value V = g( ,
and ,
) g( , b)
Proof.
VU = inf sup g(, ) sup g(,
= sup g(a,
a
) [by lemma 2]
Distribution of this14
document is illegal
Hence VU VL . But we know that VL VU , so VL = VU [= V , say], which is the value of the game:
V = sup g(,
) = inf g( , )
[because the inequalities above have become equalities]. Hence is maximin and
since g( , ) lies between the terms in the above equation, we have V = g( , ).
is minimax. Finally,
Examples:
1.
4 a1
a2
3
b1 b2 b3
1 3 35
2 1 2
1
2
1
2
1
2
g(a, ) = g a, 12 = a 12 , a 2 [0, 1]
g( , b) = 12 g(< 0 >, b) + 12 g(< 1 >, b) = 12 (b + 1
1
2
b) =
1
2
is minimax and V =
1
2
15
Army A can attack one and only one target. Army B must choose one target to defend but any target connected
immediately to that chosen target is also automatically defended. E.g. if B chooses 2 then 1, 3 and 4 are also
defended. If A attacks a defended target he/she loses a point [i.e. receives 1] and B receives +1. If A attacks
an undefended target then A receives +1 and B receives 1. This is a zero-sum game [always check]!
B defends
A
attacks
1
2
3
4
5
2
6
6
6
6
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
1
17
7
17
7
15
1
Delete the inadmissible a2 by comparing it with a1 ; similarly delete a4 by comparing it with a5 . Delete b1 by
comparing it with b2 ; delete b3 by comparing it with b2 .
[End lecture 8, 29/10/14]
Doing this yields the following matrix:
B defends
2
A
attacks
2
1
1
3 4 1
5
1
1
1
1
3
1
15
1
4 and 5 are the same so we can delete one; 1 and 3 are the same so we can also delete one of those. This yields:
B defends
A
attacks
3
5
1
1
1
1
1 1
2, 2
Subgame, S
is a solution. S = sub-game. S is the
= 0, 12 , 0, 12 , 0 . Value? Value
0 2
1 0
a1
a2
2
3
8 .
0 2 2
1 0 1
a1
a2
We would find that = 13 , 23 would still be maximin and would still be the minimax [check using lemma 3]
but would no longer be an equaliser strategy because g( , b3 ) = 43 i.e. a maximin strategy is not necessarily
an equaliser strategy [same for minimax]. Likewise, an equaliser strategy is not necessarily maximin or minimax.
Exempli gratia:
b1 b2
2 4
2 4
a1
a2
is ES
g( ,
is ES
g(,
By definition, because the reverse inequality, , trivially holds is maximin. Exercise: Show
Also inf g( , ) = g( ,
) = sup g(,
is minimax.
).
1.2.6
Solving 2 2 Games
a1
a2
b1 b2
x y
z w
|z w|
|x y|
=
,
|x y| + |z w| |x y| + |z w|
N.B. the denominator 6= 0. is proper vector of probabilities.
Firstly, notice that represents a randomised strategy. Secondly, suppose x < y and z > w. Then:
g( , b1 ) =
x|z
|x
w| + z|x y|
x(z
=
y| + |z w|
|x
and g( , b2 ) =
y|z
|x
w) + z(y x)
zy wx
=
y| + |z w|
|x y| + |z w|
w| + w|x y|
zy wx
=
y| + |z w|
|x y| + |z w|
So is an equaliser strategy for A if x < y and z > w. If x > y and z < w we find, in a similar fashion, that
zy
g( , b1 ) = g( , b2 ) = |x wx
y|+|z w| . In both cases is an ES for A. Similarly:
|y
|y w|
w| + |x
,
z| |y
|x z|
w| + |x
z|
is an ES for B. So the game has equaliser strategies for A and for B and, by lemma 4, is maximin and
is minimax. Exercise: Compare with the result for the airfields example in section 1.2.5.
The G
Method
If m = n, G, the pay-o matrix, is square. If G 1 exists we can use it to check if the game has a simple
solution. If the game has a simple solution this method gives it.
Let G = (gij ) be the n n pay-o matrix. Calculate G 1 if such exists. If G 1 does not exist try adding a
constant to each entry of G to give an invertible matrix. [The game theory problem will be unaected. This
will work, i.e. 9 such a constant provided the rows of G and the row vector of all ones spans Rn .]
Assume that G
exists. We will look for a pair of ESs [one for A, one for B]. Let:
= (p1 , . . . , pn )
is an ES if g( , bj ) =
Pn
i=1 pi gij
= (q1 , . . . , qn )
= k 8j.
Distribution of this18
document is illegal
= k 1T G
=k1 G
= k 1 where
1
1 T
We need to show that these two, and , represent proper randomised strategies. This will be so provided
the entries of are all the same sign [similarly for ]. [0 is allowed.]
(
consists of k row vector of the column sums of G 1
From 2.
consists of k row vector of the row sums of G 1
In order for the expressions in 2. to be randomised strategies, G 1 must have all its row sums with the same
sign [or 0] and all its column sums with the same sign [or 0]. If this property does not hold then 6 9 a simple
solution. If these conditions are met, we have:
1T G 1
1T G 1
1T G 1
= T
=
=
1 G 11
sum of the entries of 1T G 1
sum of all entries of G 1
Similarly:
[Sum of entries of G
1T (G 1 )T
sum of entries of 1T (G
= sum of those of (G
1 )T .]
1 )T
1T (G 1 )T
sum of the entries of G
V =k=
0
5
Example: G = @7
3
1
We ignore 185
. We add the numbers outside the G 1 matrix. The row and column sums are all positive, so
the game has a simple solution:
2 5 40
11 14 4
185
1
=
, ,
, =
, ,
,V =
=
17 51 51
17 51 51
51
sum of entries of G 1
Warning: this is not necessarily the best method. Unless G is very easy to invert it is often better to seek
equaliser strategies by writing down the relevant equations.
1.2.8
Equilibrium Pairs
) g( ,
) g( , ) 8 2 A ,
2B
Distribution of this19
document is illegal
Lemma 5. If , are maximin and minimax and if the game has a value then V =g( , ) and ( , )
is an equilibrium pair.
Proof.
VL
is maximin
inf g( , ) g( ,
) sup g(,
is minimax
VU
Since the game has a value, we know VL = VU = V so everything in the line above is equal and V = g( ,
Now to show ( ,
so ( ,
1.2.9
) sup g(,
).
) [= g( ,
)] = inf g( , ) g( ,
is an equilibrium pair.
S-Games
Pre-amble
If we have a set of point x1 , . . . , xn 2 Rm we can form their convex
P hull, C(x1 , . . . , xn ), which consists of all
points of the form q1 x1 + q2 x2 + + qn xn where qi 0 8i and ni=1 qi = 1.
E.g. with m = 2, imagine a rubber band closing in on the points shaded. The convex hull is the triangle and
its interior, C(x1 , x2 , x3 , x4 ):
In general, C(x1 , . . . , xn ) is a convex set i.e. for any 2 points in C, the line segment joining them is also in C.
[Exercise.]
which separates the two convex sets. Here, (x1 ,...,xm ) is a point in Rm . E.g. with m=2 one can obviously
always draw a line [of dimension 1=(2 1)] that separates the two disjoint convex sets C1 and C2 :
If (x1 , . . . , xm ) 2 Rm then
a x > k x 2 C2
a x < k x 2 C1
In our diagram, C1 and C2 include their boundaries. However not all convex sets do include their boundaries,
e.g. {(x1 , x2 ) 2 R2 : x21 + x22 < 1}. We could have 2 disjoint convex sets, C1 and C2 , where C2 includes its
boundary but C1 does not, but one of the boundary points of C1 is in C2 :
ax k
axk
x 2 C2
x 2 C1 [ (boundary of C1 )
3 2 1 2
Exempli gratia: G =
Let = 23 b1 + 13 b2 :
4 1 5 3
21
= 23 b1 + 13 b2 lies on the line segment joining b1 and b2 and lies in C(b1 , b2 , b3 , b4 ) . [Any other
also lie in the convex hull of b1 , b2 , b3 , b4 .]
More generally, = q1 b1 + qn bn , [qi 2 [0, 1],
q1 , . . . , qn at b1 , . . . , bn respectively.
2 B will
Let S be the set of all points in Rm that represent randomised strategies for B. S is called the risk set for B.
[A can have a risk set. If we want to distinguish between the 2 risk sets, we call them S(B) and S(A)].
S [for B] is a convex set. Its the convex hull of the points in Rm that represent the pure strategies for
B. In our example, S is the triangle with vertices b1 , b2 , b3 and its interior. If m = 2, S is a convex polygon and its interior. In higher dimensions, if m > 2, S is a convex multi-dimensional polyhedron and its interior.
B must choose a point in S as her randomised strategy.
What about A? A typical randomised stratP
m
egy for A is (1 , . . . , m ) where i 2 [0, 1] and
i i = 1. This defines a set of hyperplanes in R :
Hk : 1 x1 + . . . m xm = k, for Hk , k a constant. So we have a set of hyperplanes = {Hk }. As k varies
the Hk form a set of parallel hyperplanes.
Example [m = 2] [the arrow indicates the direction, which follows from k increasing]:
where s2S is the point representing in S [as long as Hk intersects S].
N.B.allpointsonthesamehyperplanehavethesamepay-o[namelyk]. Thinkofthisasanewgameinwhich
Bs randomised strategiesre points of S, the risk set, and As pure strategiesre the m coordinates in Rm .
ThepayotoAiss.ThisiscalledanS-game.
22
E.g.: G =
3 2 1 2
4 1 5 3
= 23 b1 + 13 b2 :
1 2
3, 3
Question 4
R = rock, S = scissors, P = paper.
R
P
S
and
R
0
41
1
2
P S
3
1 1
0
15
1
0
=
=
1
3,
1
3,
1
3,
1
3,
1
3
1
3
is minimax. V = 0.
Question 7
Zero-sum game because all targets are equally valuable to
2
0
0
1
60
0
0
6
6 1 0
0
G=6
6 1
1 0
6
4 1
1
1
1
1
1
The obvious thing to do is to delete inadmissible pure strategies. This leads to:
a2
a5
Consider S = 12 , 12 , S =
the whole game we have:
1 1
2, 2
|
|
b1
0
1
b6
1
0
1
1
0, , 0, 0, , 0
2
2
1
2.
Exercise
1
1
, 0, 0, 0, 0,
2
2
|
|
|
|
b1
0
1
1
1
b4
1
0
0
1
b5
1
0
0
1
b8
1
1
1
0
b1
0
1
1
b5
1
0
1
b8
1
1
0
1 1 1
, ,
3 3 3
a1
a2
1 1 1
, ,
3 3 3
2
3
V =
6
8.
b1 b2
x y
z w
Question 11
Suppose A bids a and B bids b.
8
>
<400 a
A gets:
b
>
:
200
8
>
<a
B gets:
400
>
:
200
a>b
a<b
a=b
a>b
a<b
a=b
A
[100]
0
2
63
6
62
6
41
0
2
0
1
2
3
4
1
1
2
2
1
0
2
2
2
2
1
0
3
3
3
3
2
0
4
3
4
47
7
47
7
45
2
If we change this to a zero-sum game by subtracting 200 from each pay-o, Value = 0. This is a fair game.
If any real-numbered bid from 0 to 400 is allowed, consider A and B bot playing / bidding < 200 >. This
is a PSSP. Check this using Lemma 3.
Question 12
By contradiction. Suppose
that:
) 8 2 A
By the definition of
as minimax.
Question 8
Part I
Here we have a PSSP:
3
G = 42
5
4
6
4
3
1
05
3
such
Part II
3
x 0 0
G = 4 0 y 05
0 0 z
p=
yz
xz
xy
, q=
, r=
D
D
D
where D = xy +xz +yz. If x, y, z all have the same sign then (p, q, r) represents a randomised strategy. V =
xyz
D
If they didnt all have the same sign, delete inadmissible strategies ! a game with value 0.
Part III
Add 1 to each entry in Part II.
[End lecture 12, 06/11/14]
Thursday 13th November. 40 minutes long. Justify your answers! [Test 2: 11th December.]
We will represent minimax and maximin strategies geometrically. Let:
Q = {(x1 , . . . , xm ) 2 Rm : xi
i 2 {1, . . . , m}}
Q is the set of points in Rm whose maximum coordinate is . [ may not be small or even positive!] m = 2:
Q is convex and it includes its boundary. [The interior of Q is also convex.]
Let S =S\Q . S contains those points of the risk set, S, whose maximum coordinate is .
If isverylarge andpositive then thewhole ofS willlie in Q and S =S: if islargebutnegative, thenQ
may not intersect S at all. Somewhere in-between we have the smallest such that Q intersects S [S 6=;].
[Sinceweassumedthatthepay-osinAareboundedfrombelow,thisisfinite.]
25
We let:
M = inf{ : S = S \ Q 6= ;}
Claim: SM = S \ QM is the set of minimax points for B. This is because it contains those points of S that
minimise the maximum coordinate, i.e. they minimise supa g(a, ) [= sup g(, )].
Typically, for m = 2, we have:
Here, SM consists of the single point SM alone. Note that other situations can arise, e.g. multiple minimax
points and minimax point for B that is a pure strategy for B:
26
Typically,form=2,wehave[thedirectionofthelinesisdeterminedby,thesolidlineistheseparatingline
that yields the limiting k]:
The coefficients of the separating line give the maximin strategy for A.
In many situations the separating line in the case m = 2 will lie along an edge of S. Suppose we have:
Hk has equation 1 x1 + 2 x2 = k . The point of contact P has x1 > x2 . So we can increase the pay-o to A
by increasing 1 [and so decreasing 2 ]. This rotates the line through P clockwise [P remains fixed].
However we cannot go beyond the edge P Q because otherwise, as it reaches Q, the point of contact changes.
Similarly if Q were the point of contact we could increase the pay-o to A by increasing 2 [as x2 > x1 for Q]
and thereby rotating the line anti-clockwise. The limiting [optimal] situation is that the optimal direction for
A is parallel to P Q.
Notes [for m = 2]:
1. The slope of the separating line is never positive [1 + 2 = 1]
2. The separating line passes through (M , M ) [proved later]
3. Typically [m = 2], we get the diagram in the top below. The direction of the separating line gives us the
direction for As maximin strategy. Sometimes the separating line [m = 2] is not always along an edge
of S and there may be more than one maximin strategy. [Illustrated in the other two diagrams below]
27
We will show [later] that the separating line [m = 2] passes through (M , M ). And, as weve already seen,
sometimes the separating line is not along an edge of the risk set S.
[End lecture 13, 11/11/14]
Class Thursday 20th November.
1.2.10
Recall: B is trying to minimise supai g(ai , ) = maxai g(ai , ). [ai are pure strategies for A.]
Notation: In an S-game g(, ) = g(, s) where s is the point representing in the S-game. We can write the
pay-o to A as g(, ) = g(, s) = a s [a dot product]. = (1 , , m ), s = (s1 , , sn ).
Theorem 1 (The Minimax Theorem (A very famous theorem proved by Von Neumann)). Every finite game
is strictly determined [i.e. has a value and maximin [for A] and minimax [for B] strategies].
Proof. BS is finite so S is closed [i.e. it includes all its limit points] and S is bounded. Let sM 2 S be a
minimax point for B [sM 2 S because S contains its limit points]. By definition of minimax:
a2AS
Recall: QM = {(x1 ,...,xm ) 2 Rm : xi M,i = 1,2,...,m}. Let T be the interior of QM. So T =
{(x1 ,...,xm )2Rm :xi <M,i=1,2,...,m}. T isaconvexsetandT\S =;becauseallpointsinS haveat
leastonecoordinate M [bydefinitionofQM]. S isalsoconvex[andincludesitsboundary]so9aseparating
hyperplane ax=k where a=(a1 ,...,am ). [N.B. these are not pure strategies for A, theyre components of
a row vector, a. x=(x1 ,...,xm ).] This separates S and T:
(
ax k
axk
x2S
x 2 T [ boundary of T [i.e. QM ]
sM 2S
Pa
i
ai .
sM
i 2QM
a (sM
0 8i = 1, . . . , m. [If
i)
[This is a vector of probabilities whose entries sum to 1.] The represents a randomised
Pk
i
ai .
Well show that is maximin for A and V = value of the game. From 1 :
(
s V
x V
s2S
x 2 QM
M = x V
sM
definition of minimax
M V s for all s 2 S
3
2
Looking at the 2 extreme ends of this inequality we can use lemma 3 to show that is maximin [we know
already that sM is minimax and V is the value of the game because if we put = and s = sM in 4 this
gives V = value and V = M . We get equality through 4 .]
b 1 b2 b 3 b 4 b 5
| 4 5 8 2 6
| 1 8 5 6 6
29
In this diagram, the dotted lines denote the boundaries of QM . sM is the point
of the line x1 = x2 and the line joining b1 and b4 ]. V = 22
7 .
The minimax for B is 47 b1 + 37 b4 [M1GLA!] or
4
3
7 , 0, 0, 7 , 0
22 22
7 , 7
[the intersection
x1 = g(a1 , )
x2 = g(a2 , )
So As maximin strategy is 57 a1 + 27 a2 .
The only admissible strategies for B are those on the line segment joining b1 and b4 . In fact, from the
pay-o matrix b2 , b3 , and b5 are inadmissible. We could have deleted these to give:
a1
a2
b1 b4
| 4 2
| 1 6
2. A has 3 strategies and B has 2. There are no obvious dominated pure [inadmissible] strategies. Plot the
risk set S(A) for A. n = 2:
a1 |
a2 |
a3 |
b1 b2
3 5
4 1
0 8
The dotted lines mark the boundary of the limiting Q for A. tM , the intersection of the line x1 = x2
and the segment from a1 to a2 gives the maximin point for A: 35 a1 + 25 a2 . [Exercise.]
The separating line is along a1 a2 . This has equation x2 + 4x1 = 17. We just have to normalise this:
3.
a1
a2
x2 4x1
17
+
=
=V
5
5
5
4 1
3 2
=
,
=
, ,0
5 5
5 5
b1 b2 b3
3 3 5
5 2 3
PSSP at (a1 , b2 )
Here there are multiple minimax points for B; they join (3, 3) to b2 . The separating line here is simply
x1 = 3 i.e. g(a1 , ) = 3 ! = a1 . b2 is the only admissible strategy for B; the other minimax points
are inadmissible
b here is Bayes with respect to . is represented by the parallel lines. Exercise: Show that ,
and minimax respectively () is Bayes with respect to and vice-versa.
are maximin
Admissibility in S-Games
Intuitively, admissible strategies for B are those points of S in the south west corner of our diagrams.
More precisely (assuming a finite game): Assume S is bounded and includes its boundary. For s 2 S let
Q(s) = {x 2 Rm : xi si , i = 1, . . . , m} where x = (x1 , . . . , xm ), s = (s1 , . . . , sm ). N.B. this is dierent from
Q earlier.
Below are two diagrams with m = 2 in which Q(s) \ S 6= {s} i.e. s is inadmissible and Q(s) \ S = {s} i.e. s is
admissible.
Lemma 6. If a Bayes strategy s2S with respect to 2A is unique then it is admissible.
Proof. Suppose s is inadmissible and let =(1 ,...,m ). Then 9s6=s such
that for all i we have si si
withatleastonestrictinequality. Hence,bymultiplyingbyi andsumming, P i si P i s
i [weneedin
case one of the i s is 0] i.e. g(,s)g(,s ).
But s is Bayes with respect to so g(,s) g(,s ) so g(,s)=g(,s ), which contradicts the uniqueness
of s as the unique Bayes strategy for B with respect to .
Lemma 7. If s is Bayes with respect to =(1 ,...,m ) where each i >0 8i then s is admissible.
(
g(ai , s) g(ai , s ) 8i
Proof. Suppose s is inadmissible. Then 9s 2 S such that
g(aj , s) < g(aj , s ) for some j
P
P
i i g(ai , s) <
i i g(ai , s ) = g(, s ) contradicting s as Bayes with respect to .
hence g(, s) =
i >0
Proof. If s is admissible then s 2 (S) and S and Q(s)/{s} are disjoint convex sets so 9 a separating hyperplane
between them. The coefficients of this hyperplane give the required .
1.2.12
1. Cheat! There are many game solvers on the web that solve finite games. See the following site which
will only work with numerical values: http://banach.lse.ac.uk/form.html
[End lecture 15, 18/11/14]
2. Linear programming for finite games. Simplex method algorithm. We wont look at this approach.
3. (a) Look for a PSSP
(b) Delete any obvious inadmissible strategies. These are usually the pure strategies but not necessarily:
a1
a2
b1 b2 b3
3 2 4
4 5 1
2 1
3, 3
1
1
3
b2 + b3 gives
3
2
2
4. Look at sub-games [i.e. we have a subset of As pure strategies and a subset of Bs]:
Sometimes we can find a solution to a sub-game that, when extended to the whole game, can give a
solution to the whole game. [One can simply try putting 00 s for other pure strategies in an attempt to
solve the whole game. Use lemma 3 to check.] In fact, It can be shown that 9 at least one solution that
can be obtained from extending a simple solution [i.e. a pair of ESs] to some sub-game: [QM intersects
S on a face or edge or point of S [in some linear subspace of S] and the pay-o on the separating
hyperplane is the same at all points of this intersection]. Example:
3 5 2
4 1 3
0 8 2
The G 1 method does not work here, theres no simple solution [all the row sums are > 0, but this is
not true for all the column sums]. We need to look at sub-games. Try the following:
b1 b2 b3
3 5 2
4 1 3
a1
a2
b1 is inadmissible !
5 2
! S = 25 a1 + 35 a2 ,
1 3
= 15 b2 + 45 b3 . Try =
2 3
5, 5, 0
= 0, 15 , 45
b1 b2 b3
4 1 3
0 8 2
11
4
[exercise]
Question 5
g(, )
(, )
in equil.
g(, ) g( , ) g( , ) g(, )
( , )
( , )
(, )
Hence g(, ) = g(, ) = g( , ) = g(, ). To show (, ) is in equilibrium we need to show that for all
2 A , 2 B, we have:
g( , ) g(, ) g(, ) by the definition of an equilibrium pair
To see this, note:
g( , ) g( , )
(, )
From above
Also = g(, )
g(, ) g(, )
(, )
Distribution of this34
document is illegal
Question 6
b 1 b 2 b3
1 2 3
2 3 2
3 2 x
a1 |
a2 |
a3 |
Part I
Two methods:
The G
method. What makes the row and column sums have the same sign? Answer: 1 x 3
x
4
1
4
r=
1
2
x+7
4
Part II
1 2 3
G=2 3 2
3 2 x
1. x > 3: b3 is inadmissible [compare it with b1 ]. Now delete a1 [compare it with a2 ]. We are left with:
a2 |
a3 |
b1 b2
2 3
3 2
Lets guess: = 12 , = 12 , 12 . S , S are ESs for the sub-game so are maximin and minimax for
the sub-game. Extend this to the whole game. We get that the value = 52 .
Distribution of this35
document is illegal
We see that a2 is maximin [and it happens to be an ES]. There are many separating lines all going through
(2, 2). B has minimax strategy qb1 + (1 q)b3 in the sub-game for 12 q 23 xx [one way of seeing this
is by asking the question: For which values of q does Lemma 3. tell us we have a solution to the sub-game?
Another way of forming this inequality is noting that the separating line is of the form (y 2) = m(x 2)
where m is the gradient of a line between the lines joining (2, 2) and (3, x), and (2, 2) and (1, 3), so
x 2 m 1.
m
1
=
,
m 1 m 1
Extend this to the whole game. The value = 2.
Question 9
Let
So c > g(, ) 8.
So
Question 14
0
a
b
G=
a b
0 c
c 0
This has no PSSPs. If a = 0 we look at possibilities for b and c. If both are negative, then (a3 , b3 ) is a PSSP
and V = 0. If at least one of b and c is 0 then [exercise] show that 9 a PSSP. So a 6= 0. Similarly b, c 6= 0.
To show a, b have opposite signs, look at these two cases:
a, b are both > 0
a, b are both < 0
In both cases we get PSSPs.
Show a, c have the same sign, again by contradiction.
Look for equaliser strategies. Note that the pay-o matrix is anti-symmetric (G =
and V = 0. Write down the equations for an ES for A [same for B]. We find:
=
to be a pair of ESs and hence ,
strategies.
(c, b, a)
a b+c
are maximin and minimax respectively. Check that these are randomised
Distribution of this36
document is illegal
Question 13
Lynda only gave numerical solutions to this question. Heres my solution:
Part I
G = a1 |
a2 |
b 1 b 2 b 3 b4
1 3 2 7
8 2 4 1
So the line x1 = x2 intersects the line segment b2 b3 . So the separating line encompasses the segment b2 b3 .
y = mx + c:
(
(
2 = 3m + c
m= 2
y
8
2 1
8
normalising 2x
=)
=) y + 2x = 8 =)
+ =
=) =
,
,V =
3
3
3
3 3
3
4 = 2m + c
c=8
As for
3
+
2
2
(
8
3
2
= 38 =)
3
4
2
3
+ 2(1
2 + 4(1
2
2)
=
2) =
8
3
8
3
=)
2
= ,
3
1
=)
3
2 1
0, , , 0
3 3
Part II
1 4 3
G=3 2 4
5 1 4
4
3
3 4
19
=
, 0,
,
=
, ,0 ,
V =
7
7
7 7
7
[End lecture 17, 20/11/14]
Distribution of this37
document is illegal
1.3
1.3.1
Non-Zero-Sum Games
Introduction
1)
and (2 ,
2)
4. (, ) is an equilibrium pair () ,
2)
and (2 ,
1)
Non-Cooperative Games
Examples:
1. Prisoners Dilemma:
Two suspects, A and B, are held in dierent cells and they cannot communicate. The officer in charge
suspects that they are guilty of major crime but does not have enough evidence to convict them. Each
prisoner is given two choices:
DONT CONFESS
= a1 for A
= b1 for B
CONFESS
= a2 for A
= b2 for B
If both were to confess, (a2 , b2 ), they each would receive eight years in prison. If neither confesses, they
get one year each in prison because they can be convicted on some other minor crime. If one confesses
and the other does not, the one who confesses gets three months and the other gets ten years:
B
b1= dont confess b2 = confess
( 1, 1)
( 10, 0.25)
A a1 = dont confess
a2 = confess
( 0.25, 10)
( 8, 8)
Multiply by 4 and add 40:
a1
a2
b 1 b2
(36, 36) (0, 39)
(39, 0) (8, 8)
Whatever B chooses, A would prefer a2 . Similarly, whatever A chooses, B would prefer b2 . However if
A plays a2 and B plays b2 then both do rather badly. What appears to be best for them, as individuals,
is not good when they both choose to confess
Distribution of this38
document is illegal
a1
a2
B
b
b2
1
(2, 1)
( 1, 1)
( 1, 1)
(1, 2)
Assume theres no communication about the decision. The man says to himself: I want a1 , and she
wants b2 , but we both do badly like that. So, if I choose a2 [i.e. give in to her] and she chooses b2 then
we both do pretty well. However B will argue in a similar way and give in to him by playing b1 . This
gives the combination (a2 , b1 ), which is bad for both. Moral: communicate in advance!
3. Evolution:
Dawkins [Selfish Gene] Maynard-Smith. Worked models for animal behaviour. In a population of
hawks and doves, each member behaves either like a hawk [H] or a dove [D].
When two members confront each other, we have the rules:
If 2 hawks meet they always fight until one gets badly injured [cost = 50]. On average, the loss to
a hawk in an H H confrontation is 25 [half the time they win, half the time they lose]
When a hawk and a dove meet, the hawk wins 50 because the dove runs away [dove gets 0]
When 2 doves meet no one gets hurt, but one [who gets 0] runs away, the other gets 50. However
both lose 10 for wasting time in a staring match. In a D D confrontation the average gain
= 12 50 10 = 15. We can model this as a two-person game: 2 players, A and B. Each can choose
H or D. We get a pay-o table:
H
D
B
H
D
Question: is there a mixture [i.e. a randomised strategy] of H and D that is stable? I.e. a mixture
such that any small deviation from the stable state is soon brought back to it. Looking for an
equilibrium.
[End lecture 18, 25/11/14]
4. Cuban missile crisis in 1962:
Russians wanted to put nuclear weapons on Cuba, threatening the USA. The US had two strategies:
Natural blockade to stop Russians sending further weapons
Air strike to wipe out existing weapons on Cuba, followed by an invasion of Cuba
Russia also had two strategies:
Withdraw (W)
Not withdraw
Russians
W
Not W
USA Blockade Compromise for both Soviet victory, more powerful weapons
Airstrike
USA victory
Nuclear war
In the end, Russia withdrew and persuaded Kennedy not to invade Cuba. Game Theory in the
Humanities by S.J. Brams MIT Press
Distribution of this39
document is illegal
Definitions:
1. The pair ( ,
gA ( ,
gB ( ,
)
)
gA (, ) 8 2 A
gB ( , ) 8 2 B
The definition is due to Nash [1950] ! Nobel prize. Neither A nor B has an incentive to change strategy.
N.B. This definition agrees with our earlier definition for zero-sum games, why?
We call ( ,
an equilibrium pair
1)
and (2 ,
2)
2)
and (2 ,
1)
are also
3. A non-cooperative game is called Nash solvable if every pair of equilibrium pairs is interchangeable. By
convention, any game with 0 or 1 equilibrium pair is Nash solvable.
Examples of equilibrium pairs:
1. Chicken [the starred entries in the following are the E. pairs [pure]]:
We have a long straight road on a single track. To swerve or not to swerve? That is the question. Its of
course more macho not to swerve, but one doesnt want to get killed!
B
swerve
dont swerve
swerve
(2, 2)
(1, 4)
A
b1
b2
(2, 1)
( 1, 1)
( 1, 1) (1, 2)
Here are two equilibrium pairs but 9 a pair of randomised strategies in equilibrium:
= 35 a1 + 25 a2
= 2b + 3b
5 1
5 2
Distribution of this40
document is illegal
There is a situation where it is easy to find equilibrium pairs:
Lemma 9. In a non-zero-sum game played non-cooperatively [or cooperatively], if is an equaliser strategy
forAusingBspay-os,and isanequaliserstrategyforB usingAspay-os,then( , )isanequilibrium
pair.
(
(
gA (ai , ) = CA 8i
gA ( , ) = CA = gA (ai , ) 8i
Proof.
C
,
C
are
constants
so
A
B
gB ( , bj ) = CB 8j
gB ( , ) = CB = gB ( , bj ) 8j
Examples:
1. Battle of the Sexes
a1
a2
b1
b2
(2, 1)
( 1, 1)
( 1, 1) (1, 2)
2
1
1
and in this game B has equaliser strategy 25 , 35 . Bs pay-os are
1 1
1
and in this game A has equaliser strategy 35 , 25 . N.B. Not all equilibrium pairs occur this way.
As pay-os are
1
2
2. Evolutionary Model
H
D
H
D
( 25, 25) (50, 0)
(0, 50)
(15, 15)
25 50
25
7 5
7 5
which gives = 35
60 , 60 = 12 , 12 . Similarly = 12 , 12 . So a popula0
15
5
hawks and 12
doves is stable or in equilibrium. This is called an evolutionary stable strategy
As pay-os are
tion with
7
12
1)
2)
if
[In a zero-sum game any 2 equilibrium pairs are both equivalent and interchangeable.]
Examples: Look at Chicken and the Battle of the Sexes. Equivalent? Interchangeable?
In both Chicken and the Battle of the Sexes the equilibrium pairs of pure strategies are neither equivalent nor
interchangeable.
Geometrical Interpretation of Non-Zero Sum, Non-Cooperative Games
(
x = pay-o to A
We plot the points (x, y) where
Suppose = pa1 + (1 p)a2 ,
y = pay-o to B
a game where each player has two pure strategies.
= qb1 + (1
q)b2 in
p)gA (a2 , b1 ) + (1
p)(1
q)gA (a2 , b2 )
The set of such points corresponding to randomised strategies for A and B is called the pay-o set, S.
Firstly, any point of S must be in the convex hull of the 4 points (ai ,bj) [i,j 2{1,2}] because 0 pq 1,
0p(1 q)1 &c. and pq+p(1 q)+q(1 p)+(1 p)(1 q)=1.
HowevernoteverypointintheconvexhullisnecessarilyanelementofS becausethecoefficientspq,p(1 q),
q(1 p)and(1 p)(1 q)areconstrained[i.e. theyhaveaparticularstructure];thisarisesfromthefactthat
the players choose their strategies independently.
Example: Battle of the Sexes:
a1
a2
b1
b2
(2, 1)
( 1, 1)
( 1, 1) (1, 2)
q)( 1, 1) + q(1
p)( 1, 1) + (1
p)(1
q)(1, 2)
= q[p(2, 1) + (1
Here, we let p vary in [0, 1]. S consists of the 2 straight line segments [( 1, 1) to (2, 1) and ( 1, 1) to (1, 2)],
the parabolic arc and the region interior to these. Note that there are points in the convex hull that are not in S.
Note the following:
We would have produced the same S if we had used p instead of q in the expressions for (x, y)
All points on the line segment ( 1, 1) to (2, 1) are in S [take q = 1] and all points on the line segment
joining ( 1, 1) to (1, 2) are in S [take q = 0]
The line segment joining (1, 2) and (2, 1) is not in S. Note that ( 1, 1) and (2, 1) are in the same row
of the table of pay-os, as are ( 1, 1) and (1, 2), whereas (1, 2) and (2, 1) have no row or column in
common
Exercise: Show that the equation of the parabolic arc is 5(x
y + 1)2 = 4(3x
2y + 1).
In general, S depends on the orientation and location of the four points representing the pure strategies.
Example [2 2 game]:
a1
a2
Possibilities [we will plot , ,
b1 b2
and ]:
Here, the line segments joining and , and , and , and all lie in S [set p = 1, p = 0, q = 1, q = 0
respectively]. However the line segments joining and or and may not be in S.
Sometimes S comprises the whole of the convex hull of the four points representing pairs of pure strategies.
E.g. if the four points , , , are the vertices of a convex quadrilateral, labelled in the order , , , we have
diagrams (a) or (b). If M is in the convex hull, we can find p, q 2 [0, 1] such that = (p, 1 p), = (q, 1 q)
are represented by the point M .
In general, this will not be so and various dierent diagrams can result. In (d) we have a curve on the lower
part and two straight lines on the upper part. The whole of the shaded area = S. The lines drawn are in equal
increments. [In (c) there are two straight lines on the lower part, and a curve on the upper.] Other possibilities
can occur.
[End lecture 20, 27/11/14]
Theorem 2 (Nash 1950). Every finite 2-person game has at least one equilibrium pair of strategies.
Preamble: Let = (1 , . . . , m ), = ( 1 , . . . , n ) be any randomised strategies for A, B respectively.
(
ri = max {gA (ai , ) gA (, ), 0} i = 1, . . . , m
sj = max {gB (, bj ) gB (, ), 0} j = 1, . . . , n
Example:
b2
b1
(4, 0) (0, 2)
(3, 1) (4, 0)
a1
a2
gA (, ) =
17
6 ,
so r1 = max gA (a1 , )
17
6 ,0
Let: =
1 2
,
3 3
1 3
,
4 4
17
6 ,0
11
12 .
[Exercise.]
Proof (Outline).
Let i =
) and
and = (1 , . . . , m
cise].
=(
1, . . . ,
n ).
i + ri
P
1 + ri
1+
+ sj
P
sj
Define f (, ) = ( , ). We now use Brouwers Fixed Point Theorem [without proof]: a continuous function
from a closed bounded convex set C into C has a fixed point. f satisfies the requirements of this theorem. f is
a map from Rm+n to itself. So f has a fixed point, i.e. 9(, ) such that f (, ) = (, ). We will show that
the pair (, ) is in equilibrium () f (, ) = (, ).
Firstly =) : If (, ) is an equilibrium pair then, by definition, gA (ai , ) gA (, ) 8i so ri = 0 8i and hence
i = i 8i and so = . Similarly = , so f has a fixed point.
Secondly (= : By contradiction. Suppose (, ) is not an equilibrium pair. Then either [or both]:
( 0
9 such that gA (0 , ) > gA (, )
0
9 such that gB (, 0 ) > gB (, )
0 ). We know
Well just consider
the first of these situations as the second is very similar. Let 0 = (10 , . . . , m
P
that gA (0 , ) = i i0 gA (ai , ) hence 9 at least one
gA (, ). Otherwise, there would
P i such that gA (ai , ) > P
be a contradiction with the first case: gA (0 , ) i i0 gA (, ) = gA (, ) i i0 = gA (, ).
P
P
For
P this i, we have ri > 0 and, as another consequence, we know ri > 0. However, gA (, ) = i i gA (ai , ) =
hence gA (ak , ) gA (, ) for some k with ak > 0 otherwise gA (, ) > gA (, ),
k:k >0 k gA (ak , ) and
h
i
P
clearly a contradiction uses
=
1
.
k
k:k >0
For this particular k, we have rk = 0 and k =
6= and (, ) is not a fixed point of f .
Pk
1+ i ri
ri > 0. Hence k 6= k , so
44
gA ( ,
0
gB ( ,
0
0
)
)
) such that
gA (, )
gB (, )
B
(1, 1) (0, 0)
(0, 0) (2, 2)
(1, 1) is in equilibrium, but is not jointly admissible. (2, 2) is a solution in the strict sense.
Prisoners Dilemma has no jointly admissible equilibrium pairs, so it is not solvable in the strict sense.
Repetition of Non-Cooperative Non-Zero-Sum Games [or Supergames]
Richard Dawkins The Selfish Gene, Poundstone: Prisoners Dilemma. Consider:
B[you]
cooperate
defect
(3, 3) (0, 5)
A[me] cooperate [nice]
defect [nasty]
(5, 0) (1, 1)
If you only play one game, its best to defect [though both do badly]. But how should we play if we repeat the
game a large number of times? You need to think about your strategy. Generally, both players do reasonably
well if both cooperate, but there is always the temptation to defect to gain more. Would your opponent forgive
you? I somehow doubt it.
In 1981 Robert Axelrod [American political scientist] conducted an experiment. He held a competition and
asked people to submit strategies for playing repeated Prisoners Dilemma games. The computer programs he
added randomly. The participants werent told how many rounds, but they played each other for about 200
games!
i=1,...,m
j =1,...,n
x, y
y.
Definition: The set of admissible points is called the pareto optimal set.
S is a convex polygon. Typically:
sA = sup inf gA (, )
I.e. look at As pay-os alone and calculate the value of the game to A.
This is the maximum A can guarantee getting, ignoring Bs pay-os. A wont settle for less than this. sA is
As security level. Similarly:
n
o
sB = sup inf gB (, )
sA and y
0
2
1
1
1
2. Prisoners Dilemma:
A!
36 0
39 [8]
B!
sA = 8
36 39
0 [8]
sB = 8
Again, in general, be careful as the numbers in the pay-o table are gains to B.
0
[PSSPs]. A cannot reasonably expect to get more than the x-coordinate of M = 38 13 and B cannot
0
reasonably expect to get more than the y-coordinate of L . Exercise: What is this value?
0
Question: What is a fair solution? I.e. which points on L M should represent a solution?
The Nash Arbitration Procedure [Also Called the Shapley Solution]
Choose s 2 S to maximise (s1 sA )(s2 sB ) [the excess over sA multiplied by the excess over sB ] over all
points in N where s = (s1 , s2 ), with respect to s1 , s2 . Why choose this function? The advantages are:
Symmetric on s1
sA and s2
sB
Scale invariant
[End lecture 22, 03/12/14]
Question 10
Part I
=
1 1 1 1
4, 4, 4, 4
Part II
1 2 3
6 1 2
2 3 2
There are two approaches:
1. Write down equations which we need for a pair of ESs
2. G
1 1 1
3, 6, 2
1 1 1
6, 3, 2
,v=
7
3
Question 15
b2
b1
(x, z) (y, z)
(x, w) (y, w)
a1
a2
All pairs of pure strategies are in equilibrium. We need to show that every pair of randomised strategies is in
equilibrium:
Let = pa1 + (1
that:
p)a2 ,
For A (B is similar):
gA ( ,
..
.
p)(1
q)gA (a2 , b2 )
q)gA (, b2 ) + &c. 8
pqgA (, b1 ) + p(1
gA (,
p)gA (a2 , b1 ) + (1
) 8
Question 16
Let (1 ,
1)
and (2 ,
2)
)=
0)
2
1 + (1
2.
gA (1 ,
1)
+ (1
)[gA (1 ,
)2 ,
gA (,
= gA (,
= gA (,
1)
1)
+ (1
+ (1
2)
)[gA (,
)gA (,
2)
+ gA (2 ,
and (2 ,
2)
1)
+ gA (,
1 )]
+ (1
)2 gA (2 ,
2)
+ (1
)2 gA (,
2)
2)
) 8
0)
Question 17
a1
a2
b2
b1
(1, 4) (9, 0)
(7, 1) (3, 3)
gA (a1 , b1 ) = 1 < 7 = gA (a2 , b1 ), so (a1 , b1 ) is not in equilibrium. Check the other three pairs of pure strategies
are not in equilibrium. Look for ESs for each player using the others pay-os.
For A:
1 9
[these are gains to B] !
7 3
1 1
2, 2
For B:
4 0
[these are gains to A] ! =
1 3
1 2
3, 3
[Find gA (ai ,
is in equilibrium].
Question 18
p),
= (q, 1
q) are in equilibrium.
gA (, ) = 36pq + 39q(1
= 5pq + 31q
We know gA (, )
gA (a1 , ) and gA (, )
p) + 8(1
p)(1
q)
8p + 8
31q + 8 so p(5q
8)
=)
p = 0. If p = 0,
a1
a2
B
b2
b1
(2, 1) (3, 0)
(0, 4) (2, 5)
The dot on the pareto optimal set, i.e. the line segment between (2, 5) and (3, 0), represents a Shapley solution.
[2] 3
[1] 0
A:
PSSP = a1 , SA = 2. B :
gains to B. PSSP = b1 , SB = 1.
0 2
4 5
The Shapley solution is (s1 , s2 ) where (s1 2)(s2 1) is maximised over N , i.e. along the relevant part of the
line 5s1 + s2 = 15. We find (s1 2)(s2 1) = (s1 2)(14 5s1 ) = 5s21 + 24s1 28, a quadratic in s1 [[].
Dierentiation tells us that the maximum occurs when s1 = 12
Check that this is in N . N.B. It is
5 , s2 = 3.
a global maximum.
Exercises [Cooperative Games]: Give examples:
1. Not all equilibrium pairs are jointly admissible
2. Not all jointly admissible pairs are in equilibrium
3. It is possible for to be inadmissible [ignoring Bs pay-os] and vice-versa for , yet (, ) is in equilibrium
2
2.1
Utility
Introduction
Consider the choice between: decision d1 = accept 100; decision d2 = get 200 with probability
with probability 12 . Most people prefer d1 even though both have the same expected pay-o.
1
2
and 0
Now consider the St. Petersburg Paradox [Bernoulli]. Toss a fair coin until you get tails. You can get 2n if
the first tail occurs at toss n.
E(pay-o) =
1
X
n=1
1
X
n=1
2n
1
!1
2n
Although the expected pay-o is infinite, people wont pay much to play it. So the expected pay-o is not a
good measure to compare decisions about money.
In addition, the results of making decisions may not even be monetary; they may be prestige, goodwill &c.,
which are hard to quantify. We need a scale for value. We also may have more than one objective. We need
to make a comprehensive list of possible decisions and, for each decision, a list of consequences.
Let R = the set of all possible consequences of a set of decisions. The unknown events need to be taken into
account. The elements of R may not be numerical, and may involve unknown events. We have R = {C}, C
is a consequence.
Example: You are invited to your friends house one evening at 8p.m., but you do not know if they will provide
dinner. Two decisions: d1 = eat beforehand; d2 = do not eat beforehand. Your friends will either provide
dinner [event 1 ] or not [event 2 ]. The true value of is unknown. There are four consequences: C1 = (d1 , 1 ),
C2 = (d1 , 2 ), C3 = (d2 , 1 ), C4 = (d2 , 2 ). We have a set of consequences {C1 , . . . , C4 }P
and, P, a probability
distribution over these. An element of P is a set of probabilities p1 , p2 , p3 , p4 . [pi 0, i pi = 1.]
The notation (p1 C1 , p2 C2 , p3 C3 , p4 C4 ) denotes a gamble or a lottery in which Ci occurs with probability pi .
In our first example, R = {0, 100, 200} {C0 , C100 , C200 }. We were asked to compare the two lotteries
(1 C100 ) and 12 C0 , 12 C200 .
2.2
By comparing lotteries we will develop a common scale for utility [i.e. value]. We can think of decisionth
making as comparing
P lotteries. R = {C1 , . . . , Cm }, Ci = i consequence. A lottery is L = (p1 C1 , . . . , pm Cm )
where 0 pi 1, i pi = 1. We can form a compound lottery such as L = (pL1 , (1 p)L2 ) where L1 and L2 are
lotteries, which means you receive the result of L1 with probability p and the result of L2 with probability 1 p.
Definition: We write L1 < L2 if the decision-maker prefers L2 to L1 . L1 > L2 [L1 preferred to L2 ]. L1 L2
[indierent between L1 and L2 ]. L1 . L2 [i.e. L1 < L2 or L1 L2 ].
Example at the friends house:
d1
d2
1
2
C1 [full / costs] C2 [costs]
C3 [best]
C4 [starve]
51
25
= 0.694
36
P (B > G) =
21
= 0.583
36
P (G > R) =
21
= 0.583
36
All >
1
2
Non-transitive
[End lecture 24, 09/12/14]
Axiom 3 (Substitutability of Lotteries).
1. If L = (p1 L1 , . . . , pn Ln ) and L1 L01 , then L (p1 L01 , p2 L2 , . . . , pn Ln )
2. If L =P(p1 L1 , . . . , pn LP
n ) and Li (qi1 C1 , . . . , qim Cm ) [qs are probabilities] then we should be indierent:
L [ i pi qi1 C1 , . . . , i pi qim Cm ] [like the law of total probability]
Criticism: Some people like a multi-stage lottery [those who like gambling]. Axiom 3 reflects this fact.
Heres an example from the 2013/14 course:
Ci = i, L =
1
4
5 L1 , 5 L2
where L1 =
1
1
2 C2000 , 2 C0
1
1
4
L
C2000 ,
C0 , C200
10
10
5
Axiom 4 (Principle of Irrelevant Alternatives). If L1 , L2 , and L are 3 lotteries then for any 2 (0, 1) we
should have L1 < L2 () (L1 , (1 )L) < (L2 , (1 )L).
I.e. introducing a new lottery, L, should not change our preferences between L1 and L2 .
Axiom 5 (Continuity). For any consequences C1 < C < C2 9 2 (0, 1) such that C ((1
However this may be difficult.
)C1 , C2 ).
Distribution of this52
document is illegal
)C, C).
Proof. Axiom 5 tells us that 9 at least one such . Suppose [for a contradiction] that 91 < 2 such that:
C ((1
2 )C, 2 C)
((1
Let L be the lottery ((1
Let
= 2
( C, (1
((1
)C, C) where
(2
1
1+1 2 .
1 . Check
)L)
1 )C, (1 + 1
2 )
Hence by axiom 2 ( C, (1
)L) ( C, (1
2 )C, 2 C)
Check
1 2
.
1
2 (0, 1).
Consider:
1 2
1 C
C,
1 + 1 2
1 + 1 2
= ((2 1 )C, (1
)((1
)C, C))
)C1 , C2 ) [note that 9 and its unique by Axiom 5 and the corollary
)u(C1 ) + u(C2 )
X
i
pi i + b =
X
i
pi (ai + b) =
pi u(Ci )
so u(L) can be thought of as expected utility. If we accept axioms 1 to 5 then we should base our preferences
on maximising the expected utility.
u(z)
0 .
0 .
0 [u(0) = 0]
u(z) is increasing in z
u(z) is twice dierentiable
u(z) is concave. u00 (z) 0 because u0 (z) decreases as z increases, which reflects the idea that an extra
10 to a rich person means very little compared with what it means to a poor person
u(z) is linear near z = 0
u(z) is bounded above; after 101000 no one would want any more
[End lecture 25, 10/12/14]
u(z) is very dierent for z < 0. Typical functions used to model u(z) are:
u(z) = a log(1 + bz), a, b > 0 constants and b small (to get linearity near the origin)
z
u(z) = z+
, > 0 constant. represents willingness to take risks: a high love risk; a low risk
averse with this u(z).
u(z) = 1
z ,
A risk-averse person prefers the certainty of small amounts of money. A risk-loving person prefers speculative
gains of large amounts to certain gains of small amounts.
Bayesian Methods
If X, Y are continuous random variables then the conditional P.D.F. of Y given X = x: where:
"
#
fX|Y (x|y)fY (y)
fX,Y (x, y)
fY |X (y|x) =
=R
fX (x)
y fX|Y (x|y)fY (y)dy
If x is fixed, fX (x) = constant and fY |X (y|x) / fX,Y (x, y) = fX|Y (x|y)fY (y).
Example: Y Exp( ),
> 0. X|Y = y Exp(y). We will calculate the conditional distribution of
Y |X. fY |X (y|x) / fX|Y (x|y)fY (y) = ye xy e y / ye (x+ )y , y 2 (0, 1). I.e. given X = x, Y has a
Gamma(2, x + ) distribution.
Bayesian Inference
Suppose we have some observations x1 , . . . , xn , which are realisations of random variables X1 , . . . , Xn that have
a joint P.D.F. depending on an unknown parameter [ may be vector-valued but will be scalar in an exam].
E.g. X1 , . . . , Xn independent but identically distributed N (, 1) random variables where is unknown [this is
called a random sample]. The Bayesian approach to estimating assumes that itself has a P.D.F. before we
collect the data. I.e. we regard as a random variable. Denote this P.D.F. (). We call it a prior P.D.F. for
. It represents our knowledge and beliefs about before we collect the data.
Heres an example form the 2013/14 course: A patient sees a doctor. The patient either has a disease, D, or
not. The doctor must diagnose which it is: = 1 is the event that the patient has D; = 0 is the event that
the patient does not have D. The doctor believes P ( = 1) = , P ( = 0) = 1 .
The doctor now does some tests, i.e. collects data. We will call this data X. We get:
(
P ( = 0|X = x) / fX| (x| = 1)
P ( = 0|X = x) / fX| (x| = 0) (1 )
Updated beliefs about . In general, we use Bayes Theorem as follows:
|X (|x) / fX| (x|)()
x = (x1 , . . . , xn ), X = (X1 , . . . , Xn ), |X (|x) is called the posterior P.D.F. of . fX| (x|) is the joint P.D.F.
of the data given a particular . As a function of , this joint P.D.F. is called the likelihood function, i.e.
posterior P.D.F. of / likelihood prior P.D.F. of .
Examples:
1. X Bin(n, ). n known, unknown. Were told () = 6(1 ), 2 [0, 1]. Using Bayes:
n x
(|x) /
(1 )n x 6(1 ) / x+1 (1 )n x+1
x
2 [0, 1]. Hence the posterior P.D.F. of given X = x, is that of a Beta(x+2, n x+2) random variable.
The prior distribution looks like something resembling a semi-circle: its posterior counterpart is more
peaked - the variance has decreased. Posterior P.D.F. updates our feelings about in light of the data
2. X N (, 02 ), 02 is known, is unknown. We want to estimate . The prior distribution of [we are
told] is () = N (0 , 02 ) where 0 , 02 are both known. Use Bayes:
1 (x )2 ( 0 )2
=) (|x) / exp
+
[= prior likelihood]
2
2
02
0
The first term in the [ ] comes from the likelihood, and the second term from the prior. Think of this
as a function
proportional to another normal P.D.F.
of . Complete thesquare in . Get an 2expression
0 + 02
1
1
x
! |x N m, p where m = p 2 + 2 and p = 2 2
0
Non-Informative Priors
If can only take a finite set of k values we might set () = k1 . If 2 [a, b] then take () =
1
b a,
2 [a, b].
Problem: Any non-linear function of , say g(), is not uniformly distributed over [g(a), g(b)]. Theres a worse
problem if the range of values is infinite.
R
If, for example, () = constant [finite] then ()d is infinite. This is an example of an improper prior as it
doesnt integrate to 1. However, we can still use Bayes Theorem as we often get a proper posterior distribution
for [integrates to 1].
E.g. taking X1 , . . . , Xn a random sample from Exp( ), unknown [ > 0] [random sample means X1 , . . . , Xn
are independently and identically distributed]. We assume ( ) / 1 , an improper prior.
( |x) /
Provided n
Pn
i=1
xi
[by independence] =
n 1
Pn
i=1
xi
[ > 0]
Pn
i=1 xi ).
4 Decision Theory
4.1 Introduction
Decisionmakinginvolvesuncertainty.Firstly,makealist,D,ofallpossibledecisionsoractions.Secondly,make
alistoftheunknowns.Thesearecalledthestatesofnature,.
E.g.doyoutakeanumbrella?Youdontknowifitwillrainornot.={0,1}.=0itrains:=1itdoesnot
rain. For any d 2 D and 2 we assume that we can evaluate the consequences and the utility of this
consequence.Indecisiontheory,wetalkoflosses;letusdefinethefollowinglossfunction:
L(,d)= utilityoftheactiondwhenisthetruestateofnature
n
fromwhichwecanformatableofthestatesofnature{i}m
i=1 againsttheactions/decisions{dj }j =1 withentries
equaltoL(i,dj ).WeassumethatalltheLsarefinite.Itwilllookabitlikethepay-omatrixofatwo-person
zero-sum game. For the rows, the states of nature represent the set of pure strategies for player A and
actionsforthecolumnsofplayerB.WearetryingtominimiseL.However,therearedierences:
Nature is not playing like an intelligent opponent, i.e. it is not trying to maximise L
What are the randomised strategies for both players? For Nature a randomised strategy is a
probability distribution over , i.e. a prior P.D.F. [or P.M.F.] () for
We (i.e. player B) do not need to use any element of surprise on Nature so theres less need for a
randomised strategy for the decision maker
We know (), so we are trying to choose the best d 2 D in the light of Nature playing the randomised
strategy (). I.e. we need to look for a Bayes strategy for the decision maker with respect to ()
We should also note that:
Even if we consider randomised actions [i.e. probability distributions
that:
inf L(, d) = inf L(, )
So we lose nothing by restricting ourselves to pure strategies for player the decision maker
Definitions:
1. r(, d) = E L(, d) =
2. A decision d 2 D is called a Bayes Decision or Bayes Action with respect to () if it minimises r(, d)
i.e. if r(, d ) r(, d) 8d 2 D
3. If d is a Bayes Decision then r(, d ) is called the Bayes Loss of
Example: Let (1 ) = 14 , (2 ) = 34 :
1
2
d1 d2
0 3
5 2
1
r(, d1 ) = 0
+ 5
4
1
r(, d2 ) = 3
+ 2
4
3
=
4
3
=
4
15
4
9
4
()
f (x|)L(, d(x))dx d
x
where the integral in the square bracket is R(, d). The next step assumes we can interchange
we can do as were dealing with realty and expect not to have any pathological example:
Z
Z
= fX (x)
(|x)L(, d(x))d dx
x
using (|x) =
f (x|)()
fX (x)
and
x,
which
Looking at the last integral, once we have observed x we should minimise [. . . ] inside the main integral. If we
were to do this for each x then we would have minimised r(, d). I.e. once we know x we need to minimise for
any particular x:
Z
(|x)L(, d(x))d
So minimising the Bayes Risk minimising the expected posterior loss maximising the expected posterior
utility.
Example: We observe x from X where X N (, 1) [ is unknown]. The prior distribution of is N (0, 2 ).
2
x
2
|x N
,
[See Section 3]
1 + 2 1 + 2
We need a loss function. The most common in inference is square error loss:
L(, d(x)) = (
d(x))2
There was some more material on this covered in the 2013/14 lecture series, which Ill include in an appendix
in section 6.1.
Lemma13.Undersquareerrorloss[andaproperposteriordistributionfor],theBayesDecisionRuleisto
estimatebythemeanoftheposteriordistributionfor.
Proof. The expected posterior loss of a decision rule d(x) is:
Z
(|x)( d(x))2 d
This is a [-shaped quadratic in d(x). We dierentiate with respect to d(x) and set it = 0:
Z
Z
d
2
(|x)( d(x)) d = 2 (|x) ( d(x))d = 0
d{d(x)}
Z
Z
=)
(|x) d = (|x)d(x)d = d(x)
R
The mean of the posterior distribution of . N.B. (|x)d = 1.
Examples:
2
1. In our X N (, 1)
example, with square error loss, the prior is N (0, ). The posterior distribution of
2
2
x
is N 1+
[see Section 3].
2 , 1+ 2
The mean of the posterior distribution for i.e. the Bayes Rule, d , for estimating with respect to ()
2x
is d (x) = 1+
2
2. X Bin(n, ). () = 6(1 ), 2 [0, 1]. Squared erros loss. We have (|x) / x+1 (1 )n x+1 i.e.
the posterior distribution for is Beta(x + 2, n x + 2). The Bayes Rule d for estimating with respect
x+2
to () is the mean of this beat distribution, i.e. d (x) = n+4
.
If instead we were interested in
(x + 2)(n x + 2)
E|X ( |X = x) = var(|X = x) + [E(|X = x)] =
+
(n + 4)2 n + 5)
2
x+2
n+4
from Beta
4.3
Decision Trees
Decision-making is usually sequential; most decision tress have many steps, which we use to represent decision
problems. There are 2 kinds of nodes [vertices]: decision nodes and chance nodes. Quite often these alternate.
Examples:
1. Dinner example: C1 [overfull]; C2 [okay but costly]; C3 [great]; C4 [starve]:
In this diagram, = P (meal), the square represents the decision node and the circles, the chance nodes.
It is drawn from left to right ! time goes from left to right. Ci are consequences [in practice with
attached utilities, sometimes called terminal utilities, which sum to 1]
3. A company is thinking of launching a new product that it has developed. The marketing executive
estimates the profits / losses resulting from dierent market shares. The market share level, , is equal
to one of 2% or 10% with prior probabilities [with the numbers in utility units] of 0.3 and 0.7:
Market Share Level ()
Prior Decisions
Launch
Dont launch
10%
0.7
500
0
2%
0.3
250
0
[Cost for scrap is 0 because the research and development costs have already been sunk.]
E(utility of launch) = 0.7 500
E(utility of scrap) = 0
This is called a prior analysis of the problem. We might do better if we knew more about the market
share [i.e. 10% or 2%]. We would launch if 10% and scarp if 2%.
E(utility under perfect information) = 500 0.7 + 0 0.3 = 350
The dierence [350 275 = 75] is called the expected value of perfect information. It is not worth spending more than this to get information on the unknown market share.
Suppose now that the company is oered a market research proposal at a cost of 10 utility units. The
market researchers will report one of:
Market share will be high
Market share will be low
Unfortunately high and low do not correspond exactly with 10% and 2% as the M.R. company may
get things wrong.
Market Research Report
High Low
Actual Share 10% 0.85 (0.15)
2%
0.25 0.75
(0.15) = P (The market research report says low|the share is actually 10%).
Distribution of this60
document is illegal
Note the terminal utilities on the right hand side. All probabilities are conditional on everything to the
left. Exempli gratia:
i. P (M R says high) = P (high|10%)P (10%) + P (high|2%)P (2%) = 0.85 0.7 + 0.25 0.3 = 0.67
ii. P (10%| MR says high) =
= 0.89 by Bayes
How do we solve this? We fold back the tree from right to left. The purple numbers are obtained from
this folding back. The quadruple purple lines are indicating to us not to take these decisions.
We take expectations at chance nodes: we maximise at decision nodes. In the top right corner, we take
expectations: 0.89(500 10) + 0.11( 250 10).
Repeat this for other chance nodes on the right. Attach these expectations to the relevant chance nodes.
Then, maximise at decision nodes you meet next. For a more algorithmic specification:
Work from right to left
4. The Marriage Problem. When should one get married?! Assume n suitable partners will be presented to
you in a random order, one by one [n is known]. Having seen person j you can rank persons 1, 2, . . . , j.
You then must decide to reject person j [and move on to person j + 1] or accept person j. You cannot
back track; you cannot accept one seen earlier. [If you were able to see all at once you could rank them
all.] If you reach person n you must accept them. pi > pj denotes person i is preferred to person j.
There are no ties.
Terminal utility = P (best person has been selected). We want to maximise this. Consider the case n = 4.
We need to calculate the terminal utilities. For example:
P (p3 is best |p3 > p2 > p1 ) =
1
8
1
6
3
4
If p2 > p1 then accept p2 . Otherwise see p3 and only accept p3 if p3 > p2 > p1 .
1
n k
un k =
max
, un k+1 + (n k + 1)un k+1
n k
n
n (k + 1)
1
1
1
=
+
+ +
n
n (k + 1) n k
n 1
as long as un
k+1
n k
n
holds.
n 1
log
1!1
n (k + 1)
n k
n
1
n k
k
n
+ +
1
n 1
0.368
After the threshold, choose the best so far, id est the first one better than all those seen so far
The M4S11 mastery question will be on n-person cooperative games. The following books may be useful:
Introducing Game Theory and its Applications by Mendelson, Chapman and Hall
Intorduction to Game Theory by Morris, Springer Verlag
The Theory of Games by Wang, Oxford
Were expected to study the following topics:
1. Coalitions
2. Characteristic Functions
3. Imputations
4. The core of an n-person cooperative game
Definition. Let P be the set consisting of all N players. A coalition, S, is a subset of P. The corresponding
counter-coalition to S is simply S c = P\S.
Clearly there are 2N coalitions. We call P the grand coalition and its complement is the empty coalition, ;.
With a coalition S, its natural to think of the game as having two players: S and S c . We can rewrite this
game in bi-matrix form with the tuples representing the sum of the payos to players in the coalition in the
first coordinate and whichs second coordinate is the sum of the payos to the players in S c .
Definition. The maximin value for the coalition, denoted v (S), is called the characteristic function. Its
domain is the set of all coalitions.
Note: Obviously v (P) = the largest total payo which the set of all players can achieve and v(;) = 0.
Theorem (Superaditivity). Let S and T be two disjoint coalitions. Then:
v (S [ T )
v (S) + v (T )
Proof. By the definition of a characteristic function, theres a joint strategy for the members of S such that the
total payo to the members of S is v (S). Similarly, theres a joint strategy for the members of T such that
the total payo to them is v (T ). Since S and T are disjoint, if players in the two coalitions play according
to these strategies, then the total payo to the union is guaranteed to be at least v (S) + v (T ) and hence the
maximin value for the union is this value, i.e. superaditivity holds.
Definition. A game in characteristic function form comprises a set of players P = {P1 , . . . , PN } together with
a function v defined on all subsets of P such that v(;) = 0 and superaditivity holds, i.e. v(S [ T ) v(S) + v(T )
for disjoint subsets S and T of P.
Definition. An N -person game, v, in characteristic function form is said to be inessential if:
v (P) =
N
X
i=1
v ({Pi })
v(P)
N
X
i=1
Pi 2S
v({Pi })
xi
2. Collective Rationality: We have:
N
X
xi = v (P)
i=1
The first condition is an intuitively obvious imposition. If xi were < v ({Pi }) then player Pi would do better
o on their own. For the second condition, consider:
PN
PN
i=1 xi v(P): suppose x occurs, i.e. a coalition with such a profit split occurs. Then, using superaditivity:
N
X
X
X
xi =
xi +
xi = v(S) + v(S c ) v(P)
i=1
Pi 2S
Pi 2S c
Theorem. Let v be an N -person game in characteristic function form. If v is inessential, then it only has
one imputation:
x = (v({P1 }), . . . , v({PN }))
and if v is essential then it has infinitely many imputations.
Proof. Suppose that v is inessential and that x is an imputation. If for some j we had xj > v({Pj }) then
PN
PN
i=1 xi >
i=1 v({Pi }) = v(P), contradicting collective rationality.
Now suppose that v is essential and let:
= v(P)
N
X
i=1
v({Pi }) > 0
then for any N -tuple of positive numbers summing to we have xi = v({Pi })+i , which clearly defines
an imputation. So, with infinitely many choices for , there are infinitely many imputations when a game is
essential.
For an essential game, there are too many imputations. So we need a way to single out the ones which merit
thetitleofasolution. Thefollowingdefinitionattemptstoformalisethedefinitionofoneimputation being
preferred over another.
64
Definition. Let v be a game in characteristic function form and let S be a coalition with imputations x and
y. We say that x dominates y through the coalition S if the following two conditions hold:
1. xi > yi 8Pi 2 S
P
2.
Pi 2S xi v(S)
The second condition merely tells us that such an x is feasible, that the coalition can attain enough payo so
as to distribute the payos as prescribed. Now well meet a solution concept, albeit one thats fundamentally
flawed in that it is sometimes empty!
Definition. Let v be the game in characteristic form. The core of v consists of all imputations which are not
dominated by any other imputation through any coalition.
So if x is in the core, then no group of players has reason to form a coalition and replace x with a dierent
imputation. At first, it looks difficult to decide whether x is in the core, but the following theorem will help
make this easier.
Theorem. Let v be a game in its characteristic function form with N players and let x be an imputation. x
is in the core of v if and only if:
X
xi v(S)
Pi 2S
Proof. Suppose this formula holds for every coalition S. If some other imputation w dominates x through a
coalition S then:
X
X
wi >
xi v(S)
Pi 2S
Pi 2S
Now suppose that x is in the core and suppose that S is a coalition such that:
X
xi < v(S)
Pi 2S
Note that S =
6 P otherwise collective rationality in the definition of an imputation would be violated. Next,
there has to exist a Pj 2 S c such that xj > v({Pj }). If this werent true then by superaditivity:
N
X
i=1
xi < v(S) +
Pi 2S c
xi v(P)
wi = xi + for Pi 2 S
k
w j = xj
wi = xi for all other i
Then w dominates x through S and so the assumption that x is in the core is contradicted.
The next Corollary will give a more convenient form of the above result, whichll allow us to calculate the core
of a game more easily.
Corollary. Let v be a game in characteristic function form with N payers and let x be an N -tuple of numbers.
Then x is an imputation in the core if and only if the following two conditions hold:
PN
1.
i=1 xi = v(P)
P
2.
v(S) for every coalition S
Pi 2S xi
Proof. Certainly an imputation in the core satisfies the two above conditions. Now let x satisfy both conditions. The second condition applied to one-player coalitions shows that individual rationality holds. The first
condition is collective rationality, and so x is an imputation. And its certainly in the core.
Definition. Let v be a game in characteristic function form. We say that v is constant-sum if, for every
coalition S, we have:
v(S)+v(S c )=v(P)
Further, its zero-sum if its constant-sum and v(P)=0.
Theconceptsofzero-sumandconstant-sumarenotthesameinnormalandcharacteristicfunctionforms;the
twoarentequivalent. Itspossibleforagamewhichisnotconstant-suminitsnormalformtobeconstant-sum
in its characteristic function form.
Theorem. If an N -person game is zero-sum in its normal form, then it is also zero-sum in its characteristic
form.
Proof. Supposethatthegamewerentzero-suminitscharacteristicfunctionform,i.e.forsomecoalitionS we
have v(S)6= v(S c ). Then, if the players in the coalition S and the counter coalition S c adopt the strategies
that give them such the maximin payos v(S) and v(S c ) in the normal form game then the total payo to all
players would P
be v(S) + v(S c ) 6= 0. But this is a contradiction, since the game is zero-sum in the normal form
we know that N
i=1 i (x1 , . . . , xN ) = 0 for all strategies x1 , . . . , xN for players P1 , . . . , PN .
Theorem. If an N -person game is constant-sum in its normal form, then it is also constant-sum in its
characteristic form.
P
!
Proof. Let c be the constant value of the normal-form game, , i.e. N
i=1 i (x1 , . . . , xN ) = c for all choices of
!
strategies x1 , . . . , xN for players P1 , . . . , PN respectively. We define a new game by subtracting c/N from
!
every payo in . Then (x1 , . . . , xN ) = (x1 , . . . , xN ) c/N for every choice of i and for all choices of
!
!
strategies. Then is zero-sum, and thus the characteristic function form of , u, is zero-sum. But its easy
!
to see that the characteristic function v of is related to u by the formula:
v(S) = (S) + kc/N
where k is the number of players in the coalition S. So clearly v is constant sum.
Theorem. If v is both essential and constant-sum, then its core is empty.
Proof. Suppose v has players {Pi }N
i=1 . Well prove this by showing that if v is constant sum and theres an
imputation in its core then v must be inessential. We know, for any player Pj , we have xj
v({Pj }) by
individual rationality. Since x is in the core, we also have:
X
xi v({Pj }c )
i6=j
N
X
xi
i=1
by the constant sum property. Hence the inequality is actually an equality and so xj = v({Pj }). Since it holds
for every j, the game is inessential.
Definition. A game v in characteristic function is form is called simple if all of the below hold:
v(S) is either 0 or 1 for every coalition S
v(P) = 1
v({Pi }) = 0 8Pi 2 P
In a simple game, a coalition S with v(S) = 1 is called a winning coalition and a coalition with v(S) = 0 is
called a losing one.
L(, d1 ) = A + B(1
L(, d2 ) = 0
= (B
r cn
(b) Calculation of the Bayes Decision Rule. Suppose now that we have a prior distribution for and
that it is Beta(, ) where , are known. So:
(, x) / x (1
)n
x 1
(1
x + ).
We want to choose the action [d1 or d2 ] to minimise the expected posterior loss, so we calculate:
E|X L(, d1 ) =
=
Where
+x
+n+
(A + B)E|X (|X = x) + B
(A + B)
+x
+n+
+B
+x
So, we drill if (A + B) +n+
B, i.e. when x
(n+ )B A
.
A+B
)2
Example to show risk function [assume squared error loss]: R(, dc ) = EX| L(, dc (X)) where dc (x) = cx,
P
c = constant and x = n1 n1 xi , X similarly.
R(, dc ) = EX| (cX
E X = , var X =
1
n
)2 = var(cX
) + (mean(cX
))2 =
c2
+ 2 (c
n
1)2
In the range [A, B], d 1 is better than d1 . but we dont know ! Note that if c > 1 then:
2
R(, d1 ) =
1
c2
<
+ 2 (c
n
n
1)2 as above
(|x)|
d(x)|d =
(d
)(|x)d +
d)(|x)d
Now let us dierentiate with respect to d [N.B. the limit of the integrals involve d itself]:
"
(|x)d =
1
@
We have used:
@a
(|x)d Exercise
g(a)
f (a, x)dx =
1
g(a)
1
@f
dx + g 0 (a)f (a, g(a))
@a
6.2
Mid-Term 1 - 13/11/14
3 6
1 0
(a1 , b1 ) is a PSSP because 3 < 6 [row 1] and 3 > 1 [column 1]. The game thus has value V = 3.
Part II
1 0 2
G=0 2 1
2 1 0
Consider =
1 1 1
3, 3, 3
is minimax. V = 1.
Part III
1 4 6
G=2 0 3
1 2 8
b3 is inadmissible [compare it with b1 or b2 ]. So we delete b3 and after doing so we see that a3 is now inadmissible
[compare it with a1 ]. So we then delete a3 . Were left with:
1 4
2 0
A has ES S =
2 3
5, 5
and B has ES
4 1
5, 5
Hence S is maximin in this sub-game and S is minimax. For the whole game, =
are maximin and minimax respectively. V = 85 .
2 3
5, 5, 0
and
Part IV
x, y 2 R>0 :
G=
(a1 , b1 ) is a PSSP if
When
1
2
<
x
y
x
y
x 2x
2y y
x
y
2. In this case V = y.
|y
|y 2x|
2x| + |2y
,
x| |y
y
x
,
x+y x+y
|2y x|
2x| + |2y
x|
is an ES for A
2x y 2y x
,
x+y x+y
is an ES for B
3xy
x+y .
is minimax. V =
xy
x+y
2xy
x+y
4 1
5, 5, 0
Part V
G=
1 x x2
x x 2 x3
1. x > 0, x 2 R:
0 x 1 =) (a1 , b3 ) is a PSSP since x2 1, x2 x and x2
x3 . V = x2
2 [considering subgames]:
G=
1
2
2
4
4
8
9 any obvious inadmissible strategies. The idea here is to solve the 3 sub-games to see if any can be
2 1
3, 3
2
4
are
If we had
to disregard b1 instead
of b3 we would have obtained an alternate solution to the whole
chosen
2 1
2 1
game = 3 , 3 , = 0, 3 , 3 .
The other 2 2 sub-game has a PSSP at (a1 , b1 ), but Lemma 3 shows [very easily] that this cannot be
2 1
extended to a solution to the whole game. In fact, any =
) 0, 23 , 13 with = 23 , 13
3 , 3 , 0 + (1
for 2 [0, 1] gives a solution
6.3
Mid-Term 2 - 11/12/14
Part I
a. Note that (3, 8) lies on the line joining (2, 9) and (7, 4). (x + y = 11). S, the risk set, is the triangle with
vertices (2, 9), (7, 4) and (8, 2).
b. The lines segments (2, 9) ! (7, 4) and (7, 4) ! (8, 2) are the admissible strategies for A.
c. sM = (7, 4) + (1
)(3, 8) = (4 + 3, 8 4 ) lies on y = x if = 58 . So = 58 a3 + 38 a4 is maximin for
= 1 , 1 is minimax and V = 11 . An alternative
A. The line joining (3, 8) and (7, 4) is x2 + y2 = 11
2 so
2 2
2
7
3
maximin strategy for A is = 10
a3 + 10
a1 [or any strategy of the form p + (1 p) with p 2 [0, 1]]
d. The maximum of the blow four is
31
5 .
2
5
2
g(a2 , ) =
5
2
g(a3 , ) =
5
2
g(a4 , ) =
5
g(a1 , ) =
3
5
3
8+
5
3
7+
5
3
3+
5
2+
is
31
5 :
31
5
22
2=
5
26
4=
5
30
8=
3
9=
Part II
Part III
3 > 2 [column 1] and 6 > 5 [row 3] for A and B respectively.
(a1 , b2 ) is an equilibrium pair because 3 > 1 and 3 > 2 [column 2] and 2 = 2 [row 1] for A and B respectively.
As pay-os are:
1 3 2
2 1 3
3 2 1
An equaliser strategy for B using these pay-os is
= 13 b1 + 13 b2 + 13 b3 .
Bs pay-os are:
2 2 2
3 5 1
6 5 5
An equaliser strategy for A using these pay-os is = 1a1 as g(a1 , bj ) = 2 for j = 1, 2, 3.
Since and
is an equilibrium pair.
6.4
The answers thatre to follow were all written by Lynda White, and have been copied into this document.
Question 19
The pairs of pure joint strategies form a rectangle and the pay-o set consists of two triangles:
(a1 , b2 ) and (a2 , b1 ) are both admissible and in equilibrium, but they are not interchangeable so the game is
not solvable in the strict sense.
Question 20
The pay-o set is the whole of the convex set determined by the four points representing pairs of pure strategies.
The pairs (ai , b2 ) [i = 1, 2] are both equilibrium pairs and so, therefore, is any pair of the form (, b2 ). Show
that there are no other equilibrium pairs by considering x = 2pq q + 4, y = p q + 2.
The game is Nash solvable. The security levels for A and B are 3 and 2 respectively and the Shapley solution
is (a1 , b2 ) as (a1 , b2 ) is the only admissible pair of strategies. Draw a diagram!
Question 21
a.
x+y
2
p=
x
2
q=
10 x 3y
20 4x 4y
p) and
q=
= (q, 1
q) then
10 3x y
20 4x 4y
Hence either p = 1, q 34 or q 34 , p = 0 or q = 34 .
If p = 1: 4pq p 2q + 3 gB (, b1 ) =) 2q + 2 4 =) q 1 > 34
If p = 0: 4pq p 2q + 3 gB (, b2 ) =) 3 2q 3 =) q 0 < 34
Hence q = 34 and gB (, ) = 2p + 32 . But gB (, b1 ) = 3p + 1 and gB (, b2 ) = 3 p =) p = 12 . That
is p = 12 , q = 34 is the only equilibrium pair and the game is Nash solvable
ii. If the game is played cooperatively sA = sB = 52 . The negotiation set is the line joining 72 , 52 to
5 7
2 , 2 and the Shapley solution is given by the point (3, 3) by symmetry. This corresponds to = a1
and = 12 b1 + 12 b2
b.
i. Non-cooperative game: since each edge of the convex hull of the four pure strategy points can be
achieved by suitable p and q, the pay-o set is the whole of the convex hull [draw a diagram]. The
jointly admissible strategies are those corresponding to the line segment joining (2, 5) and (3, 0). The
point (2, 1) represents a pair of pure strategies in equilibrium. Using a method similar to that in 26.
show that this is the only equilibrium pair. The game is Nash solvable
ii. Cooperative game: show that sA = 2, sB = 1. The negotiation set N is the line segment joining (2, 5)
to 11
5x] and the Shapley solution is (s1 , s2 ) where (s1 2)(s2 1)
4 , 1 [whichs equation is y = 15
is maximum over N . Dierentiation gives s1 = 12
5 , s2 = 3, which is in fact in N . The Shapley
2
3
solution is = 5 a1 + 5 a2 , = b2
Question 22
Draw a diagram. The payo set is a square. The admissible strategy pairs correspond to points on the line
segment joining (a b, a + b) to (a, b). The security levels for A and B are respectively 0 and b. The Shapley
2
2
solution maximises s1 (s2 b) such that as1 +bs2 = a2 +b2 . This gives s1 = a2 and s2 = a +2b
by dierentiation.
2b
a
a
a
a2 +2b2
If 2 > a b i.e. if b > 2 the Shapley solution is given by x = 2 and y = 2b , which is in the negotiation
set. Otherwise the Shapley solution is at (a b, ab ).
Question 23
L1 > L2 =) u(5000) > 0.1u(25000) + 0.89u(5000) + 0.01u(0) and L4 > L3 =) 0.1u(25000) + 0.9u(0) >
0.11u(5000) + 0.89u(0).
Rearrange each of these to give two contradictory inequalities. Many people would say L1 > L2 and L4 > L3 ,
but if you believe the Lottery axioms these are contradictory.
Question 24
Plot a graph! Note that u(z) and u0 (z) are both continuous at z = 1 .
h
i
u0 (z) = 2
2 2 z 0 0 z 1 and zero otherwise. Hence u(z) is a non-decreasing function of z reflecting
h
i
the fact that people prefer more money to less. u00 (z) = 2 2 0 z 1 and zero otherwise. So u(z) is a
concave function of z, reflecting the fact that people tend to avoid taking risks.
Also u(z) ! 1 as z ! 1, which reflects the fact that for most people there is an upper limit to the amount of
money they want. High values of correspond to high risk aversion.
u(L1 ) = 0.4(20
100 2 ) + 0.4(10
25 2 ) and u(L2 ) = 0.1(20
2
From this we deduce that L1 > L2 i < 35
.
100
2)
+ 0.9(10
25
2 ),
since
> 10.
When
= 0.01, 1 = 100 and 10010 < 200, 50 < 5 100. Hence u(L1 ) = 0.4u(10) + 0.4u(5) =
(0.4 1) + 0.4(0.1 0.00252 ) and u(L2 ) = 0.1u(10) + 0.9u(5) = (0.1 1) + 0.9(0.1 0.00252 ).
2
Question 25
1. u(L) = (p 0.67) + ((1
p) 0.59) + ((1
Question 26
Let X be the number of people out of the 5 who win the lottery. Then X Bin 5, 23 . The reward to each
1
person is
X)) = 4000X 10000. The expected utility for each person is therefore
Pthen 5 (10000X 10000(5
1
j = 0.5 u(4000j)P (X = j) = 35 {u(0) + 10u(4000) + 40u(8000) + 80u(12000) + 80u(16000) + 32u(20000)}.
Apart from u(0) and u(20000), we do not know any of these utilities exactly, but if we assume that the utility
function is concave we have:
u(4000)
0.4u(2500) + 0.6u(5000)
u(8000)
0.4u(5000) + 0.6u(10000)
u(12000)
0.6u(10000) + 0.4u(15000)
u(16000)
0.8u(15000) + 0.2u(20000)
Hence E(utility) 0.89, which is greater than the utility [= 0.85] of each persons current assets, so it is worth
each person taking part in the lottery.
Question 27
a. For xi = m, m + 1 . . . we have P (Xi = xi ) =
xi 1
m 2
(|x) / mn+
m (1
(1
)xi
P
)
P
Hence the posterior distribution of is Beta(mn + , xi
mean and variance.
m.
xi mn+
By Bayes:
1
a
b. (|x) / n e Pxi a 1 e a/b = a+n 1 e ( xi + b ) . Hence the posterior distribution of is
Gamma a + n, xi + ab and we can then find the posterior mean and variance.
Question 28
1 bab
for > a and 0 < x1 , . . . , xn <
n b+1
These two conditions on can be expressed as > max{a, max{xi }}. The result follows immediately.
(|x) /
Question 29
(|x) / exp
1X
(yi
2
xi )2
Now complete the square in and show that the posterior distribution of is normal with mean
variance
P1 2 .
xi
xi y i
x2i
and
Question 30
Let M be the random variable representing the smaller amount, m, and let X and Y be the amounts in your
envelope and the other envelope respectively. For any m, we have:
P (X = m|M = m) = P (X = 2m|M = m) =
since the two envelopes are allocated at random.
Distribution of this74
document is illegal
1
2
P (X = x|M = x)(x)
P (X = x|M = x)(x) + P X = x|M =
P (M = x|X = x) =
( x2 )
(x)+ ( x2 )
x
2
x
2
(x)
(x) +
x
2
x
x
P Y = |X = x + 2xP (Y = 2x|X = x)
2
2
x x2 + 4(x)
x
x
= P M = |X = x + 2xP (M = x|X = x) =
2
2
2 (x) + x2
E(reward|swap) = E(Y |X = x) =
x
2
Question 31
)2 = c2 var(X) + (c
1)2 2 = c2 + (c
b. Under the square error loss the Bayes Rule for estimating is the posterior mean. The posterior density
of is (|x) / e x 1 e .
Hence the posterior distribution of is Gamma( + x, 1 + ) and E(|x) =
is dB (x) = +x
1+ .
+X
1+
1
+(
(1 + )2
= varX| +X
+
1+
)2 =
1
+
(1 + )2
+
1+
+x
1+ .
(1+ )2
var() =
( )2
.
(1+ )2
Hence the
(1 + )
Question 32
[This material wasnt covered in lectures.]
Question 33
R(, dc ) = EX| (cX )2 = c2 var(X) + [(c
where = 1 . We therefore get:
R(, dc ) = c2 (
1) + [(c
1)]2 = [c2 + (c
1)2 ]2
= (
1),
c2
Now:
R(, dc ) = R(, d1 ) = (c2
1)(
1) + [(c
1)(
1) > 0
When has a prior P.D.F proportional to 12 you can find (|x) = x(x+1)
1 1
by Bayes Theorem
3
[and some integration to get the normalising constant]. You can then find the Bayes Rule by calculating the
expected posterior mean of :
Z 1
x(x + 1)
1 x 1
E(|x) =
1
d
2
1
which [after an obvious substitution] gives E(|x) = x + 1.
However, it is easier to note that the prior distribution of is UR nif orm[0, 1] and so the posterior P.D.F. of
1
is x(x + 1) (1
)x 1 . We then calculate E(|x) = E 1 |x = 0 x(x + 1)(1
)x 1 d = x + 1.
Question 34
[This material wasnt covered in lectures.]
Question 35
P
P
2
R(, dc ) = EX| c Xi2 = var c Xi2 + [(cn 1)]2 = [2c2 n + (cn 1)2 ]2 , which is minimised with
1
respect to c when c = c0 = n+2
.
P
m 1 exp(
Let = 1 . Then ( |x) / n/2 exp 2
x2i
). So the posterior distribution of is:
n
1X 2
m + , +
xi
2
2
x2i
2m+n 2
=c
x2i with c =
P
2+ x2i
2m+n 2
1
2m+n 2
[exercise].
6=
1
n+2
Question 36
If 12 < < 32 we find, on folding back, that choosing box A gives expected utility 2 + max 2 , 3 43 and that
choose box B gives expected utility +2
4 . Hence the utility of box A is less than that of box B if and only if:
3 3
1
max
,
<
2
4
2 4
By plotting the three relevant lines, we see that this will always be the case and we therefore choose box B. If
> 23 , we find that the utilities of choosing the two boxes are both and so it doesnt matter which box we
choose.
1
1+
+
=
2
4
4
1
2
1+
4
2
1+
Question 37
Let F denote a faulty machine, R denote the number of alarms that ring and H denote overhaul.
With perfect information, the expected utility of H is 0.8800+0.2700=780 and the expected utility of
H is 0.81000+0700=800.
Hence the expected value of perfect information is 940 max{780,800} = 140. Since each scanning device
costs 50, we can take n=0,1,2.
1. For n = 1 show that P (R = 0) = P (R = 1) = 0.5
2. For n = 2 show that P (R = 0) = 0.29, P (R = 1) = 0.42, P (R = 2) = 0.29
Hence calculate the conditional probabilities of F and F given the possible values of R for the cases n = 1 and
n = 2.
Taking into account the costs of the scanning devices, the optimal decision is n = 1 [utility = 812]. One should
use one scanning device and then overhaul if the alarm sounds. [See the below diagram, shamelessly copied
form Lyndas notes.]
Question 38
Let M = minor fault; S = serious fault; T M = examination indicates M ; TS = examination indicates S. We
have:
P (M ) = 0.9
P (T M ) = 0.8
P (M |T M ) = 36/37
P (M |T S) = 9/13
P (S) = 0.1
P (T S) = 13/15
P (S|T M ) = 1/37
P (S|T S) = 4/13
Folding back the decision tree, we see that the detailed examination should be selected if
7
c < 50
i.e. if the cost of the examination is less than 140, 000.
13
50
+c <
4
10
i.e. if
n = 1:
P (R = 0) = P (R = 0|F )P (F ) + P (R = 0)|F )P (F )
P (F |R = 0) =
P (R = 0|F )P (F )
P (R = 0)
&c.
n = 2:
P (R = 0) = P (R = 0|F )P (F ) + P (R = 0|F )P (F )
P (R = 1) = P (R = 1|F )P (F ) + P (R = 1|F )P (F )