Sunteți pe pagina 1din 3

Math A77 : Homework 2

[due Mon Sep 22]

1. Choose an m {150, 151, . . . , 5000} and an a {2, 3, . . . , m 1} so that the linear congruential generator
(LCG)
Rn = (aRn1 ) mod m
has full period. [Matlab has a function isprime that you might find useful.] Report your choice of a and m.
For the seed R0 = 1, generate R1 , . . . , Rm1 and define Nk to be the number of the Rj s that are equal to k.
More formally,
Nk = #{j : Rj = k; j = 1, . . . , m 1}
What should Nk be for each k = 1, . . . , m 1? Attach a plot of Nk versus k for k = 1, . . . , m 1.
Different PRNGs in Matlab
The latest version of Matlab has many different methods for creating pseudo-random uniform(0, 1) sequences.
You can create multiple sequences (often called streams) using different seeds or different methods and
compare them. Heres how to create 3 different streams using 3 different methods.
method1 = RandStream(mcg16807);
method2 = RandStream(swb2712);
method3 = RandStream(mt19937ar);
Method 1 was Matlabs default before 1995. It is a LCG with m = 231 1 and a = 75 . It is too simple for
modern applications. Method 2 was Matlabs default from 1995 to 2006. It is not a LCG and it is awful. I once
spent many hours debugging before I realized that the problem was with this PRNG. Method 3 is the Mersenne
Twister, which is currently a very popular PRNG and which is Matlabs default since 2007. Assuming you
have a recent version of Matlab, then if you just use rand in Matlab, then you will get method 3 without
having to specify anything. To generate an r c array of pseudo-random numbers using one of these streams
and store it in the matrix U , type
U = rand(method?,r,c);
where you replace the ? with one of the numbers 1,2, or 3 to get the desired method. [On later homework
sets, when we are finished exploring PRNGs and just want the best PRNG, you can use U = rand(r,c);
without specifying a method. This uses the default method 3, in this case which seems to be the best of
Matlabs choices.]
Demo
Convince yourself that the PRNGs are deterministic. For example, create two copies of method 3, say
method3a = RandStream(mt19937ar); and method3b = RandStream(mt19937ar);
Now rand(method3a) and rand(method3b) should behave identically. Call them repeatedly to verify.
Four different PRNGs
A. The LCG that you created in problem 1 [after converting to (0, 1) by dividing each Rn by m].
B. Matlab method 1 above.
C. Matlab method 2 above.
D. Matlab method 3 above.

The law of large numbers


Recall that the law of large numbers (LLN) states that
n

1X
h(Xk ) E[h(X)]
n
k=1

as n for any iid sequence X, X1 , X2 , . . . for which E[h(X)] exists. So if a pseudo-random sequence
U1 , U2 , . . . , Un behaves like an iid Uniform(0, 1) sequence, then we better have
n

1X
h(Uk ) E[h(U )]
n
k=1

for large n where U is Uniform(0, 1).


2. For
each of the four PRNGs (AD)
defined above, let U1 , . . . , U10000 be a sequence from the PRNG. Plot
1 Pn
1 Pn
5
k=1 Uk versus n and also n
k=1 Uk versus n for n = 1, . . . , 10000. What would you expect these sums
n
to converge to as n for a truly iid sequence? Interpret your results.1
Correlation between successive pairs
For a pair of random variables (X, Y ) the correlation coefficient () is defined as
=

E[(X X )(Y Y )]
X Y

If X and Y are independent, then the numerator factors into a product of expectations, each of which is
zero, so = 0. If (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) is a sequence of pairs, then we can compute the sample
correlation coefficient (b
) via
Pn

Y )
k=1 (Xk X)(Y
qP k
q
b = P
n
n
2
2
k=1 (Xk X)
k=1 (Yk Y )
= 1 Pn Xk and Y = 1 Pn Yk . If X, X1 , X2 , . . . are iid and Y, Y1 , Y2 , . . . are iid, then (again
where X
k=1
k=1
n
n
by the LLN) we have b as n . So if a pseudo-random sequence U1 , U2 , . . . behaves like an iid
sequence, then the successive pairs (U1 , U2 ), (U3 , U4 ), . . . , (U2n1 , U2n ) should be behave like independent
pairs, which means the sample correlation coefficient between U2k1 and U2k should be close to zero for large
n.
3. For each of the four PRNGs (AD) defined above, let U1 , . . . , U20000 be a sequence from the PRNG. Make
a 2D-scatterplot of the first 1000 successive pairs: plot(u(1:2:1999),u(2:2:2000),.) Do they
look independent? Compute the sample correlation coefficient between successive pairs. What would you
expect the sample correlation coefficient to converge to as n for a truly iid sequence? Interpret your
results.

You may find it interesting to look at the x-axis using a log scale. The Matlab command semilogx works exactly like plot except
that it uses a log scale for the x-axis. Also, you have to make a lot of plots in this HW. Heres how to put many on the same page. The Matlab
command subplot(r,c,k) will create an r c grid of plots on the current figure and bring the kth plot (ordered left to right and then
top to bottom) into focus. Any future plotting commands will use this plot until you type subplot(r,c,j) which switches to the jth plot.
Typing orient tall or orient landscape before printing may be helpful, too.

Higher dimensional uniformity


If we group a pseudo-random sequence U1 , U2 , . . . , Und into disjoint blocks of length d, say, (U1 , . . . , Ud ),
(Ud+1 , . . . , U2d ), . . . , (U(n1)d+1 , . . . , Und ) then the d-dimensional blocks should be uniformly distributed
over the d-dimensional unit cube (0, 1)d . It is very challenging to engineer a PRNG that remains uniform in
high dimensions (large d). The Mersenne Twister is supposedly uniform up to about 600 dimensions.
4. Consider the following subset of the 30-dimensional unit cube:


E = (u1 , . . . , u30 ) (0, 1)30 : u1 (0, 1/4), u16 (1/4, 1/2), u28 (1/2, 1)

(a) Suppose V1 , . . . , V30 are iid uniform(0, 1). What is P (V1 , . . . , V30 ) E ? Notice that the event
(V1 , . . . , V30 ) E is just the event that V1 < 1/4 and 1/4 < V16 < 1/2 and 1/2 < V28 , simultaneously.
(b) For each of the four PRNGs (AD) defined above, generate a sequence of 10000 30-dimensional uniform
pseudo-random vectors. More specifically, create U1 , . . . , U300000 using the PRNG and group them into
disjoint consecutive blocks of length 30: (U1 , . . . , U30 ), (U31 , . . . , U60 ), . . . , (U30000029 , . . . , U300000 ).
An easy way to do this in Matlab is u = rand(method,30,10000); so that each column is a 30-D
pseudo-random vector. What fraction of your 10000 pseudo-random vectors are in the set E? Interpret
your results. [You should see something very weird with PRNG C (Matlab method 2). If not, something
is wrong.]
(c) For each PRNG in part (b), let T be the number of vectors (out of 10000) that are in the set E. If the
PRNG was truly iid uniform, what would be the distribution of T ? Using T as a test statistic, at what
level of significance do you reject the null hypothesis that the PRNG generates iid uniform(0, 1) random
variables?
5. (Extra credit) Play around and see if you can break the Mersenne Twister. More specifically, construct a statistical hypothesis test that reliably rejects the null hypothesis that PRNG D (Matlab method 3) is iid uniform.