Part 6: Random Number Generation

Fall 2011
Part 6: Random Number

Generation
CSC 446/546
Agenda
1. Properties of Random Numbers
2. Generation of Pseudo-Random Numbers (PRN)
3. Techniques for Generating Random Numbers
4. Tests for Random Numbers
Fall 2011
5. Caution
CSC 446/546

(1)
A sequence of random numbers R1, R2, , must have two important statistical
properties:
Uniformity
Independence.
Random Number, Ri, must be independently drawn from a uniform
distribution with pdf:
pdf for random
numbers
1, 0 x 1
f ( x)
0, otherwise
2
x
E ( R ) xdx
0
2
1
V ( R ) x dx E R
1
Fall 2011
CSC 446/546
3 1
1

2
0
1 1 1

3 4 12

(2)
Uniformity: If the interval [0,1] is divided into n
classes, or subintervals of equal length, the expected
number of observations in each interval is N/n, where
N is the total number of observations
Fall 2011
Independence: The probability of observing a value in

a particular interval is independent of the previous
value drawn
CSC 446/546
2. Generation of Pseudo-Random
Numbers (PRN) (1)
Pseudo, because generating numbers using a known
method removes the potential for true randomness.
If the method is known, the set of random numbers can be
replicated!!
Goal: To produce a sequence of numbers in [0,1] that
Fall 2011
simulates, or imitates, the ideal properties of random

numbers (RN) - uniform distribution and independence.
CSC 446/546
Numbers (PRN) (2)
Problems that occur in generation of pseudo-random
numbers (PRN)
Generated numbers might not be uniformly distributed
Generated numbers might be discrete-valued instead of
continuous-valued
Mean of the generated numbers might be too low or too
high
Variance of the generated numbers might be too low or too
high
Fall 2011
There might be dependence (i.e., correlation)
CSC 446/546
Numbers (PRN) (3)
Departure from uniformity and independence for a
particular generation scheme can be tested.
Fall 2011
If such departures are detected, the generation scheme

should be dropped in favor of an acceptable one.
CSC 446/546
Numbers (PRN) (4)
Important considerations in RN routines:
The routine should be fast. Individual computations are
inexpensive, but a simulation may require many millions of
random numbers
Portable to different computers ideally to different
programming languages. This ensures the program produces
same results
Have sufficiently long cycle. The cycle length, or period
represents the length of random number sequence before
previous numbers begin to repeat in an earlier order.
Replicable. Given the starting point, it should be possible to
generate the same set of random numbers, completely
independent of the system that is being simulated
Fall 2011
Closely approximate the ideal statistical properties of

uniformity and independence.
CSC 446/546
3. Techniques for Generating Random

Numbers
3.1 Linear Congruential Method (LCM).
Most widely used technique for generating random numbers
3.2 Combined Linear Congruential Generators (CLCG).
Extension to yield longer period (or cycle)
Fall 2011
3.3 Random-Number Streams.
CSC 446/546

Numbers (1): Linear Congruential Method
(1)
To produce a sequence of integers, X , X , between 0 and m-1 by
1
following a recursive relationship:
X i 1 (aX i c) mod m, i 0,1,2,...

The
multiplier
The
increment
The
modulus
X0 is called the seed

The selection of the values for a, c, m, and X0 drastically affects
the statistical properties and the cycle length.
If c 0 then it is called mixed congruential method
Fall 2011
When c=0 it is called multiplicative congruential method

CSC 446/546
10

(2)
The random integers are being generated in the
range [0,m-1], and to convert the integers to
random numbers:
Fall 2011
Xi
Ri , i 1,2,...
m
CSC 446/546
11

(3)
EXAMPLE: Use X0 = 27, a = 17, c = 43, and m = 100.

The Xi and Ri values are:
X1 = (17*27+43) mod 100 = 502 mod 100 = 2,
R1 = 0.02;
X2 = (17*2+43) mod 100 = 77 mod 100 =77,
R2 = 0.77;
X3 = (17*77+43) mod 100 = 1352 mod 100 = 52 R3 = 0.52;
Notice that the numbers generated assume values only from the set
I = {0,1/m,2/m,.., (m-1)/m} because each Xi is an integer in
the set {0,1,2,.,m-1}
Fall 2011
Thus each Ri is discrete on I, instead of continuous on interval [0,1]

CSC 446/546
12

(4)
Maximum Density
Such that the values assumed by Ri, i = 1,2,, leave no large
gaps on [0,1]
Problem: Instead of continuous, each Ri is discrete
Solution: a very large integer for modulus m (e.g., 231-1, 248)
Maximum Period
To achieve maximum density and avoid cycling.
Achieved by: proper choice of a, c, m, and X0.
Most digital computers use a binary representation of numbers
Fall 2011
Speed and efficiency are aided by a modulus, m, to be (or close

to) a power of 2.
CSC 446/546
13

(5)
Maximum Period or Cycle Length:
For m a power of 2, say m=2b, and c0, the longest possible period is
P=m=2b, which is achieved when c is relatively prime to m
(greatest common divisor of c and m is 1) and a=1+4k, where k is
an integer
For m a power of 2, say m=2b, and c=0, the longest possible period is
P=m/4=2b-2, which is achieved if the seed X0 is odd and if the
multiplier a is given by a=3+8k or a=5+8k for some k=0,1,.
Fall 2011
For m a prime number and c=0, the longest possible period is P=m1, which is achieved whenever the multiplier a has the property
that the smallest integer k such that ak-1 is divisible by m is
k=m-1
CSC 446/546
14

(6)
Example: Using the multiplicative congruential method, find the period of the generator for
a=13, m=26=64 and X0=1,2,3 and 4
10
11
12
13
14
15
16
Xi
13
41
21
17
29
57
37
33
45
53
49
61
25
Xi
26
18
42
34
58
50
10
Xi
39
59
63
51
23
43
47
35
27
31
19
55
11
15
Xi
52
36
20
m=64, c=0; Maximal period P=m/4 = 16 is achieved by using odd seeds X0=1 and X0=3 (a=13
is of the form 5+8k with k=1)
With X0=1, the generated sequence {1,5,9,13,,53,57,61} has large gaps
Fall 2011
Not a viable generator !! Density insufficient, period too short

CSC 446/546
15

(7)
Example: Speed and efficiency in using the generator on
a digital computer is also a factor
Speed and efficiency are aided by using a modulus m

either a power of 2 (=2b)or close to it
Fall 2011
After the ordinary arithmetic yields a value of aXi+c, Xi+1

can be obtained by dropping the leftmost binary digits
and then using only the b rightmost digits
CSC 446/546
16

(8)
Example: c=0; a=7 =16807; m=2 -1=2,147,483,647 (prime #)
5
31
Period P=m-1 (well over 2 billion)

Assume X0=123,457
X1=75(123457)mod(231-1)=2,074,941,799
R1=X1/231=0.9662
X2=75(2,074,941,799) mod(231-1)=559,872,160
R2=X2/231=0.2607
X3=75(559,872,160) mod(231-1)=1,645,535,613
R3=X3/231=0.7662
.
Fall 2011
Note that the routine divides by m+1 instead of m. Effect is negligible for such
large values of m.
CSC 446/546
17

Numbers (2): Combined Linear Congruential
Generators (1)
With increased computing power, the complexity of
simulated systems is increasing, requiring longer
period generator.
Examples: 1) highly reliable system simulation

requiring hundreds of thousands of elementary
events to observe a single failure event;
2) A computer network with large number of nodes,
producing many packets
Fall 2011
Approach: Combine two or more multiplicative

congruential generators in such a way to produce a
generator with good statistical properties
CSC 446/546
18

Numbers (2): Combined Linear
Congruential
Generators (2)
LEcuyer suggests how this can be done:
If Wi,1, Wi,2,.,Wi,k are any independent, discrete
valued random variables (not necessarily identically
distributed)
If one of them, say Wi,1 is uniformly distributed on
the integers from 0 to m1-2, then
Wi
W
j 1
i, j
mod m1 1
Fall 2011
is uniformly distributed on the integers from 0 to m1-2

CSC 446/546
19

Congruential Generators (3)
Let Xi,1, Xi,2, , Xi,k, be the ith output from k different
multiplicative congruential generators.
The jth generator:
Fall 2011
Has prime modulus mj and multiplier aj and

period is mj-1
Produced integers Xi,j is approx ~ Uniform
on integers in [1, mj-1]
Wi,j = Xi,j -1 is approx ~ Uniform on integers
in [0, mj-2]
CSC 446/546
20

Numbers (2): Combined Linear Congruential
Generators (4)
Suggested form:
j 1
(
1
)
X i , j mod m1 1
j 1
Fall 2011
X i
Xi
m ,
Hence, Ri 1
m 1
1 ,
m1
Xi 0
Xi 0
The maximum
(mpossible
1)...(mfor
1) a
1 1)( m2 period
k such
generatorPis:
2 k 1
CSC 446/546
21

Congruential Generators (5)
Example: For 32-bit computers, LEcuyer [1988] suggests combining k = 2

generators with m1 = 2,147,483,563, a1 = 40,014, m2 = 2,147,483,399 and
a2 = 20,692. The algorithm becomes:
Step 1: Select seeds
Fall 2011
X1,0 in the range [1, 2,147,483,562] for the 1st generator

X2,0 in the range [1, 2,147,483,398] for the 2nd generator.
Step 2: For each individual generator,
X1,j+1 = 40,014 X1,j mod 2,147,483,563
X2,j+1 = 40,692 X1,j mod 2,147,483,399.
Step 3:
Xj+1 = (X1,j+1 - X2,j+1 ) mod 2,147,483,562.
Step 4:
Return
X j 1
, X j 1 0
2,147,483,
563
R j 1
2,147,483,562 , X j 1 0
2,147,483,563
Step 5:
Set j = j+1, go back to step 2.
Combined generator has period: (m1 1)(m2 1)/2 ~ 2 x 1018
CSC 446/546
22

Numbers (3): Random-Numbers Streams
(1)
The seed for a linear congruential random-number generator:
Is the integer value X0 that initializes the random-number

sequence.
Any value in the sequence can be used to seed the generator.
A random-number stream:
Refers to a starting seed taken from the sequence X0, X1, , XP.
If the streams are b values apart, then stream i could defined
by starting seed:
Si X b ( i 1) for i 1,2, , P b
Fall 2011
Older generators: b = 105; Newer generators: b = 1037.
CSC 446/546
23

Numbers (3): Random-Numbers Streams
(2)
A single random-number generator with k streams can
act like k distinct virtual random-number generators
To compare two or more alternative systems.
Fall 2011
Advantageous to dedicate portions of the pseudo-random

number sequence to the same purpose in each of the
simulated systems.
CSC 446/546
24
4. Tests for Random Numbers (1):

Principles (1)
Desirable properties of random numbers: Uniformity and
Independence
Number of tests can be performed to check these properties been
achieved or not
Two type of tests:
Frequency Test: Uses the Kolmogorov-Smirnov or the Chisquare test to compare the distribution of the set of numbers
generated to a uniform distribution
Fall 2011
Autocorrelation test: Tests the correlation between numbers

and compares the sample correlation to the expected
correlation, zero
CSC 446/546
25

Principles (2)
Two categories:
Testing for uniformity. The hypotheses are:
H0: Ri ~ U[0,1]
H1: Ri ~ U[0,1] /
Failure to reject the null hypothesis, H0, means that
evidence of non-uniformity has not been detected.
Testing for independence. The hypotheses are:
H0: Ri ~ independently distributed

/
Fall 2011
H1: Ri ~ independently distributed

Failure to reject the null hypothesis, H0, means that
evidence of dependence has not been detected.
CSC 446/546
26

Principles (3)
For each test, a Level of significance must be stated.
The level is the probability of rejecting the null hypothesis H0
when the null hypothesis is true:
= P(reject H0|H0 is true)
The decision maker sets the value of for any test
Fall 2011
Frequently is set to 0.01 or 0.05
CSC 446/546
27

Principles (4)
When to use these tests:
If a well-known simulation languages or random-number
generators is used, it is probably unnecessary to test
If the generator is not explicitly known or documented, e.g.,
spreadsheet programs, symbolic/numerical calculators, tests
should be applied to many sample numbers.
Types of tests:
Theoretical tests: evaluate the choices of m, a, and c without
actually generating any numbers
Fall 2011
Empirical tests: applied to actual sequences of numbers

produced. Our emphasis.
CSC 446/546
28

Frequency Tests (1)
Test of uniformity
Two different methods:
Kolmogorov-Smirnov test
Chi-square test
Both these tests measure the degree of agreement between the
distribution of a sample of generated random numbers and the
theoretical uniform distribution
Fall 2011
Both tests are based on null hypothesis of no significant difference

between the sample distribution and the theoretical distribution
CSC 446/546
29

Frequency Tests (2) Kolmogorov-Smirnov
Test
(1)
Compares the continuous cdf, F(x), of the uniform distribution with
the empirical cdf, SN(x), of the N sample observations.
We know:
F ( x ) x, 0 x 1
If the sample from the RN generator is R1, R2, , RN, then the
empirical cdf, SN(x) is:
S N ( x)
number of R1 , R2 ,..., Rn which are x

N
Fall 2011
The cdf of an empirical distribution is a step function with jumps at

each observed value.
CSC 446/546
30

Test (2)
Test is based on the largest absolute deviation statistic between F(x) and SN(x)
over the range of the random variable:
D = max| F(x) - SN(x)|

The distribution of D is known and tabulated (A.8) as function of N
Steps:
1. Rank the data from smallest to largest. Let R(i) denote ith smallest
observation, so that R(1)R(2)R(N)
2. Compute
i 1
i
R i ; D max R i
1i N N
1 i N
N
D max
3. Compute D= max(D+, D-)
Fall 2011
4. Locate in Table A.8 the critical value D, for the specified significance
level and the sample size N
5. If the sample statistic D is greater than the critical value D, the null
hypothesis is rejected. If D D, conclude there is no difference
CSC 446/546
31

Test (3)
Example: Suppose 5 generated numbers are 0.44, 0.81, 0.14, 0.05,

0.93.
Step 1:
Step 2:
R(i)
0.05
0.14
0.44
0.81
0.93
i/N
0.20
0.40
0.60
0.80
1.00
i/N R(i)
0.15
0.26
0.16
0.07
R(i) (i-1)/N
0.05
0.04
0.21
0.13
Arrange R(i) from

smallest to largest
D+ = max {i/N R(i)}
D- = max {R(i) - (i-1)/N}
Step 3: D = max(D+, D-) = 0.26

Step 4: For = 0.05,
D = 0.565 > D
Fall 2011
Hence, H0 is not rejected.

CSC 446/546
32

Frequency Tests (3) Chi-Square Test (1)
Chi-square test uses the sample statistic:
n is the # of classes
(Oi Ei )
Ei
i 1
n
02
Ei is the expected
# in the ith class
Oi is the observed
# in the ith class
Approximately the chi-square distribution with n-1 degrees of

freedom (where the critical values are tabulated in Table A.6)
For the uniform distribution, Ei, the expected number in the each
class is:
N
Ei , where N is the total # of observation
n
Valid only for large samples, e.g. N >= 50
Fall 2011
Reject H0 if 02 > ,N-12

CSC 446/546
33

Fall 2011
Example 7.7: Use Chi-square test for the data shown below with
=0.05. The test uses n=10 intervals of equal length, namely [0,0.1),
[0.1,0.2), ., [0.9,1.0)
CSC 446/546
34

Fall 2011
The value of 02=3.4; The critical value from table A.6 is 0.05,92=16.9.
Therefore the null hypothesis is not rejected
CSC 446/546
35
4. Tests for Random Numbers (3): Tests

for Autocorrelation (1)
The test for autocorrelation are concerned with the dependence
between numbers in a sequence.
Consider:
Though numbers seem to be random, every fifth number is a large

number in that position.
Fall 2011
This may be a small sample size, but the notion is that numbers in
the sequence might be related
CSC 446/546
36

Testing the autocorrelation between every m numbers (m is a.k.a.
the lag), starting with the ith number
The autocorrelation im between numbers: Ri, Ri+m, Ri+2m, Ri+(M+1)m
i (M 1 )m N
M is the largest integer such that
Hypothesis:
H 0 : im 0,
if numbers are independent
H1 : im 0,
if numbers are dependent
If the values are uncorrelated:
Fall 2011
For large values of M, the distribution of the estimator of im, im

denoted
is approximately normal.
CSC 446/546
37

im
Z0
im
Test statistics is:
Z0 is distributed normally with mean = 0 and variance = 1, and:
1
im
M 1
im
R
k 0
i km
Ri (k 1 )m 0.25
13M 7
12(M 1 )
If im > 0, the subsequence has positive autocorrelation

High random numbers tend to be followed by high ones, and vice versa.
Fall 2011
If im < 0, the subsequence has negative autocorrelation

Low random numbers tend to be followed by high ones, and vice versa.
CSC 446/546
38

After computing Z0, do not reject the hypothesis of independence if
z/2Z0 z/2
Fall 2011
is the level of significance and z/2 is obtained from table A.3
CSC 446/546
39

Example: Test whether the 3rd, 8th, 13th, and so on, for the output on Slide 37 are
auto-correlated or not.
Hence, = 0.05, i = 3, m = 5, N = 30, and M = 4. M is the largest integer
such that 3+(M+1)530.
1 (0.23)(0.28) (0.25)(0.33) (0.33)(0.27)

35
0.25
4 1 (0.28)(0.05) (0.05)(0.36)
0.1945
13(4) 7
0.128
12( 4 1 )
0.1945
Z0
1.516
0.1280
Fall 2011
35
From Table A.3, z0.025 = 1.96. Hence, the hypothesis is not rejected.
CSC 446/546
40

Shortcoming:
The test is not very sensitive for small values of M, particularly
when the numbers being tested are on the low side.
Problem when fishing for autocorrelation by performing numerous
tests:
If = 0.05, there is a probability of 0.05 of rejecting a true
hypothesis.
If 10 independent sequences are examined,
Fall 2011
The probability of finding no significant

autocorrelation, by chance alone, is 0.9510 = 0.60.
Hence, the probability of detecting significant
autocorrelation when it does not exist = 40%
CSC 446/546
41
5. Caution
Caution:
Even with generators that have been used for years, some of
which still in use, are found to be inadequate.
This chapter provides only the basics
Fall 2011
Also, even if generated numbers pass all the tests, some

underlying pattern might have gone undetected.
CSC 446/546
42

Part 6: Random Number Generation

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Part 6: Random Number Generation

Încărcat de

Drepturi de autor:

Formate disponibile

Fall 2011

Part 6: Random Number

1. Properties of Random Numbers

1. Properties of Random Numbers

Independence: The probability of observing a value in

simulates, or imitates, the ideal properties of random

There might be dependence (i.e., correlation)

If such departures are detected, the generation scheme

Closely approximate the ideal statistical properties of

3. Techniques for Generating Random

3.3 Random-Number Streams.

3. Techniques for Generating Random

following a recursive relationship:

X i 1 (aX i c) mod m, i 0,1,2,...

X0 is called the seed

When c=0 it is called multiplicative congruential method

3. Techniques for Generating Random

3. Techniques for Generating Random

EXAMPLE: Use X0 = 27, a = 17, c = 43, and m = 100.

X2 = (17*2+43) mod 100 = 77 mod 100 =77,

X3 = (17*77+43) mod 100 = 1352 mod 100 = 52 R3 = 0.52;

Thus each Ri is discrete on I, instead of continuous on interval [0,1]

3. Techniques for Generating Random

Speed and efficiency are aided by a modulus, m, to be (or close

3. Techniques for Generating Random

3. Techniques for Generating Random

Not a viable generator !! Density insufficient, period too short

3. Techniques for Generating Random

Speed and efficiency are aided by using a modulus m

After the ordinary arithmetic yields a value of aXi+c, Xi+1

3. Techniques for Generating Random

Period P=m-1 (well over 2 billion)

3. Techniques for Generating Random

Examples: 1) highly reliable system simulation

Approach: Combine two or more multiplicative

3. Techniques for Generating Random

is uniformly distributed on the integers from 0 to m1-2

3. Techniques for Generating Random

The jth generator:

Has prime modulus mj and multiplier aj and

3. Techniques for Generating Random

3. Techniques for Generating Random

Example: For 32-bit computers, LEcuyer [1988] suggests combining k = 2

X1,0 in the range [1, 2,147,483,562] for the 1st generator

3. Techniques for Generating Random

Is the integer value X0 that initializes the random-number

Older generators: b = 105; Newer generators: b = 1037.

3. Techniques for Generating Random

Advantageous to dedicate portions of the pseudo-random

4. Tests for Random Numbers (1):

Autocorrelation test: Tests the correlation between numbers

4. Tests for Random Numbers (1):

Testing for uniformity. The hypotheses are:

H0: Ri ~ independently distributed

H1: Ri ~ independently distributed

4. Tests for Random Numbers (1):

= P(reject H0|H0 is true)

The decision maker sets the value of for any test

Frequently is set to 0.01 or 0.05

4. Tests for Random Numbers (1):

Empirical tests: applied to actual sequences of numbers

4. Tests for Random Numbers (2):