Documente Academic
Documente Profesional
Documente Cultură
i=1
i
2
< 10000.
CHAPTER 4. DOING THINGS REPEATEDLY 20
One approach to this problem is to successively add terms to the sum, while the
sum is less than 10000, and to stop accumulating as soon as the sum exceeds this
amount. The following statements accomplish this:
DATA _NULL_;
NUMSUM = 0;
INDEX=0;
DO WHILE (NUMSUM < 10000);
INDEX=INDEX+1;
NUMSUM = NUMSUM + INDEX**2;
END;
INDEX=INDEX-1;
FILE sum.out;
PUT INDEX;
RUN;
QUIT;
The nal value of INDEX is the solution n. This single number should be contained
in the le sum.out after executing the above lines of code.
4.3.2 Exercises
1. Write a SAS program which nds the largest n satisfying
n
i=1
i
3
< 20000.
2. Write a SAS program which nds the largest n satisfying n! < 100000.
3. Write a SAS program which nds the smallest n satisfying n! > 100000.
Chapter 5
Simulation
5.1 Generation of Pseudorandom Numbers
We begin our discussion of simulation with a brief exploration of the mechanics of pseudo-
random number generation. Pseudorandom numbers are useful in simulation studies.
We will briey describe a common method for simulating independent uniform random
variables on the interval [0,1]. A multiplicative congruential random number generator pro-
duces a sequence of pseudorandom numbers, u
0
, u
1
, u
2
, . . . , which are approximately inde-
pendent uniform random variables on the interval [0,1]. We now describe how to construct
such a generator.
Let m be a large integer, and let b be another integer which is smaller than m. b is often
somewhere around the square root of m. To begin, an integer x
0
is chosen between 1 and
m. x
0
is called the seed. It is best chosen in some non-systematic manner.
Once the seed has been chosen, the generator proceeds as follows:
x
1
= bx
0
(mod m)
u
1
= x
1
/m.
u
1
is the rst pseudorandom number. Dividing by m ensures that the number lies between
0 and 1. Note that it takes some value between 0 and 1. If m and b are chosen properly, it
is dicult to predict the value of u
1
, given the value of x
0
only. The second pseudorandom
number is then obtained in the same manner:
x
2
= bx
1
(mod m)
u
2
= x
2
/m.
u
2
is another pseudorandom number, which is approximately independent of u
1
. The method
continues using the following formulas:
x
n
= bx
n1
(mod m)
u
n
= x
n
/m.
This method produces numbers which are in reality non-random, but if done properly,
the numbers appear to be random (i.e. unpredictable).
Dierent values of b and m give rise to pseudorandom number generators of varying
quality. If they are not chosen with some care, then the generator will produce numbers that
do not appear to be random. A number of statistical tests have been developed for assessing
the quality of a pseudorandom number generator.
21
CHAPTER 5. SIMULATION 22
5.1.1 Example
The following lines of SAS create a le called RANDOM.DAT which contains 5 pseu-
dorandom numbers based on the multiplicative congruential generator:
x
n
= 171x
n1
(mod 30269)
u
n
= x
n
/30269
with initial seed x
0
= 23121.
/* Rudimentary Pseudorandom Number Generator */
DATA _NULL_;
FILE RANDOM.DAT;
B = 171;
M = 30269;
SEED = 23121;
X = SEED;
DO I = 1 TO 5;
X = MOD(B*X, M);
U = X/M;
PUT X U;
END;
RUN;
QUIT;
The results which are stored in the le RANDOM.DAT are as follows. The rst column
consists of the integers x
1
, x
2
, . . . , x
5
. The second column consists of numbers rang-
ing between 0 and 1. These are the uniform pseudorandom numbers, u
1
, u
2
, . . . , u
5
.
18721 0.61849
23046 0.76137
5896 0.19479
9339 0.30853
22981 0.75923
A related operation is used internally by SAS to produce pseudorandom numbers auto-
matically with the function UNIFORM.
5.1.2 Example
The following lines of SAS create a le called RANDOM.DAT which contains 50 uni-
form pseudorandom numbers based on the SAS generator UNIFORM with initial seed
x
0
= 27218.
/* Example demonstrating use of SAS RNG with fixed seed. */
DATA _NULL_;
SEED = 27218;
CHAPTER 5. SIMULATION 23
FILE RANDOM.DAT;
DO I = 1 TO 50;
U = UNIFORM(SEED);
PUT U;
END;
RUN;
QUIT;
It is often of interest to look at the distribution of a set of pseudorandom numbers.
For the numbers generated in the previous example, we would proceed as follows:
DATA RANDOM;
INFILE RANDOM.DAT;
INPUT U;
PROC CHART;
VBAR U;
RUN;
QUIT;
The bars of the histogram should all be roughly the same height, if the numbers
are really uniformly distributed.
5.1.3 Exercises
1. Generate 200 random numbers using the generator from the rst example with
an initial seed of 2018.
2. Write a program (or modify the second program in the second example) which
produces a histogram of the numbers produced in the previous exercise.
3. Generate 200 random numbers using the SAS UNIFORM generator from example
2 with an initial seed of 2018. Produce a histogram of this simulated data.
4. Modify the generator of the rst example so that it produces 200 random
numbers from the generator
x
n
= 172x
n1
(mod 30307)
with initial seed x
0
= 17218.
5. Generate 1000 pseudorandom numbers using the SAS function UNIFORM, and
store them in a le called UNIF.DAT.
6. Modify the above program to simulate the random variable Y = 1/(U +
1) where U is a uniform random variable on the interval [0,1]. Specically,
generate 1000 values of this random variable and put them in a le called
RANDOM.DAT.
Also, plot the histogram of the random numbers y
1
, . . . , y
1000
. Since Y is no
longer a uniform random variable, the histogram will not be at any longer;
what is the shape of the distribution?
CHAPTER 5. SIMULATION 24
7. Write a program which generates 100 independent observations on a uniformly
distributed random variable on the interval [0, 100]. Estimate the mean, vari-
ance and standard deviation of such a uniform random variable.
8. Use the FLOOR function together with UNIFORM to simulate 100 random in-
tegers between 0 and 99.
5.2 Simulation of Bernoulli Trials
A Bernoulli trial is an experiment in which there are 2 possible outcomes. For example, a
light bulb may work or it may not work; these are the only possibilities. For another example,
consider a student who guesses on a multiple choice test question which has 5 options; the
student may guess correctly with probability 0.2 and incorrectly with probability 0.8.
Suppose we would like to know how well such a student would do on a multiple choice
test consisting of 100 questions. We can get an idea by using simulation:
Each question corresponds to an independent Bernoulli trial with probability of success
equal to 0.2. We can simulate the correctness of the student for each question by generating
an independent uniform random number. If this number is less than .2, we say that the
student guessed correctly; otherwise, we say that the student guessed incorrectly.
This will work because the probability that a uniform random variable is less than .2 is
exactly .2, while the probability that a uniform random variable exceeds .2 is exactly .8,
which is the same as the probability that the student guesses incorrectly. Thus, the uniform
random number generator is simulating the student. The SAS version of this is as follows:
DATA _NULL_;
SEED = 12883;
FILE STUDENT.ANS;
PUT CORRECT U;
DO QUESTION = 1 TO 100;
U = UNIFORM(SEED);
IF U < .2 THEN CORRECT = 1;
ELSE CORRECT = 0;
PUT CORRECT U;
END;
RUN;
QUIT;
The rst column of the le STUDENT.ANS contains the results of the students guesses. A 1
is recorded each time the student correctly guesses the answer, while a 0 is recorded each
time the student is wrong. The second column records the value of the variable U; note
that whenever its value is less than .2, the value of CORRECT is 1, and when U takes a value
exceeding .2, the value of CORRECT is 0.
5.2.1 Exercises
1. Write a SAS program which simulates a student guessing at a True-False test
consisting of 40 questions.
CHAPTER 5. SIMULATION 25
2. Write a SAS program which simulates 500 light bulbs, each of which has
probability .99 of working.
3. Write a SAS program which simulates a binomial random variable Y with
parameters n = 25 and p = .4. (Y is the sum of 25 independent Bernoulli
random variables with p = .4.)
Now, modify the program so that it generates 100 of these binomial random
variables and writes them to a le called binom.dat. In order to do this,
you will need to nest one DO group inside another.
Write another program which reads the data from binom.dat into a SAS
data set and produces a histogram. Estimate the mean and variance using
PROC MEANS. Compare these estimates with their theoretical counterparts.
Recall that the theoretical mean of a binomial random variable is np and
the theoretical variance is np(1 p).
5.3 The Logistic Model
In many biostatistical applications, interest centers on a dose-response relationship. For
example, what dosage of a carcinogenic substance will produce cancer in a given percentage
of a population? One would expect that higher dosages of carcinogen will yield higher rates
of cancer. A rst attempt at modelling this kind of relationship might be
p =
0
+
1
x
where p is the proportion of the population that would acquire cancer at dosage x;
0
and
1
are constants. This model is linear, and will almost have the correct behaviour if
1
is
positive. However, it will give values of p outside the interval [0, 1] if x is too large or too
small.
The logistic model is often used as an alternative to handle this kind of problem. It
is based on the logit transformation which maps values in (0, 1) to (, ). The logit
transformation is given by (p) = log(p/(1 p)). Its inverse is given by the logistic function
p() = exp()/(1 + exp()).
We can then model the dose-response relationship with
(p) =
0
+
1
x
where
0
and
1
are constants. This model says that when the dosage is x, the proportion
of the population acquiring cancer will be p, where
p =
e
0
+
1
x
1 + e
0
+
1
x
.
Example
Write SAS code to simulate the responses of 20 subjects who have been exposed to
varying amounts of carcinogen under the logistic model assumption with
0
= 1.5
and
1
= 0.7. Assume that the dosages are given by x = 0.1, 0.2, . . . , 2.0. Output
should be printed to a le called doseresponsesim.txt.
DATA _NULL_;
CHAPTER 5. SIMULATION 26
SEED = 81818; B0 = -1.5; B1 = 0.7;
FILE doseresponsesim.txt;
PUT Response Dosage;
DO X = 0.1 TO 2.0 BY 0.1;
U = UNIFORM(SEED);
TMP = EXP(B0 + B1*X);
P = TMP/(1+TMP);
IF U < P THEN CANCER = 1;
ELSE CANCER = 0;
PUT CANCER X;
END;
RUN;
QUIT;
Upon running the code, it should be clear that as x increases, the incidence of
cancer increases (i.e. the incidence of 1s in the rst column of simulated data
increases).
Exercises
1. Run the code for the logistic model given in the above example. Then change the slope
parameter
1
to 0.7. How does this aect the pattern in the response?
2. Modify the code given in the example so that dosages are given by 1.5, 1.7, 1.9, . . . , 3.5.
3. Modify the example code so that the output enters a SAS dataset called DOSERESP.
Next, use the PLOT procedure to plot CANCER against X. Experiment with various
values of
0
and
1
in order to see how these values aect the pattern of response.
5.4 Binomial Random Numbers
The RANBIN function can be used to automatically generate binomial random numbers.
Syntax:
Y = RANBIN(seed,n,p);
The seed is any positive integer, while n and p are the binomial parameters. The function
assigns a random binomial realization to the variable Y.
5.4.1 Example
Suppose 12% of a large population has recently been infected by a virus whose
incubation period is 2 weeks long, but whose presence can be detected by a blood
test. Suppose random testing for the virus is conducted, and 15 individuals are
tested each hour. Simulate the number of positive test results for each hour over
a 24-hour period. Record the simulated numbers of positive test results in a le
called viruscounts.txt.
Since 15 individuals are tested each hour and each individual has a 0.12 probability
of being infected, independent of the state of the other individuals, the number
CHAPTER 5. SIMULATION 27
of positive test results in one hour is a binomial random variable with n = 15
and p = 0.12. To simulate the numbers of positive test results for each hour in a
24-hour period, we need to generate 24 binomial random numbers:
/* Simulation of infected individuals */
DATA _NULL_;
SEED = 3728;
N = 15;
P = .12;
FILE viruscounts.txt;
PUT HOUR NUMBER OF INFECTED;
DO HOUR = 1 TO 24;
INFECTED = RANBIN(SEED,N,P);
PUT HOUR INFECTED;
END;
RUN;
QUIT;
5.4.2 Exercises
1. Generate 1000 binomial variates with n = 18 and p = .75 using RANBIN. Then use
PROC MEANS to estimate the average and variance. Compare with the theoretical mean
and variance. Repeat for binomial variates with n = 50 and p = .4.
2. Generate 50 binomial variates B
1
, B
2
, . . . , B
50
, having n = 20 and where p satises
(p) = 2.0 + 0.5x
where x = 0.1, 0.2, 0.3, . . . , 5.0. Use the Plot procedure to plot B against x and note
the pattern of plotted points.
3. Refer to the previous question. Calculate the expected value of B
i
, for i = 1, 2, . . . , 50.
Plot these expected values against x.
5.5 Poisson Random Numbers
We can generate Poisson random numbers using SAS with the RANPOI function. It is similar
to the RANBIN function, but there is only one parameter instead of two.
Syntax:
Y = RANPOI(seed, lambda);
In this case, lambda is the mean of the Poisson random variable.
CHAPTER 5. SIMULATION 28
5.5.1 Example
Suppose trac accidents occur at an intersection with a mean of 3.7 per year.
Simulate the annual number of accidents for a 10-year period, assuming that the
numbers occurring from year to year are independent.
/* Example of Poisson variate generation -- Simulation of Traffic
Accidents */
DATA _NULL_;
SEED = 497765;
LAMBDA = 3.7;
FILE ACCIDENT.DAT;
PUT YEAR NUMBER OF ACCIDENTS;
DO YEAR = 1 TO 10;
ACCIDENT = RANPOI(SEED, LAMBDA);
PUT YEAR ACCIDENT;
END;
RUN;
QUIT;
5.5.2 Exercises
1. Modify the above program to simulate the number of accidents per year for
15 years, when the average rate is 2.8 accidents per year.
2. Simulate the number of surface defects in the nish of a sports car for 20 cars,
where the mean is 1.2 defects per car.
3. Estimate the mean and variance of a Poisson random variable whose mean
rate is 7.2 by simulating 1000 such variates and using PROC MEANS. Compare
with the theoretical values, recalling that the variance and mean are equal for
Poisson random variables.
4. A commonly used model is the Poisson regression model
log() =
0
+
1
x
where
0
and
1
are constants. Take
0
= 3 and
1
= 0.5, and suppose
x = 0.1, 0.2, 0.3, . . . , 4.0. Calculate the corresponding values of . (Store these
values in a SAS variable called lambda.)
5. Refer to the previous question. Simulate Poisson random variates which have
the values. Plot the Poisson variates against the corresponding values of x.
5.6 Exponential Random Numbers
The exponential distribution can be used as a simple model for the time until a component
fails, or until a light bulb burns out.
A random variable T has an exponential distribution with mean if
CHAPTER 5. SIMULATION 29
P(T t) = 1 e
t/
for any non-negative t. The mean or expected value of T is 1/ and the variance of T is
1/
2
.
The simplest way to simulate exponential random variables is to generate a uniform
random variable U on [0,1], and set
1 e
T/
= U
Solving this for T, we have
T = log(1 U).
It can be shown that T dened in this way has an exponential distribution with mean . The
SAS function RANEXP can be used to generate random exponential variates with mean 1.
Syntax:
T = RANEXP(seed);
This produces an exponential variate T having mean 1. To change the mean to lambda, we
must use
T = lambda * RANEXP(seed);
5.6.1 Example
/* SIMULATION OF N EXPONENTIAL LAMBDA RANDOM VARIATES */
DATA _NULL_;
SEED = 12238;
LAMBDA = 2.5;
N = 10;
FILE EXPO.RVS
DO I = 1 TO N;
T = RANEXP(SEED)*LAMBDA;
PUT T;
END;
RUN;
QUIT;
5.6.2 Exercises
1. Suppose that a certain type of battery has a lifetime which is exponentially
distributed with mean 55 hours. Simulate 1000 such lifetimes to estimate the
mean and variance of the lifetime for this type of battery. Compare with the
theoretical mean and variance.
2. The central limit theorem says that the sample mean for a random sample
of size n from a population with mean and variance
2
is approximately
normally distributed with mean and variance
2
/n, where the approximation
improves as n increases.
CHAPTER 5. SIMULATION 30
The following programs provides a demonstration for the case where the un-
derlying population is exponentially distributed:
/* PROGRAM 1: Computation of averages of samples of size N coming
from exponential lambda populations */
DATA _NULL_;
SEED = 12238;
LAMBDA = 2.5;
NSAMPLES = 1000; /* We are going to simulate NSAMPLES
independent samples of size N, computing the average
in each case. */
N = 10;
FILE EXPO.AVG
DO NSAMPLE = 1 TO NSAMPLES;
TSUM = 0;
DO I = 1 TO N;
T = RANEXP(SEED)*LAMBDA;
TSUM = TSUM + T; /* Accumulating the sample
values to form a sum */
END;
TAVG = TSUM/N; /* TAVG = average of the current
sample. */
PUT TAVG; /* Storing sample averages for
use in next program where they will be
plotted as a histogram. */
END;
RUN;
QUIT;
/* PROGRAM 2: Histogram of averages to demonstrate CLT */
DATA EXPO_AVG;
INFILE EXPO.AVG;
INPUT TAVG;
PROC CHART;
VBAR TAVG;
PROC MEANS MEAN VAR;
VAR TAVG;
/* Weve included this procedure to compare
the mean and variance of the averages with what is
expected by the theory */
RUN;
QUIT;
Run the above programs for N = 3, 6, 10, 20, 30, 40. Note how the histogram
begins to resemble the familiar bell-shaped curve as N increases. How large
would you say N should be in order for the normal approximation to be con-
sidered accurate, when the underlying population is exponential?
CHAPTER 5. SIMULATION 31
5.7 Normal Random Numbers
Standard normal random variables can be generated using the RANNOR function in SAS.
Syntax:
Z = RANNOR(seed);
This produces a value of a normal random variable Z which has mean 0 and variance 1.
Recall that if X has mean and variance
2
, then
X = + Z
where Z has mean 0 and variance 1. Therefore, to simulate a random variable X having
mean mu and standard deviation sigma, use
X = mu + sigma*RANNOR(seed);
5.7.1 Example
Use simulation to estimate P(Z < 1.25) where Z is a standard normal random
variable.
Idea: Simulate a large number (say, 1000) of standard normal random variates and
compute the proportion that lie below 1.25.
DATA _NULL_;
FILE NORMAL.PRB;
SEED = 19218;
N = 1000;
VALUE = 1.25;
COUNT = 0;
DO I = 1 TO N;
Z = RANNOR(SEED);
IF Z < VALUE THEN COUNT = COUNT + 1;
END;
PROBEST = COUNT/N;
PUT AN EMPIRICAL ESTIMATE OF P(Z < VALUE ) IS PROBEST;
RUN;
QUIT;
5.7.2 Exercises
1. Simulate 100 normal random variates having mean 51 and standard deviation
5.2. Compute the average and standard deviation of your simulated sample
and compare with the theoretical values.
2. Simulate 1000 standard normal random variates Z, and use your simulated
sample to estimate
(a) P(Z > 2.5).
(b) P(0 < Z < 1.645).
CHAPTER 5. SIMULATION 32
(c) P(1.2 < Z < 1.45).
(d) P(1.2 < Z < 1.3).
Compare with the theoretical values (i.e. consult a normal table).
3. Using the fact that a
2
random variable on 1 degree of freedom has the same
distribution as the square of a standard normal random variable, simulate 100
independent values of such a
2
random variable, and estimate its mean and
variance. (Compare with the theoretical values: 1, 2.)
4. A
2
random variable on n degrees of freedom has the same distribution as
the sum of n independent standard normal random variables. Simulate a
2
random variable on 8 degrees of freedom, and estimate its mean and variance.
(Compare with the theoretical values: 8, 16.)
5. A commonly used model is the simple regression model
y =
0
+
1
x +
where
0
and
1
are constants. is a normal random variable with mean 0 and
variance
2
. Take
0
= 3 and
1
= 0.5, and suppose x = 0.1, 0.2, 0.3, . . . , 4.0.
(a) Simulate 40 independent normal variates , supposing = 0.4. (Store
these values in a SAS variable called epsilon.)
(b) Simulate the corresponding values of y. (Store these values in a SAS vari-
able called y.)
(c) Plot the normal variates against the corresponding values of x. Note the
pattern on the plot.
6. Re-do the previous question using = 1.5.
7. Repeat, using
0
= 5 and
1
= 2.
Chapter 6
REFERENCE: Other Data Step
Functions
A SAS DATASET
X1 X2 X3 X4
-1 3 2 2.3
0.1 4 -1 2.1
0.5 -1 -7 2.4
1.9 -1.7 -4 1.9
- used in some of the examples below.
6.1 Arithmetic Functions
ABS(X) - returns the absolute value of X: |X|.
EXAMPLE: Y=ABS(X1); (Y = 1 0.1 0.5 1.9).
MAX(X1,X2,...,XN) - returns the largest value among the values of the arguments.
EXAMPLE: verb+Y=MAX(X1,X2,X3,X4);+ (Y = 3 4 2.4 1.9).
MIN(X1,X2,...,XN) - returns the smallest value among the values of the arguments.
EXAMPLE: Y=MIN(X1,X2,X3,X4); (Y = -1 -1 -7 -4).
MOD(N1,N2) - returns the remainder when the quotient of N1 divided by N2 is calculated.
EXAMPLE: Y=MOD(X1,X2); (Y= 2 0.1 0.5 0.2).
SIGN(X) - returns the sign of X, or 0, if X is 0.
EXAMPLE: Y=SIGN(X1); (Y= -1 1 1 1)
SQRT(X) - returns the square root of X:
0
t
X1
e
t
dt.
LOG(X): the natural logarithm of X.
LOG2(X): the logarithm to the base 2 of X.
LOG10(X): the logarithm to the base 10 of X.
6.4 Trigonometric and Hyperbolic Functions
ARCOS(X): inverse cosine of X.
ARSIN(X): inverse sine of X.
ATAN(X): inverse tangent of X.
COS(X): cosine of X.
COSH(X): hyperbolic cosine of X.
SIN(X): sine of X.
SINH(X): hyperbolic sine of X.
TAN(X): tangent of X.
TANH(X): hyperbolic tangent of X.
6.5 Statistical functions
CSS(X1,X2,...,XN): the corrected sum of squares
N
i=1
X
2
i
N
X
2
CV(X1,X2,...,XN): the coecient of variation - the standard deviation of X
1
, . . . , X
N
divided by the mean of X
1
, . . . , X
N
.
MEAN(X1,...,XN)
X =
1
N
N
i=1
X
i
EXAMPLE: Y = MEAN(X1,X2,X3,X4); (Y = 1.575 1.3 -1.275 -0.475).
CHAPTER 6. REFERENCE: OTHER DATA STEP FUNCTIONS 35
N(X1,...,XN): number of nonmissing arguments.
EXAMPLE: Y=N(.,4.1,.3.7,5.7); (Y = 3).
NMISS($X_1,\ldots,X_N$): number of missing values.
EXAMPLE: Y=NMISS(.,4.1,.3.7,5.7); (Y = 2).
RANGE(X1,...,XN): maximum minus the minimum.
EXAMPLE: Y=RANGE(X1,X2,X3,X4); (Y = 4 5 9.4 5.9).
STD(X1,...,XN): standard deviation.
STDERR(X1,...,XN): standard error (standard deviation divided by
N).
SUM(X1,...,XN):
N
i=1
X
i
USS(X1,...,XN): uncorrected sum of squares
N
i=1
X
2
i
VAR(X1,...,XN): variance
6.6 Probability functions
The following functions can be used to determine various probabilities. The syntax is similar
to that used for the random number generator functions.
GAMINV(P,eta): returns the value of x such that
P =
x
0
t
1
e
t
dt
()
(0 P < 1, and > 0).
POISSON(lambda,N): returns the probability that an observation from a Poisson distri-
bution is less than or equal to N. is the mean parameter.
i.e. POISSON(lambda,N) =
N
j=0
e
()
j
j!
PROBBNML(p,n,m): returns the probability that an observation from a binomial distri-
bution with parameters p and n is less than or equal to m.
i.e. PROBBNML(p,n,m) =
m
j=0
n
j
p
j
(1 p)
nj
.
PROBCHI(x,nu): returns the probability that a random variable with a chi-square dis-
tribution on degrees of freedom falls below x.
PROBF(x,ndf,ddf): returns the probability that a random variable with an F distribu-
tion on ndf numerator degrees of freedom and ddf denominator degrees of freedom falls
below x.
PROBGAM(x,eta): returns the probability that a random variable with a gamma distri-
bution with shape parameter falls below x.
i.e. PROBGAM(x,eta) =
x
0
t
1
e
t
()
.
CHAPTER 6. REFERENCE: OTHER DATA STEP FUNCTIONS 36
PROBIT(x): returns the inverse of the standard normal cumulative distribution function.
i.e. If X is a standard normal random variable, then x is the probability that X will
take on a value less PROBIT(X).
PROBNORM(x): returns the probability that a standard normal random variable will fall
below x.
PROBT(x,nu): returns the probability that a random variable with students t distribu-
tion on degrees of freedom will fall below x.
TINV(p,nu): returns the pth percentile of the students t distribution on degrees of
freedom.
6.6.1 Example
Find the probability that a random variable with a t distribution on 8 degrees of freedom is
less than 1.4.
i.e. P(T < 1.4) =? where T is t-distributed on 8 d.f. The following program writes the
correct probability into the le PROB.T.
DATA _NULL_;
FILE PROB.T;
PROB = PROBT(1.4, 8);
PUT PROB;
6.6.2 Exercises
1. Compute the probability that a Poisson random variable with mean rate 11.4
takes on values less than
(a) 1.
(b) 2.
(c) 5.
(d) 11.
(e) 15.
(f) 21.
2. Repeat the previous question for a binomial random variable with p = .45 and
n = 24.
3. The time that it takes a bus to arrive at the next stop is normally distributed
with mean 10.4 minutes and standard deviation 1.2. Compute the probabilities
that the bus will arrive in less than
(a) 5 minutes.
(b) 8 minutes.
(c) 10.5 minutes.
(d) 12.5 minutes.
(e) 13.1 minutes.
(f) 15.2 minutes.