Documente Academic
Documente Profesional
Documente Cultură
STATISTICAL ANALYSIS
OF SIMULATION OUTPUTS
Sample Statistics
If x1, x2, , xn are n observations of the value
of an unknown quantity X, they constitute a
sample of size n for the population on which X
is defined.
n
1
Sample mean x xi
n i 1
n
1
Sample variance
2
s
n 1 i 1
( xi x ) 2
Central Limit Theorem
If the n mutually independent random
variables x1, x2, , xn have the same
distribution, and if their mean and their
variance 2 exist then
Central Limit Theorem
The random variable
n
1
n i 1
xi
n
is distributed according to the standard
normal distribution (zero mean and unit
variance).
Estimating a mean
Assume that we have a sample x1, x2, , xn
consisting of n independent * observations
of a given population
The sample mean xbar is an unbiased
estimator of the mean of the population
z / 2 z / 2
x ,x
n n
with
F ( z / 2 ) 1
2
for the standard normal distribution
Explanations
1 expressed in percent is the level of the
confidence interval
90% means= 0.10
Example:
If x = 35, = 4 and n = 100
The 95% confidence interval for is
35 1.96x4/10 = 35 7.84
When is not known
We can replace in the preceding formula by
the standard-deviation s of the sample
When n < 30, we must read the value of
z/2 from a table of Student's t-distribution
with n - 1 degrees of freedom
When n 30, we can use the standard
values
Confidence Intervals in CSIM
CSIM can automatically compute confidence intervals
for the mean value of any table, qtable and so on.
For everything but boxes
xyx->confidence();
bd->time_confidence();
bd->_number_confidence();
p(1 p ) p(1 p )
P p z / 2 p p z / 2 1
n n
AUTOCORRELATION
The problem
All statistical analysis techniques we have
discussed assume that sample values are
mutually independent
Generally false for quantities such as
Waiting times, response times,
Not practical
Would require very long simulations
Three good solutions
Batch means
Regenerative method
Time series analysis
Batch means
We group consecutive observations into
batches
We compute the means of these batches
We observe that autocorrelation among batch
means decreases with size of batches
When size increases, each batch includes
(run-length control)
Regenerative method
Most systems with queues go through states
that return it to an state identical to its original
state
The system regenerates itself
Examples:
Whenever a disk array is brought back to its
original state
Whenever a camper rental agency has all its
campers available
Key idea
Define a regeneration interval as an
interval between two consecutive
regeneration points:
Observations collected during the same
points
System must be idle
A relative accuracy
hold(exponential(MIART);
customer();
}
in a separate arrivals process
Make it an infinite loop
Example (IV)
Add
converged.wait();
converged.wait();
Best way to let sim process generate
customers and wait for termination
in parallel
Example (V)
The new arrivals process
void arrivals() {
process(arrivals); // REQUIRED
for(;;) { // forever loop
hold(exponential(MIART));
customer();
} // forever
} // arrivals
Warnings
Confidence intervals do not take into account
model inaccuracies
While the batch means method eliminates
most effects of measurement autocorrelation,
it is not always 100% effective
The max_time parameter of the run_length()
will not necessarily stop the simulation just
after the specified CPU time
Like the emergency brake of a train
Objective
Partition processes into different classes
Low priority
High priority
report_classes()
Other options
Can change the name of a process class:
c->set_name("high priority");
Atomic decay
Pseudo-random numbers
rn+1 = (a rn + c ) mod m
where
r , r , are the random values
1 2
m is the "modulus
r
0 is the seed
Two realizations
GCC family of compilers
m = 232 a = 69069 c=5
Microsoft Visual/Quick C/C++
m = 232 a = 214013 c = 2531011
Problems with pseudorandom
number generators
Much shorter periods for some seed states
Lack of uniformity of distribution
Correlation of successive values
Better RNGs
Use the Mersenne twister
Period is 219937 - 1
Blum-Blum-Schub
A quote
"Any one who considers arithmetical
methods of producing random digits is,
of course, in a state of sin. For, as has
been pointed out several times, there is no
such thing as a random number there are
only methods to produce random numbers,
and a strict arithmetic procedure of course is
not such a method.
John von Neumann
CSIM RNGs
BY default, CSIM uses a single stream of
random numbers
Can reset the seed using
void reseed(stream *s, long n)
as in
reseed(NIL, 13579)
Continuous distributions
supported by CSIM (I)
double uniform(double min, double max)
double triangular(double min, double max,
double mode)
double beta(double min, double max, double
shape1, double shape2)
double exponential(double mean)
double gamma(double mean, double stddev)
double erlang(double mean, double var)
Continuous distributions
supported by CSIM (II)
double hyperx(double mean, double var)
double weibull(double shape, double scale)
double normal(double mean, double stddev)
double lognormal(double mean, double
stddev)
double cauchy(double alpha, double beta)
double hypoexponential(double mn, double
var)
Continuous distributions
supported by CSIM (III)
double pareto(double a)
double zipf(long n)
double zipf_sum(long n, double *sum)
Discrete distributions
supported by CSIM
long uniform_int(long min, long max)
long bernoulli(double prob_success)
long binomial(double prob_success, long
num_trials)
long geometric(double prob_success)
long negative_binomial(long success_num,
double prob_success)
long poisson(double mean)
Empirical distributions
supported by CSIM
Not clear enough
Using multiple streams
In campers example, sequence of RNs used
to generate arrivals is affected by the
numbers of campers
If agency has less campers
s = new stream();
By default, streams are created with seeds
that are spaced 100,000 values apart
Reseeding a stream
Use:
s->reseed(24680);
Delete a stream
delete s;
Using a specific seed
Prefix RNG function with name of seed:
s->uniform (3.0, 7.0)
A CASE STUDY
RAID array revisited
Reliability of a RAID array
Reliability R(t) of a system is the probability
that will remain operational over a time
interval [0. t ] given that it was operational at
time t = 0
Not the same as availability
Three nines
Confidence intervals
Data loss rate and survival rate are
distributed according to a binomial law
Since n = 100,000, the distributions of both
proportions are approximately normal
Should use
p (1 p ) p (1 p )
p z / 2 p p z / 2
n n
CI for data survival rate s
=99.983% or 0.99983
We compute
((1 )/n) =
(0.00017 x 0.99987)/100000) =0.0000412
95% CI is
0.99983 1.96 x 0.0000412 =
0.99983 0.000081
[0.999749, 0.99911]
CI for data loss rate
0.000017 0.000081
[0.000089, 0.000251]
Extensions (I)
Other repair time distributions
Exponential ( to compare with results of
stochastic analysis)
Ad hoc (80% of repairs within one day,
of three disks
Two-dimensional RAID arrays