Documente Academic
Documente Profesional
Documente Cultură
Input Modeling
Purpose & Overview
Data Collection
Histograms
Selecting families of distribution
Parameter estimation
Goodness-of-fit tests
Fitting a non-stationary process
1. HISTOGRAMS
A frequency distribution or histogram is useful in determining the shape of a
distribution
The number of class intervals depends on:
o The number of observations
o The dispersion of the data
o Suggested: the square root of the sample size
For continuous data:
o Corresponds to the probability density function of a theoretical distribution
For discrete data:
o Corresponds to the probability mass function
If few data points are available: combine adjacent cells to eliminate the ragged
appearance of the histogram
Vehicle Arrival Example: # of vehicles arriving at an intersection between 7
am and 7:05 am was monitored for 100 random workdays.
There are ample data, so the histogram may have a cell for each possible
value in the data range
A rriv a ls p e r
P e r io d F re q u e n c y
0 12
1 10
2 19
3 17
4 10
5 8
6 7
7 5
8 5
9 3
10 3
11 1
Same data
with different
interval sizes
Let {xi, i = 1,2, …., n} be a sample of data from X and {yj, j = 1,2, …, n} be the
observations in ascending order:
Example (continued): Check whether the door installation times follow a normal
distribution.
Straightline,
supportin gthe
hypothe sisofa
norm aldistribution
S uperim posed
densityfunctionof
thenorm al
distribution
o The ordered values are ranked and hence not independent, unlikely for
the points to be scattered about the line
o Variance of the extremes is higher than the middle. Linearity of the points
in the middle of the plot is more important.
o Plotting the order values of the two data samples against each other
3. PARAMETER ESTIMATION
∑ ∑
n n
Xi X i2 −nX 2
X = i =1
S 2
= i =1
n n −1
If the data are discrete and have been grouped in a frequency distribution:
∑ ∑
n n
j =1
fjX j j =1
f j X 2j −nX 2
X = S2 =
n n −1
where fj is the observed frequency of value Xj
When raw data are unavailable (data are grouped into class intervals), the
approximate sample mean and variance are:
∑ ∑
c n
j =1
fjX j j =1
f j m 2j −nX 2
X = S2 =
n n −1
where fj is the observed frequency of in the jth class interval
∑ ∑
k k
and j =1
f j X j = 364 , and j =1
f jX 2
j = 2080
The sample mean and variance are
364
X= = 3.64
100
2080− 100* (3.64) 2
S =
2
99
= 7.63
The histogram suggests X to have a
Possion distribution
4. GOODNESS-OF-FIT TESTS
Conduct hypothesis testing on input data distribution using:
o Kolmogorov-Smirnov test
o Chi-square test
No single correct distribution in a real application exists.
o If very little data are available, it is unlikely to reject any candidate
distributions
o If a lot of data are available, it is likely to reject all candidate distributions
Chi-Square test
Intuition: comparing the histogram of the data to the shape of the
candidate density or mass function
Valid for large sample sizes when parameters are estimated by maximum
likelihood
By arranging the n observations into a set of k class intervals or cells, the
test statistics is:
k
(Oi −Ei )2
∑
ExpectedFrequency
χ02 = Ei = n*p i
i=1
Ei
where pi is thetheoretical
Observed prob. of the ith interval.
Frequency SuggestedMinimum=5
Kolmogorov-Smirnov Test
O
ur
foc
us
Suppose we need to model arrivals over time [0,T], our approach is the
most appropriate when we can:
9:30- 10:00 20 13 12 30
20 whose length Dt = ½, and observe over n =3 days
> 0, >0
= 0, =0
> 0, >0
o The closer r is to -1 or 1, the stronger the linear relationship is
between X1 and X2.
A time series is a sequence of random variables X1, X2, X3, … , are
identically distributed (same mean and variance) but dependent.
o cov(Xt, Xt+h) is the lag-h autocovariance
o corr(Xt, Xt+h) is the lag-h autocorrelation
o If the autocovariance value depends only on h and not on t, the
time series is covariance stationary
1 n ˆ
=
n−
∑
1j =
X1j X2j −nXˆ X
1
2
1
côv
( X1, X2 )
ρ
ˆ =
σˆ1σ
ˆ2 S
amp
led
evia
tio
n
X t = µ + φ ( X t −1 − µ ) + ε t , for t = 2,3,...
where ε 2 , ε 3 , … are i.i.d. normally distributed with µε = 0 and variance σ ε2
If X1 is chosen appropriately, then
o Autocorrelation ρh = Øh
To estimate Ø, μ, σe2 :
côv( X t , X t +1 )
µˆ = X , σˆ ε2 = σˆ 2 (1 − φˆ 2 ) , φˆ =
σˆ 2
where côv( X t , X t +1 ) is the lag-1 autocovariance
To estimate Ø, λ :
Summary