Sunteți pe pagina 1din 4

Statistical Inference Course Project

Nilrey Jim D. Cornites


November 7, 2016
Overview
This is the part 1 of the course final project which is a simulation exercise. In this project we will simulate
exponential random variables and compare its distribution with the Central Limit Theorem. We will explore
the distribution of the sample mean, sample variance, compare to theoritical mean and variance distribution,
and to show the sample mean distribution approximation to normal.

Simulation
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter.
The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. For this
project, we will set lambda = 0.2 for all of the simulations. We will simulate 1,000 times of 40 exponentials
and investigate the distribution. We will store the simulate data to simdata variable as matrix where each
row is the 40 samples
## Initialize variables
n=40
lambda=0.2
sim=1000
## Simulate data
simdata<-matrix(rexp(n*sim, lambda), sim, n)
head(simdata)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

[,1]
[,2]
[,3]
[,4]
[,5]
[,6]
[1,] 6.374017 1.293578 6.807734 4.8760658 12.0715314 6.173311
[2,] 2.728914 6.691434 8.567115 4.9139389 0.4717333 11.626117
[3,] 5.627465 16.347664 19.043141 2.0302783 0.7286167 1.273446
[4,] 34.403983 1.780344 3.768525 0.6006413 24.9243717 9.476574
[5,] 18.008220 7.402575 5.253922 3.6546294 2.3218509 2.211779
[6,] 2.535587 4.123374 23.700231 0.5615619 0.3891447 6.396274
[,7]
[,8]
[,9]
[,10]
[,11]
[,12]
[1,] 6.1720331 11.0516545 2.201348 1.0945661 3.1208264 5.38401103
[2,] 2.7707858 4.4809916 9.124007 0.6012095 0.8303089 3.29590280
[3,] 2.1275496 6.0462539 3.505114 0.9817137 8.9204362 1.22502308
[4,] 0.2809769 2.2604836 4.495021 3.0629984 2.1797546 12.41520979
[5,] 7.1493059 1.9848335 12.788744 8.3167803 4.4419968 0.50188772
[6,] 0.5588886 0.3570148 4.783951 3.2055609 11.1635941 0.04747952
[,13]
[,14]
[,15]
[,16]
[,17]
[,18]
[1,] 8.2335945 8.1065753 3.9142549 2.206253 8.6187280 5.112030
[2,] 1.7538093 19.5773227 9.8805489 3.073041 4.6470782 14.844922
[3,] 0.8994896 2.2065014 1.3356708 4.530257 2.2380568 5.468162
[4,] 3.1080904 0.5568631 6.9326036 3.814846 3.6630580 0.536950
[5,] 7.4295333 4.7441684 2.9834501 2.792538 0.6233632 5.969806
[6,] 3.8941004 4.6457556 0.2220027 2.076916 0.8832747 13.972879
[,19]
[,20]
[,21]
[,22]
[,23]
[,24]
1

##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

[1,] 0.2740115 1.667255 2.3401018 2.6862130 0.284484 9.456210


[2,] 15.9671583 3.213886 3.0524908 1.6261002 6.721435 1.788739
[3,] 0.5631134 10.440686 0.6380763 3.5314491 12.316609 26.245530
[4,] 11.1701952 29.711443 0.4860411 5.6820109 10.201205 5.326447
[5,] 2.7908121 2.735825 1.3276071 0.1235972 14.920358 3.626721
[6,] 16.3086822 10.016590 2.0261978 5.0113552 5.602207 4.097713
[,25]
[,26]
[,27]
[,28]
[,29]
[,30]
[1,] 10.553766 3.5064372 10.441034 3.615220 4.7003211 3.3003316
[2,] 3.807650 0.6349264 1.406944 3.825338 7.3543720 23.1054812
[3,] 0.826214 3.2423867 1.801570 1.401550 3.9474464 4.0300639
[4,] 1.749978 1.2506832 4.361131 4.131545 5.4733720 0.8419272
[5,] 10.613995 4.7991193 4.846980 9.539735 4.9017263 0.8415794
[6,] 5.232062 12.7441401 17.433514 1.030981 0.8235978 1.7920308
[,31]
[,32]
[,33]
[,34]
[,35]
[,36]
[1,] 0.3171620 5.1804119 0.3659964 3.019532 1.41805268 3.90805740
[2,] 8.1732008 0.4482673 8.2431870 2.206819 15.35910379 0.02920346
[3,] 11.3834769 1.9000791 3.4666655 2.339095 4.99923038 3.21762208
[4,] 6.1696920 0.1758730 1.0370420 9.116804 2.98056526 11.06418033
[5,] 0.2862768 1.9083156 4.3992995 1.196008 3.70573056 4.31692276
[6,] 1.9186464 7.2134157 6.4802019 6.257538 0.02982305 2.51704287
[,37]
[,38]
[,39]
[,40]
[1,] 9.784537 11.0647776 1.81726952 3.4669022
[2,] 1.362403 0.4534317 2.62294679 0.1777096
[3,] 2.141255 16.6764086 0.07660868 2.1100548
[4,] 6.769260 5.1296096 0.43110864 7.7171714
[5,] 15.669165 4.8358899 13.19780803 1.9477044
[6,] 28.278434 4.2590572 3.90618492 4.0703393

Sample Mean versus Theoretical Mean


Theoritical mean and variance is 1/lambda which is 1/0.2 = 5. As theory, the expectation of the distribution
of sample mean is equal to the population (or theoritical mean). To show this, we will use apply function
and get the mean in each row and average the sample means.
mns<-apply(simdata, 1, mean)
mn<-mean(mns)
mn ## sample mean center

## [1] 4.985551
The simulated means is centered in 4.9855507, which is very close to the theoritical center which is 5. Now
you see that the distribution of sample means is centered around the theoritical center of the distribution.
It is good to show variability of the sample mean distribution, theoritically as n is sufficiently large, the
variance is equal to sigma2/n and the square root of it is which commonly known as the standard error of
the mean. Code below shows how it work.
theorySE<-5/sqrt(40) ## theoritical standard error
actualSE<-sd(mns)
print(c(theorySE, actualSE))

## [1] 0.7905694 0.7857306


You see how simulation works.

Sample Variance versus Theoretical Variance


Recall that the theoretical variance is equal to 25 0.04 . Accordingly, sample variance is a good estimate
of the of the theoritical variance i.e. the distribution of the sample variance will be centered around that
theoritical variance. To show this, we will compute the variance in each row of the simdata and compared to
the theoritical variance.
vs<-apply(simdata, 1, var)
v<-mean(vs)
print(c(1/.2^2, v))
## [1] 25.00000 25.02464

g<-ggplot(data.frame(vs), aes(x=vs))
g+geom_histogram(binwidth=5, color="black", aes(y=..density..))+labs(title="Sample Variance Density of 4

Sample Variance Density of 40 Numbers from Exponential Distribution

0.04

Density

0.03

0.02

0.01

0.00
0

20

40

60

80

Variance of 40 Selections
Now we show how consistent our estimator s2 or sample variance. We can see clearly below the distribution
of the sample variances and how it is centered around the theoritical variance.

Distribution approximating to normal


In theory, as n gets larger, the sample mean of exponential random variables will approximate to normal
with mean = population mean and variance equals to the population variance over n, as previously discussed.
Below, we can clearly see how the distribution appears to be like gaussian distribution.
g<-ggplot(data.frame(mns), aes(x=mns))
g<-g+geom_histogram(binwidth=lambda, fill="white",color="black", aes(y=..density..))+labs(title="Sample
g+stat_function(fun=dnorm,args=list(mean=1/.2, sd=actualSE),color = "red", size = 1.0)

Sample Mean Density of 40 Numbers from Exponential Distribution

Density

0.4

0.2

0.0
3

Mean of 40 Selections
That red curve is the density function of the standard normal distribution.

S-ar putea să vă placă și