Documente Academic
Documente Profesional
Documente Cultură
The
first
thing
I
did
for
this
problem
was
create
a
series
of
functions
that
gave
multiple
component
densities
for
me
to
use.
I
then
compiled
these
functions
into
a
list.
The
last
variable
will
be
used
later
to
give
an
ouput
based
on
the
4
probabilities.
#Function
taking
sample
from
our
density
#Help
from
Duncan
Tample
Lang
was
given
in
office
hours
to
to
write
this
function.
samp_dens_1
<-
function(n,
problty,
components){
compon
=
numeric(n)
answer
=
numeric(n)
for
(i
in
1:n){
y
=
sample(c(1:length(components)),
1
,
replace
=
TRUE,
prob
=
problty)
compon[i]
=
y
U
=
components[[y]](n=1)
answer[i]
=
U
}
return(answer)
}
samp_dens_1(n
=
9,
problty
=
problty,
components
=
norms)
##
[1]
5.068311
12.329605
6.125661
4.085809
5.187219
3.653186
15.947611
##
[8]
4.741648
2.727713
The
function
returns
n
sample
values
from
a
mixture
of
Normals.
It
accepts
three
parameters:
the
number
of
sample
values
(n),
the
probabilities
for
sampling
from
The
function
returns
n
sample
values
from
a
mixture
of
Normals.
It
accepts
three
parameters:
the
number
of
sample
values
(n),
the
probabilities
for
sampling
from
each
component
(problty),
and
an
object
representing
the
densities
of
the
components.
It
also
shows
from
which
of
the
distrobutions
the
numbers
are
beign
generated
from.
#Compare
distributions
answers_1
=
samp_dens_1(n
=
10000,
problty
=
problty,
components
=
norms)
#show
quantile
plot
for
the
two
distrobutions
and
make
sure
straight
line
The
plot
above
is
making
certain
the
two
functions
produce
similar
results.
I
used
Q-
Q
plots
to
compare
the
output
from
the
two
functions
It
is
shown
in
the
plot
above
that
the
distrobutions
from
the
two
functions
are
very
similar,
since
there
is
a
relatively
linear
distrobution
of
the
two
quantiles
when
plotted
together.
samp_dens_1_run<-
function(n,
problty=c(.2,.3,.1,.4),
components
=
norms){
compon
=
numeric(n)
answer
=
numeric(n)
for
(i
in
1:n){
y
=
sample(c(1:length(components)),
1
,
replace
=
TRUE,
prob
=
problty)
compon[i]
=
y
U
=
components[[y]](n=1)
answer[i]
=
U
}
return(answer)
}
samp_dens_1_run(10,
problty
=
c(.2,.3,.1,.4),
components
=
norms)
##
[1]
2.842619
8.351622
3.842018
12.676749
12.135348
10.280814
7.434478
##
[8]
3.098023
4.934412
3.443973
The
function
above
performs
multiple
runs
of
function
1
for
different
sample
sizes.
samp_dens_2_run
<-
function(n,
prob
=
c(.2,.3,.1,.4),
components
=
norms){
compon
=
numeric(n)
compon
=
sample(c(1:length(components)),
n
,
replace
=
TRUE,
prob
=
problty)
tab
=
table(compon)
storage
=
list()
for
(i
in
names(tab))
{
storage[[i]]
=
components[[as.numeric(i)]](n
=
tab[i])
}
return(storage)
}
##
$`4`
##
[1]
14.452658
4.478555
2.508859
13.464259
6.777307
13.905548
18.477334
##
[8]
7.838756
12.674486
8.550978
12.558988
13.412830
14.207680
6.613511
##
[15]
4.166077
5.547381
7.695239
9.123781
12.808309
8.550782
9.322842
##
[22]
8.652894
9.681145
12.171426
5.438197
12.193603
9.124177
2.836161
##
[29]
11.171202
15.108634
9.407813
4.046711
16.366509
15.279492
7.147838
##
[36]
8.888420
15.536749
14.273155
12.396337
10.559881
7.626454
The
function
above
performs
multiple
runs
of
function
2
for
different
sample
sizes.
time_1
=
sapply(sampsize,
function(x){system.time(samp_dens_1_run(n=x))})[3,]
time_2
=
sapply(sampsize,
function(x){system.time(samp_dens_2_run(n=x))})[3,]
boxplot(time_elapsed_2,
main
=
"Time
elapsed
for
function
2
sample
sizes",
xlab
=
"Sample
Size",
ylab
=
"Time
(in
minutes)",
xaxt
=
'n')
In
comparing
the
two
boxplots,
it
is
obvious
that
the
second
function
works
much
more
quickly
than
the
first,
especially
for
larger
sample
sizes.
Also,
the
larger
the
sample
size,
the
longer
it
takes
for
the
function
to
complete.
After
experimenting
with
different
parameters
in
the
component
densities,
it
is
not
obvious
that
the
time
elapsed
is
effected.
Also,
for
larger
numbers
of
components
in
the
mixture
(tried
for
k=4,5,6)
there
is
a
greater
amount
of
time
elapsed.
2.1
Random
Number
Generation
a.Triangular
Distribution
#compute
density
of
triagular
distrobution
store[i]
=
2*(x[i]-a)/((b-a)*(c-a))
if
(c
<=
x[i]
&
x[i]
<
b)
store[i]
=
2*(b-x[i])/((b-a)*(b-c))
if
(x[i]
<
a
|
x[i]
>
b)
store[i]
=
0
}
return
(store)
}
#call
dtriang()
with
the
specified
values
dtriang(x=c(3,4,8,2),a=1,b=6,c=3)
##
[1]
0.4000000
0.2666667
0.0000000
0.2000000
Since
the
function
is
computing
the
density
of
the
triangular
distribution
at
one
or
more
values
of
x,
for
values
of
a,
b
and
c
specified
by
the
caller
,
we
can
determine
that
it
is
working
properly.
#Make
function
ptriang
that
takes
one
or
more
values
of
the
RV
and
computes
triang
dist.
}
return(ptri)
}
ptriang(x
=
c(.5,
2,
4,
9),
a
=
1,
b
=
6,
c
=
3)
##
[1]
0.0000000
0.1000000
0.7333333
1.0000000
Since
the
function
is
computing
the
probability
for
a
Triangular
distribution
of
the
value
being
less
than
or
equal
to
each
value,
we
know
that
the
function
is
working
properly.
#Using
inverse
cdf
of
the
original
function,
sampling
from
distrobution.
From
the
histogram
above,
we
can
see
that
there
has
been
a
triangular
density
generated
by
the
function.
This
was
done
sampling
from
the
triangular
distrobution
from
the
first
function.
2.2
b.
Acceptance/Reject
Sampling
#Taking
a
look
at
the
function
source(url("http://eeyore.ucdavis.edu/stat141/homework/nodeDensity.R"))
nd=nodeDensity
plt=persp(outer(0:100,0:100.,nd),
phi=30,
theta=30,
main
="Target
Density",
xlab="X")
plt
##
[,1]
[,2]
[,3]
[,4]
##
[1,]
1.732051e+00
-0.5000000
0.8660254
-0.8660254
##
[2,]
1.000000e+00
0.8660254
-1.5000000
1.5000000
##
[3,]
-1.537228e-17
0.4348286
0.2510484
-0.2510484
##
[4,]
-1.366025e+00
-1.0490381
-2.9150635
3.9150635
contour(outer(0:100,0:100.,nd))
The
two
plots
above
clearly
show
the
target
density.
#Finding
max
of
function
max(outer(0:100,0:100.,nd))
##
[1]
3.983295
return(1/100^2)
else
return(0)
}
Specifying
where
the
plane
needs
to
lie
above
the
function
for
our
sampling
density.
rprop
=
function(n)
runif(n,
0,
100)
library(MASS)
print(pas)
results=
return(samp)
}
We
can
see
that,
after
plotting
the
original
function
and
finding
the
max
of
it,
I
was
able
to
create
a
function
that
sampled
from
the
original
distrobution,
to
mimick
that
of
the
original.
It
can
be
seen
that
the
two
plots
are
similar,
which
means
a
success!
The
efficiency
is
around
34%,
meaning
that
about
34%
of
the
sampled
points
actually
fall
within
the
original
distrobution.