Documente Academic
Documente Profesional
Documente Cultură
Tutoring
Fall
2009
Exam
1
Notes
1 Confidence
Interval
Confidence
intervals
provide
an
estimate
for
some
target
parameter
that
you’re
trying
to
estimate
(mean,
variance,
etc.)
and
are
associated
with
some
level
of
uncertainty,
called
the
level
of
confidence.
They
have
the
form
θ̂ ± ME
where
θ
is
the
point
estimate
of
the
parameter
and
ME
is
the
margin
of
error.
If
the
data
has
a
t-‐distribution,
the
number
is
called
a
t-number
and
is
found
similarly;
however,
every
t-‐number
has
a
certain
degrees
of
freedom
(df)
associated
with
it.
For
a
one-‐sample
test
with
n
observations,
the
df
is
n
-‐
1.
p̂(1 − p̂)
SE =
n
where
p̂
is
the
point
estimate
for
the
data
set.
The
statistical
number
is
still
based
on
how
the
data
is
distributed.
If
the
population
(N)
is
finite
and
known,
we
can
use
the
Finite
Population
Correction
(FPC)
as
a
better
measure
for
the
standard
error.
The
standard
error
under
FPC
is
s N −n
ŜE =
n N −1
where
s
is
the
standard
deviation
of
the
sample,
n
is
the
number
of
observations
in
the
sample,
and
N
is
the
size
of
the
(finite)
population.
where
p̂x
and
p̂y
are
the
sample
proportions
of
the
first
and
second
samples,
respectively.
Once
we
have
the
pooled
variance,
we
use
the
following
formula
to
find
the
standard
error:
s 2p s 2p
SE = +
nx ny
Gator
Tutoring
Fall
2009
Exam
1
Notes
so
the
margin
of
error
is
s 2p s 2p
ME = tα /2 +
nx ny
4 Stratified
Sampling
Sometimes
it
makes
sense
to
split
up
our
overall
population
into
k
different
categories,
called
strata.
Each
strata
has
its
own
size,
mean,
and
standard
deviation.
If
we
want
to
estimate
the
overall
mean
for
the
entire
population,
we
can
use
the
stratified
sample
estimate
for
our
point
estimate.
It
has
the
following
formula:
1 k
X strata = ∑ N j X j
N i =1
where
N
is
the
overall
population,
Nj
is
the
size
of
the
population
in
the
jth
strata,
and
X j
is
the
mean
of
the
jth
strata.
If
we
needed
to
compute
the
standard
error
for
the
mean
of
a
strata,
we
could
use
the
finite
population
correction
(as
the
size
of
the
population
is
known).
where
N
is
the
old
population
size
of
each
strata,
σ
is
the
old
standard
deviation
of
each
strata,
and
n
is
the
new
population
size
(of
the
overall
data
set
of
all
strata).
6 Outliers
The
easiest
way
to
check
for
outliers
is
to
see
if
they
fall
below
the
lower
fence,
or
above
the
upper
fence.
1. Calculate
the
interquartile
range
(IQR),
which
is
the
difference
between
the
third
quartile
(Q3)
of
data
and
the
first
quartile
(Q1)
of
data.
50%
of
your
data
lies
in
this
range.
2. The
lower
fence
is
Q1
–
1.5(IQR).
3. The
upper
fence
is
Q3
+
1.5(IQR).
7 Hypothesis
Testing
Be
familiar
with
the
terminology:
• H0:
the
null
hypothesis.
This
is
what
we
are
assuming
to
be
true.
• H1:
the
alternate
hypothesis.
This
is
the
other
possibility,
if
we
find
enough
evidence
to
reject
the
null
hypothesis.
Gator
Tutoring
Fall
2009
Exam
1
Notes
• Rejection
Region:
If
our
sample
estimate
falls
in
this
region,
we
have
enough
evidence
to
reject
the
null
hypothesis.
• Critical
Value:
Determines
the
rejection
region;
it
based
on
a
certain
level
of
confidence
• P-value:
For
a
sample,
tells
us
the
smallest
possible
error
we
could
have
(which
is
associated
with
the
largest
level
of
confidence
we
could
have)
given
the
data.
8 Bins
Here’s
the
table
from
the
book,
which
tells
us
the
recommended
number
of
bins
based
on
different
sample
sizes: