QMB3250 Exam1Notes

Gator
Tutoring
Fall 2009
Exam 1 Notes

1 Confidence Interval
Confidence intervals provide an estimate for some target parameter that you’re
trying to estimate (mean, variance, etc.) and are associated with some level of
uncertainty, called the level of confidence. They have the form
θ̂ ± ME
where θ is the point estimate of the parameter and ME is the margin of error.
2 One Sample Testing

2.1 Point Estimates
The point estimate is your “best guess” for the true value of the parameter you’re
trying to estimate (mean, variance, etc.). Often, you will estimate the population
mean (μ), and your point estimate will be the sample mean ( X ). You may also
estimate the proportion (P) of the population having a certain characteristic, and
your point estimate would be
X
p̂ =
n
where X is the number of observations that have this characteristic in your sample,
and n is the number of observation in your sample. The point estimate for the
proportion is denoted p̂ to distinguish it from the “true” proportion p (i.e. the one
you are trying to estimate).
2.2 Margin of Error

The margin of error of a confidence interval has the form
ME = k(SE)
where k is a number from a statistical table (which represents how many standard
deviations away from the mean the data falls) and SE is the standard error.
2.2.1 Statistical Number

The number k is based on the distribution of the data, and the level of confidence.
For a normal distribution, the number is called a z-number and is usually given as
the number at which X% of the data lies above (in the upper tail), where X% is the
level of confidence. The table given in the test may look like the following:
Gator Tutoring
Fall 2009
Exam 1 Notes

If the data has a t-‐distribution, the number is called a t-number and is found
similarly; however, every t-‐number has a certain degrees of freedom (df)
associated with it. For a one-‐sample test with n observations, the df is n -‐ 1.
2.2.2 Standard Error

If the data has a z-distribution or a t-distribution, the standard error SE is
s
SE =
n
where s is the standard deviation of the sample and n is the number of observations
in the sample.
If you’re estimating the proportion of the population having a certain characteristic,
the standard error SE is
p̂(1 − p̂)
SE =
n
where p̂ is the point estimate for the data set. The statistical number is still based
on how the data is distributed.
If the population (N) is finite and known, we can use the Finite Population
Correction (FPC) as a better measure for the standard error. The standard error
under FPC is
s N −n
ŜE =
n N −1

where s is the standard deviation of the sample, n is the number of observations in
the sample, and N is the size of the (finite) population.
3 Two Sample Testing

3.1 Point Estimates
We are also often interested in estimating the difference in means of two samples;
this is known as a two-‐sample test. The point estimate for the difference is
Gator Tutoring
Fall 2009
Exam 1 Notes

D = X − Y
where X x and X2 are the sample means of the first and second samples,
respectively.
We can also estimate the difference in the proportions of two samples. The point
estimate for the difference is
p̂x − p̂y
where p̂x and p̂y are the sample proportions of the first and second samples,
respectively.
3.2 Margin of Error

The margin of error for a two-‐sample test is still of the form ME = k(SE) .
3.2.1 Difference in Means - Population Variances Unknown

If the population variance (σ2) is known for each data set, the statistical number
comes from a z-table. The standard error is
2
sx2 sy
SE = +
nx ny
so the margin of error is

2
sx2 sy
ME = zα /2 +
nx ny
3.2.2 Difference in Means - Population Variances Unknown

Typically, we don’t know the population variances. If the population variances are
not known, the statistical number comes from a t-table -‐ which has a certain degrees
of freedom. For a two-‐sample test with nx observations in the first sample and ny
observations in the second sample, the df is nx + ny – 2. If the variances (although
unknown) are assumed to be the same for both samples, we use something called
the pooled variance. It has the following formula:
(nx − 1)sx2 + (ny − 1)sy2
S =
2

(nx + ny − 2)
p
Once we have the pooled variance, we use the following formula to find the standard
error:
s 2p s 2p
SE = +
nx ny
Gator Tutoring
Fall 2009
Exam 1 Notes

s 2p s 2p
ME = tα /2 +
nx ny
3.2.3 Difference in Proportions

The statistical number used when estimating the difference in proportions comes
from a z-table. The sample error is:
p̂x (1 − p̂x ) p̂y (1 − p̂y )
SE = +
nx ny

p̂x (1 − p̂x ) p̂y (1 − p̂y )
ME = zα /2 +
nx ny
4 Stratified Sampling
Sometimes it makes sense to split up our overall population into k different
categories, called strata. Each strata has its own size, mean, and standard deviation.
If we want to estimate the overall mean for the entire population, we can use the
stratified sample estimate for our point estimate. It has the following formula:
1 k
X strata = ∑ N j X j
N i =1
where N is the overall population, Nj is the size of the population in the jth strata, and
X j is the mean of the jth strata.
If we needed to compute the standard error for the mean of a strata, we could use
the finite population correction (as the size of the population is known).

4.1 Allocation in Stratified Sampling

For a given population size, there are a couple of ways to allocate your data among
the strata.
The first method is to use the proportional allocation method. To determine the
size of the jth stratum using this method, we use the following formula:
Nj
nj = n
N
where Nj is the old size of the jth strata from the previous data set, N is the old
population size (of the overall data set of all strata), and n is the new population size
Gator Tutoring
Fall 2009
Exam 1 Notes

(of the overall data set of all strata). Again, this gives us the size nj that we should
make our jth stratum.
The second method is to use the optimal allocation method. The formula for the
size of the jth stratum is
N jσ j
nj = k n
∑N σ i i
i =1
where N is the old population size of each strata, σ is the old standard deviation of
each strata, and n is the new population size (of the overall data set of all strata).
5 Median Estimation (The “.4n – 2” rule)

Sometimes we want to find a confidence interval for the median value of our data.
The “.4n-‐2” rule provides a quick way to do this.
1. First, calculate the following number:
.4n − 2
where n is the number of observations in the sample.
2. Round the number you obtained in Step 1 to the nearest integer, and call this
number r.
Then, use the rth-‐smallest observation in your data set as the lower bound, and the
rth-‐largest observation in your data set as the upper bound, and you will have a
confidence interval of approximately 95%.
6 Outliers
The easiest way to check for outliers is to see if they fall below the lower fence, or
above the upper fence.
1. Calculate the interquartile range (IQR), which is the difference between the
third quartile (Q3) of data and the first quartile (Q1) of data. 50% of your data
lies in this range.
2. The lower fence is Q1 – 1.5(IQR).
3. The upper fence is Q3 + 1.5(IQR).
7 Hypothesis Testing
Be familiar with the terminology:
• H0: the null hypothesis. This is what we are assuming to be true.
• H1: the alternate hypothesis. This is the other possibility, if we find enough
evidence to reject the null hypothesis.
Gator Tutoring
Fall 2009
Exam 1 Notes

• Rejection Region: If our sample estimate falls in this region, we have enough
evidence to reject the null hypothesis.
• Critical Value: Determines the rejection region; it based on a certain level of
confidence
• P-value: For a sample, tells us the smallest possible error we could have (which is
associated with the largest level of confidence we could have) given the data.
7.1 Which test?

• If the null hypothesis is of the form μ > c, use a lower tail test (reject H0 if X is too
small)
• If the null hypothesis is of the form μ < c, use an upper tail test (reject H0 if X is
too large)
• If the null hypothesis is of the form μ = c, use a two-‐sided test (reject H0 if X is too
small or two large)
7.2 Finding the Critical Value/P-value

To find the critical value that we will use to define our rejection region, simply find
the upper tail that’s associated with the given level of confidence α in a statistical
table, and reference the statistical number (z-‐number, t-‐number, etc.) that
corresponds with that α.
We can then rearrange our confidence interval formula to solve for k, the statistical
number:
θ̂ − θ
k=
s/ n
where θ̂ is the estimated parameter, θ is the “true” parameter (which we are
assuming has the value given by the null hypothesis), s is the sample standard
deviation, and n is the number of observations in the sample. If this number is in the
rejection region (given by the critical value), we will reject the null hypothesis.
So, if we were estimating the mean, and if our sample had a z-distribution, we could
find the z-number of our sample by
X−µ
zα =
s/ n
We could then compare this with the critical number, to see if it falls in the rejection
region. This process would be similar for estimating the mean with a t-distribution.
The p-value is the smallest α that we can have while still rejecting our null
hypothesis (i.e. still making a statistically significant conclusion). To solve for it, we
simply need to find the upper tail α that corresponds with the statistical number
Gator Tutoring
Fall 2009
Exam 1 Notes

from our sample. We find this the same way we did when we were comparing our
sample to the critical number.
For example, if we used the formula
X−µ
zα =
s/ n
and found our z-‐number to be 1.28, we could just look this up in our tables, and the
percentage of data in the upper tail (α) would be our p-‐value. Similar for t-‐
distribution.
8 Bins
Here’s the table from the book, which tells us the recommended number of bins
based on different sample sizes:
Sample Size Number of Bins
Fewer than 50 5-‐7
50 to 100 7-‐8
101 to 500 8-‐10
501 to 1000 10-‐11
1001 to 5000 11-‐14
More than 5000 14-‐20

QMB3250 Exam1Notes

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

QMB3250 Exam1Notes

Încărcat de

Drepturi de autor:

Formate disponibile

Gator

2 One Sample Testing

2.2 Margin of Error

2.2.1 Statistical Number

2.2.2 Standard Error

3 Two Sample Testing

3.2 Margin of Error

3.2.1 Difference in Means -­ Population Variances Unknown

so the margin of error is

3.2.2 Difference in Means -­ Population Variances Unknown

3.2.3 Difference in Proportions

so the margin of error is

4.1 Allocation in Stratified Sampling

5 Median Estimation (The “.4n – 2” rule)

7.1 Which test?

7.2 Finding the Critical Value/P-­value

Sample Size Number of Bins

Fewer than 50 5-­‐7

50 to 100 7-­‐8

101 to 500 8-­‐10

501 to 1000 10-­‐11

1001 to 5000 11-­‐14

More than 5000 14-­‐20

S-ar putea să vă placă și

3.2.1 Difference in Means - Population Variances Unknown

3.2.2 Difference in Means - Population Variances Unknown

7.2 Finding the Critical Value/P-value

Fewer than 50 5-‐7

50 to 100 7-‐8

101 to 500 8-‐10

501 to 1000 10-‐11

1001 to 5000 11-‐14

More than 5000 14-‐20