Documente Academic
Documente Profesional
Documente Cultură
Discrete random variable – a variable that has countable number of possible random values
Continuous random variable – a variable that has an infinite number of possible random values in an interval of numbers
Probability distribution – is an assignment of probabilities to the values of the random variable.
Discrete probability distribution has to satisfy two criteria:
The probability of X is between 0 and 1 (0</ P (X) </ 1)
The probability of all X values adds to 1. (EP (X) =1)
Random – means everyone has an equal chance of being chosen.
Systematic – list entire population, then randomly pick every nth
Stratified – population is spilt into groups, then random sample from each
Cluster – population is spilt into groups, and then one or more groups are chosen
Variance – is the average squared deviation of data from a mean
Standard Deviation is the average deviation of date from the mean
*The SD is the square root of variance.
Sample Standard Deviation =s and is the SD of a sample of n observations taken from a population.
Population Standard Deviation = O (sigma) and is the SD of the entire population which is usually unknown.
Sample Variance = S2
Population variance = O2
Sample Mean or xbar – is the average of a variable X of a sample. An estimate of population mean.
Population mean = u
Mean & Standard Deviation of a Sample Mean: Also called Sampling Distribution of the mean
Let xbar be the mean of a random sample of size (n) from a population having mean (u) and SD (o), then
the mean of all possible values of xbar is ux =u
This says that the mean of the sample mean is the same as the population mean
The SD (standard error) of all possible values of xbar is ox = o (sd)/sqrt (n)
Example: Let x be the height of men in the US. Studies show that the heights of 15-year old boys in the US are normally
distributed with average heights of 67 inches and a SD of 2.5 inches. A random experiment consists of choosing 16 15-
year old boys at random. Find the mean and SD of xbar, that is, the mean and SD for the average height of a random
sample of 16 boys.
Solution:
The mean of the sample means is the same as the population mean. Ux=u=67
The SD in the sample means the populations SD is divided by the square root of sample size = ox = o/sqrt (n) =
2.5/sqrt(16) = 0.625
*Notice that the mean of a sample mean is always the same as the mean of the population, but the SD is smaller.
Sampling Distribution of a Sample Mean – if a population is distributed N (mean, SD) then the sample mean (xbar)of n
independent observations has the N (mean, SD/sqrt(n)) distribution
Draw a random sample of size (n) from any population having population mean (u) and finite SD (O). When n is large
(n>/30), the sampling distribution of xbar is approx. normal where xbar – N (mean, SD/sqrt(n))
Central Limit Theorem - guarantees that the sample mean will be normally distributed when the sample size is large
(usually 30 or more) no matter what shape the population distribution is.
Calculator Steps: Press Apps, Scroll down to STATS/LIST EDITOR, Pres enter, then press F5 (Distr) and scroll down to 4
(Normal CDF), Press enter, then enter numbers.
Solution: By the CLT, the mean of the sampling (ux) equals the mean of the population. U = 18.07. SD of the
sampling by CLT = ox = SD/sqrt(n) = 5.27/sqrt(25) = 1.054
Solution: The population mean = 18.07. This is smaller than the median of 19. Therefore its negatively skewed,
the mean is pulled in the direction of the outliers.
C) Which of the graphs correspond to the distribution of the population, distribution of a single sample and the sampling
distribution of the mean?
Solution: The CTL is bell shaped which is graph 3. Graph 1 only has 25 samples which is the distribution of a
single sample. The distribution of the population would be graph 2 because it’s the last one left.
D) Find the probability that for next term’s class they have a sample mean of more than 20.
Solution: p (XBAR > 20) would be normally distributed with a mean = 18.07 and a SD of 5.27/sqrt (25) = 1.054
Drawing & Shading sample distribution: Excel = P (xbar >20) = 1-NORM.DIST (20, 18.07, 5.27/sqrt (25), true) = 0.0335.
Sampling Distribution of a Sample Mean – if a population is distributed N (mean, SD) then the sample mean (xbar)of n
independent observations has the N (mean, SD/sqrt(n))
Examples:
Let X be the height of 15-year old boys in the US. Studies show that the heights of 15-year old boys in the US are
normally distributed with an average height of 67inches and a SD of 2.5 inches. A random experiment consists of
randomly choosing 16 15 year old boys. Find the probability that the mean height of those sampled is 69.5 inches.
Solution: 1st find the sample mean = Sample mean (xbar) is approx. N (67, 0.625)
P (xbar >/ 69.5) = P (xbar – mean / SD >/ new mean – old mean / SD = P (xbar – 67 / 0.625 >/ 69.5-67/0.625)
= P (z>/4) = 0.00003
Calculator Steps: Press Apps, Scroll down to STATS/LIST EDITOR, Pres enter, then press F5 (Distr) and scroll down to 4
(Normal CDF), Press enter, then enter numbers.
Z-Score (CLT) = xbar – mean / SD/sqrt (n) - use when it’s a group of people
P (X</ x) or P (X< x) – less than, below - =NORM.DIST (x, mean, SD, true)
P (x1 < X <x2) or P (x1</ X </ x2) – between - =NORM.DIST (x2, Mean, SD, true) – NORM.DIST (x1, Mean, SD, True)
P (X>/ x) or P (X>x) – greater than , top, more - =1-NORM.DIST (x, mean, SD, true)
Example #2
The average teacher’s salary in Connecticut is $57,337. Suppose that the distribution of salaries is normal with a SD of
$7500.
A) What is the probability that a randomly selected teacher makes less than $55,000 per year?
Z= X – Mean / SD = 55,000 – 57,337 / 7500 = -0.3116 or use the calculator with upper and lower limits.
*not this is only the probability for one person not a group of people
B) If we sample 100 teachers’ salaries’, what is the probability that the sample mean is less than $55,000?
Solution: Find P (xbar< 55000) this is a group so we use the CLT Z-score formula = xbar – mean / sd/sqrt (n)
Z = 55000 – 57337 / (7500/sqrt (100)) = -3.116 using excel to simplify
=NORM.DIST (55000, 57337, 7500/sqrt (100), true) = 0.0009167
*if z-score is less than -3 or greater than 3 it would be a rate event
Confidence Intervals:
Point Estimate for a Parameter – is a statistic.
100 (1 – a) % confidence interval has this chance of producing an interval containing the true value of the parameter.
Common confidence Intervals – 90% a =0.10 or 10%, 95% a=0.05 or 5%, 99% a=0.01 or 1%
Confidence level (level of confidence) -= 1-a
Alpha (a) – represents the probability that the parameter’s actual value is not captured in the interval, it is the
probability of being incorrect.
**when a symmetric distribution, such as a normal distribution is used, confidence intervals are ALWAYS of the form:
point estimate +- margin of error
Margin of error – defines the “radius” of the interval necessary to obtain the desired confidence level. It depends on the
desired confidence level.
**Higher levels of confidence come at a cost, such as larger margins of error and less accurate estimate**
Critical Value – which determines the number of SD based on the desired a level.
Confidence intervals – use 90% a =0.10 or 10%, 95% a=0.05 or 5%, 99% a=0.01 or 1%
Normal Sampling Distribution – Use a = 0.05 or 95%
Applet - http://www.rossmanchance.com/applets/ConfSim.html
Example #1: Use Excel or your calculator to find the critical values Za/2 for a 95% confidence interval.
Solution: = NORM.INV (lower tail area, mean, SD). Use the positive z-score – 1 –a/2 = 0.975
Excel NORM.INV (0.975, 0.1) = Z a/2 = 1.96
Example #2: State the statistical and real world interpretations of the following confidence intervals.
A) Suppose you have a 95% confidence interval for the mean age a woman gets married in 2013 is
26<u(mean)<28
Solution: Statistical Interpretation: There is a 95% chance that the interval 26 < u <28 contains the mean
age a woman gets married in 2013
Real World Interpretation: The mean age a woman married in 2013 is between 26 and 28 years of age
B) Suppose a 99% confidence interval for the proportion of Americans who have tried marijuana as of 2017 is
0.55 < p < 0.61
Solution: Statistical: There is a 99% chance that the interval 0.55<p<0.61 contains the proportion of
Americans who have tried marijuana as of 2017.
Real World: The proportion of Americans who have tried marijuana as of 2017 is between 0.55 and0.61
Higher level of confidence makes a wider interval. Increase the confidence level, the width interval becomes wider
Sample Size for Estimating a mean – n = (Z a/2 * SD / E) ^2 *use this when you want to find inference for a mean
Example #1: A researcher is interested in estimating the average salary of teachers. She wants to be 95% confident that
her estimate is correct. In a previous study, she found the population SD was $1175, how large a sample is needed to be
accurate within $100.
Solution: 1st – find the Z a/2 for 95% confidence , Excel Z a/2 = 1.96
Sample Size for Estimating a Proportion - n = P * q * (Za/2 / E) ^2 * use this when you want to find inference for a
proportion
Example #2: A study found that 73% of prek children ages 3 to 5 whose mothers had a bachelor’s degree or higher were
enrolled in early childhood care and education programs.
A) How large a sample is needed to estimate the true proportion within 3% with 95% confidence?
B) How large a sample is needed if you had no prior knowledge of the proportion?
*Choose a simple random sample of size (n) from a population having unknown population proportion (p)
The 100(1-ox) % confidence interval estimate for p is given phat +-Za/2 sqrt (phat*qhat/n), where phat = x/n or # of
successes/# of trails), qhat = 1 –phat or the complement
1. State the random variable and the parameter in words. X = # of successes P = proportion of successes
C) To determine the sampling distribution of phat, you need to show that nphat >/ 5, nqhat >/ 5
5. Real World: This is where you state what interval contains the true proportion.
Example #1: A concern was raised in Australia that the percentage of deaths of Aboriginal prisoners was higher than the
percent of deaths of non-aboriginal prisoners, which is 0.27%. A sample of six years (1990-1995) of data was collected,
and it would that out of 14,495 Aboriginal prisoners, 51 died (indigenous death in,” 1996) Find a 95% confidence interval
for the proportion of Aboriginal prisoner who died.
4. Statistical: There is a 95% chance that 0.002554 <P<0.004482 contains the proportion of Abor
prisoners who died.
5. Real World: The proportion of Abor prisoners who died is between 0.0026 and 0.0045.
Example #2: A researcher studying the effects of income levels on breastfeeding of infants hypothesizes those countries
where the income level is lower have a higher rate of infant breastfeeding than higher income countries. It is known that
in Germany, considered a high income country by the world banks, 22% of all babies are breastfeed. In Tajikistan,
considered a low income country by the World Bank, researchers found that in a random sample of 500 new mothers
that 125 were breastfeeding their infants. Find a 90% confidence interval of the proportion of mothers in low income
countries who breastfed their infants.
Solution: 1. X = # of women who breastfeed in low income country
P = proportion of woman who breastfeed in low income county
2. x = 125 & n-x = 500 – 125 = 375. Both are greater than 5 so its normal.
3. Excel. 0.219 < p <0.282
4. Statistical: there is a 90% chance that 0.219 <p<0.282 contains the proportion of women in low income
countries who breastfeed their infants.
5. Real World: The proportion of women in low income countries who breastfeed their infants is between 0.219
and 0.284
Example #3: A local country has a very active adult education venue. A random sample of the population showed that
189 out of 400 persons 16 years old or older participated in some type of formal adult education activities, such as basic
skills training, apprenticeships, personal interest courses, and part-time college or university degree programs. Estimate
the true proportion of adults participating in some kind of formal education program with 98% confidence.
Example #1: Suppose we select a random sample of 100 pennies in circulation in order to estimate the average age of all
pennies that are still in circulation. The sample average, in years, was found to be Xbar = 14.6. For the sake of this
example, let us assume that the population SD of ages is 4 years. Find a 95% confidence interval for the true average age
of pennies that are still in circulation.
Solution: Excel = NORM. INV (lower tail area, mean, sd) =NORM.INV (0.975,0,1) = 1.96
Example #1: Find the critical values –t a/2 and +t a/2 for a 90% confidence interval with n=10.
Solution: =T.INV (0.95,9)
Example #2: The rates of home ownership are normally distributed. A random sample of 8 states is listed below.
Estimate the population mean for the rate of home ownership with 99% confidence.
Solution: 1st, find the t critical value using DF = n-1 = 7 and 99% confidence, t a/2 =1- ( .10/2) = 0.995 =T.INV
(0.995,7) = 3.4995
Xbar +- t a/2, n-1 (s/sqrt (n) = 69.7125 +- 3.4995 (4.4483/sqrt (8) = (64.2088, 75.2162)
EXCEL directions: Type in the data, set data analysis tool under data tab, select descriptive statistics, click ok,
enter range, click summery statistics under options and confidence level for mean, Find lower limit by taking
mean - confidence level and upper limit by taking mean + confidence level