Sunteți pe pagina 1din 9

STAT 241 – Unit 5 Notes

Discrete random variable – a variable that has countable number of possible random values
Continuous random variable – a variable that has an infinite number of possible random values in an interval of numbers
Probability distribution – is an assignment of probabilities to the values of the random variable.
Discrete probability distribution has to satisfy two criteria:
The probability of X is between 0 and 1 (0</ P (X) </ 1)
The probability of all X values adds to 1. (EP (X) =1)
Random – means everyone has an equal chance of being chosen.
Systematic – list entire population, then randomly pick every nth
Stratified – population is spilt into groups, then random sample from each
Cluster – population is spilt into groups, and then one or more groups are chosen
Variance – is the average squared deviation of data from a mean
Standard Deviation is the average deviation of date from the mean
*The SD is the square root of variance.
Sample Standard Deviation =s and is the SD of a sample of n observations taken from a population.
Population Standard Deviation = O (sigma) and is the SD of the entire population which is usually unknown.
Sample Variance = S2
Population variance = O2

*the smaller SD or variance indicates more consistency in data.


*If two or more datasets have about the same mean and scale (unit) the its good enough to compare SD or variance to
compare consistency.
*If the datasets have different scales or means, it is not appropriate to compare SD or variance.

Sample Mean or xbar – is the average of a variable X of a sample. An estimate of population mean.

Population mean = u

Distribution of xbar is sampling distribution of xbar

Mean & Standard Deviation of a Sample Mean: Also called Sampling Distribution of the mean

 Let xbar be the mean of a random sample of size (n) from a population having mean (u) and SD (o), then
the mean of all possible values of xbar is ux =u
 This says that the mean of the sample mean is the same as the population mean
 The SD (standard error) of all possible values of xbar is ox = o (sd)/sqrt (n)

Central Limit Theorem Applet: http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Draw and Shade Applet: http://homepage.divms.uiowa.edu/~mbognar/applets/normal.html

Example: Let x be the height of men in the US. Studies show that the heights of 15-year old boys in the US are normally
distributed with average heights of 67 inches and a SD of 2.5 inches. A random experiment consists of choosing 16 15-
year old boys at random. Find the mean and SD of xbar, that is, the mean and SD for the average height of a random
sample of 16 boys.

Solution:

The mean of the sample means is the same as the population mean. Ux=u=67

The SD in the sample means the populations SD is divided by the square root of sample size = ox = o/sqrt (n) =
2.5/sqrt(16) = 0.625
*Notice that the mean of a sample mean is always the same as the mean of the population, but the SD is smaller.

Sampling Distribution of a Sample Mean – if a population is distributed N (mean, SD) then the sample mean (xbar)of n
independent observations has the N (mean, SD/sqrt(n)) distribution

Central Limit Theorem

Draw a random sample of size (n) from any population having population mean (u) and finite SD (O). When n is large
(n>/30), the sampling distribution of xbar is approx. normal where xbar – N (mean, SD/sqrt(n))

Central Limit Theorem - guarantees that the sample mean will be normally distributed when the sample size is large
(usually 30 or more) no matter what shape the population distribution is.

Z-Score = xbar – mean / SD/sqrt (n)

Using calculator = normalcdf (lower limit, upper limit, mean, SD)

 For the TI-89 the lower limit will always be –infinity

Calculator Steps: Press Apps, Scroll down to STATS/LIST EDITOR, Pres enter, then press F5 (Distr) and scroll down to 4
(Normal CDF), Press enter, then enter numbers.

3 types of population distributions: uniform, exponential, log-normal

Each population distributions have corresponding sampling distributions in size 2, 5, 12, 30


Example: The population of midterm scores for all students taking a PSU Business Statistics course has a known SD of
5.27. The mean of the population is 18.07 and the median of the population is 19. A sample of 25 was taken and the
sample mean was 18.07 and we want to know what the sampling distribution for the mean looks like.

Here are 3 graphs using the Sampling Distribution Applet.

A) What is the mean and SD of the sampling distribution?

Solution: By the CLT, the mean of the sampling (ux) equals the mean of the population. U = 18.07. SD of the
sampling by CLT = ox = SD/sqrt(n) = 5.27/sqrt(25) = 1.054

B) Would you expect midterm exam scores to be skewed or bell-shaped?

Solution: The population mean = 18.07. This is smaller than the median of 19. Therefore its negatively skewed,
the mean is pulled in the direction of the outliers.

C) Which of the graphs correspond to the distribution of the population, distribution of a single sample and the sampling
distribution of the mean?

Solution: The CTL is bell shaped which is graph 3. Graph 1 only has 25 samples which is the distribution of a
single sample. The distribution of the population would be graph 2 because it’s the last one left.

D) Find the probability that for next term’s class they have a sample mean of more than 20.

Solution: p (XBAR > 20) would be normally distributed with a mean = 18.07 and a SD of 5.27/sqrt (25) = 1.054

Drawing & Shading sample distribution: Excel = P (xbar >20) = 1-NORM.DIST (20, 18.07, 5.27/sqrt (25), true) = 0.0335.

Sampling Distribution of a Sample Mean – if a population is distributed N (mean, SD) then the sample mean (xbar)of n
independent observations has the N (mean, SD/sqrt(n))

Examples:
Let X be the height of 15-year old boys in the US. Studies show that the heights of 15-year old boys in the US are
normally distributed with an average height of 67inches and a SD of 2.5 inches. A random experiment consists of
randomly choosing 16 15 year old boys. Find the probability that the mean height of those sampled is 69.5 inches.

Solution: 1st find the sample mean = Sample mean (xbar) is approx. N (67, 0.625)

2nd find the probability of it being 69.5 inches.

P (xbar >/ 69.5) = P (xbar – mean / SD >/ new mean – old mean / SD = P (xbar – 67 / 0.625 >/ 69.5-67/0.625)

= P (z>/4) = 0.00003

Z-Score = x-mean/ SD - use when there is only one person

Using calculator = normalcdf (lower limit, upper limit, mean, SD)

 For the TI-89 the lower limit will always be –infinity

Calculator Steps: Press Apps, Scroll down to STATS/LIST EDITOR, Pres enter, then press F5 (Distr) and scroll down to 4
(Normal CDF), Press enter, then enter numbers.

Z-Score (CLT) = xbar – mean / SD/sqrt (n) - use when it’s a group of people

Excel =NORM.DIST(X, mean, SD/sqrt (sample size n), true)

P (X</ x) or P (X< x) – less than, below - =NORM.DIST (x, mean, SD, true)
P (x1 < X <x2) or P (x1</ X </ x2) – between - =NORM.DIST (x2, Mean, SD, true) – NORM.DIST (x1, Mean, SD, True)
P (X>/ x) or P (X>x) – greater than , top, more - =1-NORM.DIST (x, mean, SD, true)

Example #2

The average teacher’s salary in Connecticut is $57,337. Suppose that the distribution of salaries is normal with a SD of
$7500.

A) What is the probability that a randomly selected teacher makes less than $55,000 per year?

Solution: Find P (X<55,000) use Z-score

Z= X – Mean / SD = 55,000 – 57,337 / 7500 = -0.3116 or use the calculator with upper and lower limits.

=normalcdf (-infinity, upper limit, 57337, 7500) = 0.3777

*not this is only the probability for one person not a group of people

B) If we sample 100 teachers’ salaries’, what is the probability that the sample mean is less than $55,000?

Solution: Find P (xbar< 55000) this is a group so we use the CLT Z-score formula = xbar – mean / sd/sqrt (n)
Z = 55000 – 57337 / (7500/sqrt (100)) = -3.116 using excel to simplify
=NORM.DIST (55000, 57337, 7500/sqrt (100), true) = 0.0009167
*if z-score is less than -3 or greater than 3 it would be a rate event
Confidence Intervals:
Point Estimate for a Parameter – is a statistic.
100 (1 – a) % confidence interval has this chance of producing an interval containing the true value of the parameter.
Common confidence Intervals – 90% a =0.10 or 10%, 95% a=0.05 or 5%, 99% a=0.01 or 1%
Confidence level (level of confidence) -= 1-a
Alpha (a) – represents the probability that the parameter’s actual value is not captured in the interval, it is the
probability of being incorrect.

**when a symmetric distribution, such as a normal distribution is used, confidence intervals are ALWAYS of the form:
point estimate +- margin of error

Margin of error – defines the “radius” of the interval necessary to obtain the desired confidence level. It depends on the
desired confidence level.

**Higher levels of confidence come at a cost, such as larger margins of error and less accurate estimate**
Critical Value – which determines the number of SD based on the desired a level.

Confidence intervals – use 90% a =0.10 or 10%, 95% a=0.05 or 5%, 99% a=0.01 or 1%
Normal Sampling Distribution – Use a = 0.05 or 95%

Two Critical Values –Za/2 & +Za/2 shown below

If sample size is small (n<30), the population sampling is normal.


If sample size is large (n>/30), the CLT guarantees it to be normal.
Z a/2 = NORM.S.INV (1-(a/2))

EXCEL = NORM.INV (lower tail area, mean, SD)


P (X </ x) or P (X < x) – lower, bottom, below, less than - =NORM.INV (area, mean, SD)
P (x1 <X<x2) or P (X1</ X </ x2) – between - =NORM.IV (1-area/2, mean, SD)
P ( X >/ x) or P (X > x) – upper, top, more than - =NORM.INV (1-area, mean, SD)

Applet - http://www.rossmanchance.com/applets/ConfSim.html

Example #1: Use Excel or your calculator to find the critical values Za/2 for a 95% confidence interval.
Solution: = NORM.INV (lower tail area, mean, SD). Use the positive z-score – 1 –a/2 = 0.975
Excel NORM.INV (0.975, 0.1) = Z a/2 = 1.96
Example #2: State the statistical and real world interpretations of the following confidence intervals.

A) Suppose you have a 95% confidence interval for the mean age a woman gets married in 2013 is
26<u(mean)<28

Solution: Statistical Interpretation: There is a 95% chance that the interval 26 < u <28 contains the mean
age a woman gets married in 2013
Real World Interpretation: The mean age a woman married in 2013 is between 26 and 28 years of age

B) Suppose a 99% confidence interval for the proportion of Americans who have tried marijuana as of 2017 is
0.55 < p < 0.61

Solution: Statistical: There is a 99% chance that the interval 0.55<p<0.61 contains the proportion of
Americans who have tried marijuana as of 2017.

Real World: The proportion of Americans who have tried marijuana as of 2017 is between 0.55 and0.61

Confidence Level on Width:

Higher level of confidence makes a wider interval. Increase the confidence level, the width interval becomes wider

Sample Size for a Desired Margin of Error:

Sample Size for Estimating a mean – n = (Z a/2 * SD / E) ^2 *use this when you want to find inference for a mean

Example #1: A researcher is interested in estimating the average salary of teachers. She wants to be 95% confident that
her estimate is correct. In a previous study, she found the population SD was $1175, how large a sample is needed to be
accurate within $100.

Solution: 1st – find the Z a/2 for 95% confidence , Excel Z a/2 = 1.96

((1.96 * 1175) / 100) ^2 = 531

Sample Size for Estimating a Proportion - n = P * q * (Za/2 / E) ^2 * use this when you want to find inference for a
proportion

Example #2: A study found that 73% of prek children ages 3 to 5 whose mothers had a bachelor’s degree or higher were
enrolled in early childhood care and education programs.

A) How large a sample is needed to estimate the true proportion within 3% with 95% confidence?

Solution: n = 0.73 * 0.27 (1.96 / 0.03) ^2 = 841.3104 = 842

B) How large a sample is needed if you had no prior knowledge of the proportion?

Solution: n = 0.5 *0.5 (1.96/0.03) ^2 = 1067.1111 = 1068


Z-Interval for a Proportion (p):

Confidence Interval for 1 population proportion = (1-Prop Z-interval)

*Choose a simple random sample of size (n) from a population having unknown population proportion (p)

The 100(1-ox) % confidence interval estimate for p is given phat +-Za/2 sqrt (phat*qhat/n), where phat = x/n or # of
successes/# of trails), qhat = 1 –phat or the complement

Steps for the Confidence Interval:

1. State the random variable and the parameter in words. X = # of successes P = proportion of successes

2. State and check the assumptions for confidence interval

A) A simple random sample of size n is taken

B) The condition for the binomial distribution are satisfied

C) To determine the sampling distribution of phat, you need to show that nphat >/ 5, nqhat >/ 5

3. Find the sample statistic and the confidence interval

Sample Proportion: phat = x/n

Confidence Interval: phat +- Za/2 sqrt (phat *qhat/n)

4. Statistical interpretation: in general this looks like

There is a 100(1-a) % chance that phat-Za/2 sqrt (phat*qhat/n) <p<phat+Za/2sqrt (phat*qhat/n)


contains the true proportion.

5. Real World: This is where you state what interval contains the true proportion.

Example #1: A concern was raised in Australia that the percentage of deaths of Aboriginal prisoners was higher than the
percent of deaths of non-aboriginal prisoners, which is 0.27%. A sample of six years (1990-1995) of data was collected,
and it would that out of 14,495 Aboriginal prisoners, 51 died (indigenous death in,” 1996) Find a 95% confidence interval
for the proportion of Aboriginal prisoner who died.

Solution: 1st state the random variable & parameter in words


X= # of Abor prisoners who died, p = proportion of Abor prisoners who died

2. State & check the assumption for a confidence interval


C) X =51 and n-x = 14495-51=14444

3. Sample proportion: phat=x/n=51/14495 = 0.003518


Confidence interval: Za/2 = 1.96, since 95% confidence level
Margin of Error=Za/2sqrt (phat*qhat/n) = 1.96*sqrt (0.003518(1-0.003518)/14495 =0.000964
Phat – E < P < Phat + E = 0.003518 – 0.000964 < P < 0.003518 + 0.000964 = 0.002554 < P <0.00482

4. Statistical: There is a 95% chance that 0.002554 <P<0.004482 contains the proportion of Abor
prisoners who died.
5. Real World: The proportion of Abor prisoners who died is between 0.0026 and 0.0045.
Example #2: A researcher studying the effects of income levels on breastfeeding of infants hypothesizes those countries
where the income level is lower have a higher rate of infant breastfeeding than higher income countries. It is known that
in Germany, considered a high income country by the world banks, 22% of all babies are breastfeed. In Tajikistan,
considered a low income country by the World Bank, researchers found that in a random sample of 500 new mothers
that 125 were breastfeeding their infants. Find a 90% confidence interval of the proportion of mothers in low income
countries who breastfed their infants.
Solution: 1. X = # of women who breastfeed in low income country
P = proportion of woman who breastfeed in low income county

2. x = 125 & n-x = 500 – 125 = 375. Both are greater than 5 so its normal.
3. Excel. 0.219 < p <0.282
4. Statistical: there is a 90% chance that 0.219 <p<0.282 contains the proportion of women in low income
countries who breastfeed their infants.
5. Real World: The proportion of women in low income countries who breastfeed their infants is between 0.219
and 0.284

Example #3: A local country has a very active adult education venue. A random sample of the population showed that
189 out of 400 persons 16 years old or older participated in some type of formal adult education activities, such as basic
skills training, apprenticeships, personal interest courses, and part-time college or university degree programs. Estimate
the true proportion of adults participating in some kind of formal education program with 98% confidence.

Solution: X = # of adults participating in formal education program


P = proportion of adults participating in formal education program
2. x = 189 and n-x = 400-189 = 211. Both are greater than 5.
3. Excel. 0.414 < p < 0.531
4. Statistical: There is a 98% chance that 0.414 < p <0.531 contains the proportion of adults participating in formal
education programs.
5. Real World: The proportion of adults participating in formal education programs is 41% and 53%.

T-Interval for a mean Using Excel

Confidence interval of a Population mean – Xbar +- Z a/2 (SD /sqrt (n))


Point Estimate = Xbar
Margin of Error = Z a/2 (SD/sqrt (n))

Example #1: Suppose we select a random sample of 100 pennies in circulation in order to estimate the average age of all
pennies that are still in circulation. The sample average, in years, was found to be Xbar = 14.6. For the sake of this
example, let us assume that the population SD of ages is 4 years. Find a 95% confidence interval for the true average age
of pennies that are still in circulation.

Solution: Excel = NORM. INV (lower tail area, mean, sd) =NORM.INV (0.975,0,1) = 1.96

Xbar +- Z a/2 * (SD/sqrt (n)


= 14.6 +- 1.96 (4/sqrt (100))
= 14.6 +- 0.784
= (13.816, 15.384) or 13.816 < u < 15.384

T distribution – is another symmetric distribution for a continuous random variable.


Properties:
1. Symmetric, bell shaped
2. Centered at the mean u = median = mode = 0
3. The spread of a t distribution is determined by the degrees of freedom = df which are determined by the
sample size – 1
4. As the degrees of freedom increase, the T distribution approaches the standard normal curve.
5. The total area under the curve is equal to 1 or 100%

DF (degrees of freedom) = n-1

T Interval for a mean:


Population Mean where SD is unknown = Xbar +- t a/2, n-1 (S/sqrt (n))
Point estimate = Xbar
Margin of error = t a/2, n-1 (s/sqrt (n))
EXCEL = T.INV (lower tail area, DF)

Example #1: Find the critical values –t a/2 and +t a/2 for a 90% confidence interval with n=10.
Solution: =T.INV (0.95,9)

Example #2: The rates of home ownership are normally distributed. A random sample of 8 states is listed below.
Estimate the population mean for the rate of home ownership with 99% confidence.

66.0, 75.8, 70.9, 73.9, 63.4, 68.5, 73.3, 65.9

Solution: 1st, find the t critical value using DF = n-1 = 7 and 99% confidence, t a/2 =1- ( .10/2) = 0.995 =T.INV
(0.995,7) = 3.4995

2nd, find the sample mean and sample SD

Xbar +- t a/2, n-1 (s/sqrt (n) = 69.7125 +- 3.4995 (4.4483/sqrt (8) = (64.2088, 75.2162)

EXCEL directions: Type in the data, set data analysis tool under data tab, select descriptive statistics, click ok,
enter range, click summery statistics under options and confidence level for mean, Find lower limit by taking
mean - confidence level and upper limit by taking mean + confidence level

S-ar putea să vă placă și