Sunteți pe pagina 1din 15

Sampling distribution, interval estimation and hypothesis testing (Chapters 7, 8 and 9)

Review The numerical characteristic of a sample is called__________________________________ The numerical characteristic of a population is called_______________________________ We now want to estimate UNKNOWN population parameter with a KNOWN sample statistic Examples We use sample To estimate population

Simple random sample: Finite population A sample of size n selected from a finite population of size N such that each possible sample of size n has the same probability of being selected. The random numbers can be generated using a computer. Read on Sampling without replacement and sampling with replacement Point estimators How do we estimate population parameters from sample statistic? If we want to estimate population mean from sample mean X we follow the following steps 1) Take a simple random sample of size n from a population of size N 2) Calculate the sample mean X 3) X becomes the point estimator of . Similarly, s is the point estimator of and p becomes the point estimator of p

Example: A simple random sample of 5 months of sales data provided the following information Month 1 2 3 4 Sales 94 100 85 94 Develop point estimators of population mean and standard deviation. 5 92

Sampling distribution The point estimators are random variables (we dont know their values until the experiment are done). They thus have a distribution. The distribution of sample statistics such as p , X and s is called sampling distribution. The knowledge about sampling distribution of sample statistics and its properties will enable us to make probability statement about how close the sample statistic is to population parameter In Sampling distribution, we care about 3 issues: a) Sampling distribution of the sample mean X The mean of the sample mean, X , is written as We say that .is unbiased estimator of . b) Standard deviation of the sample mean, X This is written asand is also called the standard error of the X If either Population infinite (large) or sample size is less or equal to 5% ( standard error= If the population is finite (small), that is
n > 0.05 , then, the standard error, N n 0.05 ), then, N

c) The shape
(i)

If the population has a normal distribution or approximately normal distribution, then, X is also normally distributed or approximately normally distributed.

(ii)

If the population is NOT normally distribution, we use central limit theorem (CLT) to identify the shape of sampling distribution of X . CLT states that in selecting random samples of size n from a population of size N, the sampling distribution of the sample mean X becomes approximately normal as the sample size, n, increases (usually when n30)

Example: Suppose you have a population of 4 scores consisting of 2, 3. 4 and 5. We draw all possible samples of 2 (with replacement) from this population. We can ascertain sampling distribution using this finite/small population as follows: (DO THIS ON THE EMPTY BACK PAGE)

Standardizing the sample mean, X If X is normal or approximately normal, we can compute the z (standardize X ) as follows

Example Automobile service time at the garage takes an average of 120 minutes with a standard deviation of 24 minutes. Assuming the population is normal, compute the standard error if the sample size is (i) 9 automobiles

(ii) 36 automobiles

(iii) 64 automobiles

(iv)What is the relationship between the standard error and the sample size?

(v) Using a sample size of 9, -what is the probability that a service will take over 135 minutes? -Whats the probability that the sampling error is within 18 minutes?

Sampling distribution of sample proportion, p


p is the point estimator of p p=

1 In estimating p , we assume x is binomial so that p = .(binomial ) .. Again, we are interested n

in the mean, standard deviation and shape of p (i)Mean

(ii)Standard deviation

(iii)Shape

Example The government reports indicate that 30% of companies in US are owned by women. We take a sample of 250 companies. (i)Is p approximately normally distributed? (ii)What id the probability that more than a quarter of the companies is owned by women? (iii) Find the probability that the sample proportion is within 6% of the population proportion?

What are the properties of point estimators? -Unbiased (expected value of sample statistic=population parameter) -Efficient (small standard error) -Consistent (A large sample size is a better point estimator than a small sample size) Confidence intervals of the population mean (Chapter 8) In point estimation, we compute a single statistic value to estimate the population parameter. We want to estimate the populations mean, , using an interval estimate. Interval estimate is more accurate than point estimates Confidence interval (CI) = point estimate Margin of error For , CI= For p, CI= (A) Estimating CI of when is KNOWN: Use Z Margin of error=significance coefficient*Standard error. Assumptions of CI are -We have a simple random sample - is known -Population is infinite and approximately normal Example 1 The average age of 49 randomly selected people in 42. The population standard deviation is 12.6 years. Give a 90%, 95% and 99% confidence intervals of the population mean. Example 2 A simple random sample of 60 items resulted in a sample mean of 80. The population standard deviation is 15 -Compute the 95% CI for the population mean -Suppose the sample size increased to 120. What is the effect on the interval estimate? Example 3

We want to estimate the average starting salaries at a top law firm to be within $1000. How large should our sample be if preliminary studies indicate that of the salaries is $8500? Use 94% CI

(B) Estimating CI of when is UNKNOWN: Use t statistic If is unknown, use the use the sample to get and . In this case, the sample mean follows t distribution NOT normal distribution. The features of t distribution are -Roughly bell-shaped just like z -It has larger tails than z but is symmetrical -The larger the degrees of freedom, the closer t is to z The values of t are determined by (i) degrees of freedom, df and (ii) significance level Example 1 t 0.01, 12= t 0.05,41= t 0.025, 23= The CI using t:

Example 2 A sample of 40 items has a standard deviation of 11 and a mean of 70. Find the 95% confidence interval. Example 3 A random sample of yogurt has the following calories 75, 81, 103, 93, 71, 86. Give the 99% CI for the average calories content of all yogurts if calories are normally distributed.

Chapter 9: HYPOTHESIS TESTING, TYPE I AND II ERRORS AND USE OF P-VALUES Big Picture: In hypothesis testing, we use sample statistics to decide which of two statements about a population/model parameter or property is the more "reasonable". These hypotheses are logically complementary such that one of the two must be true and the other false. (i) Benchmark or null hypothesis (Ho). This is an assumption to be challenged. It represents our default position on the issue in question, barring strong evidence to the contrary. It is the status quo, "take no action", "things are as we expected", "the product meets its claimed specification", we always assume that hypothesis. Ho: What we assume as true unless proved otherwise. -Always use =, or (ii) Alternative or research hypothesis (Ha). It is the there is a new trend, "do things differently", "different thinking is required", "the manufacturer's claim is not supported", we want to test hypothesis. Ha: -What we are trying to prove. We gather sample data to provide evidence that it is true. -This is the more important hypothesis. Example Ho: A person is guilty Ha: A person is guilty Note that Ho and Ha are complementary.

We have 3 possibilities: Left tail Ho: o Ha: < Right tail Ho: o Ha: > o Two tail Ho: = o Ha: o

Thus the aim in hypothesis testing is to decide whether or not the sample data is consistent with what we should see/status quo/current assumption if the null hypothesis were true. If the answer is no, we say that there is a statistically significant difference between what the null hypothesis claims and what the sample seems to indicate. We thus reject the Ho since the discrepancy/difference observed is too significant to simply be due to chance or sampling error. Which way we go, of course, depends on the level of significance of the test -- whether tied to critical values of the test statistic or compared to the pvalue of the test. Explain level of significance (denoted as ).

Type I and type II errors


As with confidence interval procedures, hypothesis testing includes measures of reliability -- the probabilities of making one of the 2 possible wrong decisions/errors. Here is a chart of the 4 decision possibilities: H0 is true Reject H0 Dont Reject H0 1 2 H0 is false 3 4

Example: Ho: A person is innocent Ha: A person is not innocent (is guilty) What are the type I and type II errors
b) Having decided on a rejection rule, what is the probability of each of the 4 outcomes above? 1. 2. 3. 4.

What is P-value? What is its use in decision making?

Why dont we always just set /significance level to be very, very low?

To find the p-value of the test -- and make the decision -- we turn the relevant sample statistic(s) into a test statistic whose probability distribution we know ( z,t,
2

, or F ).

We use the corresponding tables to get actual/critical values of these probabilities. Here is a useful, if somewhat arbitrary, table of how to use p-value to gauge significance:

p-value p > 0.10 0.05 < p 0.10 0.01 < p 0.05 p 0.01 In summary, we have 2 decision rules: (i)P-value approach:

Evidence Against H0 Weak or none (not significant) Moderate Strong Very strong (very significant)

-If computed p-value/observed significance level<desired /stated significance level, (Statistically significant), Reject the Ho. That is, how much risk of a Type I mistake you can tolerate. Rejecting true H0 could lead to an incorrect and expensive business decision. - If computed p-value/observed significance level<desired /stated significance level, fail to reject Ho (Statistically insignificant). (ii)The critical value approach: -If the critical value (from the tables) <computed statistic, reject Ho (statistically significant) -If Critical values>computed statistic. fail to reject Ho (statistically insignificant) Steps in Hypothesis Testing (i) Develop Ho and Ha [one tail (use inequality signs) or two tail test (uses equality sign)] (ii) Specify significance level () (iii)Collect the sample data and compute the test statistic. - If is known, use z test statistic where z= - If is unknown, use t test statistic where t= (iv)Use the P-value or critical value approach to make a decision (reject or fail to reject)

(v) Draw your conclusion.

Example 1 For the US, the mean monthly internet bill is $32.79 per household. A sample of 50 households in a southern state showed a sample mean of $30.63. Use a population standard deviation of $5.60. Formulate hypotheses and test whether the sample data support the conclusion that the mean monthly internet bill in the southern state is less than the national mean of $32.79. Use =0.05 Example 2 Annual per capita consumption of milk is 21.6 gallons (statistical Abstract of US, 2006). Being from the mid-west, you believe milk consumption is higher there and wish to support your opinion. A sample of 16 individuals from the mid-western town of Webster city showed a sample mean annual consumption of 24.1 gallons with a sample standard deviation of 4.8. Develop a hypothesis and test whether the mean annual consumption in the Webster city is higher than the national average. Use =0.05. Example 3 CCN and Actmedia provided a television channel targeted to individuals waiting in supermarket checkout lines. The channel showed news, short features and advertisements. The length of the program was based on the assumption that the population mean time a shopper stands in the supermarket checkout line is 8 minutes. A sample of 120 shoppers showed a mean time of 8.5 minutes while the population standard deviation is assumed known to be 3.2 minutes. Formulate the hypothesis for this application. Test this assumption and determine whether actual mean waiting time differs from this standard. Use =0.05 Example 4 A manufacturer of paints claims that the paint covers 380 sq fit. A random sample of 8 cans covers the following area 331 312 350 347 303 412 365 420 Does this prove the average is not 380 sq ft? Use =0.01.

Power of the Test Recall: Type I error= Type II error= When we fail to reject Ho, we need to analyze type II error and the power of the test. Probability of type II error= Power of the test= The power of the test is the probability of correcting rejecting Ho when it is false (Probability of making the right decision when Ho is false. Power Curve=Power of a particular test for every

Example A car maker claims that the new car can do 26 MPG. A consumer group takes a sample of 30 cars and want to test at =0.05 if the MPG is below 26. The is assumed known to be 1.4 MPG. Find the probability of type II error and the power of the test assuming (i) (ii) = 25.8 = 25

S-ar putea să vă placă și