Documente Academic
Documente Profesional
Documente Cultură
Semester 1, 2013
Ira M. Anjasmara
Sample distribution
Recall : a sample is a selection from a population that is deemed to be representative of that population. Many samples can be taken from the same population, and each sample should be unbiased: i.e., collected in a random fashion. Then, statistics can be calculated from each sample: e.g., the mean, variance, standard deviation. We wish to infer information about the population based on all the samples, rather than just one.
2/67
Hypothesis Testing
If many dierent samples are selected from the same population we get a range of dierent sample means. For example, suppose we are interested in the mean height of Indonesian women: a sample of 100 women will give us a sample mean, but in general, this will be dierent to the true (population) mean. Another sample of 100 women will give a dierent sample mean, also dierent to the true mean. All the possible sample means taken from a population have their own distribution, known as the sampling distribution of the mean.
3/67
Hypothesis Testing
The sampling distribution of the mean is the probability distribution for all possible values of the sample mean, x . If we take enough samples, the mean of these sample means will tend towards the population mean, i.e.: E [ x] = or, the expected value of x = the population mean. (1)
4/67
Hypothesis Testing
where is the standard deviation of the population being sampled, n is the sample size, and N is the population size (often unknown).
5/67
Hypothesis Testing
If n/N < 0.05, i.e. the sample size is much smaller than the population size, then the eq. (2) reduces to: x = n From now on, we will assume that the n/N condition is always met. (3)
6/67
Hypothesis Testing
Assume that many random samples of sizen = 49 are to be taken from a large population with mean = 100 and standard deviation = 21. What are the mean and standard deviation of the values of all the sample means? We know that repeating the sampling process will generate dierent sample means due to the dierent samples selected. The mean of these x values is E [ x] = = 100. Since the population is large relative to the sample, the standard deviation of the x values is: 21 x = = = 3 n 49
7/67
Hypothesis Testing
8/67
Hypothesis Testing
Figure 1 : Illustration of the Central Limit Theorem for the Three Population (from Anderson et al. (1991). Introduction to Statistics, West Co.)
Statistics dan Probability 9/67 Hypothesis Testing
Since this distribution has a shape that cannot be represented by a mathematical equation, we cannot nd the answer to the question: What is p(x < 1340)? (The red-shaded area).
10/67
Hypothesis Testing
Now we can ask: What is p(x < 1340)? (The red-shaded area).
Statistics dan Probability 11/67 Hypothesis Testing
We can use the sampling distribution of the mean to compute the probability of selecting a sample that will provide a value of x within any specied distance from the population mean. If we approximate the sampling distribution of x to a normal distribution, through the central limit theorem, we may compute a z -value, where: z= x x
12/67
Hypothesis Testing
We can use the sampling distribution of the mean to compute the probability of selecting a sample that will provide a value of x within any specied distance from the population mean. If we approximate the sampling distribution of x to a normal distribution, through the central limit theorem, we may compute a z -value, where: z= x x
Note that here the z valued is dened by using x rather than x and the standard error of the mean rather than standard deviation.
13/67
Hypothesis Testing
14/67
Hypothesis Testing
15/67
Hypothesis Testing
From the standard normal tables, the area enclosed between z =0 (the mean) and z =1.53 is A=0.4370:
So the area we want (shaded) is: 0.5-0.4370 = 0.0630 i.e., p( x > 55) = 0.0630
16/67
Hypothesis Testing
Hypothesis Testing
This is a process for making a statistical decision based on information contained in the sample. A statistical hypothesis is an assumption, statement or question concerning one or more populations, which may or may not be true. The truth of the assumption can only be known for certain if we examine the whole population, which is impractical. Thus, the aim of hypothesis testing is to decide whether the assumption is true based on random samples.
17/67
Hypothesis Testing
Hypothesis Testing
We make an assumption about the value of a population statistic (e.g., mean, variance):
this is called the null hypothesis it is denoted H0 .
We then test the null hypothesis, usually through experiment on a sample. If the results of the test (based on the sample) are inconsistent with the null hypothesis then we reject H0 . If the results are consistent with the null hypothesis, then this does not necessarily imply that H0 is true, or that we accept it, only that there is insucient evidence to reject H0 .
Statistics dan Probability 18/67 Hypothesis Testing
Hypothesis Testing
The null hypothesis is phrased so that the status quo is preserved, i.e., things dont change. For example, in (Western) criminal law, a defendant on trial for a crime is innocent until proven guilty. Therefore, we make: H0 : innocent i.e., the null hypothesis states that he will still be innocent after the trial, preserving the status quo. Think in terms of a courtroom: the null hypothesis is like the defence lawyer, pleading innocence; the alternative hypothesis is like the prosecution lawyer, attempting to prove guilt.
19/67
Hypothesis Testing
Hypothesis Testing
When testing a hypothesis, the aim is to use sample data to refute the null hypothesis (and not to prove the alternative hypothesis). Then, if there is any doubt as to the validity of the alternative hypothesis, we revert back to the null hypothesis. In mathematical terms, the null hypothesis is a statement of a population, and not sample, statistic, because the population statistic is a representation of the accepted wisdom, while the sample statistic is a representation of the new evidence. In this chapter, null hypothesis statements will concern (and never x ).
20/67
Hypothesis Testing
Accept H0 Reject H0
A type II error is more dicult to detect than a type I error. Recall, this is when we accept H0 when it is false (i.e., we free the guilty man). In an experiment, if we accept H0 it may be because H0 is actually true, but it may also be because we did not have enough evidence to reject it. This latter case is like the police bungling an investigation, so that the jury have no choice but to free the guilty man.
Statistics dan Probability 22/67 Hypothesis Testing
23/67
Hypothesis Testing
24/67
Hypothesis Testing
Signicance levels
Type I errors are controlled by setting the level of signicance () for the test: = p (type I error occuring) i.e., there is an chance that we mistakenly reject H0 . The value of gives the area under the probability distribution curve corresponding to the probability of making a type I error. An example for the normal distribution is:
25/67
Hypothesis Testing
Signicance levels
As a null hypothesis can never be rejected with 100% certainty, we test at various levels of signicance: a small value of means a small chance of making the wrong decision, and thus a large chance of making the right decision. Suppose we reject H0 at = 0.01. This is a more signicant result than if H0 were rejected at = 0.05, because 0.01 represents only a 1% chance of mistakenly rejecting it, whereas 0.05 represents a 5% chance of mistakenly rejecting it. Bear in mind, though, that of all the tests being carried out around the world at the 0.05 level, 5% of them result in a false rejection of the null hypothesis. The choice of value of is subjective, but should always be greater than zero. The chosen value should be at most 0.1, but the most popular choice is 0.05.
Statistics dan Probability 26/67 Hypothesis Testing
formulate hypotheses determine number of tails determine signicance levels determine critical z value determine rejection region determine test statistic peform the test draw conclusion
27/67
Hypothesis Testing
28/67
Hypothesis Testing
The number of tails of a test comes from a graphical representation of the hypothesis:
30/67
Hypothesis Testing
It is known that a certain quantity has a value 0 . Recent tests nd that this quantity actually has a value x . For a 1-tailed test: if the value of x < 0 , then have Ha : < 0 if the value of x > 0 , then have Ha : > 0 For a 2-tailed test always have: Ha : < 0 In general: if in doubt, use a 2-tailed test.
31/67
Hypothesis Testing
32/67
Hypothesis Testing
Use the value of to get a value of z from the normal tables. i.e., what value of z gives an area in the rejection region of ? This critical value will be used to test the null hypothesis see Step 7. For a given value of the value of z will depend on whether we have a 2-tailed or 1-tailed test: in a 1-tailed test, all of is in one rejection region, so nd z in a 2-tailed test, is split into two rejection regions, each one with area /2, so nd z/2
33/67
Hypothesis Testing
The boundary of the rejection region is determined by the value of z . Its location is determined by the form of the alternative hypothesis (<, >, or = ):
34/67
Hypothesis Testing
1-tailed:
H0 : 0 Ha : > 0
1-tailed:
H0 : 0 Ha : < 0
35/67
Hypothesis Testing
2-tailed:
H0 : = 0 Ha : = 0
36/67
Hypothesis Testing
x x
(4)
Recall, x is the mean of the sample taken to test the hypotheses, is the population mean, and x = / n where is the population standard deviation. NB: If is unknown, then as long as n 30, we may use the sample standard deviation (s) as an approximation.
37/67
Hypothesis Testing
Compare the test statistic against its critical value. i.e., plot the position of z on the z -axis, and check its position relative to z and the rejection region: if z lies in the rejection region, reject H0 ; if z does not lie in the rejection region, do not reject H0 . Always state the signicance level at which you make your decision.
38/67
Hypothesis Testing
39/67
Hypothesis Testing
Always refer back to the wording of the original problem: do not just leave the answer as reject H0 ; and always include the signicance or condence level in your answer.
40/67
Hypothesis Testing
Example 1
A count of vehicles travelling past a point on Albany Highway in 30 seconds (during peak hour) is supposed to be about 25 vehicles with a standard deviation of 4.3 vehicles. When a further 40 measurements were taken it was found that the mean was 23.5 vehicles per 30 seconds. Can we be 99% certain that the vehicular trac is less than 25 vehicles per 30 seconds? Take 25 vehicles as the population mean. We therefore want to test whether the sample mean from the new data (23.5) indicates that this value is too high. We have: = 25, = 4.3, x = 23.5, n = 40, = 0.01.
41/67
Hypothesis Testing
Example 1
Step 1 Formulate alternative hypothesis: Ha : < 25 i.e., test whether the true population mean is actually less than the established value. Formulate null hypothesis: H0 : 25 i.e., assume the given population mean is correct, and the sample data are misleading. Step 2 Determine number of tails. This is a 1-tailed test, because the null hypothesis has an inequality.
42/67
Hypothesis Testing
Example 1
Step 3 Determine level of signicance: We are told that the condence level is 99%, therefore = 0.01. Step 4 Determine the critical value of z: We have a 1-tailed test, so we need to nd z = z0.01 From the standard normal distribution table, we have: z0.01 = z (0.5 0.01) = z( 0.49) =2.33
43/67
Hypothesis Testing
Example 1
Step 5 Determine the rejection region: The null hypothesis will be rejected if < 25, so we have the following situation:
Since we are testing < 25, we are in the LHS of the normal curve, therefore the rejection region is z < 2.33.
Statistics dan Probability 44/67 Hypothesis Testing
Example 1
Step 6 Determine the test statistic (z -score) from the sample data: z= x 23.5 25 = 4.3 = 2.21 x / 40 (5)
Step 7 Compare the test statistic against its critical value: 2.21 > 2.33, therefore z , and hence x , the sample mean, do not lie in the rejection region. Hence, we do not reject H0 at the 0.01 signicance level. Step 8 Our sample measurement is compatible with the supposed population mean at 99% condence level. Therefore it follows that the true mean is not less than 25 vehicles per 30 seconds.
Statistics dan Probability 45/67 Hypothesis Testing
Exercise
The value of a well-observed angle was known to be 30 15 30 . A new theodolite was tested against this angle for calibration. A sample of 36 arcs produced a mean of 30 15 32 , with an SD of 6. Is this value signicantly dierent from the standard value at the 5% level of signicance? Take 30 15 30 as the population mean. We therefore want to test whether the sample mean from the new data (30 15 32 ) indicates that this value is incorrect. We have: = 30 15 30 , s = 6, x = 30 15 32 , n = 36, = 0.05.
46/67
Hypothesis Testing
Condence Interval
The degree of condence is dened as: 1 It is usually expressed as a percentage, called the condence level (CL). Condence is the probability that the mean (sample or population) lies within a condence interval (CI). The condence interval represents the region within which (1 )% of all the sample means will lie: CI = z/2 x or p( z/2 x + z/2 x x ) = 1
47/67
Hypothesis Testing
Condence Interval
CI also represents the region in which we are (1 )% likely to nd the population mean, : CI = x z/2 x or p( x z/2 x + z/2 x x ) = 1 This means that we dont need to know in order to determine the condence interval. For instance, for = 0.05, the CL = 95%, and z/2 = 1.96:
48/67
Hypothesis Testing
Condence Interval
The quantity z/2 x is called the margin of error. This is not the precision of the data (which is just x ); rather it gives the maximum allowable error. It is obviously desirable to have a low margin of error, because a low margin of error indicates that we have pinned down the mean quite precisely. However, a low margin of error implies a low z/2 , and thus a low condence level. Conversely, a high condence percentage gives a large margin of error (because z/2 is larger). So having high condence does not imply that we have good data: it just means that we have allocated a wider range in which to place the measurement.
49/67
Hypothesis Testing
Since the condence interval can give a range of values where we are (1 )% likely to nd or x , we can use it to perform a 2-tailed hypothesis test (but not 1-tailed). If signicance is the probability of rejecting the null hypothesis when it is actually true (making a type-I error), then condence can be thought of as our degree of certainty in making the correct decision. Remember, a low value of means a low probability of mistakenly rejecting H0 , and therefore a high condence in making the right decision. Conversely, a higher value of means a higher probability of mistakenly rejecting H0 , and therefore a lower condence in making the right decision.
50/67
Hypothesis Testing
Consider an extreme example, where = 0. In a hypothesis test we have a zero probability of making a mistake (type I error), and we are extremely condent of our decision, with a condence level of 100%. However, dont be fooled: the critical value is z/2 = , so the margin of error is innite. So even if x was a long long way from we still wouldnt reject H0 . In fact we could never reject H0 . Obviously, real-world examples would never have a zero signicance level. But this example above shows that as decreases, it becomes harder to reject H0 .
51/67
Hypothesis Testing
Now consider: H0 : = 0 Ha : = 0 where 0 is the hypothesized value for the population mean. By analogy with the 8 steps to hypothesis testing for a 2-tailed test, we can see that the do-not-reject H0 region is given by: 0 z/2 So if the sample mean does not fall in this region, we must reject H0 .
52/67
Hypothesis Testing
Example 1
The value of a well-observed angle was known to be 30 15 30 . A new theodolite was tested against this angle for calibration. A sample of 36 arcs produced a mean of 30 15 32 , with an SD of 6. Is this value signicantly dierent from the standard value at the 5% level of signicance? We have: = 30 15 30 , s = 6, x = 30 15 32 , n = 36, = 0.05. So, the hypotheses are H0 : = 30 15 30 Ha : = 30 15 30 We can use the condence interval method because this is a 2-tailed test. The critical value of z is: z/2 = z0.025 = 1.96 The standard error of the mean is: x = 6 / 36 = 1
Statistics dan Probability 53/67 Hypothesis Testing
Example 1
So the condence limits are: z/2 x = (1.96 1 ) = 1.96 And the condence interval about the population mean is: CI=30 15 30 1.96 = 30 15 28.04 to 30 15 31.96
The sample mean (30 15 32 , the green arrow) does not lie within this interval. Hence, we reject H0 with 95% condence. Our sample measurement is incompatible with the supposed population mean at 0.05 signicance. Therefore it follows that the true mean is not 30 15 30 at this level.
Statistics dan Probability 54/67 Hypothesis Testing
Example 2
What if we do the test at 0.01 level of signicance (99% condence)? The critical value of z is now : z/2 = z0.05 2.58 And the condence limits are: z/2 x 2.58 The condence interval about the population mean is now: CI=30 15 30 2.58 = 30 15 27.42 to 30 15 32.58
The sample mean (30 15 32 , the green arrow) does lie within this interval. Hence, we do not reject H0 with 99% condence. Our sample measurement is now compatible with the supposed population mean at 0.01 signicance. Therefore it follows that the true mean is 30 15 30 at this level.
Statistics dan Probability 55/67 Hypothesis Testing
Conclusion
The data havent changed, but the outcomes have. In Example 1 we rejected H0 , but in Example 2 we didnt. Whereas in Example 1 we had a 5% chance of mistakenly rejecting H0 , in Example 2 we only had a 1% chance of mistakenly rejecting it. So in Example 2 we accepted H0 , not because the data got better or worse, but because we were allowed more freedom via a larger margin of error. In terms of statistical theory, while setting a low signicance level means a low probability of mistakenly rejecting H0 (making a type I error), it raises the probability of making a type II error, i.e., mistakenly accepting H0 (and thus accepting any old rubbish!).
56/67
Hypothesis Testing
P-Values
You might ask, then, at what signicance (or condence) level do we only just reject H0 ? This is measured by the P-value: The P-value is the probability, if H0 were true, that the test statistic would take a value at least as extreme as that actually observed. It can be loosely thought of as the probability that a variable would assume a value greater than or equal to the observed value strictly by chance, assuming that H0 is true. Or even more loosely as the statement theres only a P chance that this result could have happened by coincidence.
57/67
Hypothesis Testing
P-Values
Suppose you took a random sample from a normally-distributed population with mean 0 . You would not, in general, expect the mean of this sample, x 0 , to be exactly equal to 0 , but if you are pretty sure that the population mean really is 0 (i.e., the null hypothesis), then x 0 shouldnt be too far o. Now suppose that you found that x 0 was much greater than 0 . This could be due to one of two reasons. Either:
1 2
the null hypothesis is false; or, the result is just a uke, a coincidence of your random sampling happening to choose data that had a large mean; in this case we would not want to change the null hypothesis.
What the P-value does is to help us choose between these two options.
58/67
Hypothesis Testing
P-Values
The P-value is the chance of obtaining your results if the null hypothesis is true. Or, in terms of our example: P = p( xx 0 ), given a true H0 Put another way, the P-value tells us that random sampling from the population would lead to a value larger than x 0 in 100P% of experiments, and smaller than x 0 in 100(1-P)% of experiments. The smaller the P-value, the less likely it is that the result we got was simply the result of chance, and more likely that the result indicates that the null hypothesis is incorrect. Suppose we had found that P = 0.3. This means that if the experiment was performed 100 times, then we would be allowed to observe a sample mean of at least x 0 30 times, even though the true mean is 0 . So getting a value of x 0 would not be that unusual, and we would not need to change the null hypothesis.
Statistics dan Probability 59/67 Hypothesis Testing
P-Values
However, if we had found that P = 0.01, then if the true mean were 0 , getting a result of x 0 purely by chance would only happen once out of every 100 trials, which is quite rare. The low P-value, therefore, suggests that our result is not likely to be the product of pure chance, and it is very likely that that the H0 assumption (mean = 0 ) is wrong. In summary (loosely speaking): P-value high low Fluke? yes - a uke no - a good measurement H0 decision do not reject reject
For neither large nor small P-values, any decision on the null hypothesis should be made by assigning a signicance level.
60/67
Hypothesis Testing
Example 1a
A certain quantity has an accepted value (population mean) of 648 with a (population) standard deviation of 20.46. A new experiment of 36 observations nds that the (sample) mean is 655. What is the probability of observing a sample mean of 655 or larger from a population with a true mean of 648? We can quickly nd that z = 2.053, and therefore P = p (x 655)= 0.02.
This small P-value implies that we would be unlikely (2 times out of every hundred) to measure a value of 655 purely by chance if the true mean really is 648. This suggests that the null hypothesis (H0 : 648) might not be true.
Statistics dan Probability 61/67 Hypothesis Testing
Example 1b
However, suppose the new experiment found the sample mean to be 651. The z-score is now 0.88, and P = p(x > 651) 0.19. This means that, given a true mean of 648, the chance of observing the sample mean of at least 651 is much higher than before (19%). So although 651 is dierent from 648, we could reasonably say that the dierence is small enough to be allowable by chance. Note that the wording if H0 were true in the denition of P-values is important. This is because the resulting P-value is dependent upon the z-score, which in turn is dependent upon the value of the population mean in H0 . Thus, you have to assume H0 is true before you can calculate the P-value.
62/67
Hypothesis Testing
P-Values
As a nal word, care must be taken when interpreting P-values. Particularly:
1
The P-value is not the probability that the null hypothesis is true; and 1P is not the probability of the alternative hypothesis being true. The P-value is not the probability that a nding has occurred by chance. It is the probability of obtaining a certain result when the true mean has a known value, which could then be interpreted as having occurred by chance, nor not. The P-value is not the probability of mistakenly rejecting the null hypothesis. This is the signicance level. The signicance level of the test is not determined by the P-value. The signicance level is a value that should be decided upon by the analyst before the data are viewed, and is compared against the P-value calculated after the test has been performed.
63/67 Hypothesis Testing
64/67
Hypothesis Testing
Example 2a
A certain quantity has an accepted value of 648 with a population standard deviation of 20.46. A new experiment of 36 observations nds that the sample mean is 655. Do the new data provide enough evidence to cause us to change the accepted value, at the 0.05 signicance level? a) Critical score method: H0 : 648; Ha : > 648 = 0.05 z0.05 = 1.65 z = 2.053 z > z0.05 reject H0 at =0.05 Must change accepted value. b) P-value method: H0 : 648; Ha : > 648 = 0.05 z = 2.053 P=0.02 P> reject H0 at =0.05 Must change accepted value.
65/67
Hypothesis Testing
Example 2a
As for Example 2a, except the sample mean is 651. a) Critical score method: H0 : 648; Ha : > 648 = 0.05 z0.05 = 1.96 z = 0.88 z > z0.05 do not reject H0 at =0.05 Do not change accepted value. b) P-value method: H0 : 648; Ha : > 648 = 0.05 z = 0.88 P=0.19 P> do not reject H0 at =0.05 Do not change accepted value. In fact we would not reject H0 until reached 0.19.
66/67
Hypothesis Testing
So, the P-value can be viewed as the smallest level at which the data are signicant. The Example 1a data are signicant at the 0.05 level, but not at the 0.01 level. In fact they are signicant at the 0.02 level. However, while they might appear to be strongly related, the P-value and signicance level are quite dierent concepts. That is, the P-value cannot be viewed as a probability of mistakenly rejecting a null hypothesis.
67/67
Hypothesis Testing