Sunteți pe pagina 1din 67

Hypothesis Testing

Statistics dan Probability

Semester 1, 2013

Ira M. Anjasmara

Jurusan Teknik Geomatika

Sample distribution

Recall : a sample is a selection from a population that is deemed to be representative of that population. Many samples can be taken from the same population, and each sample should be unbiased: i.e., collected in a random fashion. Then, statistics can be calculated from each sample: e.g., the mean, variance, standard deviation. We wish to infer information about the population based on all the samples, rather than just one.

Statistics dan Probability

2/67

Hypothesis Testing

Sampling distribution of the mean

If many dierent samples are selected from the same population we get a range of dierent sample means. For example, suppose we are interested in the mean height of Indonesian women: a sample of 100 women will give us a sample mean, but in general, this will be dierent to the true (population) mean. Another sample of 100 women will give a dierent sample mean, also dierent to the true mean. All the possible sample means taken from a population have their own distribution, known as the sampling distribution of the mean.

Statistics dan Probability

3/67

Hypothesis Testing

Sampling distribution of the mean

The sampling distribution of the mean is the probability distribution for all possible values of the sample mean, x . If we take enough samples, the mean of these sample means will tend towards the population mean, i.e.: E [ x] = or, the expected value of x = the population mean. (1)

Statistics dan Probability

4/67

Hypothesis Testing

Standard error of the mean


Just as we can nd the mean of all the sample means, we can nd their standard deviation. The standard deviation of all the sample means measures the dispersion of all possible values of x for all possible samples. The standard deviation of all the sample means is called the standard error of the mean, and is dened by: x = N n N 1 n (2)

where is the standard deviation of the population being sampled, n is the sample size, and N is the population size (often unknown).

Statistics dan Probability

5/67

Hypothesis Testing

Standard error of the mean

If n/N < 0.05, i.e. the sample size is much smaller than the population size, then the eq. (2) reduces to: x = n From now on, we will assume that the n/N condition is always met. (3)

Statistics dan Probability

6/67

Hypothesis Testing

Standard error of the mean: example

Assume that many random samples of sizen = 49 are to be taken from a large population with mean = 100 and standard deviation = 21. What are the mean and standard deviation of the values of all the sample means? We know that repeating the sampling process will generate dierent sample means due to the dierent samples selected. The mean of these x values is E [ x] = = 100. Since the population is large relative to the sample, the standard deviation of the x values is: 21 x = = = 3 n 49

Statistics dan Probability

7/67

Hypothesis Testing

Central limit theorem


The central limit theorem states: In selecting simple random samples of size n from a population of mean and standard deviation , the sampling distribution of x approaches a normal probability distribution with mean and standard deviation x when n 30. This is true for any population distribution, not just normally-distributed ones. Furthermore, if the population being sampled itself has a normal distribution, then the sampling distribution of the mean is normally distributed for any value of n.

Statistics dan Probability

8/67

Hypothesis Testing

Figure 1 : Illustration of the Central Limit Theorem for the Three Population (from Anderson et al. (1991). Introduction to Statistics, West Co.)
Statistics dan Probability 9/67 Hypothesis Testing

Central limit theorem: example (1)


Consider a variable x that has a population distribution given by:

Since this distribution has a shape that cannot be represented by a mathematical equation, we cannot nd the answer to the question: What is p(x < 1340)? (The red-shaded area).

Statistics dan Probability

10/67

Hypothesis Testing

Central limit theorem: example (2)


We can, however, sample this distribution many times, and as long as the sample size is 30, we convert the distribution of x into a distribution of the sample means x , which, from the central limit theorem, is normally-distributed:

Now we can ask: What is p(x < 1340)? (The red-shaded area).
Statistics dan Probability 11/67 Hypothesis Testing

Probabilities using sampling distributions

We can use the sampling distribution of the mean to compute the probability of selecting a sample that will provide a value of x within any specied distance from the population mean. If we approximate the sampling distribution of x to a normal distribution, through the central limit theorem, we may compute a z -value, where: z= x x

Statistics dan Probability

12/67

Hypothesis Testing

Probabilities using sampling distributions

We can use the sampling distribution of the mean to compute the probability of selecting a sample that will provide a value of x within any specied distance from the population mean. If we approximate the sampling distribution of x to a normal distribution, through the central limit theorem, we may compute a z -value, where: z= x x

Note that here the z valued is dened by using x rather than x and the standard error of the mean rather than standard deviation.

Statistics dan Probability

13/67

Hypothesis Testing

Probabilities using sampling distributions


So, for the population of x given, we would convert the normal distribution of the sample means (x ) into a standard normal distribution of z -scores (assuming we knew and ):

And now we can nd p(x < 1340) = p(z < 1.4)

Statistics dan Probability

14/67

Hypothesis Testing

Probabilities using sampling distributions:example


For the Statistics mid-semester test the mean score of 85 students was 48% with a standard deviation of 25%. What is the probability that the mean of a sample of 30 students will be greater than 55% ? We have =48, =25, n=30, and require p( x > 55). First, compute the standard error of the mean: 25 x = 4.56 = = n 30 Then calculate the z -score: z= 55 48 = 1.53 4.56

Statistics dan Probability

15/67

Hypothesis Testing

Probabilities using sampling distributions:example

From the standard normal tables, the area enclosed between z =0 (the mean) and z =1.53 is A=0.4370:

So the area we want (shaded) is: 0.5-0.4370 = 0.0630 i.e., p( x > 55) = 0.0630

Statistics dan Probability

16/67

Hypothesis Testing

Hypothesis Testing

This is a process for making a statistical decision based on information contained in the sample. A statistical hypothesis is an assumption, statement or question concerning one or more populations, which may or may not be true. The truth of the assumption can only be known for certain if we examine the whole population, which is impractical. Thus, the aim of hypothesis testing is to decide whether the assumption is true based on random samples.

Statistics dan Probability

17/67

Hypothesis Testing

Hypothesis Testing
We make an assumption about the value of a population statistic (e.g., mean, variance):
this is called the null hypothesis it is denoted H0 .

We then dene another hypothesis, opposite to the null hypothesis:


this is called the alternative hypothesis it is denoted Ha or sometimes H1 .

We then test the null hypothesis, usually through experiment on a sample. If the results of the test (based on the sample) are inconsistent with the null hypothesis then we reject H0 . If the results are consistent with the null hypothesis, then this does not necessarily imply that H0 is true, or that we accept it, only that there is insucient evidence to reject H0 .
Statistics dan Probability 18/67 Hypothesis Testing

Hypothesis Testing
The null hypothesis is phrased so that the status quo is preserved, i.e., things dont change. For example, in (Western) criminal law, a defendant on trial for a crime is innocent until proven guilty. Therefore, we make: H0 : innocent i.e., the null hypothesis states that he will still be innocent after the trial, preserving the status quo. Think in terms of a courtroom: the null hypothesis is like the defence lawyer, pleading innocence; the alternative hypothesis is like the prosecution lawyer, attempting to prove guilt.

Statistics dan Probability

19/67

Hypothesis Testing

Hypothesis Testing

When testing a hypothesis, the aim is to use sample data to refute the null hypothesis (and not to prove the alternative hypothesis). Then, if there is any doubt as to the validity of the alternative hypothesis, we revert back to the null hypothesis. In mathematical terms, the null hypothesis is a statement of a population, and not sample, statistic, because the population statistic is a representation of the accepted wisdom, while the sample statistic is a representation of the new evidence. In this chapter, null hypothesis statements will concern (and never x ).

Statistics dan Probability

20/67

Hypothesis Testing

Hypothesis Testing: Example


The speed of light is known to be 299,792,458 ms1 . However, someone comes along and measures it as 299,792,457 ms1 . Null hypothesis: the accepted speed of light value has been determined over decades of rigorous research, and there is no need to change it: H0 : = 299,792,458 Alternative hypothesis: the new evidence suggests that the value needs changing: Ha : =299,792,458 Note that we do not test Ha : x = 299,792,457. Furthermore, our conclusions after the statistical test is performed will be made in terms of the null hypothesis: For example: there is no evidence to suggest that H0 should be rejected; or, the new evidence suggests that H0 should be rejected.
Statistics dan Probability 21/67 Hypothesis Testing

Errors in hypothesis testing


Because our results are based only on a sample then our decision to reject or accept a hypothesis may be incorrect. There are two types of errors: Type I error: reject H0 when it is true (being too sceptical ) Type II error: accept H0 when it is false (being too gullible ) H0 true Correct decission Type I error H0 false Type II error Correct decission

Accept H0 Reject H0

A type II error is more dicult to detect than a type I error. Recall, this is when we accept H0 when it is false (i.e., we free the guilty man). In an experiment, if we accept H0 it may be because H0 is actually true, but it may also be because we did not have enough evidence to reject it. This latter case is like the police bungling an investigation, so that the jury have no choice but to free the guilty man.
Statistics dan Probability 22/67 Hypothesis Testing

Errors in hypothesis testing


To avoid making a type II error, we say do not reject H0 rather than accept H0 . This statement does not discard the possibility that we have mistakenly accepted H0 . It embodies the subtle dierence between saying this man is innocent and we cannot prove this man guilty. So a statistical test can either reject or fail to reject a null hypothesis, but can never prove it. Failing to reject a null hypothesis does not prove it true. It is largely up to the researcher to determine which type of error is the worst to commit.

Statistics dan Probability

23/67

Hypothesis Testing

Errors in hypothesis testing: Example

Example 1: A woman suspects shes pregnant and visits the doctor.


H0 not pregnant not pregnant Reality not pregnant pregnant Action informed pregnant (H0 rejected) informed not pregnant (H0 not rejected) Error type I type II Error type I type II Consequence freaks out carries on

Example 2: A man is on trial for murder:


H0 innocent innocent Reality innocent guilty Action convicted (H0 rejected) freed (H0 not rejected) Consequence sentenced to death released

Statistics dan Probability

24/67

Hypothesis Testing

Signicance levels
Type I errors are controlled by setting the level of signicance () for the test: = p (type I error occuring) i.e., there is an chance that we mistakenly reject H0 . The value of gives the area under the probability distribution curve corresponding to the probability of making a type I error. An example for the normal distribution is:

Statistics dan Probability

25/67

Hypothesis Testing

Signicance levels
As a null hypothesis can never be rejected with 100% certainty, we test at various levels of signicance: a small value of means a small chance of making the wrong decision, and thus a large chance of making the right decision. Suppose we reject H0 at = 0.01. This is a more signicant result than if H0 were rejected at = 0.05, because 0.01 represents only a 1% chance of mistakenly rejecting it, whereas 0.05 represents a 5% chance of mistakenly rejecting it. Bear in mind, though, that of all the tests being carried out around the world at the 0.05 level, 5% of them result in a false rejection of the null hypothesis. The choice of value of is subjective, but should always be greater than zero. The chosen value should be at most 0.1, but the most popular choice is 0.05.
Statistics dan Probability 26/67 Hypothesis Testing

The 8 steps to hypothesis testing

Performing a hypothesis test can be broken down into 8 steps.


1 2 3 4 5 6 7 8

formulate hypotheses determine number of tails determine signicance levels determine critical z value determine rejection region determine test statistic peform the test draw conclusion

Statistics dan Probability

27/67

Hypothesis Testing

Step 1 - formulate hypotheses


i) Formulate an alternative hypothesis, Ha This is the statement of the new result, i.e., the result that is contended to alter the status quo; or, the case for the prosecution. Decide what we are testing for: are we testing whether the new results are less than or greater than the established results (use < or > in the formulation for Ha ); or whether they are just dierent (use = in the formulation for Ha )? The clue comes from the wording of the question. If the question doesnt specically state less than or greater than (or wording to that eect), use =.

Statistics dan Probability

28/67

Hypothesis Testing

Step 1 - formulate hypotheses


ii) Formulate the null hypothesis, H0 This will be the opposite or inverse of the alternative hypothesis; i.e., what is the status quo? Use the opposite sign to Ha ( , , or = ): In summary, a hypothesis test concerning the value of a population mean () can take one of three forms: H0 0 0 = 0 Ha > 0 < 0 = 0

where, 0 = numerical value being considered in the hypothesis


Statistics dan Probability 29/67 Hypothesis Testing

Step 2 - determine number of tails

The number of tails of a test comes from a graphical representation of the hypothesis:

Statistics dan Probability

30/67

Hypothesis Testing

Step 2 - determine number of tails - Example

It is known that a certain quantity has a value 0 . Recent tests nd that this quantity actually has a value x . For a 1-tailed test: if the value of x < 0 , then have Ha : < 0 if the value of x > 0 , then have Ha : > 0 For a 2-tailed test always have: Ha : < 0 In general: if in doubt, use a 2-tailed test.

Statistics dan Probability

31/67

Hypothesis Testing

Step 3 - determine signicance level


Recall, the signicance level is the probability of rejecting a true null hypothesis. This value is usually given to you as either a fraction between (but not including) 0 and 1. If it is not given, it is up to you to choose a value. Common choices are = 0.1, 0.05, 0.01 and 0.001, but 0.05 is most widely used. The value of equals the area of the rejection region: this is the part of the normal distribution where the sample data indicate that H0 should be rejected.

Statistics dan Probability

32/67

Hypothesis Testing

Step 4 - determine critical z value

Use the value of to get a value of z from the normal tables. i.e., what value of z gives an area in the rejection region of ? This critical value will be used to test the null hypothesis see Step 7. For a given value of the value of z will depend on whether we have a 2-tailed or 1-tailed test: in a 1-tailed test, all of is in one rejection region, so nd z in a 2-tailed test, is split into two rejection regions, each one with area /2, so nd z/2

Statistics dan Probability

33/67

Hypothesis Testing

Step 5 - determine rejection region

The boundary of the rejection region is determined by the value of z . Its location is determined by the form of the alternative hypothesis (<, >, or = ):

Statistics dan Probability

34/67

Hypothesis Testing

1-tailed:

H0 : 0 Ha : > 0

1-tailed:

H0 : 0 Ha : < 0

Statistics dan Probability

35/67

Hypothesis Testing

2-tailed:

H0 : = 0 Ha : = 0

Statistics dan Probability

36/67

Hypothesis Testing

Step 6 determine test statistic

This is calculated from: z=

x x

(4)

Recall, x is the mean of the sample taken to test the hypotheses, is the population mean, and x = / n where is the population standard deviation. NB: If is unknown, then as long as n 30, we may use the sample standard deviation (s) as an approximation.

Statistics dan Probability

37/67

Hypothesis Testing

Step 7 perform the test

Compare the test statistic against its critical value. i.e., plot the position of z on the z -axis, and check its position relative to z and the rejection region: if z lies in the rejection region, reject H0 ; if z does not lie in the rejection region, do not reject H0 . Always state the signicance level at which you make your decision.

Statistics dan Probability

38/67

Hypothesis Testing

For example, the following would indicate rejection of H0 :

Statistics dan Probability

39/67

Hypothesis Testing

Step 8 draw conclusions

Always refer back to the wording of the original problem: do not just leave the answer as reject H0 ; and always include the signicance or condence level in your answer.

Statistics dan Probability

40/67

Hypothesis Testing

Example 1

A count of vehicles travelling past a point on Albany Highway in 30 seconds (during peak hour) is supposed to be about 25 vehicles with a standard deviation of 4.3 vehicles. When a further 40 measurements were taken it was found that the mean was 23.5 vehicles per 30 seconds. Can we be 99% certain that the vehicular trac is less than 25 vehicles per 30 seconds? Take 25 vehicles as the population mean. We therefore want to test whether the sample mean from the new data (23.5) indicates that this value is too high. We have: = 25, = 4.3, x = 23.5, n = 40, = 0.01.

Statistics dan Probability

41/67

Hypothesis Testing

Example 1
Step 1 Formulate alternative hypothesis: Ha : < 25 i.e., test whether the true population mean is actually less than the established value. Formulate null hypothesis: H0 : 25 i.e., assume the given population mean is correct, and the sample data are misleading. Step 2 Determine number of tails. This is a 1-tailed test, because the null hypothesis has an inequality.

Statistics dan Probability

42/67

Hypothesis Testing

Example 1

Step 3 Determine level of signicance: We are told that the condence level is 99%, therefore = 0.01. Step 4 Determine the critical value of z: We have a 1-tailed test, so we need to nd z = z0.01 From the standard normal distribution table, we have: z0.01 = z (0.5 0.01) = z( 0.49) =2.33

Statistics dan Probability

43/67

Hypothesis Testing

Example 1
Step 5 Determine the rejection region: The null hypothesis will be rejected if < 25, so we have the following situation:

Since we are testing < 25, we are in the LHS of the normal curve, therefore the rejection region is z < 2.33.
Statistics dan Probability 44/67 Hypothesis Testing

Example 1
Step 6 Determine the test statistic (z -score) from the sample data: z= x 23.5 25 = 4.3 = 2.21 x / 40 (5)

Step 7 Compare the test statistic against its critical value: 2.21 > 2.33, therefore z , and hence x , the sample mean, do not lie in the rejection region. Hence, we do not reject H0 at the 0.01 signicance level. Step 8 Our sample measurement is compatible with the supposed population mean at 99% condence level. Therefore it follows that the true mean is not less than 25 vehicles per 30 seconds.
Statistics dan Probability 45/67 Hypothesis Testing

Exercise

The value of a well-observed angle was known to be 30 15 30 . A new theodolite was tested against this angle for calibration. A sample of 36 arcs produced a mean of 30 15 32 , with an SD of 6. Is this value signicantly dierent from the standard value at the 5% level of signicance? Take 30 15 30 as the population mean. We therefore want to test whether the sample mean from the new data (30 15 32 ) indicates that this value is incorrect. We have: = 30 15 30 , s = 6, x = 30 15 32 , n = 36, = 0.05.

Statistics dan Probability

46/67

Hypothesis Testing

Condence Interval

The degree of condence is dened as: 1 It is usually expressed as a percentage, called the condence level (CL). Condence is the probability that the mean (sample or population) lies within a condence interval (CI). The condence interval represents the region within which (1 )% of all the sample means will lie: CI = z/2 x or p( z/2 x + z/2 x x ) = 1

Statistics dan Probability

47/67

Hypothesis Testing

Condence Interval
CI also represents the region in which we are (1 )% likely to nd the population mean, : CI = x z/2 x or p( x z/2 x + z/2 x x ) = 1 This means that we dont need to know in order to determine the condence interval. For instance, for = 0.05, the CL = 95%, and z/2 = 1.96:

Statistics dan Probability

48/67

Hypothesis Testing

Condence Interval
The quantity z/2 x is called the margin of error. This is not the precision of the data (which is just x ); rather it gives the maximum allowable error. It is obviously desirable to have a low margin of error, because a low margin of error indicates that we have pinned down the mean quite precisely. However, a low margin of error implies a low z/2 , and thus a low condence level. Conversely, a high condence percentage gives a large margin of error (because z/2 is larger). So having high condence does not imply that we have good data: it just means that we have allocated a wider range in which to place the measurement.

Statistics dan Probability

49/67

Hypothesis Testing

Condence intervals vs hypothesis tests

Since the condence interval can give a range of values where we are (1 )% likely to nd or x , we can use it to perform a 2-tailed hypothesis test (but not 1-tailed). If signicance is the probability of rejecting the null hypothesis when it is actually true (making a type-I error), then condence can be thought of as our degree of certainty in making the correct decision. Remember, a low value of means a low probability of mistakenly rejecting H0 , and therefore a high condence in making the right decision. Conversely, a higher value of means a higher probability of mistakenly rejecting H0 , and therefore a lower condence in making the right decision.

Statistics dan Probability

50/67

Hypothesis Testing

Condence intervals vs hypothesis tests

Consider an extreme example, where = 0. In a hypothesis test we have a zero probability of making a mistake (type I error), and we are extremely condent of our decision, with a condence level of 100%. However, dont be fooled: the critical value is z/2 = , so the margin of error is innite. So even if x was a long long way from we still wouldnt reject H0 . In fact we could never reject H0 . Obviously, real-world examples would never have a zero signicance level. But this example above shows that as decreases, it becomes harder to reject H0 .

Statistics dan Probability

51/67

Hypothesis Testing

Condence intervals vs hypothesis tests

Now consider: H0 : = 0 Ha : = 0 where 0 is the hypothesized value for the population mean. By analogy with the 8 steps to hypothesis testing for a 2-tailed test, we can see that the do-not-reject H0 region is given by: 0 z/2 So if the sample mean does not fall in this region, we must reject H0 .

Statistics dan Probability

52/67

Hypothesis Testing

Example 1
The value of a well-observed angle was known to be 30 15 30 . A new theodolite was tested against this angle for calibration. A sample of 36 arcs produced a mean of 30 15 32 , with an SD of 6. Is this value signicantly dierent from the standard value at the 5% level of signicance? We have: = 30 15 30 , s = 6, x = 30 15 32 , n = 36, = 0.05. So, the hypotheses are H0 : = 30 15 30 Ha : = 30 15 30 We can use the condence interval method because this is a 2-tailed test. The critical value of z is: z/2 = z0.025 = 1.96 The standard error of the mean is: x = 6 / 36 = 1
Statistics dan Probability 53/67 Hypothesis Testing

Example 1
So the condence limits are: z/2 x = (1.96 1 ) = 1.96 And the condence interval about the population mean is: CI=30 15 30 1.96 = 30 15 28.04 to 30 15 31.96

The sample mean (30 15 32 , the green arrow) does not lie within this interval. Hence, we reject H0 with 95% condence. Our sample measurement is incompatible with the supposed population mean at 0.05 signicance. Therefore it follows that the true mean is not 30 15 30 at this level.
Statistics dan Probability 54/67 Hypothesis Testing

Example 2
What if we do the test at 0.01 level of signicance (99% condence)? The critical value of z is now : z/2 = z0.05 2.58 And the condence limits are: z/2 x 2.58 The condence interval about the population mean is now: CI=30 15 30 2.58 = 30 15 27.42 to 30 15 32.58

The sample mean (30 15 32 , the green arrow) does lie within this interval. Hence, we do not reject H0 with 99% condence. Our sample measurement is now compatible with the supposed population mean at 0.01 signicance. Therefore it follows that the true mean is 30 15 30 at this level.
Statistics dan Probability 55/67 Hypothesis Testing

Conclusion

The data havent changed, but the outcomes have. In Example 1 we rejected H0 , but in Example 2 we didnt. Whereas in Example 1 we had a 5% chance of mistakenly rejecting H0 , in Example 2 we only had a 1% chance of mistakenly rejecting it. So in Example 2 we accepted H0 , not because the data got better or worse, but because we were allowed more freedom via a larger margin of error. In terms of statistical theory, while setting a low signicance level means a low probability of mistakenly rejecting H0 (making a type I error), it raises the probability of making a type II error, i.e., mistakenly accepting H0 (and thus accepting any old rubbish!).

Statistics dan Probability

56/67

Hypothesis Testing

P-Values

You might ask, then, at what signicance (or condence) level do we only just reject H0 ? This is measured by the P-value: The P-value is the probability, if H0 were true, that the test statistic would take a value at least as extreme as that actually observed. It can be loosely thought of as the probability that a variable would assume a value greater than or equal to the observed value strictly by chance, assuming that H0 is true. Or even more loosely as the statement theres only a P chance that this result could have happened by coincidence.

Statistics dan Probability

57/67

Hypothesis Testing

P-Values
Suppose you took a random sample from a normally-distributed population with mean 0 . You would not, in general, expect the mean of this sample, x 0 , to be exactly equal to 0 , but if you are pretty sure that the population mean really is 0 (i.e., the null hypothesis), then x 0 shouldnt be too far o. Now suppose that you found that x 0 was much greater than 0 . This could be due to one of two reasons. Either:
1 2

the null hypothesis is false; or, the result is just a uke, a coincidence of your random sampling happening to choose data that had a large mean; in this case we would not want to change the null hypothesis.

What the P-value does is to help us choose between these two options.

Statistics dan Probability

58/67

Hypothesis Testing

P-Values
The P-value is the chance of obtaining your results if the null hypothesis is true. Or, in terms of our example: P = p( xx 0 ), given a true H0 Put another way, the P-value tells us that random sampling from the population would lead to a value larger than x 0 in 100P% of experiments, and smaller than x 0 in 100(1-P)% of experiments. The smaller the P-value, the less likely it is that the result we got was simply the result of chance, and more likely that the result indicates that the null hypothesis is incorrect. Suppose we had found that P = 0.3. This means that if the experiment was performed 100 times, then we would be allowed to observe a sample mean of at least x 0 30 times, even though the true mean is 0 . So getting a value of x 0 would not be that unusual, and we would not need to change the null hypothesis.
Statistics dan Probability 59/67 Hypothesis Testing

P-Values
However, if we had found that P = 0.01, then if the true mean were 0 , getting a result of x 0 purely by chance would only happen once out of every 100 trials, which is quite rare. The low P-value, therefore, suggests that our result is not likely to be the product of pure chance, and it is very likely that that the H0 assumption (mean = 0 ) is wrong. In summary (loosely speaking): P-value high low Fluke? yes - a uke no - a good measurement H0 decision do not reject reject

For neither large nor small P-values, any decision on the null hypothesis should be made by assigning a signicance level.

Statistics dan Probability

60/67

Hypothesis Testing

Example 1a
A certain quantity has an accepted value (population mean) of 648 with a (population) standard deviation of 20.46. A new experiment of 36 observations nds that the (sample) mean is 655. What is the probability of observing a sample mean of 655 or larger from a population with a true mean of 648? We can quickly nd that z = 2.053, and therefore P = p (x 655)= 0.02.

This small P-value implies that we would be unlikely (2 times out of every hundred) to measure a value of 655 purely by chance if the true mean really is 648. This suggests that the null hypothesis (H0 : 648) might not be true.
Statistics dan Probability 61/67 Hypothesis Testing

Example 1b

However, suppose the new experiment found the sample mean to be 651. The z-score is now 0.88, and P = p(x > 651) 0.19. This means that, given a true mean of 648, the chance of observing the sample mean of at least 651 is much higher than before (19%). So although 651 is dierent from 648, we could reasonably say that the dierence is small enough to be allowable by chance. Note that the wording if H0 were true in the denition of P-values is important. This is because the resulting P-value is dependent upon the z-score, which in turn is dependent upon the value of the population mean in H0 . Thus, you have to assume H0 is true before you can calculate the P-value.

Statistics dan Probability

62/67

Hypothesis Testing

P-Values
As a nal word, care must be taken when interpreting P-values. Particularly:
1

The P-value is not the probability that the null hypothesis is true; and 1P is not the probability of the alternative hypothesis being true. The P-value is not the probability that a nding has occurred by chance. It is the probability of obtaining a certain result when the true mean has a known value, which could then be interpreted as having occurred by chance, nor not. The P-value is not the probability of mistakenly rejecting the null hypothesis. This is the signicance level. The signicance level of the test is not determined by the P-value. The signicance level is a value that should be decided upon by the analyst before the data are viewed, and is compared against the P-value calculated after the test has been performed.
63/67 Hypothesis Testing

Statistics dan Probability

P-values and hypothesis tests


As we have seen, hypothesis testing involves comparing an observed test statistic (z ) against a critical value determined from the level of signicance (z ), and making a decision based on their relative values We can also make this decision by comparing the P-value with the signicance level, . For a 1-tailed test, the relationship between P and is: if P : reject H0 , data are signicant; if P > : do not reject H0 , data are not signicant. For a 2-tailed test, the relationship between P and is: if P /2 : reject H0 , data are signicant; if P > /2 : do not reject H0 , data are not signicant.

Statistics dan Probability

64/67

Hypothesis Testing

Example 2a

A certain quantity has an accepted value of 648 with a population standard deviation of 20.46. A new experiment of 36 observations nds that the sample mean is 655. Do the new data provide enough evidence to cause us to change the accepted value, at the 0.05 signicance level? a) Critical score method: H0 : 648; Ha : > 648 = 0.05 z0.05 = 1.65 z = 2.053 z > z0.05 reject H0 at =0.05 Must change accepted value. b) P-value method: H0 : 648; Ha : > 648 = 0.05 z = 2.053 P=0.02 P> reject H0 at =0.05 Must change accepted value.

Statistics dan Probability

65/67

Hypothesis Testing

Example 2a

As for Example 2a, except the sample mean is 651. a) Critical score method: H0 : 648; Ha : > 648 = 0.05 z0.05 = 1.96 z = 0.88 z > z0.05 do not reject H0 at =0.05 Do not change accepted value. b) P-value method: H0 : 648; Ha : > 648 = 0.05 z = 0.88 P=0.19 P> do not reject H0 at =0.05 Do not change accepted value. In fact we would not reject H0 until reached 0.19.

Statistics dan Probability

66/67

Hypothesis Testing

P-values and hypothesis tests

So, the P-value can be viewed as the smallest level at which the data are signicant. The Example 1a data are signicant at the 0.05 level, but not at the 0.01 level. In fact they are signicant at the 0.02 level. However, while they might appear to be strongly related, the P-value and signicance level are quite dierent concepts. That is, the P-value cannot be viewed as a probability of mistakenly rejecting a null hypothesis.

Statistics dan Probability

67/67

Hypothesis Testing

S-ar putea să vă placă și