Sunteți pe pagina 1din 5

The magical P-value

The Green Cow April 30, 2013

In the past entries I sometimes mentioned P-values. To give the nonstatisticians among the readers an idea of what these are, I explained them as the chance of being mistaken. This explanation is a bit vague and not completely accurate. The P-values are very important in statistics. You will see P-values in almost any scientic work. Bolt statements are often accompanied by the note P<0.05. Once this value for P is stated, there seem to be no more room for discussion. However, P is no magical numbers. Understanding what a P-value means, improves anyones understanding of science. Therefore, I will try to explain here what a P-value is.

Distributions
In order to understand what P-values are, it is necessary to know what distributions are. I will illustrate this with results from a very nice Swiss study. 1 2 Just take a look at gure 1 to start. This gure gives the percentage of Swiss 19-year old men with dierent lengths over time. The lines in this gure give what is called the distribution of the length among these men. Let us focus on the curve for 2008-09. Around 6.5% of the men then had a length of 178.2cm, the average length. (This is the top of the curve.) The minimal and maximal measured length was 147cm and 208cm, respectively. For each length between these extremes, we can look what percentage of men had this length. (That is the percentage you can read on the vertical axis.) When you look at the distributions on dierent times in gure 1, you see that the shape of the curve is more or the less the same each time. Distributions with this shape are called normal distributions. These type of distribution had certain properties, that are used in calculating P-values. For instance: in a normal distribution is 50% of the people below the average and 50% above the average. In the example of the length of Swiss
Swiss people seem to have a talent for these detailed, well organized studies. K Staub, FJ Rhlia, U Woitekc, C Psterb, The average height of 18- and 19-year-old conscripts (N=458,322) in Switzerland from 1992 to 2009, and the secular height trend since 1878, Swiss Med Wkly. 2011;141:w13238
2 1

Figure 1: Length of 19 year old Swiss men over time men this means that in 2008-09 50% was taller than 178.2cm. For a normal distribution, we can caluculate something that is called a standard deviation. I will not bother you with how this is calculated. Most important is to know what it means. A standard deviation is a measure of how wide the curve is. In our example the standard was 6.2cm. The properties of a normal distribution are such, that we know from the average (178.2cm) and the standard deviation (6.2cm) that 68.2% of the Swiss men had a length between 172cm and 184.4cm. (That is 178.2cm-6.2cm and 178.2cm+6.2cm.) To get a better idea of what that all means, compare gure 1 with gure 2. The last gure represents data from a hypothetical population with an average length of 178.2cm and standard deviation of 20cm. You see that this distribution is much wider.3

P-value
What is the connection between these gures and P-values, you will ask. Well, lets see. First: the P is an abbreviation for probability. The chance of something happening. As I mentioned, in the Swiss example there was a 50% chance of being taller than 178.2cm. That means more precisely, that
If you have any diculties with seeing that both graphs represent the same thing, imagine a curve conecting the middlepoints of the upper side of the cuboids.
3

Figure 2: Lengths of a hypothetical population with average 178.2cm and standard deviation 20cm. when we choose someone from the Swiss male population at random (that is: by chance), there is 50% chance that he will be larger than 178.2cm. From the properties of the normal distribution, we also know that there is 2.5% chance that the men will be taller than 190.6cm (calculated from the average and the standard deviation). In statistical tests we use the same kind of distributions. We are then not looking at lengths but at things called statistics. I will illustrate this in a minute. We know how these statistics are distributed, for instance by the normal distribution. These statistics are used to test a certain hypothesis. The chance that a statistics has a certain value is calculated from an average and a standard deviation, as I just did with the chance that a random Swiss male of 19 years old in 2009 was larger than 178.2cm (50%) or 190.6cm (2.5%). This chance, or probability, is then reported as a P-value. Typically, if this chance is less than 5% and thus P<0.05, we assume that the hypothesis we tested was false. When it is reported that something is statistically signicant, this is what they mean.

Example
Probably, this sounds all very complicated when you never had any courses on statistics before. Therefore, I will illustrate it with another example on the length of Swiss males. We see in gure 1 that the curve (the distribution) of the lengths in 1878-79 is situated to the left from the curve in 2008-09. What we want to know, is whether this was a measurement error or not. When it was not a measurement error, this means that Swiss men truly were smaller in the 19th century than they were in 2008-09. First, we assume that there was not really a dierence in length over time. This is the hypothesis we want to test. Then we have to calculate a 3

statistics. That sounds very complicated, but in this case we simply use the dierence in the means at the two time periods. In 1878-1879 the mean length among the Swiss men was 163.3cm. In 2008-09 this was 178.2cm as we said before. The dierence between these two numbers is 14.9cm. That is the value of the statistic. We assumed that there was not really a dierence between the two time periods, thus the dierence would have been 0cm. In that case the statistics would have been 0. Before we do the test, lets think for a minute. It is possible that by chance we measured the shortest men in 1878-79 and, also by chance, measured the tallest men in 2008-09. Then it is possible that there seem to be a dierence between the two time periods, while there actually is none. The dierence of 14.9cm was then found by chance. To know if this is the explanation of the dierence, we could measure again a lot of Swiss men. We then could calculate again the dierence between the time periods. Possibly it would than be another value. When there was no real dierence between the time periods, we expect that our statistics will be 0cm when we repeat our little experiment. If we repeated the experiment again and again we would once nd a dierence of 1cm, then one of 2cm. It could also be that the people in 2008-09 had a smaller average length than those in 1878-79. The dierence would than be -1cm, so to say.4 The value of 14.9 then could be considered a measurement error. If we would plot all those dierences between the average of the people in 1878-79 and those in 2008-09, measured in dierent experiments, the dierences would also follow a normal distribution. Just believe me on that, for a moment.5 The average of this normal distribution would be 0, as we assumed this is the true dierence. What we are now going to calculate is the chance that our statistics is 14.9cm given that our assumption was true and there actually was no difference in average length over time. Thus, just as we know that the chance of being taller than 190.6cm for a random men from the Swiss population in 2008-09 was 2.5%, we now look at the chances of a random experiment having a statistic larger than 14.9cm still assuming that there actually is no dierence. We need to correct for the number of men that were measured and for the standard deviation. For the moment I will not bother you with that. If you want to know the details, check out Wikipedia. 6 Basically, the more men are measured, the smaller the chance that we nd a dierence by mistake. It turns out that the chance of nding a dierence of 14.9cm between the two groups while there in reality is no dierence is almost zero with the large number of men who where measured by the Swiss researchers from our example. What you will see then in a report is P<0.001, or someSo the dierence is 1 cm, but the shortest group of the rst experiment is the tallest group of the second. 5 This we know from the so called Central Limit Theorem. 6 Here
4

thing like that. The conclusion is then that our hypothesis that there was no dierence, has a chance of less than 1 out of 1000 to be true.

Conclusion
I hope I explained clearly what a P-value actually is and how it is used in statistics. I know that it sounds probably a bit complicated when you rst read about P. Most important is that you realize that it is not more or less than a chance or probability of something to happen. The typical limit for statistical signicance of P<0.05 means that there is still a chance of ve percent that your results were found by chance. The second thing to remember is that we never prove that something is correct, but only that a specic hypothesis is most likely wrong.

S-ar putea să vă placă și