Documente Academic
Documente Profesional
Documente Cultură
Sadly, we do not live in such a simple world and statistics come into play. The ratios
might be slightly "skewed" - a math term for "off" - due to the number of individuals you
collected or just "bad luck". The ratios we expect from a dihybrid cross are not always
what we get in the experiment. One way to combat the problem of statistics is to use
statistics!
Let's take a step back and look at some of Mendel's original work with monohybrids
because they are an easier place to start. You will recall that Mendel did a lot of
monohybrid experiments and collected a lot of data from a lot of plants and crosses.
Here's one of those sets of data that I showed you earlier.
P = smooth seeds crossed with wrinkled seeds
F1 = all smooth seeds (so smooth is dominant and wrinkled is recessive)
F2 = 5,474 smooth seeds and 1,850 wrinkled seeds is a ratio of 2.96 : 1
Mendel and the Punnett square tell us that we should have a ratio of 3 : 1 not 2.96 : 1! So
is Mendel wrong? Is the Punnett square wrong? Is our entire understanding wrong?!
That depends upon how different the actual, observed numbers are from the calculated,
expected numbers. But how close is close enough? Is 2.96 : 1 close enough to 3 : 1 that
we should accept Mendel's ideas? Some folks would argue, "Well, Mendel says it should
be 3 : 1 and it is not 3 : 1, so Mendel is wrong!" But someone else would argue, "Hey,
lighten up! I think 2.96 : 1 is close enough to 3 : 1 so I will not reject Mendel's ideas."
Mendel wasn't bothered by the fact that his data was a little off because he knew that,
statistically speaking, he was "within bounds". But what are those bounds and how do
you calculate them? That's when Mendel fell back on his knowledge of math and showed
that these tiny differences were not significant enough to cause him to throw it all away.
In chi-square analysis you compare the number of individuals of a certain phenotype (or
anything else) that you have found in the experiment to the number you expected to have.
That is, you find the difference between the observed and the expected by simply
subtracting one from the other. [It doesn't matter which one you subtract from which - all
you want is the difference.]
Then, just to make that difference bigger and to make it always a positive number, you
square it (multiply it by itself). That gives you the "squared difference".
Then you divide the "squared difference" by what you expected in the first place in order
to give you a "squared difference per expected" for that group. [This step brings the
numbers into a reasonable zone to work with but the reason you do it has to do with the
theory of statistics and I won't go into that!]
Naturally, you have to take into account all the different types, and you do that by adding
together these "squared differences per expected" values. The final sum (of the "squared
differences per expected") gives you a number called the 2. (Scared yet? )
By the way, chi ( ) is the Greek letter for "c" which mathematicians often use as an
abbreviation for "comparisons". The chi-square ( 2) is a "comparison squared".
Step 3: congratulate yourself for having gotten through the toughest part!
Step 4: SUM (ADD up) the "squared differences per expected" from all the categories.
In this case there are only two categories so there are only two values to add. Add the
value you calculated for the smooth (0.066) to the value you calculated for the wrinkled
(0.197) to get 0.263.
Step 5: COMPARE our chi-square value to the value in a chi- Chi Square Significance
square significance table and determine if our value is Table
significant. 5%
The chi-square significance table has been developed by Degrees of
Significance
statisticians. These tables come in all shapes and sizes Freedom
Levels
depending upon how exact you want to be and how many
1 3.84
categories you are dealing with. For our work we want to
know if these results pass a significance level of 5%. (This is 2 5.99
a fairly good level of significance and is often used as a "cut- 3 7.81
off" in experiments like these.)
4 9.49
OK, what does this mean? What is this "degrees of freedom"
stuff?
The simple answer is that your degrees of freedom are one less than the number of
categories you have to work with. [The complicated answer is that degrees of freedom
are the number of values that can be randomly assigned while the total is left unchanged.
Don't worry about it. ]
We have two categories, smooth and wrinkled, so we have one degree of freedom and
you see from this table that with one degree of freedom we could be allowed a chi-square
as large as 3.84 and the results would still be considered significant to 5%. That is, we
would have to get a chi-square value over 3.84 before we would say that our results were
so far from a 3 : 1 ratio that we would have to reject that ratio (and Mendel's explanation
of how he got that ratio). Or, to put that another way, with a 2 = 0.263 there's less than a
5% chance that this 3 : 1 ratio happened by accident. There is a better than 95% chance
that he 3 : 1 ratio has real meaning or is "significant" in this experiment.
Let's think a bit more about the chi-square and what it has told us.
Imagine that Mendel's work had come out with exactly the ratio we expected. That is,
imagine Mendel observed in this experiment 5,493 smooth seeds and 1,831 wrinkled
seeds. That is an exact 3 : 1 ratio.
Let's do a quick chi-square on that imaginary result.
Looking first at the smooth seeds we would see that the difference between the observed
and expected is zero! (That's because 5,493 - 5,493 = 0.) When we square zero we still
get zero. If we divide zero by the expected value we get zero!
The same happens when we calculate the values for the wrinkled seeds too.
Now we would add those two values together (because they are the "squared differences
averaged") to get a final 2 = 0.
In other words, when the chi-square equals zero the experimental results are in exactly
the ratio expected! [This rarely happens.]
5% Conversely, the farther the chi-square gets from zero the less
Degrees of likely the ratio "rule" is being followed. If the chi-square had
Significance
Freedom been 1.9 (instead of 0.263) we would have been less confident
Levels
but still within the 5% significance range. (Right?)
1 3.84
2 5.99 As a matter of fact, we could have gotten a chi-square value
3 7.81 as high as 3.84 and still feel that we were close enough to the
4 9.49 3 : 1 ratio to not be worried. With a chi-square of 3.84 the
chances of the results fitting a 3 : 1 ratio by chance (by
"accident") are 5%. But if our chi-square value was larger
than 3.84 we would be drifting into uncertainty. If the chi-
square were 13.4 we would not feel at all comfortable and
would have a good reason to suspect that the 3 : 1 ratio did
not apply . With a larger and more defined table we could
even see to what level our confidence had dropped!
Here's another set of results from Mendel's monohybrid cross experiments. Let's do the
chi-square analysis of it.
Here I'll condensed the "steps". You'll see it flows a little bit better and there is less "hand
holding" or explanation.
P = green seeds crossed with yellow seeds
F1 = all yellow seeds (So which color is dominant? I hope you agree that yellow
dominates green seeds.)
F2 = 6,022 yellow seeds and 2,001 green seeds
Is this close enough to the 3 : 1 ratio we expect?
Notice that, because we are working with only two categories, the "squared differences"
are the same in both groups (22.56) because the differences are the same. (They MUST be
the same if there are only two groups! Think about it.) However, the "squared differences
per expected" are different because we have different expectations for the two groups
(6,017.25 to be yellow but 2,005.75 to be green) so we divide by different numbers. I point
this out because it can be used to highlight two of the most common mistakes in doing the
chi-square. Your "squared differences" in an experiment with only two categories (one
degree of freedom) must be the same - if they are not you made a math error. However, it
is very unlikely that your "squared differences per expected" are the same unless you
expected the same number for each group (a 1: 1 ratio) or you made the common mistake
of dividing both groups by the same number. Watch your numbers and pay attention.
Third, sum (add up) the "squared differences per Degrees of 5 % Significance
expected" from all the categories. Freedom Levels
That's 0.011 + 0.004 = 0.015 so your 2 = 0.015.
1 3.84
Wow, that's even better than before but let's look at the
table just to make sure. We are still working with only 2 5.99
one degree of freedom. (Right?) 3 7.81
4 9.49
Obviously, the ratio observed in this experiment (6,022
yellow : 2,001 green or a 3.01 : 1) is not so far off from
the 3 : 1 ratio as to cause concern.
Perhaps you found it difficult to follow through all those steps without a simple
"formula". This is a good time to present the formula in order to show you what you have
been doing and to help you in the future.
2
= [(O - E)2/E]
"O" is the number observed and "E" is the number expected.
The part within the brackets, (O - E)2/E, is the procedure you use to find the difference (O
- E), then square it (O - E)2 and then divide by the number of expected, (O - E)2/E. That's
what you do for all categories (smooth and wrinkled, green and yellow, etc.).
The symbol " " is called "sigma" and is used throughout math to mean "sum". Here it
tells you to add together (sum) the values you calculated for each category.
Some people enjoy equations and some people are panicked by them! Try to get use to
understanding and using this chi-square equation. I will not expect you to memorize the
equation, but I will expect you to "do the chi-square" and this formula will be useful to
help you through all those steps.
First, what would be the "perfect" 1 :1 ratio among this group of seeds?
There is a total of 6024 seeds (still) so a 1: 1 ratio should show us 3,012 yellow seeds and
3,012 green seeds.
OK, that's what we expected. Now let's do the chi-square.
Lets' do the green's first. That's (O - E)2/E = (2,937 - 3,012)2/3,012 = 1.867.
The yellows will be (O - E)2/E = (3,087 - 3,012)2/3,012 = 1.867 (again).
[In this example the "squared differences per expected" should be the same because here
you expect the same 1 : 1 ratio for the expected in this two category puzzle. ONLY when
you have a 1 : 1 ratio to test on a two category chi-square will you get equal "squared
differences per expected".]
5%
Adding them together gives me the 2 = 3.734 and I Degrees of
Significance
compare that with the values in the table. Freedom
Levels
I see that my new 2, using a 1 : 1 ratio, is low enough to be 1 3.84
2
within the range of significance. (This is less than 3.84.) 2 5.99
Therefore, the results of this experiment are far from being 3 7.81
3 : 1. I think they are really 1 : 1!
4 9.49
Surprised? Well, you shouldn't be. First off, the observed ratios look closer to 1 : 1 than
to 3 : 1. (Right?) Second, I didn't tell you that this experiment was an F2 population of
seeds. (Did I?) Indeed, I made these numbers up to represent the results you would get
from a test cross where the unknown genotype turns out to be heterozygote. In other
words, this is an acceptable ratio if one parent was ss and the other parent was Ss.
The chi-square is used whenever you want to compare the observed results to the ones
you would expect from a certain ratio. That ratio could be 1 : 1 or 3 : 1 or even (oh, no
) 9 : 3 : 3 : 1. That's right! You can use the chi-square to determine if a dihybrid cross is
producing offspring in acceptable ratios. Note, however, that with four categories (instead
of two) you have three degrees of freedom and twice as many calculations to do.
Do the Chi-Square Workshop (Workshop Three) now and then do the SAQs for this
lesson so you will get plenty of practice with the chi-square.
3:1
You should understand that the chi-square compares the NUMBER (not ratio) observed
to the NUMBER (not ratio) expected. You are given the observed numbers and from that
data you might guess what the ratio should be. You then use that "guessed" ratio to
calculate what the expected numbers would by from that guessed ratio.
Calculating the expected number is critical to doing the chi-square and many students
have trouble with that first step - they forget how to do it, use it backwards or don't do it
at all!
Let's work through this important step together so you will understand that logic.
You already know the number observed.
Smooth = 5474
Wrinkled = 1850
7324
7324 / 4 = 183
OK, you now have the expected numbers calculated from the expected ratio.
The best (easiest) way to COMPARE two values is to find their DIFFERENCE (by
SUBTRACTION).
1850 - 1831 = 19
8. What is the square of the difference between the observed and expected smooth?
These "square of the differences" are too large and must be "NORMALISED" by
dividing each by the number EXPECTED (NOT the number observed). This could be
called the "squared differences per expected".
10. What is the square of the difference between the observed and expected smooth,
divided by the expected number of smooth?
11. What is the square of the difference between the observed and expected wrinkled,
divided by the expected number of wrinkled?
Lastly, we add together these "squared differences per expected" to give us the TOTAL
"squared differences per expected".
2
0.066 + 0.197 = 0.263 the = 0.263
2
Therefore, the chi-square for this experiment is = 0.263.
OK - so what?
Statisticians have developed chi-square tables, based upon the probabilities that a
particular chi-square value will come about purely by chance. There are two "features" to
consider.
A. Significance Level….
We (scientists) like to use the level of 5% as our significant "cut-off". Any chi-square
larger than the value from the 5% Table indicates an experiment in which the ratios
observed are so far off the ratios expected that we have to conclude that the ratios
expected are wrong!
B. Degrees of Freedom…
The more "classes" (categories) the more likely that a statistical "blip" will increase the
acceptable limits of the chi-square. The "degrees of freedom" are one less than the
number of classes.
2-1=1
One degree of freedom.
Degrees of 5 % Significance
Freedom Levels
4 9.49
Yes! We calculated a 2 = 0.263. With one degree of freedom we could have a chi-square
up to 3.84 before we would become suspicious that the observed data was in a ratio too
far removed from the ratio we tested.
Yellow
Green
Total
Second, enter the data. Remember, data is what is observed. So data goes in the
"observed" (O) column.
(O-E)2
Phenotypes O E O-E (O-E)2
E
Yellow 4400
Green 1624
Total 6024
Next you fill in the "expected" (E) column. Using the total as a starting point divide that
number into the two sets of data that would produce the 3 to 1 ratio you expect.
Note that it might be easier to do the 1 (green) of the 3 :1 ratio first. However, if you are
comfortable with fractions it shouldn't be too hard to do them in any order.
(O-E)2
Phenotypes O E O-E (O-E)2
E
6024 X 3/4
Yellow 4400
4518
6024 X 1/4
Green 1624
1506
Now fill in the rest of the table. It's a lot of work but, now that you have it all organized,
it should be just a matter of using your calculator correctly. There is no reason to "total"
columns O-E or(O-E)2 so leave them blank. However, it is very important to complete the
"total" in the last column, (O-E)2/E, because that is the chi-square!
(O-E)2
Phenotypes O E O-E (O-E)2
E
Is the chi-square you calculated here within the boundary of "the possible"?
(To answer that, first go back to the Chi Square Significance Table you saw earlier. Then
page back down to here.)
NO! 2 = 12.32 but, with one degree of freedom we cannot accept any ratio that gives us
a chi-square larger than 3.84.
No! We must reject the 3 : 1 ratio. This data is far off the 3 : 1 ratio.
Before we dive into the chi-square we have to first determine what ratio we will test and
which category (class) fits with each part of the ratio.
Based upon these numbers, which phenotypes are dominant and recessive for the two
loci? (Remember, these are the F2s from a dihybrid cross so they should be close to a
specific ratio that you learned earlier. And you also learned which traits end up in each
part of that ratio.)
A dihybrid cross should produce a 9 : 3 : 3 :1 ratio in the F2s and a simple look at the
numbers will give you an idea of which belongs to each category.
The biggest group is the white shorts so they must be the doubly dominant class. In other
words, white shorts can be assigned the genotype W-S-.
On the opposite end of the ratio, the least represented group, would be the doubly
recessive so the red talls are the "1" in the 9 : 3 : 3 :1 ratio and have the genotype wwss.
You can deduce the other two classes, making up the "3" in the ratio. The white talls have
the genotype W-ss and the red shorts are wwS-.
Now that you have identified each category and assigned it to the ratio, we can begin the
chi-square to determine if it fits.
Let's begin by first arranging our computation table. It will be twice the size of the
previous table. It might help to arrange them in the table in a descending order to
represent the 9 : 3 : 3 : 1 ratio. Draw the appropriate table including the observed
numbers.
(O-E)2
Phenotypes O E O-E (O-E)2
E
White and
short 206
(W-S-)
Red and
short 83
(wwS-)
White and
tall 65
(W-ss)
Red and
tall 30
(wwss)
Total 384
Great! We are ready to start. First determine the "expecteds". It might be easier to do the
"1" part of the ratio first and work up the table. Regardless, take your time and calculate
what the expected numbers should be and fill in the "E" column.
(O-E)2
Phenotypes O E O-E (O-E)2
E
White and
24 X 9
short 206
216
(W-S-)
Red and
24 X 3
short 83
72
(wwS-)
White and
24 X 3
tall 65
72
(W-ss)
Red and
24 X 1
tall 30
24
(wwss)
I hope you were able to work through that and get these numbers too. Did you check your
math by adding up the column to make sure the E column equals the C column?
Now it is time to fill in the rest of the table and calculate the chi-square.
Go ahead and complete the calculations before paging down.
(O-E)2
Phenotypes O E O-E (O-E)2
E
White and
24 X 9 206 - 216 102 100 / 216
short 206
216 10 100 0.463
(W-S-)
Red and
24 X 3 83 - 72 112 121 / 72
short 83
72 11 121 1.681
(wwS-)
White and
24 X 3 65 - 72 -72 49 / 72
tall 65
72 -7 49 0.681
(W-ss)
Red and
24 X 1 30 - 24 62 36 / 24
tall 30
24 6 36 1.500
(wwss)
If you didn't, look over my answer and figure out where you went wrong - and try to
learn from your error so you can do it right next time. [A common mistake occurs in the
last column - many students divide by either the observed or by some other expected
number. Remember to always divide by the expected number for that category.]
Degrees of 5 % Significance
Freedom Levels
OK, you have calculated the chi-square and it is
now time to do something with it.
1 3.84
Here's a portion of the Chi Square Significance
Table. 2 5.99
Some students get through the difficult chi-square but then make a simple mistake at this
point. Some get confused and pick a number out of the ratio and say there at nine classes!
Or three. Or some other number and I cannot figure out where it came from. So, just to
keep yourself thinking clearly, it is smart to list the categories.
1 3.84
Three (4 -1 )
2 5.99
Does the 9 : 3 : 3 : 1 ratio fit the
data?
3 7.81
4 9.49
Yes! With three degrees of freedom you can have a chi-square as large as 7.81 before we
would be beyond our 5% significance.
Notice that if you had been so foolish as to stick with the one degree of freedom (that we
were using with the monohybrid crosses) you would have decided that the chi-square was
too large and would have (WRONGLY) rejected the ratio!
1:1
There are in vitro fertilization (IVF) methods that can increase the chances that a girl will
be born or a boy will be born. You can use the chi-square to determine if a particular IVF
clinic is really increasing the chances of having a boy or girl. You could look at the
number of girls and boys born to women who wanted girls or boys and calculate the chi-
square.
If a particular IVF clinic can, indeed, increase the odds, would you expect the chi-square
to be above or below the value of 3.84 (which I got from the table above)?
If the IVF clinic can change the ratio from the expected 1 : 1 then the chi-square,
calculated on the number of daughters or sons born, would be greater than 3.84.
I hope you understand that here we are "hoping" that the ratio will NOT be 1 : 1. (In point
of fact, scientists aren't supposed to "hope" for results but the fact remains that they often
hope a lot! )
You are the district manager of three fast food restaurants and you are looking over the
revenues. You see that store A made $1,000,000, store B made $3,000,000 and store C
brought in $5,000,000. You wonder if that is just a statistically blip. How would you use
the chi-square to test the idea that these stores are different - beyond luck? (Don't do the
chi-square - just tell me how you would set it up.)
You would "expect" a 1 : 1 : 1 ratio in the revenues if they were all the same. In other
words, the total revenues of $9,000,000 would be distributed evenly. You would
expect ...
Store A = $3,000,000
Store B = $3,000,000
Store C = $3,000,000
You could now find, for each store, the difference between expected and observed
revenues, square the difference, divide that by the expected and then add all three
together to get a chi-square value.
Suppose the manager of store A complains that you are not being fair because you
haven't taken into account the differences in local population around each store. His store
serves a smaller community. So, you go to the population records and discover that store
A serves a population that is only a quarter the size of the communities served by stores
B and C. Can you redo the chi-square? How?
The information about the populations tells you that there are four times as many likely
customers for stores B and C as A. You can express that as a ratio of 1 : 4 : 4. If revenues
are dependent upon population you would expect ("expect" is the magic word that means
"here comes a chi-square")
Store A = $1,000,000
Store B = $4,000,000
Store C = $4,000,000
The observed revenues were
Store A = $1,000,000
Store B = $3,000,000
Store C = $5,000,000
Now you would do another chi-square to determine if these numbers fit a 1 : 4 : 4 ratio
(thus showing that revenues are probably dependent upon population).
And finally, what is the degree of freedom for this-three store problem?
There are three categories (Stores, A, B and C) so there are two degrees of freedom.
These last few puzzles, about sex ratios and revenue ratios, are to show you that the chi-
square has many uses and that all you have to do is identify how to think about the ratios,
expectations and outcomes.