Sunteți pe pagina 1din 10

Zoo Biology 5161-170 (1986)

Ancestry of Alleles and Extinction of Genes in Populations With Defined Pedigrees


E.A. Thompson
Statistical Laboratory, University of Cambridge

The paper presents, without mathematical formalities, two computational approaches to the analysis of pedigrees. One provides a method for inferring the ancestry of rare alleles observed in a population. The other provides the probabilities that specified combinations of founder genes survive to this current population. The emphasis in both approaches is upon the importance of joint probabilities. If some founder genes survive, others have smaller probability of doing so also. If an allele descends from a given founder to a current individual, it is likely to appear also in individuals sharing the same lines of descent.

Key words: complex pedigrees, rare alleles, gene extinction, joint probabilities

INTRODUCTION

Where the whole of a population or even of a species descends from few original founders via complex paths of descent, several questions arise. Often there are rare alleles segregating in the pedigree; it is of interest to infer the original contributor of such alleles. This is particularly the case for recessive alleles, where different inferences as to the origin provide different assessments of the current distribution of unobservable carriers. A second set of questions concerns the extinction of genes. At a given autosomal locus, how many distinct founder genes survive? How does survival of certain founder genes affect that of others? Within a defined pedigree, homologus genes are competing for survival, in the sense that only one can be passed on at any given segregation. On the other hand, some genes must survive in an extant population. The importance of joint assessments is summarised by the following apt quotation (concerning a crossword competition) from a national daily paper: How many of this years 3600 entrants can reasonably expect to be amongst the first three? Probably about ten.
Received September 17, 1985; accepted October 22, 1985. Address reprint requests to Dr. E.A. Thompson, Department of Statistics, GN22, University of Washington, Seattle, Washington 98195

0 1986 Alan R. Liss,Inc.

162

Thompson

Questions of the above kind have been studied in the context of genetically isolated human populations. On Tristan da Cunha, the whole of the population of 268 individuals descends mainly from eleven early founders [Roberts, 19711. Inferences about the alleles contributed by some of these are made by Thompson [ 19781. For a Mennonite-Amish genealogy, the question was of the ancestral origin of a recessive allele [Kidd et al, 19801. A method facilitating joint computations was given by Thompson [ 1983al. In a Newfoundland genealogy [Buehler et al, 19751, the questions were of validity of a single-locus genetic model for an apparently recessive trait, and of estimates of the individuals carrying or having carried the recessive allele [Thompson, 19811. The aim here is to present the general methodology to those interested in the pedigrees of animal populations. Animal pedigrees differ in detailed structure from human genealogies. The former often show some individuals with very large numbers of mates and close inbreeding. However, neither of these factors complicates the computational procedures. A more serious problem is in multiple loose inbreeding resulting in lengthy and entangled loops in the genealogy, but this occurs also in human genealogies. Another feature of animal pedigrees is of extensive cross-generation mating, which in human genealogies is more limited in depth. This feature can cause added computational complexity. The subject of drawing genealogies is not at issue here, but it will be convenient to introduce the marriage node graph representation. An example is given in Figure I . The lines represent individuals: an individual connects his parents marriage to his own. Where an individual is involved in matings with different individuals, a special symbol ( < >) is used at the connecting point of the relevant arcs. For large and complex pedigrees, this representation greatly facilitates the tracing of ancestry of individuals, and the graph-definition of the structure has been used in programs for computations on such pedigrees [Thompson, 19801.
ANCESTRY OF ALLELES

The classical coefficient of kinship between two individuals is well known to both animal and human geneticists. The simplest overall summary of a relationship, it is defined as the probability that two randomly chosen homologous genes, one from each individual, are identical by descent; that is, are copies of a single gene in some common ancestor, received by each via repeated segregations. Wright [ 19221 first proposed a method for computing kinship coefficients, and there have been many approaches over the years. However, the advent of recursive block-structure programming languages has resulted in the computational feasibility of very simple recursive algorithm. Provided A is not B, nor an ancestor of B, the kinship coefficient *(A,B), between A and B, satisfies the following well-known equation:

where MA and FA are the parents of A. If A is an ancestor of B, the symmetry of kinship coefficients allows the equation to be instead applied to B;

Allele Ancestry and Gene Extinction

163

''Current'

Individuals

'Current' lnd!vlduals

Fig. 1. Example of a small pedigree (a), with its marriage node graph representation (b).

164

Thompson

Further, if A is B, we have the equation

while if A is a founder (MA = FA = 0), not an ancestor of B, *(A,B) = 0 and k(A,A) = 1/2.

(4)

The four equations (1) to (4)are sufficient to define the kinship coefficient between any two individuals, and can be implemented in a recursive language precisely as they stand. In fact, together with arrays or pointers providing the father and mother of each individual, they essentially form a program. Karigl [19811 generalised kinship coefficients to larger numbers of individuals. That precise generalisation is not relevant here, but leads to the following consideration. Let gr(B1, . . .,B,:A) denote the probability that a set of r homologous genes, one chosen from each of the individuals B1,. ..,B,, all descend from a specified set of founder genes, denoted A. (Note that the genes are not identical by descent: they may descend from different genes within A.) The index r is not an integral part of the function, but is convenient for clarity and facilitates programming. Now these descent functions satisfy equations precisely analogous to equations (1) to (4).Thompson [ 1983al explains the derivation, while Thompson [ 1983bl gives the general formula. If B1 neither is, nor is an ancestor of, any of B2, . . .,B,,

where MI and F1 are the parents of B1. The function g, is symmetric in all its arguments, (2)

and if the individual whose parents are next to be considered is repeated k times (k > 1)

If BI is a founder, with 0,1, or 2, genes in A, and k 2 1,

or g,(Bk+

..,Br:A), respectively

(4)

Equations (1) to (4) can be as easily implemented as equations (1) to (4), although the greater number of arguments increases computation time for any particular instance.

Allele Ancestry and Gene Extinction

165

TABLE 1. Probabilities (y,& a ) Defined in the Text, for the Individuals and Founders Shown. For Ease of Presentation, All Probabilities in a Row are Multiplied by the Given Factor
Individual
U V

Factor
16

B
6,2,1 20,6,3 8,2,1 72,20,11

Founder C
6,2,1 20,6,3

F
4,0,0

Inbreeding coefficient
1/8 3/32 1/16 5/32 251256

64
32 32 256

Q R

o,o,o

8,0,0 8,0,0
16,8,4 80,16,8

4 ~ 0
56,10,6

There are some special cases of the probabilities of g, which can illuminate the structure of ancestral contributions. The probability y = gl(B;A2), where A2 is the pair of genes in a single founder A, is simply the ancestral contribution of A to B. If MB and FB are the parents of B, 0 = g2(MB,FB;A2) is the bilineal contribution: the probability that both genes of B derive from A. If Al is a single gene of A, then a = ~ ~ ~ ( M B , F Bis ;A the ~ inbreeding ) contribution: the probability of the two genes of B are copies of some single gene of A. The recursive program has been extended so that these (y, P, &)-typeprobabilitiesare computed for any current individual,simultaneously for each of an array of founder individuals [Gilpin and Thompson, in preparation]. The genealogy of Figure 1 has six founders (A,B,C,L,F,K), and the six-dimensional (y, 6, a)vectors have been computed for all individuals. For a small genealogy such as this, computation time is only a few seconds. Table 1 shows the results for the three major founders B,C and F, and current individuals U, V, Q , R and W. The first component of each triple, the ancestral contribution, is straightforward. The final contribution, a, summed over founders, provides the inbreeding coefficient. (Note that for individual Q, the founder A also contributes to inbreeding.) More important, for each current-founder pair of individuals 0 - a is the probability that both of the two different genes of the founder descend to the current descendant: the individual is a copy of the founder. Where lines of descent to the mother and the father of the individual are not intertwined, @ = 2a, and an individual who receives two genes from the founder receives two copies of the same one with probability alp = 112. In general, a P 2a, and 0 - a may be small. In seeking current individuals representative of founders, we wish to have as high a value of 0 - a as possible, in order that both genes of the founder may be represented. In effect we are seeking individuals whose ancestry to founders is not channelled only through a few individuals. (This practical use of the (P,a) probabilities was suggested by N. Flesness, pers. c o r n . ) . In our example genealogy, due to its very limited depth, 0 = 2a for all founders and all current individuals except W. There is, however, some interaction in the descent of genes from B and from C to R and V, the parents of W. H has one copy of a gene of B and one from C. If either of these descends to both R and V, and thence to W, these two genes of W received from B (or from C) will be copies of the same gene in H, and hence identical. In this example the interaction is not great, as R may also receive genes of B via G, and V may receive genes of B or C via I and J. However, the probabilities a l p for W with respect to B and C are greater than 112, being 11/20 and 3/5 respectively. If W does receive two genes from B or from C the

< <

166

Thompson

chances are greater than even that they are two copies of only one gene. In an extended genealogy, with limited paths of descent, such probabilities may be very high-sometimes even one, indicating that the individual can never carry two different genes of the founder. The more general probabilities g,, for larger sets of founder genes and for larger sets of current individuals, can also be of practical importance. Suppose a certain few individuals are known to carry a recessive allele. For example they may be the parents of individuals affected with a recessive trait. Suppose we compute the probability g,(B,, . . .,B,:A) for this set of individuals, and for a variety of sets A. The set A may be considered hypothesised original copies of the allele. If the descent probability g, is much higher for a set Al than for a set A2 of equal size, this implies that Al is a more likely set of original allelic copies than is A2, since genes we known to be of the required allelic type descend jointly from Al with higher probability than from A2. This interpretation must be treated with some caution. Larger sets of founder genes will clearly give larger probabilities. So also will more recent sets, provided the probabilities are nonzero. There is a bias in the approach in that only descent of the affected allele is considered, and not the descent of the normal allele to the many unaffected individuals of the population. Against this, the method has the advantage of computational speed. Many alternative sets A can be considered rapidly, to obtain at least a qualitative inference as to likely origins. Moreover, the set A need not consist only of founder genes. Provided it does not involve individuals who are ancestors of each other, A can be a set of hypothesised ancestral copies at any level in the pedigree. Thus the likely paths of descent from founders to the current individuals can be traced. Thirdly, and perhaps more important, it is joint descent to several current genes that is considered. Where currently affected individuals are interrelated, it is the paths by which the allele could descend jointly to all their carrier parents that are relevant.
GENE EXTINCTION

A full anaylsis of gene ancestry would require that the ancestry of all alleles be jointly considered. The relevant probability would be: P(al1 genotypes for this locus are as observed given a specified combination of founder genotypes).

(5)

Computed for each possible founder combination, these probabilites would provide a comparison between alternative ancestral hypotheses. The problem, of course, is in computing (5). The method of Cannings et a1 (1978) provides a method which may be employed on relatively small yet complex pedigrees, such as that of the Tristan da Cunha population. The method involves working back from the current population to the ancestors, accumulating information, which is summarised as a set of probabilities of the form (5) for the joint genotype combinations of ancestral individuals across the pedigree. Now suppose we wish to determine the probability of extinction of a certain set of homologous founder genes. Let us label these genes as a certain allelic type (E, for extinct, say) and all other founder genes as type S for (possibly) surviving genes (see Figure 2). Then the E genes (and perhaps also some S genes) are extinct if and

Allele Ancestry and Gene Extinction

167

*Current

Individuals

Fig. 2. Ancestral likelihoods as gene extinction probabilities. The E alleles amongst founders (and perhaps also some of the S alleles) are extinct in the current population, which is homozygous SS. The pedigree is that of Figure I, with the three individuals Q. R, and W assumed to constitute the current population. The ancestral set whose probability of extinction is considered here consists of four genes; both genes of founder A and one gene of each of B and K.

only if the current population consists entirely of genotypes SS. Conversely, starting from the observation of a current population of SS individuals (homozygous for non-extinct genes) we can work backwards through the genealogy to compute the expression (5) for all possible founder E/S combinations. That is, we have precisely the joint extinction probabilities for all combinations of founder genes. Hence we can compute the probability distribution for the number of distinct founder genes. These computations have been performed for the example genealogy of Figure 1, and some of the results are summarized in Table 2. The values shown are the overall probabilities that 0, 1 and 2 genes of each of the major founders B, C and F survive in the three descendant individuals Q, R and W. Note that it is impossible for all six of these founder genes to be extinct in the three current individuals; R must in fact carry one gene from N and thence from B, C , or F. It is also impossible for all the six genes to survive, or indeed for both those of C and F to survive. All other combinations have a strictly positive probability, although some are very small. The event of maximum probability (0.191) is that the six genes of Q, R and W contain only two genes from B, C and F-one each from B and from F. The event of one gene from each of the three founders has only slightly smaller probability (0.182). From the totals we see that each of B and F has a greater than 50% probability of exactly one gene surviving and a substantial probability that both do so, but C has a greater than 50%probability that both are lost. Note also the interactions: when both

168

Thompson

TABLE 2. Probabilities of Extinction ( X 1000) for Each Cornhination of 0, 1 or 2 Extinct Genes, for the Founders B, C and F
Genes of

C
0

F
0 1

0
0 1 '/z

No. of extinct genes for B 1 0 11 4 32 182 41 93 191 18


572 453 662

2 0 I 2 0
1

3 41 15 23 86 17
186 31 230

0 12 3 35 94 10 44 44 0
242 516 108

Totals for B Totals for C Totals for F

genes of C and one of B are extinct it is more than 15 times as probable that one or two genes of F survive (0.191 + 0.093) than that both are extinct (0.018), but when both genes of C survive, the ratio of survival to extinction of genes of F is only 3: 1. An illuminating example, both of dependence in gene extinction, and of the relevance of numbers of distinct surviving genes is provided by the Tristan da Cunha population. If we simply calculate founder contributions, as is currently done in analyses of populations of zoo animals [Ballou and Seidensticker, 19831, we find that eleven early founders contribute 84.5% of current genes, and genealogically they fall clearly into two groups of five, together with an eleventh individual whose genes have high probability of survival. The contributions of the two groups are almost equal, being 36% and 33%, with the additional individual providing just over 15% [Thomas and Thompson, 19841. On the other hand, in terms of numbers of distinct genes, one group contributes substantially more than the other. The probabilities of the numbers of surviving genes are shown in Figure 3. Each group has at most nine genes surviving, since in each there is one of the five founders who has only one offspring. However, one group must have at least four surviving genes, while the other may have only two. The first is expected to have seven distinct founder genes surviving, but the second only five. Thus the former group shows much greater variety in its contribution to the current population, while the latter (with almost equal total contribution) is dominated by multiple copies of two or three genes from one founder couple. In human populations, where founders are often couples, sometimes with few offspring contributing to the current population, the main dependence in gene survival is between the genes of founder couples. Any grandchild of such a couple receives (at a given locus) a gene from one member but not the other. Thence, more generally, there will be a negative dependence between the genes of female founders and genes of male founders. This dependence was examined by Thomas and Thompson [1984] for the case of the Tristan da Cunha population. Dependence was found to be slight in this particular case, probably because there were only two founder couples, the other founders (six males and one female) marrying descendants of these. In animal

Allele Ancestry and Gene Extinction


.4

169

.3
0

z9
[ r

v)

>
3

LL

0 .2

>

d m <
m 0 [ r a
.1

II

I i I
I
I
I

I I

J:.'
/-

\
I I

;\\
10

0
2

-/HO

NO. OF GENES SURVIVING

Fig. 3. Gene survival probabilities for the two groups of Tristan da Cunha founders.

populations, where individuals often have many more mates, dependence will again probably be slight, but this is an open question of some interest.
EXTENSION OF THE COMPUTATIONAL METHODS

We have emphasised the need to consider both ancestors and current individuals jointly. It is, however, this necessity which limits the computational feasibility of the method used in the previous section for working back through a genealogy. Suppose we have a sibship of ten ES children, whose parents are unobserved. (E and S now denote any allelic types.) Then it is very likely that the parents were EE and SS. If the mother was EE, then the father was almost surely SS, and vice versa. Inferences about the two parents cannot be separated. Suppose, however, there were only two children, who were both SS. Then we know that each parent contributed S on each of two occasions. We have relative likellhoods (of 1:4) that each was ES or SS, but more important there is now no dependence between the inferences about the two parents. The same ideas may be extended far more generally [Thomas, 19841. Although in principle ancestors must be considered jointly, there are certain conditions under which inferences are independent (or almost so). Where this is so, computations can be carried out on very much more complex genealogies. Two cases where it is likely

170

Thompson

to be so are those of this paper. For the ancestry of a rare allele, those known to carry the allele must be considered jointly, but other individuals normally have little impact upon inferences. For a non-extant allele (the E-allele of the gene extinction problem) many sections of the computation contribute almost independently. The precise criteria for such independent treatment, and ways of implementing them into the general program for probability computations on pedigrees, are topics of current research. When successful, they will enable even more complex genealogies to be analysed without increased computation.
CONCLUSIONS

The methods presented here have all been used to answer questions about complex human genealogies. They are still under development to be able to handle even larger and more complex genealogies. Complex pedigrees, and the same types of questions concerning them, arise in the analysis of pedigrees of rare species. Where these are now maintained only in captivity, the whole pedigree from few original founders may be accurately known. Questions as to the numbers of distinct surviving genes are perhaps the most immediately relevant, and the distinction between this survival of variety and variation in total founder contributions has been illustrated. To take a simple example, where only one gene of a founder survives, equalising contributions with another which can provide two can only increase disparity between the representation of the surviving genes and could even jeopardise the survival of some.
REFERENCES
Ballou, J; Seidensticker, J. Demographic and genetic status of the captive populations of Sumatran tigers. 1983 INTERNATIONAL TIGER STUDBOOK: 5-39, 1983. Buehler, S.K. ; Firme, F. ; Fodor, G. ; Fraser, G.R. ; Marshall, W.H.; Vaze, P. Common variable immunodeficiency, Hodgkins disease and other malignancies in a Newfoundland family. LANCET 1:195-197, 1975. Cannings, C.; Thompson, E.A.; Skolnick, M.H. Probability functions on complex pedigrees. ADVANCES IN APPLIED PROBABILITY 10:2661, 1978. Karigl, G. A recursive algorithm for the calculation of identity coefficients. ANNALS OF HUMAN GENETICS 45:299-305, 1981. Kidd, J.R.; Wolf, B.; Hsia, Y.E.; Kidd, K.K. Genetics of propionic acidemia in a MennoniteAmish kindred. AMERICAN JOURNAL OF HUMAN GENETICS 321236-245, 1980. Roberts, D.F. The demography of Tristan da Cunha. POPULATION STUDIES 25: 465-479, 1971. Thomas, A. THE USE OF APPROXIMATION IN COMPUTING ANCESTRALLIKELIHOODS FOR LARGE COMPLEX PEDIGREES. Knights Prize Essay, University of Cambridge, 1984. Thomas, A,; Thompson, E.A. Gene survival in an isolated population: the number of distinct genes on Tristan da Cunha. ANNALS OF HUMAN BIOLOGY 11: 101-112, 1984. Thompson, E.A. Ancestral Inference 11. The founder of Tristan da Cunha. ANNALS OF HUMAN GENETICS 421239-253, 1978. Thompson, E.A. Recursive routines for computations on pedigrees. Tech. Rept. No. 17; Department of Biophysics, University of Utah, 1980. Thompson, E.A. Pedigree analysis of Hodgkins disease in a Newfoundland genealogy. ANNALS OF HUMAN GENETICS 45:279-292, 1981. Thompson, E.A. A recursive algorithm for inferring gene origins. ANNALS OF HUMAN GENETICS 47: 143-152, 1983a. Thompson, E.A. Gene extinction and allelic origins in complex genealogies. PROCEEDINGS OF THE ROYAL SOCIETY (LONDON), B, 219: 241-251, 1983b. Wright, S. Coefficients of inbreeding and relationship. AMERICAN NATURALIST 56: 330-338, 1922.

S-ar putea să vă placă și