Abstract
A methodology is introduced for numerical evaluation, with any given accuracy, of the cumulative probabilities of the proportion of genome shared identical by descent (IBD) on chromosome segments by two individuals in a grandparenttype relationship. Programs are provided in the popular software package Maple for rapidly implementing such evaluations in the cases of grandchildgrandparent and greatgrandchild–greatgrandparent relationships. Our results can be used to identify chromosomal segments that may contain disease genes. Also, exact P values in significance testing for resemblance of either a grandparent with a grandchild or a greatgrandparent with a greatgrandchild can be calculated. The genomic continuum model, with Haldane's model for the crossover process, is assumed. This is the model that has been used recently in the genetics literature devoted to IBD calculations. Our methodology is based on viewing the model as a special exponential family and elaborating on recent research results for such families.
THE genes at a given locus of two related individuals are said to be identical by descent (IBD) if one is a physical copy of the other or both are physical copies of the same gene in a common ancestor. Calculations associated with the concept of IBD at a single locus or at a finite (small) number of (linked) loci have been published in the genetics literature since the early 1940s (cf. the references in Bickeboller and Thompson 1996a). Such calculations become difficult with the increase of the number of loci and/or relatives. Moreover, considering a finite, however large, number of independent loci cannot account for the possibility of recombination. The latter problem can be overcome by considering the chromosomes as a continuum and modeling the occurrence of crossovers by a point process (cf. Lange 1997, Chap. 12). The latter is usually a Poisson process. It does not account for the possibility of interference but provides a good approximation to reality if long chromosomal regions are considered. This model dates back to Haldane (1919) and Fisher (1949). There has recently been increased interest in IBD calculations for the genomic continuum model. This is mainly due to the availability of data on densely packed loci, which makes the concept of IBD sharing for chromosomal regions or the whole genome of practical importance. The first result concerning IBD calculations, in the framework of the continuum model, is due to Donnelly (1983). He calculates the probability that individuals in a given relationship share any part of the genome IBD.
Note that the proportion of genomeshared IBD by related individuals is a random variable unless we consider some trivial cases, such as twins or a parent with a child. Donnelly (1983) calculates the probability that this random variable is positive. Its distribution function is generally unknown. The first result, although not exact, concerning its distribution in the case of c halfsibs, is due to Bickeboller and Thompson (1996a; cf. Bickeboller and Thompson 1996b). They find approximations to it using the Poisson clumping heuristic. Note that no exact result for the distribution function of the aforementioned random variable is available even in the simplest case of two closely related individuals, such as halfsibs or a grandparent and a grandchild. Earlier results provide only the expected value and variance of this random variable and the conditional counterparts of these given information on flanking markers (see Goldgar 1990; Hill 1993; Guo 1994a,b, 1995; Thompson 1995).
Throughout the article only autosomal chromosome segments are considered. Equal map lengths are assumed for male and female (cf. Bickeboller and Thompson 1996a).
In this article we introduce a methodology for numerical evaluation, with any given accuracy, of the cumulative probabilities of the proportion of genome shared IBD on chromosome segments by two individuals in a grandparenttype relationship. We provide Maple V programs for implementing such evaluations in the cases of grandchildgrandparent and greatgrandchild–greatgrandparent relationships. Also, these evaluate the cumulative probabilities given any information (e.g., such as inheritance) on one of the flanking markers. These are the first exact distributional results concerning IBD calculations in the framework of the genomic continuum model. Our methodology is applicable to higherorder grandparenttype relationships at the expense of heavier computational effort. It is also applicable to other relationships as long as the associated underlying mathematical models (continuous time Markov chains) do not have too many states. Roughly speaking, the latter means that we consider a small number of closely related individuals—for example, several halfsibs. Our results can be used in identifying chromosomal segments that may contain disease genes. Also, exact P values can be derived in significance testing for resemblance of a grandparent with a grandchild and of a greatgrandparent with a greatgrandchild.
THE UNDERLYING MATHEMATICAL MODEL
Haldane (1919) and Fisher (1949) have suggested that chromosomes be considered as a continuum and that the occurrence of crossovers along the chromosomes be modeled by a Poisson process. If the distances are measured in morgans, then the rate of the Poisson process is one. Donnelly (1983) elaborated on this model and showed that all crossover processes on a pedigree can be viewed as a continuous time Markov chain, whose states are the vertices of a hypercube. Also, the genomeshared IBD by a group of related individuals equals the sojourn time at a set of vertices up to time d, where d is the length (in morgans) of the chromosome segment of interest. For example, the amount of genome inherited by a greatgrandchild from a greatgrandparent equals the sojourn time at the vertex (1, 1) in a continuous time Markov random walk on the four vertices (1, 1), (1, 0), (0, 1), and (0, 0) of the twodimensional unit cube, where the holding times at all vertices are exponentially distributed with parameter 2 (cf. Donnelly 1983; Guo 1995, p. 1473). Likewise, the amount of genome inherited by a grandchild from a grandparent equals the sojourn time at state 1 in a continuous time Markov random walk on the two vertices 1 and 0 of the onedimensional unit cube, where the holding times are exponentially distributed with parameter 1. More specifically, the model for the first relationship (greatgrandchild–greatgrandparent) is a fourstate continuous time Markov chain whose parameters are described as follows. Denote the states (1, 1), (1, 0), (0, 1), and (0, 0) by 1, 2, 3, and 4, respectively. The holding times are exponentially distributed with parameter 2 and the onestep transition probability matrix of the embedded discrete time Markov chain is given by
METHODS
Our methodology is based on the following key points. First, the underlying model can be viewed as a member from a special exponential family. Second, recent research results on such families (cf. Stefanov 1991, 1995) are applicable to get explicit expressions for the characteristic functions of relevant stopping times. Third, these characteristic functions are numerically invertable using the system Maple V (Monaganet al. 1997) and some numerical tools. Therefore, their distribution functions are derivable. Fourth, the latter distribution functions yield the distribution function of a sojourn time in a state within any fixed time interval. Subsequently the cumulative probabilities of relevant proportions of genomeshared IBD can be calculated. More details follow.
Let {X(t)}_{t}_{≥0} be a continuoustime Markov chain with four states whose parameters are described below. The embedded discretetime Markov chain has the onestep transition probability matrix
Recall that T(t) (the sojourn time in state 1 up to time t) identifies the length of the genome shared by the two individuals on a chromosome segment of length t. It is easy to see that the distribution functions of T(t) and τ_{s} are related as
The following propositions provide explicit expressions for the characteristic functions of S(τ_{s}) corresponding to different initial states. Their proofs are found in the appendix.
Proposition 1. Assume that p_{i} = q_{i} = 0.5 for each i, λ_{1} λ_{2} = 2, and X(0) = 1. Then the characteristic function of S(τ_{s}) is given by
Proposition 2. Assume that p_{i} = q_{i} = 0.5 for each i, λ_{1} λ_{2} = 2, and either X(0) = 2 or X(0) = 3. Then the characteristic function of S(τ_{s}) is given by
Proposition 3. Assume that p_{i} = q_{i} = 0.5 for each i, λ_{1} λ_{2} = 2, and X(0) = 4. Then the characteristic function of S(τ_{s}) is given by
Proposition 4. Assume that λ_{1} λ_{2} = 1 and X(0) = 1. Then the characteristic function of U_{2}(ν_{s}) is given by
To compute the cumulative probabilities of the proportion of shared genome we need the conditional cumulative probabilities of S(τ_{s}) and U_{2}(ν_{s}) given the initial state [cf. the identities (2), (3), (4), and (5)]. The above propositions provide the characteristic functions of these conditional distributions. We invert them, using some numerical tools, and find the required cumulative probabilities. The mathematical details of these are provided in the appendix.
RESULTS
We provide two Maple V programs in the appendix. These evaluate the cumulative probabilities of the genomeshared IBD on chromosome segments by two individuals in either grandchildgrandparent or greatgrandchild–greatgrandparent relationship. It takes a few minutes real time to execute the longer program on either a PC or UNIX workstation. Tables 1 and 2 provide excerpts of such cumulative probabilities for chromosome segments of length 0.5, 1.75, and 3, respectively. Table 1 does not contain quantiles larger than the median because the distribution function for the grandchildgrandparent relationship is symmetric.
The user of these programs should enter the lengths (in morgans) of the chromosome segment of interest (d) and the shared part of this (s) in the second and third rows, respectively (note that s/d is the proportion of shared genome). The programs contain hypothetical values for these and the last row in the output that appears on the screen after the program is executed.
Remark: Our programs evaluate the cumulative probabilities for each s, such that 0 ≤ s < d. Evaluations for the trivial case s = d are not needed because the corresponding cumulative probability is clearly equal to one.
The initial probability vectors are denoted by (c_{1}, c_{2}) and (c_{1}, c_{2}, c_{3}, c_{4}), respectively (cf. the last procedure in these programs). These are set up to be the steadystate probabilities. If the user wishes to evaluate cumulative probabilities given information on one of the flanking markers, then he/she should set up the initial probabilities accordingly.
DISCUSSION
In this article we provide Maple V programs for numerical evaluation, with any given accuracy, of the cumulative probabilities of the proportion of genomeshared IBD on chromosome segments by two individuals in either grandchildgrandparent or greatgrandchild–greatgrandparent relationship. These are the first exact distributional results concerning IBD calculations in the framework of the genomic continuum model. The results also yield exact evaluations of the cumulative probabilities given information (e.g., such as inheritance) on one of the flanking markers. We suggest a couple of applications below. These assume the availability of continuous IBD data. Such data are not yet, but will be made, available with the progress on the Genome Project. In particular, a new technique called genomic mismatch scanning (Nelsonet al. 1993) is expected to produce almost continuous IBD data (cf. Cheung and Nelson 1996; McAllisteret al. 1996).
Our results can be used in devising tests for identifying chromosomal segments that may contain disease genes. Such segments are expected to have unusually large proportions of genomeshared IBD by the affected related individuals. The cumulative distribution function is the relevant quantitative measure for such unusualness. For example, let a chromosomal segment be suspected of carrying responsible genes for a particular disease. The hypothesis to be tested is “the segment does not carry such genes.” Our data consist of observations over the corresponding proportions of genomeshared IBD on that segment for n independent pairs of individuals, each in a grandchildgrandparent relationship, and all affected by the disease. Then a relevant test statistic is the minimum of these proportions. Its cumulative probabilities, and subsequently relevant P values, can be evaluated using the enclosed program for grandchildgrandparent relationship. More specifically, let x be the observed value of this statistic. Then the relevant P value is equal to (1 – F(x))^{n}, where F(x) can be evaluated by the aforementioned program (with s = xd, where it should be recalled that d and s are the lengths of the chromosome segment and the shared part, respectively). Note that this test is in the spirit of the QTL tests based on the common theme of allele sharing (cf. Lynch and Walsh 1998, pp. 523–533).
Our results can also be used for exact evaluation of P values in significance testing (that is, when there is no specified alternative hypothesis) for resemblance of either a grandchild with a grandparent or a greatgrandchild with a greatgrandparent. The crossover processes on different chromosomes are assumed to be independent. Our data consist of x_{1}, x_{2},..., x_{n}, where x_{i} is the observed proportion of genomeshared IBD by the two individuals on the ith autosomal chromosome (n = 22 in humans). A relevant test statistic is the maximum of these proportions. Let x be its observed value. Evaluate the corresponding cumulative probabilities (with s = xd_{i} and d_{i} the length of the ith autosomal chromosome) for each of these chromosomes. Then the relevant P value is equal to the product of these probabilities. Note that if there is a specified alternative hypothesis, then other tests, based on a fulllikelihood approach, would be expected to be more powerful (cf. Browning 1998).
Our methodology is general enough to be applicable to higherorder grandparenttype relationships and other relationships. Our methods for deriving explicit closedform expressions for relevant characteristic functions are applicable to the sojourn time in any given state in any given finitestate Markov chain. Therefore, they are applicable to higherorder grandparenttype relationships. The associated algebra might become unmanageable if the number of states of the underlying Markov chain is too large. However, Poisson approximations (cf. Bickeboller and Thompson 1996a,b for the case of halfsibs) should work well in such cases, and therefore exact evaluations might not be needed. Furthermore, similar numerical tools and the help of Maple would yield numerical inversions of such characteristic functions. For other relationships the quantity of interest is the sojourn time in a set of states. This can be identified as a sojourn time in a single state for a suitable Markov renewal process that is associated with the underlying Markov chain. Tools for deriving explicit expressions of the relevant characteristic functions associated with such a sojourn time can be found in Stefanov (1995). We are currently investigating other relationships, such as halfsibs.
Acknowledgments
The author is grateful to E. Thompson for helpful discussions on the topic and to the referees and the associate editor for their detailed and constructive comments.
APPENDIX
Proofs of the propositions: The fundamental identity in sequential analysis states that for a finite stopping time the sequential likelihood function is derived from the nonsequential one by substituting the stopping time for the time parameter. Thus, in view of (1), the sequential likelihood function of the chain X, observed up to time τ_{s}, is given by
In view of Stefanov's (1991) results the family given by (A1) is a noncurved exponential family of order six—that is, there is no linear constraint on the Y_{i}'s and the dimension of the parameter θ is six. For formal definitions and basic analytical properties of exponential families one may refer to BarndorffNielsen (1978) or Brown (1986). The characteristic function of the canonical statistic (Y_{1}, Y_{2},..., Y_{6}) has an explicit representation in terms of φ(θ)(cf. BarndorffNielsen 1978, p. 114). In particular, for the characteristic function of S(τ_{s}) we get
Note that
Proof of Proposition 2: Note that interchanging states two and three do not result in a new transition probability matrix. Therefore, the distribution of S(τ_{s}) is the same for both cases X(0) = 2 and X(0) = 3. Assume the initial state is 2. Similarly to the preceding case we have the linear relationships
Proof of Proposition 3: Recall that the initial state is 4, that is X(0) = 4. Similarly to the above cases the following identities hold:
The proofs of Propositions 4 and 5 follow similar arguments and are therefore omitted. Note also that our methods are based on general results (cf. Stefanov 1991) that are valid for any given finitestate Markov chain. Thus, they can also be applied to higherorder grandparenttype relationships.
Details about the inversion of the relevant characteristic functions: First recall the inversion formula
Recall that we assumed s > 0(cf. methods). Continuity arguments imply that our formulas also produce the cumulative probability for s = 0. An alternative way to show this is by letting τ_{0} be the waiting time till the first entry in state 1 and noting that this leads to the same formulas.
Most of the algebra associated with deriving closedform explicit expressions for the aforementioned characteristic functions could also be done on Maple. This might be necessary in cases of higherorder grandparenttype relationships (Table A1).
Footnotes

Communicating editor: S. Tavaré
 Received April 18, 1999.
 Accepted July 3, 2000.
 Copyright © 2000 by the Genetics Society of America