Trend tests for proportional responses in developmental toxicity experiments.

The data from developmental toxicity experiments usually are very difficult to analyze statistically because of the lack of independence among littermates and the random nature of the litter size. Only a few of the models that have been proposed in the literature have accounted for both of these features. One of the models proposed by Van Ryzin is invoked to construct a test of trend (dose response). The construction is achieved via a statistical technique called isotonic regression, which is applied to the moment estimators derived by Van Ryzin. The trend test based on isotonic regression is relatively straightforward to calculate, and when the number of dose groups (including control) is four or less, the significance of the observed result is easily determined. An example, in which fetolethality is the end point of interest, demonstrates the test.


Introduction
Developmental toxicity experiments with laboratory animals pose difficult problems in terms of statistical analysis. In such studies it is best to assume that the litter is the experimental unit and not the fetus or pup (1,2). This is because the parental animal is randomized to treatment, the parental animal receives the treatment directly, and fetuses or pups within a litter usually exhibit a litter effect in that they do not respond independently of one another. Ignoring the litter effect could lead to statistical tests that are too liberal in the sense that the Type I error rate is inflated (the null hypothesis is rejected more often than it should be).
Many of the responses that are measured in a typical experiment are proportional in nature, e.g., the proportion of affected fetuses in the litter. The naive approach to a statistical analysis is to regard the proportions as samples from a binomial distribution. However, this is not legitimate for developmental toxicity data because of the litter effect and because the litter size itself is random, i.e., not fixed.
Some of the statistical models and analyses that have been proposed for proportional data from developmental toxicity experiments have attempted to incorporate these aspects. Haseman  response variable when comparing a treatment group to a control group in developmental toxicity studies (3). The approaches can be categorized into four groups: (a) generalized binomial models; (b) nonparametric analysis; (c) transformations of proportions; (d) resampling techniques.
The beta-binomial model (4) is the most popular generalized binomial model applied to developmental toxicity data. It is derived by assuming that X, the number of positive responses in a litter, follows a binomial distribution with probability of success 0, 0 < 0 < 1, and that 0 itself is a random variable following a beta distribution. Then the unconditional distribution of X is the beta-binomial. Whereas the binomial is a one-parameter model and assumes that the fetuses are independent (no litter effect), the beta-binomial is a two-parameter model and assumes a nonnegative correlation among fetuses. For the situation of a control group and more than one dose group, a trend test within the beta-binomial framework can be constructed as a logistic function of dose (5,6). One drawback with the beta-binomial model is that it does not allow for the random nature of the litter size.
Van Ryzin considered several different generalized binomial models in which the random litter size is taken into account (7). He derived a moment estimator for the probability of an effect within each group. An unfortunate aspect of his work is that it has not been applied very often, and his models have not been included in any of the simulation studies that have appeared in the statistical literature. One of his models is discussed in more detail in "Methods," and a trend test based on isotonic regression and his moment estimators is constructed.
In an entirely different approach, Rai and Van Ryzin used a binomial framework and incorporated litter size as a covariate in modeling the conditional probability of response (8), then they arrived at the unconditional probability by assuming a Poisson distribution for the litter size. Their purpose was to provide a model for low-dose extrapolation and not necessarily a trend test. Williams has criticized this model because it does not account for extra-binomial variability (6).
In a nonparametric approach, the proportional responses for the litters are ranked and a Kruskal-Wallis test applied, which reduces to the Wilcoxon rank sum test or Mann-Whitney test in the case of only one dose group. Gaylor discussed in detail nonparametric analyses for developmental toxicity studies (9). The nonparametric version of the trend test is known as Jonckheere's test. See Lin and Haseman (10) on how to modify Jonckheere's test to allow for ties, a situation usually encountered with the proportional responses observed in developmental toxicity studies. These rank tests are computationally easy and avoid some of the distributional difficulties of the generalized binomial models. Although random litter sizes do not invalidate this approach, the rank tests may not handle this problem in an optimal manner. For instance, a proportion of 1/3 receives the same rank as 4/12, even though the latter proportion provides more information.
In the transformation approach, the observed proportions for the litters are transformed to approximate normal random variables so that one-way ANOVA techniques can be applied. Two useful transformations are the arc-sine and the Freeman-Tukey binomial arcsine transformations (3). This is justified if the proportions behave in a binomial manner, which usually is not the case with developmental toxicity data.
Gladen invoked jackknife methodology to estimate the probability of response within each treatment group weighted according to litter size (11). After using this method, a one-way ANOVA of the jackknifed estimators, weighted by their estimated variances, can be justified asymptotically. Analogous approaches using other resampling plans, such as the bootstrap (12), have been suggested but their application to developmental toxicity studies has not appeared in the literature.
A number of researchers have conducted simulation studies to compare the performance of these procedures. Haseman and Kupper reviewed these studies and concluded that none of the above procedures seems to be superior in terms of statistical power and attaining the desired significance level (3). After their review article, other simulation studies appeared in the literature (13,14). However, all of these simulation studies have not seriously examined the situation with random litter sizes. Out of all the procedures discussed, only the generalized binomial models of Van Ryzin (6) and the nonparametric analyses with rank tests legitimately account for random litter size.

Model
One of the models considered by Van Ryzin (7) is used to illustrate the construction of a trend test via isotonic regression. At this point the model is described just for one litter. Let X denote the number of affected fetuses or pups out of a litter of size N, and let P denote the probability that a fetus or pup is affected. Out of the triplet (X, N, P) of random variables, X and N are observable, while P is not. It is assumed that the conditional distribution of X given that N = n and P = p is binomial, i.e., Pr[X a xiN = n, P = PJ n [f]px(l-p)nlx, x = O,1,...,n and 0 z p z 1. [1] Next, it is assumed that the expected value of P, denoted by E(P), is the unknown parameter ,u, and that the conditional expectation of P given N = n is Oc3n, where a and i are unknown parameters with 0 < oc < 1 and 0 < 3 < 1. Van Ryzin (7) thought this regression model of the conditional expectation of P given N = n to be realistic because (a) the probability of response P is not independent of the litter size, and (b) for values of p < 1, the probability of any individual fetus being affected is a decreasing function of the litter size, i.e., the proportion affected in a large litter is likely to be smaller than the proportion affected in a small litter.
The final assumption is that N has an unconditional Poisson distribution: PrtN = n] = eXXfn/n!, n = 0,1,2,... [2] Van Ryzin (7)   (ll] denote a set of weights for the k groups (wi represents the inverse of the asymptotic variance of ,t1). The pooled estimator of the common expected probability of response under Ho is the weighted average (14) [12) The algorithm of pooling adjacent groups continues until A* , A,*.. , jjk satisfy the order imposed by  [14] The test constructed from the statistic in Equation 14 is called the i2 test (15). Note that it is possible for L to be zero if each Ai = i* i = 1,2,... , k, which occurs when all k groups must be pooled to attain the isotonic regression estimates under H1. Although L in Equation 14 is relatively easy to calculate, even its null probability distribution is somewhat involved if k , c O [15] where x2 denotes a chi-square random variable with i degrees of freedom and P(1,k),... , P(k,k) denote a set of probabilities which sum to 1. The derivation of the P(i,k) is beyond the scope of this article, so the interested reader is referred to Barlow et al. (15). In the case of k = 2, 3, or 4 (one, two, or three dose groups), closed form solutions for the P(i,k) exist. First, define the quantities

Results
[16a] Rai and Van Ryzin (8) listed the data from a dominant lethal assay (16). Male mice were assigned to one of three radiation groups (0, 300, or 600 rads) and then mated to females. The response of interest is the proportion of dead fetuses out of the number of implantations. The sample sizes for the three groups are ml = 683, m2 = 604, m3 = 486, respectively. Unfortunately, the investigators had excluded the results of (16b] those females with < 4 or 11 implants (16). Although this is somewhat of an unusual data set in terms of the number of litters, it is used here to illustrate the calculations.
Applying Equations 5 through 9 results in the following moment estimates and estimated standard de-  . The strength of this result is due to the very large sample size (M1 + m2 + m3 = 1773) and the clear-cut dose-response effect.

Discussion
The technique of isotonic regression, as discussed in "Methods," can be applied to any model that provides an estimate of the expected probability of response for each dose group. In fact, Van Ryzin developed a number of models to which isotonic regression could be applied (7). However, one model assumes independence of the random variables Pij and Nij, and another assumes that Pr[Nij = 01 is known, both of which are not likely situations.
Another model that Van Ryzin (7) proposed is analogous to the one discussed in "Methods," except that the geometric distribution replaces the Poisson distribution. Clark has extended this class of models by using the negative binomial distribution (17), which is a generalization of both the one-parameter geometric and Poisson distributions. However, Williams (6) has indicated that the Poisson (and correspondingly the geometric and negative binomial) distribution predicts too large a variance for the litter size data. With respect to the moment estimator pi in Equation 6, i = 1,2, . . ., k, the assumed litter size distribution has only a minimal effect because only a few Vij are positive.