A comparison of continuous- and discrete- time three-state models for rodent tumorigenicity experiments.

The three-state illness-death model provides a useful way to characterize data from a rodent tumorigenicity experiment. Most parametrizations proposed recently in the literature assume discrete time for the death process and either discrete or continuous time for the tumor onset process. We compare these approaches with a third alternative that uses a piecewise continuous model on the hazards for tumor onset and death. All three models assume proportional hazards to characterize tumor lethality and the effect of dose on tumor onset and death rate. All of the models can easily be fitted using an Expectation Maximization (EM) algorithm. The piecewise continuous model is particularly appealing in this context because the complete data likelihood corresponds to a standard piecewise exponential model with tumor presence as a time-varying covariate. It can be shown analytically that differences between the parameter estimates given by each model are explained by varying assumptions about when tumor onsets, deaths, and sacrifices occur within intervals. The mixed-time model is seen to be an extension of the grouped data proportional hazards model [Mutat. Res. 24:267-278 (1981)]. We argue that the continuous-time model is preferable to the discrete- and mixed-time models because it gives reasonable estimates with relatively few intervals while still making full use of the available information. Data from the ED01 experiment illustrate the results.


Introduction
Rodent tumorigenicity experiments play an important role in evaluating the carcinogenic potential of pesticides, food additives, and drugs. A standard experiment involves about 600 animals ofboth sexes in each oftwo strains randomized to a control group or one of two or three exposed groups. Animals are observed over an average lifetime of 18-24 months with the goal of comparing dose groups with respect to tumor development.
It is now well known that time-adjusted statistical analyses are desirable due to toxic effects ofthe high experimental dose levels typically used (1). Such analyses are complicated, however, by the fact that tumors are detectable only at the time of death. Appropriate methods are available if one assumes that tumors are either nonlethal (2) or instantly lethal (1). However, most tumors are of intermediate lethality, in which case alternative methods ofanalysis are needed. In recent years, many authors have turned to methods based on fitting the three-state illness-death model depicted in Figure 1. The quantity ofinterest is X(t z), the tumor incidence rate at time t for an animal exposed at dose level z (3). The functions a and represent the instantaneous death rates at time t, with and without tumor, respectively. Note that, in general, the rate ofdeath with tumor may also depend on the time ofonset (x). While all ofthe methods proposed in the literature have in common the objective of characterizing or testing for a dose effect on X(t z), they vary considerably in the type ofparametrization used; and there has been relatively little discussion regarding the similarities and differences among the various methods available.
The most straightforward approach is to assume that the functions characterizing the transition rates in the three-state process follow some fully specified parametric form (4,5). To relax the need for strong parametric constraints, Borgan et al. (6) propose the use ofpiecewise exponential models, claiming that these are only loosely parametric if enough change points are allowed. Most of the proposals in the literature, however, formulate the problem in discrete time, arguing in analogy to standard survival analysis that the results are fully nonparametric (3,(7)(8)(9). In a variation ofthis approach, Dinse (10) suggests the use ofa mixedtime formulation where the death process is modeled in discrete time and tumor onset in continuous time. In practice, the approaches using discrete time end up imposing coarse grouping ofthe data because the number ofallowable distinct death times is limited by the number of sacrifices in the experiment.
To avoid the limitation of requiring one sacrifice per interval, Portier (11 ) and Portier and Dinse (12) suggest the use of semiparametric models that place parametric restrictions on the tumor incidence function, but use a nonparametric discrete-time parametrization on the death process. Recently, Dinse (13) suggested a different kind ofsemiparametric model that uses a single parameter to characterize the relationship between a(t) and ,8(t), the hazards for death with and without tumor. More precisely,

Dem
The Models LetXbe the time to first event, either tumor onset or death, let 6 indicate whether the first event is tumor onset (6 = 1) or death (6 = 0), and let T denote time to death. Finally, let Z be a covariate representing exposure level. Presently we will discuss a(tlx, z) experiments with control (Z = 0) and exposed (Z = 1) groups, but for now we will concentrate on a single dose group. Suppose that time is broken into J intervals, I, = (ijy-, rj) for] = 1, .. , J with the same interval boundaries for both tumor onset and death. Each model can be defined in terms of (2J+1) parameters, 7t = (X43,O), where X and( 3areJ x 1 vectors. The interprtation lath off3 and X varies with the time parametrization being used. Dinse suggests using either an additive model [a(t x), = f3(t) + A] or a multiplicative model [a(t x) = 3(t) eel. The advantage of this approach is that the model can be fit to data from experiments with as few as one sacrifice time. Lindsey and Ryan (14) also propose the use of a multiplicative model, but unlike Dinse (13), who uses a mixed-time formulation wherein deaths occur in discrete time and tumor onsets in continuous time, they assume a piecewise exponential model on both X(t) and f3(t).
The main purpose of this paper is to discuss the impact of modeling in continuous versus discrete versus mixed time, illustrating the conceptual and computational similarities and differences among the three approaches. Related questions about the choice oftime-frame have been discussed in the standard survival context by Cox and Oakes (15), Hamerle (16), Heitjan (17), Hoel and Walburg (2), and Xekalaki (18). To provide a common basis for comparison, we focus on the semiparametric multiplicative or proportional hazards model discussed by Dinse (13) and Lindsey and Ryan (14). All three models can be fit using an Expectation Maximization (EM) algorithm (19), treating time oftumor onset as missing data. After describing the models in the next section, we discuss the steps ofthe EM algorithm and use these results to compare and contrast the models analytically. Next, extensions to further covariate structures are explored. Two special cases are discussed. First, when 0 = 0, standard methods for interval-censored data (20) can be applied because death (the censoring mechanism) is independent of the event of interest (tumor onset). It will be seen that the mixed-time formulation is the same as the grouped data survival parametrization described by Kalbfleisch and Prentice (21). Second, the score test for dose effects in the special case of one interval is derived. The score test from the mixed-time model has the same numerator as the well-known lifetime incidence test, which can be biased in the presence of toxicity. The methods are applied to a subset ofdata from the EDO, study. We illustrate with the data that all three approaches are similar when many intervals are used. Establishing this formally is more difficult because the number ofparameters increases with the number ofanimals and inference using standard likelihood theory is no longer applicable (22). When fewer intervals are used, the example illustrates that the discrete and mixed time models yield biased estimates ofthe hazards for tumor onset. Results are summarized in the last section.

Continuous Time
When tumor onset and death occur in continuous time, the hazards in Figure 1 are interpretable as instantaneous probabilities of failure, and can be written as: Under the piecewise constant hazards model, the hazards within each of the J time intervals are: A(x) = Aj for rj, < x < j d (t) = Pj for r1< t < trj a(t) = j3ee for Tj-l < t < rj.

Discrete Time
Under the discrete-time model, the hazards in Figure 1 correspond to the probability offailing at a particular time given survival to that time: where the relationship between acx and UP is based on the Kalbfleisch and Prentice (21 ) grouped data parametrization and is used to ensure comparability of 0 among the three models.

Mixed Time
Dinse's (13) mixed-time model assumes deaths occur in discrete-time with hazards defined exactly as for the discrete-time model above:

Tumor-free
A(x, z) 10 p(tiz) I = Pr(T = rj T > j, 6= 0) = Pr(T = rj |T >j, =1) Unlike the discrete-time formulation, however, tumors are assumed to occur in continuous time, so that the likelihood involves the following conditional probabilities that a tumor developed within the interval, given that no tumor had developed as of the beginning of the interval: Fitting the Models Four types ofevents are possible at any observed event time: death, no tumor (DNT); death with tumor (DWT); sacrifice, no tumor (SNT); and sacrifice with tumor (SWT). The likelihood contributions of these events for all three models are shown in detail in Appendix A. For each formulation, the likelihood contributions ofanimals with tumor involve integrals or sums over the tumor onset distribution which make the observed data likelihood difficult to work with directly.
The EM algorithm provides a useful alternative to maximizing the observed data likelihood for all three models and also facilitates comparison of the models. The complete data likelihood is calculated assuming exact times to tumor are known in the piecewise continuous model and assuming the intervals of onset are known in the discrete and mixed models. The E step of the EM algorithm involves finding the expected values of the complete data sufficient statistics conditional on the observed data (Y) and assuming the current parameter estimates (i). The complete data likelihood is then maximized (the M step) and the steps repeated until the parameter estimates converge.

Complete Data Log Likelihoods
The complete data log likelihoods are based on observed data, as well as on the imputed sufficient statistics from the unobservable data. The observed data in each oftheJintervals consists of counts ofanimals experiencing one ofthe four possible events. Let aj and mj be the number of animals dying or sacrificed without tumor, and bj and nj be the number dying or sacrificed with tumor in the j ' interval. Let the number of animals still alive at the beginning ofeach interval be denoted by Rj. The sufficient statistics imputed from the unobservable data include Ni, the number oftumor onsets in thej]' interval and RjT the number of animals at risk of death with tumor. The continuous-time model also requires TjTand j the times at risk with and without tumor, respectively.
The complete data log likelihood under the continuous piecewise constant hazards model is: Under the discrete-time model: The mixed-time model leads to the complete data likelihood: The log likelihoods under the continuous-and mixed-time models have the attractive feature of splitting into two pieces that can be maximized independently. The continuous-time model is particularly simple because software already exists to maximize piecewise exponential survival models. Alternatively, iterative, weighted least squares or a Newton-Raphson algorithm can be easily programmed in a matrix language. Even within an EM framework, maximizing the discrete time log likelihood is cumbersome and requires a Newton-Raphson algorithm with complicated derivatives. Because the baseline hazards in the discrete-and mixed-time models are required to lie between 0 and 1, constrained maximization techniques are advisable for these two models.
By examining the solutions to the complete data log likelihoods, one can see that differences among the three approaches can largely be explained by their differing assumptions about how events are distributed within the intervals in which they occur. To see this more clearly, consider the maximum likelihood estimates for the baseline tumor onset rates in thej ' interval:

Expectation Steps
In all three models, the E step involves calculation of the expected number of tumor onsets in intervalj, Nj: (4) {i: 6,=1} where pj(ti) is the conditional probability that an animal acquired its tumor in interval Ij given it died or was sacrificed with tumor at ti. The precise form of pj(ti) differs for the three models and is given in Appendix B. In all three cases, ofcourse, pj(ti) equals zero for all intervals after the one in which t, falls.
The continuous-time model requires the additional calculation of the expected times at risk with and without tumor, 7/ and 7jT' Expressions for these quantities are also given in Appendix B.
Adding Covariates For each model, covariates can easily be added to the hazards for death and tumor onset. To allow for a dose effect on tumor onset, for example, a proportional hazards assumption can be placed on X: where Z indicates exposure group. Using the Kalbfleisch and Prentice (I) parametrization, for the mixed and discrete models maintain comparability of^t' across the three models.
Similarly, covariates can be added to the death hazard with no tumor to account for toxic effects ofthe carcinogen. Hazards for death in the continuous-time model are then: 6(t, z) = P(t)ePz a(t, z) = ,6(t)eG+P.
In general, tests of hypothesis are most easily computed using likelihood ratio tests.

Some Special Cases
Nonlethal Tumors When 0 = 0, the three-state model can be thought ofas a standard interval-censoring problem with death as the censoring mechanism being independent oftumor incidence, the event of interest. Animals observed to die with tumor are left-censored (0,t), and those dying without tumor are right-censored (t, oo ).
The observed data likelihoods for the continuousand mixedtime models factor into two independent pieces, one involving only tumor onset parameters and the other, death parameters. The death (censoring) process is noninformative, and standard methods for interval-censored data can be used on the likelihood for tumor onset. The discrete time likelihood does not factor and so has no analogue in standard survival analysis.
Under the mixed-time model, the hazards for death cancel from the E step of the EM algorithm, and the resulting estimation corresponds to fitting the grouped data proportional hazards model suggested by Kalbfleisch and Prentice (21 ) and extended by Prentice and Gloeckler (23). The mixed-time formulation that allows for tumor lethality is thus an extension ofthe grouped-data survival problem. Prentice and Gloeckler (23) require that censored individuals (deaths in the rodent context) be removed at fixed points in an interval, e.g., at the midpoint or at the end. Analogously, the mixed-time model assumes that deaths occur at the end of intervals.

One Interval
In general, testing for dose effects is most easily done using likelihood ratio tests. However, it is useful to consider the score test for dose effects in the simple case ofone interval (J = 1). For both the mixedand discrete-time models, the score test can be shown to take the following form: (b1 + n) -(b + n)p, where p denotes the proportion of animals allocated to the exposed group, and subscripts refer to the number ofanimals in the exposed group. This is nothing other than the lifetime incidence test based on total tumor counts. The score test in the continuoustime model is: It is based on expected times at risk with no tumor and uses information about the death times ofthe animals. This extreme example illustrates one ofthe important benefits ofthe continuoustime model over the other two. This point will be further illustrated with the examples in the next section.

Application: ED01 Data
The EDO1 experiment was conducted at the National Center for Toxicological Research, and involved 24,000 female mice ran-    domized to either a control group or one of seven dose levels of months. We will examine a subset of data from one room conthe known carcinogen 2-acetylaminofluorene (AAF) (24). There sidering control and high-dose groups only. Results are reported were eight interim sacrifice times, and a terminal sacrifice at 33 on bladder and lung tumors from 671 animals. It is known that   bladder tumors show strong dose effects on tumor onset and lung tumors do not. Bladder tumors are more lethal than the nonlethal lung tumors (24). The data are summarized by month in Tables 1 and 2, where numbers of deaths and sacrifices with and without tumor are shown. Note that the bladder tumor incidence rate in the control group is low compared to lung tumor incidence but that many more bladder tumors occur in the high-dose group.  Models with varying numbers of intervals were fit incorporating dose effects on tumor onset (&) and on the death rate (p). The three-interval model breaks the 33 months ofdata at 12, 18, and 33 months. The seven-interval model has breakpoints that coincide with scheduled sacrifice times. The model with 24 intervals allows hazards to change at each month. Results for the discrete-time model were virtually identical to those for the mixed-time model, and are not discussed further in this section. Parameter estimates from the mixed and continuous models are shown in Table 3. Significant dose effects are observed for bladder tumors in all three models (p < 0.0001) and no significant results are seen for lung tumors. Quantitatively, results are consistent with each model regardless of the number of intervals. It is also useful to compare plots ofcumulative incidence functions. The two methods can be compared graphically by looking at cumulative tumor incidence functions. These are shown for 3, 7, and 24 intervals, for bladder and lung tumors in Figures 2 to 7. Curves are shown for both dose groups. For three and seven intervals, the curves for both models are similar until the final interval where the mixed-time model shows much lower estimates for the hazards of acquiring tumor than the continuous-time model. This is a result ofassuming that animals can only die or be sacrificed at interval boundaries and is most easily understood in the context ofa continuous-time model. Tumor onset hazards from the continuous-time model would reduce to those from the mixed-time model ifdeaths occurred at the beginning ofthe interval and sacrifices at the end. In this data set there is a big sacrifice at 19 months. Ifthese animals are assumed to survive to 33 months, the tumor-free time at risk is overestimated, resulting in underestimates of the hazards for tumor onset.
Allowing more intervals will decrease the bias due to grouping of the death times. Figures 6 and 7

Conclusions
The relationship of the three modeling assumptions becomes clear after looking at the complete data likelihoods and their maximum likelihood estimators. The continuous-time model will reduce to either the discrete or mixed models when assumptions are made about when deaths, sacrifices, and tumors can occur during an interval.
When tumors are nonlethal, the models reduce to a simple survival analysis with interval censoring on tumor onset. The mixed model is seen to be an extension ofthe grouped data parametrization ofKalbfleisch and Prentice (21). The continuous piecewise model reduces to the usual parametric case, but the discrete-time model has no analogue in this context.
Looking at the single-interval case, the test for dose effects in the mixed-and discrete-time models reduce to the well-known lifetime incidence test. The continuous-time model incorporates expected times of tumor onset based on the lifetime of the animals and is less likely to be biased than the discreteor mixedtime model tests when compounds are toxic. Further exploration ofthe bias and relative efficiency ofthe estimates using simulation techniques would be useful.
In summary, the results of this paper suggest that the continuous-time model has several advantages over the discreteand mixed-time models. Like the discrete-time models, it imposes only a weakly parametric (6) structure on the underlying hazards for death and tumor onset. Although it is slightly more computationally intensive (it requires the calculation of expected times at risk without tumor), the continuous-time model does not require the use ofconstrained maximization techniques, as there are no upper limits on the ranges ofthe parameters. Because the continuous model uses information about exact death times, the placement of deaths and sacrifices at interval boundaries required by the mixed-time and discrete-time models need not be made. Realistic estimates of the underlying hazards can be obtained with relatively few intervals which, when chosen aprior, allow the application of standard likelihood theory.

Appendix A Observed Data Likelihoods
Suppose the death time t -for the io animal falls in Ij. Then under the three different models, the likelihood contributions for death with and without tumor can be written as follows:   Ej(x t,) is the same for naturally dying and sacrificed animals and is the expected time without tumor contributed to thejfth interval given tumor onset in that interval.
This work was supported by grants CA-48061 and CA-33041 from the National Cancer Institute.