A new method for the analysis of cohort studies: implications of the multistage theory of carcinogenesis applied to occupational arsenic exposure.

Implications of the multistage theory of carcinogenesis for evaluating the effect of exposure to carcinogens in the workplace are described. This theory predicts different patterns of excess risk related to duration of exposure, age at initial exposure, and follow-up time since exposure stopped, depending upon which stage of the carcinogenic process is affected by the carcinogen, i.e., action at an early stage or a late stage. New statistical methodologies are proposed to examine these patterns and are applied to the lung cancer mortality experience from a cohort study of smelter workers exposed to arsenic. Under this multistage hypothesis, the results indicate that arsenic exerts a definite late stage effect though an additional effect at the initial stage cannot be ruled out. The possibilities of biased conclusions resulting from incomplete exposure histories and lack of smoking information are also discussed as well as implications of these results to experimental animal studies.


Introduction
A multistage theory of carcinogenesis was first proposed by Muller (1) and Nordling (2) to a-ccount for the observation that mortality rates for many forms of adult human cancer increase with the fifth or sixth power of age. Many different multievent theories have also been proposed and are summarized by Whittemore and Keller (3). A recent twoevent theory of carcinogenesis to describe the agespecific occurrence of both childhood and adult tumors has been proposed by Moolgavkar and Venzon (4). These quantitative theories of carcinogenesis relate the frequency and time to occurrence of detectable tumors to the concentration of the carcinogen, the duration of exposure to the carcinogen, the age and susceptibility of the host, and other related factors.
According to the general multistage theory, a single cell gives rise to a malignant tumor only after it has undergone a number of sequential, heritable changes. These cellular changes represent stages in the carcinogenic process and are characterized as being of slow and improbable occurrence. This transformation period is then followed by a promotion and growth period during which the transformed cell produces a colony of descendants by rapid cell division. This general formulation has been shown to describe a number of epidemiological and experimental observations.
As noted earlier, the risk of adult cancer has been found to increase progressively as a function of the fifth or sixth power of age (5). The multistage theory indicates that this increase of risk with age is a reflection of increasing time exposure to the carcinogenic process began rather than an aging phenomenon, a prediction which has been verified experimentally (6). Another fundamental aspect of this theory is the general characterization of the early and late stages of this multievent cellular transformation process. Under the multistage theory, initiators can be thought of as carcinogens which affect the first stage, or more generally an early stage in the carcinogenic process. These carcinogens are characterized by a long latency period due to the time required to progress through the remaining stages. The latency period, as defined by Armitage and Doll (7), is the time between first exposure and the subsequent clinical appearance of cancer. On the other hand, cocarcinogens which act late in the transformation process are characterized as having a shorter latency period than initiators.
In the present study, we propose statistical methodologies to quantify these concepts of the multistage theory, laying a foundation for the analysis of epidemiologic cohort studies. In our example, examining the patterns of excess lung cancer mortality in a cohort of men occupationally exposed to arsenic, our analysis encompasses three primary objectives: (1) determination of the factors which are associated with this excess risk to man; these factors include level of carcinogenic exposure, duration of exposure, age at initial exposure, and time since exposure ceased; (2) interpretation of the carcinogenic mechanism of action of arsenic, based on the preceding findings; and (3) utilization of these findings for the assessment of human carcinogenic risk for different exposure situations.

Materials
This analysis is based on an epidemiologic study by Lee and Fraumeni of men occupationally exposed to arsenic studied (8). The original study included 8047 white males employed as copper smelter workers for 12 or more months before December 31,1956 and whose mortality experience was observed from January 1, 1938 to December 31, 1963. These workers were exposed to various levels of arsenic trioxide and contaminants, such as sulfur dioxide, in the atmosphere. Various work areas in the smelter were categorized into three groups, heavy, medium, and light or unspecified with respect to their relative amounts of atmospheric arsenic. Lee and Fraumeni found a significant excess number of respiratory cancers which was associated with both level and duration of exposure.
Due to a few missing records, the data used for this analysis consist of 8014 workers (132,790 person-years at risk) for whom 139 lung cancer deaths were observed and 44.2 were expected (based on U.S. white male age-and calendar timespecific mortality rates). The cause of death information for the current data were recoded to the 8th Revision of the International Classification of Diseases (ICD for lung cancer = 162).

Methods
Armitage and Doll (7) formulated the multistage theory in mathematical terms as a stochastic process which assumes that a single cell can generate a malignant tumor only after it has undergone a certain number k of heritable changes which occur in a specific order. The Armitage-Doll theory concludes that the background or spontaneous agespecific incidence of adult cancers in the absence of any specific carcinogenic exposure, INt), at age t is approximately given by Eq. (1) 1(t) rv aa2 . . . aktk-l (1) where ai (i = 1,... , k) represents the rate of occurrence of the ith cellular change, and the time from the last cellular event required for growth to a detectible tumor is assumed to be negligible with respect to the total time t. These ae are assumed to be independent of age for a particular individual but may vary among individuals due to different genetic and environmental factors.
When an individual is exposed to an additional carcinogenic insult, Whittemore (9) assumes that the cellular event rates become ai + Pic(t), where i = 1, ..., k and where c(t) represents the concentration of the additional carcinogenic exposure at age t. It should be noted that some of the pi may be zero, indicating that this particular carcinogen does not affect the ith cellular event. These event rates may be written as, at + f3ic(t) = aj[l + ric(t)] where ri = Pi/ai represents the relative increase in the ith cellular event per unit of carcinogenic exposure. When the additional carcinogenic insult occurs over only a fraction of the individual's lifetime, the pattern of excess cancer risk, i.e., the overall risk minus the background risk, has been shown to be dependent upon the stage(s) of the process, i.e., cellular events affected by the carcinogen (9,10). Therefore, in light of the multistage theory, examination of the patterns observed for a particular carcinogenic exposure may provide information on its mechanism of action, i.e., which stage(s) of the carcinogenic process are affected by the carcinogen. Doll (11) and Doll and Peto (12) have studied the evolution of lung cancer risk in smokers and ex-smokers, whereas Whittemore (9) and Day and Brown (10) examined both epidemiologic and experimental data on other carcinogenic exposures. For the carcinogens examined, they found some which appear to influence only early stages, some which influence only late stages, and some which appear to influence both early and late stages.
As shown by Whittemore and Day and Brown, for exposure to a carcinogen at a constant level, the excess risk is determined by the duration of exposure, the age at which exposure begins, and for those individuals whose exposure has ceased, the time since exposure stopped. For continuous carci-nogenic exposure at a constant concentration c beginning at age to, the excess age-specific cancer risk E(t) at age t when only the first stage is affected is given by Eq. (2): where d is the duration of exposure at level c and k represents the number of stages in the process. This relationship indicates that if the carcinogen affects only the first stage of the carcinogenic process, then the excess risk should be proportional to the concentration of the carcinogen and an increasing function of the duration of exposure. This relationship also shows that the excess risk is independent of age started exposure, to, for fixed concentration and duration of exposure.
On the other hand, if the carcinogen affects only the penultimate (next to last) stage of the process, then the excess cancer risk at age t is given by, This relationship indicates that the age-specific excess cancer risk is proportional to the exposure concentration c and is an increasing function of both the duration of exposure for fixed age started and of age started exposure for fixed duration. Thus an examination of the excess risk as a function of age started exposure, standardized for concentration and duration of exposure, will indicate whether the carcinogen affects an early stage (no relationship) or a late stage (increasing trend).
Similar inferences can be made by examining the excess risk patterns for those individuals who have stopped their exposure. When the carcinogenic exposure begins at age to, continues at a constant concentration c for an exposure duration of length d, then stops, and follow-up continues for a period of length f, the excess age-specific cancer risk at age t = to + d +f is given by Eq. (4) when only the first stage is affected by the carcinogen, and by Eq. (5) when only the penultimate stage is affected. These relationships indicate that the age-specific excess risk is proportional to the concentration of exposure c and is an increasing function of exposure duration d for any affected stage, early or late. However, when the carcinogen affects only the first stage then the excess risk is independent of the age exposure began, to, for fixed duration and follow-up time since exposure stopped. When the carcinogen affects only the penultimate stage then the excess risk is independent of follow-up time since exposure stopped f for fixed duration and age started exposure. It should be noted that Eq. (2) is a special case of Eq. (4) with follow-up time since exposure stopped, f= 0, and Eqs. (3) and (5) are identical. Thus, an examination of the excess cancer risk as a function of age started exposure standardized for concentration, duration, and follow-up time and as a function of follow-up time since exposure stopped, standardized for concentration, duration, and age started exposure, will provide information on whether an early or later stage of the cellular transformation process is affected by the carcinogen. The excess risk being independent of age started exposure would imply an early-stage effect, while independence of follow-up time would imply a late-stage effect.
Since the multistage theory predicts different patterns of excess risk for individuals continuously exposed and individuals for whom exposure has ceased, the cohort of workers was examined in two ways. First, the excess lung cancer risk for this cohort was studied while the individuals were exposed to arsenic. Second, we also studied the excess lung cancer risk for those individuals who ended their employment at this smelter, i.e., with cessation of exposure more than one year before their death. Since many workers who died of lung cancer or other diseases died shortly after termination of employment (employment termination being "caused" by their illness), we have assumed an illness period of one year to calculate the personyears at risk for individuals while exposed. Thus, an individual who died within one year of employment termination would contribute all his person-years only to the continuously exposed group, whereas an individual who died after more than one year had elapsed since employment termination would contribute his person-years at risk to both the continuously exposed and stopped exposure groups. We examined other illness period durations and found substantially similar results. The continuously exposed group consists of all 8014 workers who contributed 85,273 person-years at risk during their periods of exposure. There were 70 deaths from lung cancer found and 22.7 expected in this group. The group of workers for whom employment had terminated consists of 4676 workers who contributed 47,517 person-years at risk during the period they were no longer employed at this smelter. There were 69 deaths from lung cancer found and 21.6 expected in this group. The expected numbers of lung cancers are based on U.S. age-specific while male lung cancer mortality rates during the period 1940-1960 (13). The expected cancer mortality risk was calculated for each individual based on his age and calendar year of observation (in 5-year age and calendar year groups), and these individual expected risks were used to represent the spontaneous or background lung cancer risk used in the examination of excess mortality rates. Since we were unsure that these background mortality estimates based on national statistics were applicable to this particular cohort of workers, we performed our analyses by assuming different multiplicative factors (between 1/2 and 2) of the U.S. national rates with qualitatively similar results (i.e., we applied the same factor to the U.S. national rates to obtain dif-ferent background mortality estimates). Tables 1 and 2 show summary numbers of individuals, their numbers of person-years at risk, the observed and expected numbers of lung cancer deaths, and the excess lung cancer mortality rates for the continuously exposed and stopped exposure groups, respectively (see Tables 3 and 4 for more detail). These data are categorized into five age started exposure groups (<20, 20-29, 30-39, 40-49, and >50), five duration of exposure groups (<10 years, 10-19, 20-29, 30-39, and ;40 years), three follow-up time since exposure stopped groups (<10 Table 1. Number of individuals, person-years at risk, observed and expected number of lung cancer deaths, and crude excess lung cancer mortality rate (continuously exposed group). years, 10-19, and > 20 years), and three level of exposure groups. The exposure level categories correspond to those defined by Lee and Fraumeni (8): the "heavy" exposure group consists of individuals with one or more years exposure at a worksite associated with "heavy" exposure (385 individuals); the "medium" exposure group consists of those individuals not in the "heavy" group with one or more years exposure at a worksite associated with "heavy" or "medium" exposure (1621 individuals); the remainder made up the "light" exposure category (6008 individuals). We examined the attributable lung cancer mortality risk/person-year, i.e., (observed numberex pected number)/person-years at risk, as a function of the variables of interest, age at initial exposure, duration of exposure, time since cessation of exposure, and concentration level of exposure. Since the categorizations of these variables are related to one another (age at initial exposure and duration of ex-  posure, and exposure duration and follow-up time are highly negatively associated and exposure duration is positively associated with exposure level), the excess cancer rates for one variable should be standardized for the other variables in order to obtain unconfounded rates. The qualitative examination of these unconfounded rates forms the basis for our deductions concerning the mechanism of action of arsenic as a carcinogen, i.e., the conclusions regarding early and/or late affected stages.
Since the estimated excess lung cancer rates in Tables 3 and 4 are based on substantially different person-years at risk (a number of the cells in these tables have no person-years), we used the indirect method of standardization based on standard rates derived from the data itself. Our procedure is a direct extension of that proposed by Mantel and Stark (14) and is described in detail in the statistical appendix. Since this procedure does not provide for statistical hypothesis tests of interest (e.g., homogeneity or equality of the adjusted rates, or trend of the rates over the categorization), a statistical methodology for the analysis of excess cancer risk has been developed and is also described in the statistical appendix. This methodology is shown in the appendix to correspond to the familiar Mantel-Haenszel methods (15,16) for the analysis of overall cancer risk.
In addition to these qualitative methodologies, these data are also examined quantitatively by fit-      (17)]. For these analyses, the observed numbers of lung cancers given in Tables 3 and 4 Tables 3 and 4 by weighted averages, weighted by the personyears at risk within each cell.

Results
For individuals continuously exposed to arsenic, the multistage theory indicates that the excess risk of lung cancer mortality will be an increasing func-tion of exposure duration and concentration, and may or may not depend upon the age at which exposure began. Therefore, the indirect adjustment method was employed on the data in Table 3 to provide excess lung cancer mortality rates for each of the three factors of interest (age at initial exposure, duration of exposure, level of exposure) adjusted for the possible confounding effects of the other two factors. These unconfounded rates are given in Table 5. As these results show, the excess lung cancer mortality is an increasing function of all three factors, and their associated test statistics, given in Table 5, are highly significant (one degree of freedom chi-square tests for trend are 13.2 for exposure level, 49.1 for exposure duration, and 13.1 for age at initial exposure). The excess lung cancer mortality rates for age at initial exposure adjusted for duration and level of exposure range from 0.76 x 104, for those who started employment at this smelter before 20 years of age, to 71.1 x 10', for those who started at or beyond 50 years of age. The adjusted excess mortality rates for duration of exposure show a similar increasing pattern, ranging from 0.22 x 10' for a duration of less than 10 years to 51.1 x 10' for a duration of 40 years or greater. The categories for level of exposure also show an increasing gradient of excess mortality, 3.37 x 10' for the light category, 9.54 x 10' for the medium category, and 15.5 x 10' for the heavy category. It should be noted that the individuals in the medium and heavy categories were not continuously exposed to medium and heavy levels, but may have been exposed for part of their employment to arsenic levels lower than their category definition. Therefore, the differences in excess mortality rates for this factor do not reflect the actual differences among the three exposure levels. Figure 1 depicts these relationships of excess mortality to the age at initial exposure and duration of exposure. Figure 1 shows the observed cumulative excess mortality for all exposure level categories combined as a function of exposure duration in 10 year increments for four of the age at initial exposure categories. Only those points showing a positive excess mortality are shown. The 40-49 age category is only shown up through 30 years duration since there are few person-years at risk beyond that point.
We also used GLIM to fit the actual functional forms of the excess cancer risk predicted by the multistage theory Eqs. (2) and (3)] to these data in Table 3 using the average exposure duration and age began exposure for each cell. The early-stage effect given in Eq. (2) resulted in a likelihood chisquare goodness-of-fit of 68.8, while the late-stage effect of Eq. (3) gave 55.0, both with 62 = 66 -4 degrees of freedom. Clearly, the late-stage model provides the better fit. Since the excess cancer risk is seen to be an increasing function of age at initial exposure, we conclude that arsenic does not exert its carcinogenic influence solely at the first stage, but appears to act primarily at a late stage in the transformation process. However, on the basis of these data alone, we cannot rule out the possibility that arsenic may influence both an early and late stage of the process.
(cohort continuously exposed to arsenic) For individuals whose exposure had stopped, i.e., those who terminated their employment, the multistage theory predicts that their excess cancer risk after cessation of employment will be an increasing function of exposure duration and concentration, and will also be independent of age started exposure (if only the first stage is affected) or follow-up time since exposure stopped (if only the penultimate stage is affected). The indirect adjustment method was applied to the data in Table 4 to produce excess lung cancer mortality rates for each of these four factors adjusted for the possible confounding effects of the other three factors. These unconfounded rates are given in Table 6. These results show that the excess lung cancer mortality is an increasing function of all four factors, and their associated test statistics, given in Table 6, are significant (one degree of freedom chi-square tests for trend are 12.0 for exposure level, 35.5 for exposure duration, 4.5 for age began exposure, and 8.2 for time since exposure stopped.) The adjusted excess lung cancer mortality rates for age at initial exposure range from 3.37 x 10', for those who started employment before 20 years of age, to 28.2 x 10', for those who started between 40 and 49 years of age. These rates for duration of exposure range from 2.46 x 10 , for a less than 10 year duration, to 76.3 x 10', for individuals having 40 or more years duration. The rates for time since exposure stopped range from 7.06 x 104, for a period of less than 10 years, to 20.0 x 104, for a period of 20 or more years since exposure stopped. The level of exposure factor also shows increasing adjusted excess mortality rates, 7.33 x 10', 19.7 x 10' and 27.8 x 10' for the categories light, medium and heavy respectively.
As before, we used GLIM to fit the functional forms for excess cancer risk in Eqs. (4) and (5) to the data in Table 4. This GLIM analysis produced likelihood goodness-of-fit chi-square statistics of 105.6 by using Eq. (4) (only an early-stage effect), and 102.2 by using Eq. (5) (only a late-stage effect), each statistic having 140 = 144 -4 degrees of freedom. Thus, the late-stage model again provides a slightly better fit but does not clearly discriminate between the two hypotheses. Since the results in Table 6 show the excess risk of death from lung cancer to be associated with both age at initial exposure and time since exposure ended, we conclude that arsenic does not appear to exert its influence at only the penultimate stage, but may also have an effect at an early stage. A comparison of the adjusted rates in Table 6 shows that the effect of age at initial exposure upon the excess lung cancer mortality risk appears to be more pronounced than the effect of time since exposure stopped, indicating that the late stage effect of arsenic may be greater than the effect on an early stage of the process.
Since these data do not clearly agree with a hy-pothesis of arsenic acting solely at a late stage of the transformation process, we used GLIM to examine the hypothesis that arsenic acts at both the first and penultimate stages, but at possibly different magnitudes relative to background. Under this hypothesis, the multistage theory predicts that the excess age-specific cancer risk at age t = to + d + f is given in Eq. (6): rk l[(d + to) k-1-tok-1] + rlrk ldk-l (6) where, as before, c is the exposure concentration, to the age at initial exposure, d the duration of exposure, and f the follow-up time since exposure stopped (which equals 0 for continuously exposed individuals). 'T'he constants r1 and rk 1 represent the increased transition rate of the first and penultimate stages relative to the background rates. Therefore, the ratio rl/rk1l represents a measure of the magnitude of the carcinogenic effect on the first stage relative to the effect on the penultimate.
To obtain an estimate of this relative effectiveness for arsenic, we used GLIM to fit Eq. (6) to all the data combined in both Tables 3 and 4. The results for different ratios of rl/rk-l are shown in Table 7. Table 7 also shows that the best fit to all the data occurs for the model which assumes that arsenic affects the first and penultimate stages equally relative to their background occurrences, i.e. rl/rk-l = 1.0. However, Table 7 also shows that the model for which arsenic is a pure promoter, i.e., affects only the penultimate stage, does not fit these data significantly poorer than the mixed effect model. These data (only 94.8 = 139 -44.2 excess lung cancers) are apparently too limited to clearly discriminate between the mixed effect and pure effect hypotheses. Therefore, under this multistage model of carcinogenesis, we conclude: (1) arsenic does not act solely at the first stage of the process; and (2) arsenic does act at a late stage, but may act at an earlier stage as well.
The final objective of this analysis is to use the bChi-square test statistic for trend. bRatio of risk exposed individual/risk to unexposed individual (lifetime risk for unexposed = 0.022).
preceding results to assess the potential lifetime risk to an individual exposed to arsenic in a manner similar to this cohort of workers. To provide quantitative estimates of lifetime risk, we have used the results of the GLIM analysis which provide estimates of the parameters in Eq. (6), specifically the exposure concentration constants c, one for each exposure level category. and the estimated exponent k. This analysis produced maximum likelihood estimates of k = 6.5 (which corresponds closely to the estimate of k = 6.6 when fitting the background model in Eq. (1) to the U.S. age-specific lung cancer mortality for the period 1940-1965), and concentration constants as follows: for the model assuming only a penultimate stage effect, the constants (x 1013) are 2.42, 6.13 and 9.81 for the light, medium and heavy exposure categories, respectively; for the model assuming an equal first and penultimate stage effeci, i.e., r1 = rk 1 in Eq. (6), the constants ( x 1013) are 2.17, 5.33 and 8.88, respectively. Table 8 shows the estimated lifetime risk of dying from lung cancer, up to age 80, for a hypothetical worker beginning exposure at age 20 and continuing his exposure for 10-40 years, after which his arsenic exposure would cease. These lifetime risks are adjusted for competing risks from other causes of death as described by Gail (18). The data were grouped into 5-year age intervals, and the background mortality from lung cancer and all other causes were based on U.S. white males for the calendar period 1950-1954, selected as representing a central point for the cohort being studied. Table 8 shows quantitatively how lifetime risk depends upon the exposure concentration, duration of exposure, and upon the presumed mechanism of action. For 10 years exposure duration to a heavy concentration, the lifetime risk is almost doubled (0.046 vs. 0.027) if an effect at the first stage of the carcinogenic process is added to an effect at the penultimate stage. The multistage theory predicts that if only the penultimate stage is affected, then once exposure stops, the excess risk remains constant at the level attained before termination [see Eq. (5)]. However, if the first stage is also affected, then the excess risk after exposure termination first begins to level out, but then turns around to increase once again as the excess initiated cells move through the remaining stages. As seen in Table 8, the impact of inclusion of a first stage effect can be substantial, especially if the exposure duration is relatively short.

Discussion
This methodology, based on a multistage theory of carcinogenesis, allows interpretations of the mechanism of action to be made concerning the findings of an epidemiologic study by focusing on the excess carcinogenic risk attributable to the carcinogen in question. The patterns of these excess risks as they depend upon exposure duration, age at initial exposure, and time since exposure stopped, may identify which stage(s) of the carcinogenic transformation process are affected.
In our examination of a cohort study of smelter workers exposed to arsenic, the results are compatible with a mechanism of action that involves arsenic acting at a late -stage of the process. In addition, we have concluded that there may be a contribution to an early stage effect, but the data are too limited to clarify this hypothesis. The evidence that arsenic acts at a late stage in the carcinogenic process is provided by the relationship of increasing excess lung cancer mortality risk with increasing age at initial exposure. If arsenic is truely a late-stage carcinogen, then older individuals would be at greater risk, since they have had time to accumulate more cells in the earlier stages of the process, such cells being particularly susceptible to the carcinogenic action of arsenic. This relationship between age at initial exposure and excess cancer risk is also seen for nasal sinus cancers in workers exposed to nickel (19)(20)(21). Evidence for a predominately late stage effect of arsenic is also seen in the results of the study of Pinto et al. (22) of retired smelter workers. After retirement, the relative risk of respiratory cancer death was found to decrease with increasing age, or time since retirement. As shown by Day and Brown (10), this is consistent with a late stage carcinogenic effect.
These data are not consistent with the hypothesis that arsenic acts during the promotion and growth period of the carcinogenic process, if we believe that promotion and growth is reversible. As an example of reversibility, in a study of DMBA induced-PMA promoted skin tumorgenesis, Burns et al. (23) found regression of tumors after termination of the promoting agent. However, the epidemiologic data studied here do not show such regression since the excess lung cancer mortality risk remains even after 20 years since the carcinogenic exposure presumably ceased (see Table 6). This was also found by Pinto et al. (22) in their study of retired smelter workers. This continuance of excess mortality risk after termination of exposure indicates that arsenic does not appear to affect the carcinogenic process in a reversible manner, but rather may have an irreversible effect on a stage of the cellular transformation process. In addition, the relatively long latency periods observed among individuals in this cohort also argue against an effect on the promotion and growth process. In animal experiments of the initiation-promotion effect, this latency period is often small, and most tumors appear within a short time period since initiation of exposure to the promoting agent. However, these data cannot conclusively rule out an effect on the promotion and growth process. In this instance, the growth period may be much longer than this multistage theory hypothesizes and may not be reversible. The observed relationships of excess lung cancer mortality associated with duration of exposure and time since exposure termination (Tables 5 and 6) are also consistent with an effect on a slowly evolving growth process which simply slows down when exposure is terminated (note the difference in slopes of the excess mortality with respect to time).
However, our mechanistic conclusions based on the results of our analysis of this cohort study may be biased by the lack of important relevant information. Since no information on arsenic exposure prior to this employment was obtained for the cohort, the relationship of increasing excess risk with later age at initial employment may be biased by previous exposure to arsenic or other substances. These workers who started employment at older ages may have built up a prior exposure history dependent upon the age at which they started this particular employment. Thus, we cannot rule out the possibility that this finding is actually due to an increased duration of prior exposure. Therefore, in any future studies of occupational carcinogenesis, it is important to obtain a complete work history to determine any exposure prior to the particular employment being studied.
This lack of exposure information is also important to the examination of excess risk after exposure ceases, i.e., employment is terminated. Our findings indicate a moderately increasing risk with time since employment ended, but this also may be biased by individuals who have continued their arsenic exposure at another work site. Of the 69 lung cancers found after ending employment at this smelter, 19 individuals left work before the age of 50 and had at least five years follow-up before their death. Since they were young enough to continue employment, they may well have continued at another work site associated with arsenic or other substances, thus biasing the observed relationship.
An additional piece of important information concerns the smoking history of these men. Smoking histories were not available for the individuals in this cohort. Therefore we could not directly adjust our analysis for the effect of cigarette smoking. However, we did examine this issue in the following indirect manner. Since the cigarette smoking habits of U.S. men have undergone substantial changes in the past, the likelihood of an individual being a cigarette smoker is associated with his calendar year of birth. Therefore we recomputed the indirectly adjusted excess lung cancer mortality rates in Tables  5 and 6 by adjusting for calendar year of birth in addition to the other factors of interest. The calendar year of birth categories we used are <1880, 1880-1889, 1890-1899, 1900-1909, and .1910. The numbers of individuals, person-years at risk, ob-served and expected numbers of lung cancer deaths, and indirectly adjusted excess lung cancer mortality rates for these categories are shown in Table 9 for both the continuously exposed and stopped exposure groups. Table 9 shows that the excess lung cancer mortality rates, adjusted for the other factors, increase with later calendar years of birth up to the last category, birth year >1910, at which point the adjusted rates drop due to the few cancer deaths observed for these individuals who have not yet reached the ages at which lung cancer mortality becomes more prevalent. This calendar year of birth trend in the adjusted lung cancer mortality is compared with the trend in background lung cancer mortality for U.S. males in age groups 55-59 and 60-64 (13) in Figure 2. This figure shows that the trends in background and excess lung cancer mortality are very similar, indicating that the factor, presumably cigarette smoking, affecting the background mortality is affecting the excess mortality in a similar manner. Adjusting for calendar year of birth had little effect on the adjusted excess lung cancer mortality rates for the other factors of interest. The effect on arsenic exposure category was unchanged, while the effect on age at initial exposure and duration of exposure was to make the relationships slightly more pronounced. Therefore, this indirect examination of the possible confounding effects of cigarette smoking indicates that smoking may not be a major confounder of our analysis. However, we emphasize that this examination is indirect and, as such, does not provide conclusive evidence of the lack of a confounding effect. In their studies of smelter workers exposed to arsenic, Rencher et al. (24) and Pinto et al. (22) concluded that smoking did not seriously confound their results; however, the effect of smoking as a confounding factor in our particular analysis cannot be completely ruled out.
As noted in other epidemiologic studies of smelter workers exposed to arsenic, the role of other atmospheric contaminants is difficult to measure. The original Lee and Fraumeni (8) study of these data could not distinguish the influence of arsenic from other agents, most notably sulfur dioxide, which were correlated with the levels of arsenic in the smelting process. Therefore, due to this high correlation, we cannot categorically state that arsenic alone is the active carcinogenic agent.
Presently, arsenic trioxide and related compounds are the only suspect human carcinogens which have not clearly been shown to be carcinogenic in experimental animals. In over 20 animal experiments, arsenic by itself or in combination with known carcinogens was not found to be carcinogenic (25). Of these studies, four were cocarcinogenesis studies involving trivalent arsenic.   Baroni et al. (26) studied arsenic trioxide administered in drinking water to Swiss mice after initiation with either a single skin application of DMBA or gavage administration of urethane. Histopathology was performed on the major organs, including the lung, the target site of the occupational carcinogenesis studies. Urethane, by itself, was shown to induce lung adenomas, but the cocarcinogenic activity of arsenic was found to be negative. Unfortunately, all the animals treated with both urethane and arsenic trioxide died before the 50th week. Thus, the exposure to arsenic may not have been of sufficient duration to adequately study its presumed promotional or late stage effect. Boutwell (27) found no cocarcinogenic activity of potassium arsenite in skin carcinogenesis after a single skin application of DMBA. Sanderson (28) and Milner (29) also studied the cocarcinogenesis effects of arsenic in skin carcinogenesis with similarly negative results.
The evidence that arsenic compounds produce mutational effects in bacteria is inconclusive. However, these compounds have been shown to induce chromosomal aberrations in mammalian cells (25). In addition, a number of epidemiologic studies have shown an increased incidence of chromosomal aberrations in patients treated with arsenical compounds (30)(31)(32) and in workers occupationally exposed in a smelter environment (33,34). Brusick (35) states that chromosomal aberrations are manifestations of DNA damage and genotoxicity. Thus, these findings provide evidence that arsenic may involve the cellular transformation period of the carcinogenic process.

Statistical Appendix Computation of Adjusted Rates
We wish to compare the excess mortality rates among the different categories of one factor while adjusting for the possible confounding by other factors. To accomplish this, we use an extension of the method of Mantel and Stark (14) for the computation of indirect-adjusted rates. This method is based on an iterative procedure in which the "standard" rates used in the indirect adjustment are derived internally from the data since no relevant external standard exists. In the case of overall-as opposed to excess-mortality or morbidity rates, this method has been shown to give maximum likelihood estimates of the rates in a product model (36,37). In our situation we use the method to estimate excess mortality rates, i.e. the difference in rates between observed and expected (based on some external standard).
For a three-factor situation, a description of this term illustrating the methodology in mathematical notation is as follows. Let O°-k denote the observed number of responders, Eijk die expected number of responders (based on an external standard such as the age, race, sex, calendar period-specific U.S. mortality rates for the disease of interest), and Nijk the person-years at risk in the ith category of factor 1, the jth category of factor 2, and the kth category of factor 3. Also let R R = V°ijk -EiJk)IXNijk denote the crude overall excess mortality rate per person year, and Rli(i = 1, ... I), R2 (j= 1, ..., J), and R3k(k = 1, ..., K) denote the adjustea excess rates for each of the three factors, respectively.
In the case of three factors, the indirect-adjustment procedure consists of a series of three sets of computations, the series being performed iteratively until each set of rates stabilizes, i.e., when each combination of two sets implies the third set and no further changes would be made. We begin this series of computations by obtaining the rates for factor 1 adjusted for the other two factors (we must assume some set of initial rates for the adjustment factors; we use the crude rates but any rates will suffice). The adjusted rate for the ith category of factor 1 is given by, where Dijk = 0ijk -Eijk represents the excess number of responders in category (ijk). Thus Rli is simply the crude excess rate multiplied by the ratio of the actual excess number of responders, Di , to the expected excess, Xi , based on the rates R2 and R3k for the other two factors.
The next computation in this series is to compute the adjusted rates for factor 2, where the R1l, i = 1, ..., I, are derived from the first step. The final computation in the series is to compute the adjusted rates for factor 3 which are based on the rates computed in the previous two steps. In order to insure that the adjusted rates are nonnegative, if, during any step in this series, the adjusted rates are estimated to be negative, they are set equal to zero. Upon convergence, which we define as the maximum relative change in rates being smaller than some critical value such as 10', the rates are dependent upon the assumed initial set of rates and may not imply a total expected number of excess deaths that is equal to the total observed. Following Mantel and Stark, we make a final adjustment by multiplying each rate by a correction factor of I0/I-E where so =E IDijk i k denotes the total observed excess deaths and 1 E denotes the total expected excess deaths implied by the convergent rates. The total expectations implied by the factor 1 rates is (XE)1=i aXR liNlijk by the factor 2 rates is (XE)2 = zR2 N2ijk and similarly for (IE)3. This final modification insures that each set of adjusted rates imply a total expectation equal to the total observed. These are the rates shown in Tables 5, 6 and 9.

Tests for Equality of Adjusted Rates
The Mantel-Stark method is useful for estimating the excess rates for each factor adjusted for the other factors, but does not provide a means for any hypothesis test, such as equality or significance of a trend. The following section proposes such tests based on score statistics obtained from the assumed Mantel-Stark product model for the excess risk.
Assume we wish to test the equality or trend of the excess rates for one factor having I categories, adjusted for, or stratified by another factor having J categories. Note that this adjustment or stratification factor may be a combination of more than one factor (e.g., two adjustment factors having 5 and 3 categories respectively would produce one combination factor with 5 x 3 = 15 categories). Let Oij E-and N.1 denote the observed number of responaers, the expected number of responders, and the person-years at risk for category i of the factor of interest and category j of the stratification factor. Then, under an assumed product model for excess risk, Oj, is distributed as a Poisson variate with mean ESj + N-1expfai + Pj}, where expfai}, i = 1, . . ., I, are the excess rates for the factor of interest, and exp{i}, j = 1, ... , J the rates for the stratification factor. One null hypothesis of interest is Ho: ai = a i = 1, . . . , I i.e., the excess risk does not depend upon the factor of interest. The unknown constant risk, exp{a}, can be included into the Pj since we have assumed a product model, therefore, without any loss of generality, this null hypothesis also be considered as Ho: ai = O We propose to test this hypothesis by score statistics (38,39). The  t Since the S(ai), i = 1, * * * I, are linearly dependent, i.e., XS(a) = 0, a test of the null hypothesis can be based on the asymptotic chi-square statistic, S I-1 T*I-1 SI-1 where SI-1 is a vector of any I -1 elements of S(ai), i= 1, ... , I and * I-1 is the corresponding covariance matrix obtained from V;*aa Under the null hypothesis, this statistic is distributed as a central chi-square variate with I -1 degrees of freed m.
However, to compute this statistic, the f1 must first be computed from the first partial derivative of the log likelihood, d log L a= =exp{ff31}N.,[~+NjepI1 dpj |°{}ti Ei + Nijexpfpl }I Setting this derivative to zero leads to the equation xP{/@1} L [ Eijex{plf } + Nij] / which can be solved iteratively for Ai, by evaluating the right-hand side for some initial value of solving for a new Aj, and contiiuing the iteration until covergence. The value of Pj obtained by this procedure will be the maximum likelihood estimate of Pi and can be used in the evaluation of the chi square test statistic.
It should be noted that when this score statistic is applied to overall mortality rather than excess mortality, i.e., by treating Eij = 0, then and f,j= O°i jj Nij= O.j/N.j M(ai) = (°ij -NijO jlN.j.) the Mantel-Haenszel deviation between the oW served and expected numbers of responders summed over the stratification factor. The only difference between this score statistic method and the Mantel-Haenszel approach is in the calculation of the covariance matrix where the score statistic method uses Nj3 and the Mantel-Haenszel method uses NNWj-1) in the denominator of the summed covariance terms (40). Therefore, a one degree of freedom chi-square test for trend can be based on the statistic S2(a)/o2.