A RASH analysis of National Toxicology Program data: predictions for 30 compounds to be tested in rodent carcinogenesis experiments.

Relative potencies for 30 compounds scheduled for carcinogenic testing in the 2-year rodent bioassays were estimated based on comparisons with a wide variety of bioassay data for benzo[a]pyrene, nicotine, cisplatin, aflatoxin B1, and cyclophosphamide. Potential for oncogenic transformation of each of the compounds was estimated from short-term bioassays. Promoting strength was assigned on the basis of comparisons of the product of relative potency and test dose with the distribution of similar products obtained for 67 common compounds in the data-base of Gold et al. A potency class for promotion was assigned on the basis of whether the potency-adjusted test dosage was > 2 sigma below the mean, > 1 sigma below the mean, within +/- sigma of the mean, > sigma above the mean, or > 2 sigma above the mean, as determined from the 67 compounds. The underlying hypothesis is that a weak test dose may have a low probability of revealing a potential carcinogen, whereas a strong dose may have a high probability of producing false-positive results. Predictions are therefore directed at the central 68% of the log-normal frequency distribution according to the assumption that +/- sigma represents the ideal test dose.


Introduction Goa
The goal of this analysis is to use a wide variety of existing data from several sources and relative-potency-based models in an effort to predict the outcome of carcinogenic testing of rats and mice in the standard 2-year bioassay as used by the National Toxicology Program (NTP).
Some of the existing bioassay data are used to estimate the compound-specific potential for initiation. For the compoundspecific capacity to promote initiated carcinogenic lesions, a large volume of unedited data are compared to matched tests for reference carcinogens comprised of benzo [a]pyrene B [a] P, nicotine, cisplatin, aflatoxin B1, and cyclophosphamide. In addition to predicting carcinogenic outcome of the rodent bioassays, rank order is assigned for the 30 compounds as requested by the organizers from the National Institute of Environmental Health Sciences (NIEHS) and the NTP.
Tennant et al. (1) published an analysis on "Prediction of the outcome of rodent carcinogenicity bioassays currently being conducted on 44 compounds by the National Toxicology Program." In an editorial in that This paper is part of the NIEHS Predictive-Toxicology Evaluation Project. Manuscript received 5 January 1996; manuscript accepted 3 April 1996.
We thank J. Wachsman and D. Bristol of NIEHS for their pleasant attitudes, exceptional patience, and constant willingness to help acquire needed information.
Substances; R, risk; a, estimated standard deviation of the log-probability distribution; (Slope)B[alp, risk coefficient for B[a]P. same issue of Mutagenesis, Parry (2) noted, "Readers will be aware of others who propose alternative methods for the prediction of the carcinogenicity of chemicals. I would like to take this opportunity to open the debate to other contributors." Because we have successfully used a Rapid Screening of Hazard (RASH) chemical-scoring method for a variety of difficult applications (3)(4)(5)(6)(7), it was an excellent opportunity to test if the RASH-derived relative-potency estimates could be used for range-finding for the doses actually tested in the 2-year bioassays or if those estimates of relative potency could be used to predict the carcinogenic outcome of the tests.
Since RASH was based on the hypothesis that toxicity-induced compensatory cell proliferation may be a practical index of carcinogenic promotion for toxicological and radiological insults (8,9), there were no considerations of potency for carcinogenic initiation in any of the various RASH applications other than that by Jones and Easterly (10). The analysis of the 44 compounds (10) avoided the modeling of carcinogenic initiation by using the tabulations by Tennant et al. (1).
This effort will again attempt to demonstrate the range-finding utility of the RASH method by using data from the Registry of Toxic Effects of Chemical Substances (RTECS) (11) and short-term bioassay data from the NTP to estimate whether methods described in this exercise can expedite the NTP range-finding study by reducing the number of test animals required to determine the ideal test doses.
Compound-specific relative potency values and the database of Gold et al. (12,13) are used to model test doses considered to be of the proper magnitude to minimize the probability of obtaining falsepositive and false-negative test results. This exercise attempts to demonstrate the utility of a simple, rapid, data-rich screening tool and will not resort to careful literature reviews to refine our initial predictions.
For this analysis, a prototype personal computer version of RASH (called CRASH) (5) has permitted more exhaustive relative-potency estimates than were possible from hand calculations (10,14). The relative potency from CRASH will be used in combination with an estimate of initiation potential as based on short-term test results to predict which test doses are too weak to express possible carcinogens, which test doses are so strong that equivocal JONES AND EASTERLY results or false positives are possible, and which test doses are just right to produce a carcinogenic outcome that is consistent with other NTP test results in the database of Gold et al. (12,13).

Background
The Environmental Monitoring Plan for the U.S. Synthetic Fuels Corporation (15) stated that the corporation should have the burden of justifying the need to monitor specific unrelated substances and of providing threshold values above which those substances must be monitored. Somewhat overlapping in time, the U.S. Air Force sponsored the development of a hazard assessment rating methodology (HARM) that finally became known as the defense priority model (16,17) for site-specific screening of the Installation Restoration Program. Based on observed linear relationships between carcinogenic risk and either cytotoxicity or compensatory cell proliferation for both ionizing radiations (9,18) and carcinogenic chemicals (8), the RASH method was proposed for both of these applications (19).

Method
Because RASH has been documented exhaustively, it will be summarized very briefly with somewhat greater detail given to the differences between the tedious hand calculations associated with the original RASH and the prototype personal computer version used for this application, CRASH. Jones et al. (14,19) present numerical demonstrations of the RASH evaluations.
RASH. The definition of relative potency (RP) as used in RASH is the dose of a reference compound such as B[a]P divided by the dose of an "interviewing" compound (i.e., a compound being evaluated) or insult "i" that causes the same level of response in a common test system, i.e., RP = DB[alp/Di. In the data used in this analysis, the doses are the lowest published values for a positive result in the test of comparison. The RP index is exactly analogous to the relative biological effectiveness factors used to compare various ionizing radiations to a standard such as 250-kVp Xrays or, to use another analogy, to an electric motor rated in terms of horsepower.
In this manner, it is possible to compute an equivalent toxic dosage of an interviewing compound about which little is known regarding carcinogenic or human risk in terms of a standard or (20) and Owen and Jones (7). It is desirable for the reference compounds to have been tested extensively in various bioassays so that several relativepotency values can be computed for each new compound of interest. The median of the array of RP values should be a practical estimate of the composite toxicological potency (14). The distribution of RP values and the stability of the median provide useful information about the uncertainty in doses required to cause different biological effects. For most applications, we have used the interquartile range, i.e., the spread between the 25th and the 75th percentiles, as a practical measure of uncertainty due to random errors and variations in experimental design.
Most calculations that have been based on the RASH method have considered data unselectively from RTECS on mutagenesis, carcinogenesis, reproductive toxicity, tumorigenesis, acute toxicity, and even irritation, although the user could select bioassays considered to be most relevant to carcinogenic risk. From examples shown in previous publications, the different categories of test data for most compounds usually lead to similar distributions of RP values (10,14).
In previous analyses based on RASH, the compound-specific products of RP x Regulatory Benchmark are reasonably constant for a variety of compounds evaluated by similar considerations. Alternatively, the empirical behavior can be viewed as an inverse proportionality between relative potency and permissible exposure (5,20). After extensive testing of the RASH method by six investigators, each using assumptions in accordance with individual professional and academic backgrounds, the RASH process has been found to be quite robust to different users but somewhat less robust to additional test data, especially when RPs were computed from small numbers of previously matched comparisons.
This particular study is the second application of a CRASH program (5). The standardization and simplification required for the Windows version personal computer program are consistent with the previous findings in that significant changes from earlier publications are primarily due to new test data. Generally this imprecision results from user-specific choices or variations associated with standardized algorithms used in the CRASH program. In contrast, however, additional test data for a compound (that has previously been tested only by Ames tests) and one or two relevant tests for acute toxicity may cause estimates to vary by factors of 3 to 10, while compounds evaluated only from a couple of mutagenesis assays (perhaps Ames test results with and without S9 substrate) may change by factors of 100 or 1000.
CRASH. The CRASH code was designed to be as similar to the original RASH method (14) as possible. The goal was to match each bioassay available for the interviewing or test compound with a similar result for one and only one reference compound. Reference compounds included B[a]P as a primary standard and several secondary standards that were sometimes varied from study to study. Whenever a test result for the interviewing compound was matched successfully, the calculation proceeded to the next bioassay without considering whether matches with other secondary standards were possible. Because computers are almost infinitely faster than humans at matching bioassay results and computing relative potency ratios, more comparisons between the interviewing compound and the several reference compounds provide greater accuracy-provided that the results of many bioassays matched to many different primary reference compounds are used correctly. Used incorrectly, an impressive degree of precision is achieved, but the goal for accuracy is not achieved.
It was recognized from the beginning that the RASH method did not necessarily need to match compounds according to their mechanisms of action because the definition is analogous to that of work, namely, force applied and work achieved. However, the chemical's structure often controls the selection of the bioassays used to test its potency and it is readily seen that inorganic compounds are frequently tested by bioassays and protocols that are uncommon to organics.
For this application, the CRASH analyses will typically be based on matches of all the bioassay results for a particular test compound, with corresponding test results for each of five reference compounds used one by one. The median relative potency is taken for the interviewing compound relative to a particular reference compound. This produces five compound-specific scales, each of which is normalized to unity for the reference standard. At the next step of the analysis, the five scales are standardized to a common scale by normalizing each to have unit potency for B[a]P.
It is imperative to match individual bioassays in this or a logically equivalent manner because if relative potencies are computed without any balance, Ames tests and LD50 results will propagate exponentially in numbers of matches and dominate the results. This uncharacteristic proliferation of "excessive matches" usually leads to great precision but can result in great inaccuracies.
For the same reason and because the CRASH program may be used by individuals who are relatively new to the process, the CRASH code does not run to completion when fewer than three matches are found between a particular test compound and a specified reference compound. Therefore, 3 of the 30 compounds were computed by hand according to the RASH methodsphenolphthalein, sodium xylenesulfonate, and isobutene.
Initiation. Results from mutation bioassays are available from RTECS and from the NTP battery of short-term screening tests, which was generously supplied to participants. Both sources will be used. Ames test results with and without metabolic activation have a long, complex trail as possible predictors of initiation and carcinogenesis. Ames test results from RTECS and from the NTP bioassays will be taken as one component of five considerations used to judge the probability of binary initiation, i.e., whether initiation processes can be successfully completed by any conceivable test protocol implementing that particular compound. The second component (indicated by "I" in column headings of Table 1) is based on NTP results for chromosome aberrations and sister chromatid exchanges and RTECS results for specific locus test; DNA damage, repair, synthesis, and inhibition of synthesis; gene conversion and mitotic recombination; cytogenetic analysis; sister chromatid exchanges; mutation in somatic mammalian cells; and oncogenic transformation. The third component, considered more closely related to intracellular dosimetry (indicated by "II" in Table 1), is based on NTP results from the mouse bone-marrow micronucleus assay, and RTECS data on body-fluid assay; dominant lethal test; micronucleus test; phage inhibition capacity; sex chromosome loss and disjunction; sperm morphology; and heritable translocation test. Positive results in this class without support from the other four classes are treated as questionable.
In addition, supplemental considerations were added as seen in Table 1 if existing RTECS data indicated that the compound is oncogenic in either animals or humans. Results given in Table 1 include the overall estimate (in the right-hand column) based on the strength of the total evidence for inducing positive results with respect to oncogenic transformation.
Although we have postulated that initiation has a binary, on/off behavior and is only qualitatively related to carcinogenic potency (8,9), data gaps for the 30 test compounds cause us to model initiation in a stepwise fashion for this application. We do not believe the process actually behaves in this manner, but based on the available data, there is a significant probability that we will not be able to classify a  compound correctly either as an initiator or as a noninitiator of the carcinogenic process. Instead of a graduated probability scale, we have used a classification schema based on -for negative; ? for uncertain; + for possible; ++ for moderate; and +++ for strong evidence based compositely on the five classes of data that we have taken to be relevant to carcinogenic initiation of cells. This class assignment will be used in combination with the promotion class, based quantitatively on potency-adjusted doses. Promotion. The compound-specific potency for promotion will be assigned from the product of the test dose and the median value of relative potency as estimated from results published in RTECS. As described above, the CRASH program was used to compute the relative potency for each of the 30 test compounds relative to five reference compounds-B[a]P, nicotine, cisplatin, aflatoxin B1, and cyclophosphamide. These compounds were selected because they have provided consistent results in past evaluations and because they have an abundance of test data grouped in the RTECS categories of mutagenesis, reproductive toxicity, tumorigenesis, and acute toxicity-except for B[a]P. Four of the reference compounds are rich in test results for acute toxicity and seem well-suited to the 30 compounds to be tested. Estimates of relative potency are given in Table 2. The relative potencies within a particular column are all normalized to a potency of unity for the particular reference compound shown in the column heading.
Fewer than three matches were identified for phenophthalein and isobutene, so those relative potencies were computed by hand (14). In addition, sodium xylenesulfonate was based only on an acute LD50 value (21) resulting from a literature search; no test data were listed in RTECS.
Scatter plots for the test compounds versus the reference compounds, taken two by two, are shown in Figures 1 and 2. The six panels of Figure 1 illustrate that different reference compounds lead to consistent results except for compounds that have not been tested adequately. Outliers indicated by an asterisk usually result from a small number of matches involving Ames test data or other similarly based bioassays. In this analysis of 30 compounds, many of the relevant bioassay data are based on measures of acute toxicity. B[a]P is one of the earliest known carcinogenic compounds and has never been tested comprehensively in assays for acute toxicity. Thus, as seen in the four panels of Figure 2, the absence of acute toxicity data for B[a]P makes it useless in this particular application. In contrast, the estimates shown in the six panels of Figure 1 provide adequate consistency and (when corrected to a common scale associated with unit potency for B[a]P) will permit compound-specific estimates of the power of the promoting dosage given to both sexes and species.
The data listed in Table 2 were converted to a common scale as seen in Table 3 based on conversion factors of 1, 3. The RP values from Table 3 were used to define a median value as shown in column 8. That value is reproduced in column    Table 4 and was used to modify the maximum test doses (MaxD) shown in columns 3 to 6 for male rats (MR), female rats (FR), male mice (MM), and female mice (FM), respectively. The potencyadjusted dosages are given in columns 7 to 10 of Table 4. Table 5, obtained from methods described in the appendix, was designed to use the intrinsic capacity of a compound's initiation potential (column 1) and the power of the test protocol with respect to promotion, as shown in the column headings. As seen in Table 5, both considerations were used to predict the sexand species-specific test outcomes. From the method described in the appendix, the promoting class is assigned from the median of the product of relative potency and test dosage for a database of 67 common compounds in the database of Gold et al. Classification for capacity to effectively promote carcinogenesis was assigned according to whether the potency-adjusted test dose  Potency to B[a]P Figure 2. Comparisons are the same as those described in Figure 1 Table 3. Compound-specific relative potency estimates in Table 2  Positive Positive RP, relative potency; MaxD, maximum test dose. aThe assigned initiation class in column 1 and the promotion treatment class as listed in the row of column headings taken together determine the prediction for the outcome of the NTP 2-year testing program in both mice and rats. bData from Gold et al. (12,13).  Table 5. of prediction (biological and/or chemical); and relevant comments pertaining to route of administration, exposure dose, chemical stability, solubility, alteration in gene expression, etc. The simple chemical screening tools that we have adapted from analyses of historical databases are completely inadequate for such predictive detail. We have experienced reasonable success with analyses of dose-magnitude type considerations (above). Because our concern has been for issues of risk to human health, we previously assumed that cellular initiation was a pervasive condition. In contrast, animals tested under the NTP protocols are isolated in test environments where initiation stressors are minimized. For those tests, simply assessing the promoting efficacy and ignoring initiation is generally insufficient to predict the outcome of test results. This analysis involves the use of historical data and hypothetical models designed to test whether data from general toxicologic bioassays can be used to quantitatively (but subjectively) assign categories of carcinogenic initiation and promotion. Promotion is modeled from the product of the protocol test dosage and relative potency, as computed from RTECS data. Predictions are made for compounds currently being tested. The activity is novel with respect to most conventional approaches in the biological literature and the organizers should be commended for putting science on the line to evaluate just how much general knowledge has been accumulated from decades of research (2). Although the relative potency factors as used in this application seem to have a reasonably good degree of correlation with the maximum doses tested in the 2-year studies, it is still unknown whether the considerations used to evaluate the initiation potential of test compounds and the compound-specific median relative potency can be used in matrix form to predict the outcome of the 2-year test protocols to a helpful degree. Hence, discussions and conclusions should probably be left unrecorded until test results have been reported, as was the procedure for the previous 44 compounds (22).
To rank the carcinogenic potency of 30 compounds, a scale based on deciles was used. Placement on the scale depends on index compounds placed at the extremes. Different rankings would be expected if the ranking were organized only on the range defined by the 30 test compounds as opposed to a more general scale with saccharine, ethyl alcohol, and vinyl chloride Environmental Health Perspectives -Vol 104, Supplement 5 * October 1996 of a particular test compound is >26 below the mean, > 1 below the mean, within ± (a of the mean, >ca above the mean, or >2a above the mean as given by doses shown in the headings of Table 5. The combination of promoting class and initiation class, as given in Table 5, determines the compound-specific predictions for the 2-year NTP tests of male and female populations of rats and mice, as shown in Table 6. near the low end and aflatoxin B1 and 2,3,7,8-TCDD at the upper end. The median relative potencies (based on mass) for the 30 compounds varied by about 1,000-fold (i.e., 4.51/0.00388). In contrast, the more general range of the 101 compounds considered previously (5) varied by a millionfold based on mass units of dose and twice that based on molar doses. Respective rankings of the 30 compounds in both mass and molar units are indicated according to deciles in Table 7. Also listed in Table 7 are the decile rankings of the 30 compounds on the molar scale as defined by the list of 101 compounds. On this broader scale it is noteworthy that 27 of the 30 are above the median toxicity of category 5, and 4 of the 30 compounds are in the most toxic decile of 10.
For the NTP bioassays, extraordinary care is taken to minimize secondary sources of carcinogenic initiation and this article is our first effort to model carcinogenic initiation. Clearly, the NTP protocol requires that carcinogenic initiation be treated in a realistic manner because an on/off behavior could turn an otherwise adequate promoting dose into a negative carcinogenic test result. However, because our interest is still focused on safety for humans, we provide potency scales in both mass and molar units for all 30 of the test compounds in Table 7, whether or not the compounds are classified as carcinogens.
Appendix: Use of TD50S Tested for 67 Compoun Ideal Test Dosage for R Experiments: A Range-I Definitions Potency used to describe differential toxicity of one substance when compared with another or the effect caused by one dose relative to that of a different dosage of the same substance.
Relative potency: ratio of doses required to cause the same level of toxic effect in both frequency and severity. Unit potency: indicates that an insult of study had identical toxicity to the test substance or dosage. TD50: dose of a substance that reduces the number of tumor-free animials by 50%.
MaxD: the highest dose tested in a 2-year study by the NTP.     Table Al were compared for the different reference compounds taken two by two, as shown in the scatter plots of Figure Al. As seen in Figure  Al, estradiol is evaluated inconsistently by the various reference compounds, probably because of its hormone action. Occasionally, bis-(2-chloroethyl)ether, sodium saccharine, and vinyl chloride to a lesser degree, straggle somewhat from the central tendency. In addition, the comparisons involving B[a]P demonstrate more scatter because of the absence of acute toxicity data for B[a]P. Overall, it is clear that order-ofmagnitude precision is typical between median relative potency estimates based on each of the compounds.
Using the median potency estimates for each reference compound relative to B[a]P, the results listed in Table Al can be converted to a common scale as seen in Table  A2. The conversion factors used were 3.2, 8.2, 8.08, and 1 for nicotine, cisplatin, aflatoxin, and cyclophosphamide, respectively. The median estimate is given in column 7 and will be used as the characteristic potency for each of the 67 compounds listed in column 2. The range of estimates is given in columns 8 and 9.
The median potency values from Table  A2 for each of the 67 compounds were used to produce the scatter plots of TD50 or MaxD versus RP for rats and mice, as seen in Figure A2. In Figure A2, the general relationship between potency and either the TD50 or MaxD values seems to hold. Occasionally, hydrogen peroxide (90%), bis(chloromethyl)ether, bis-(2chloroethyl)ether, and hormone-acting diethylstilbestrol are outliers on the scatterplots. Test results are typically based on 50 animals within a particular sex and species, so some randomness should be expected. However, beyond that randomness, there seems to be some added uncertainty for chemically reactive compounds that may bind to sites not directly related to carcinogenic mechanisms or to sites that act through hormone receptors.
The potency-adjusted TD50 and MaxD values were used to plot a log-probability frequency distribution, as seen in Figure  A3. From Figure A3, we can see that the data seem reasonably log-normal within ± a of the mean. The results appear to deviate above linearity for the tails of the distribution, but this is likely to be a result of a bias for selecting suspected hazards for testing as opposed to random selecting from the complete inventory of environmental and industrial pollutants.
Fits of the log-probability distribution to mice or rats (shown cumulatively in Figure A3) and to the combined data set are given in Table A3. As seen in Table A3, the central 68% of the estimates are within a factor of 11 for the TD50 data. The TD50 values may be intrinsically more variable than the MaxD doses because different dose-response models were used from compound to compound. This is supported to some degree by the result that 68% of the MaxD values are within a factor of 7.32 of the distribution mean.

Conclusions
Results from these comparisons suggest that potency-adjusted doses from past NTP test protocols may be used for range finding of ideal test doses for compounds scheduled for future testing. Alternatively, the potency-adjusted doses from past NTP experiments may be used to form an opinion as to whether a protocol test dose deriving from subchronic test results is within the acceptable range: too weak, so that false negative findings may result, or too strong, so that they may carry the possibility of causing false positive conclusions.
For simplicity, it is proposed that the ideal potency-adjusted test dose can be taken as 5 mg/kg/day, with a 68% confidence interval based on a factor of 7. This range is defined by the 67 compounds evaluated. For compounds tested below 5/7 mg/kg/day or above 5 x 7 mg/kg/day, there may be a higher frequency of false negatives and false positives. That hypothesis is applied to the 30 compounds currently scheduled for testing in the NTP rodent carcinogenesis bioassays.