Statistical analysis of data in mutagenicity assays: rodent micronucleus assay.

To evaluate chemical safety, many kinds of short-term mutagenicity assays are performed together with long-term assays in animals. Rationales and methodology for these assays have been well discussed and documented. No statistical method, however, has been singled out as the method of choice for the evaluation of mutagenicity assay data, although a number of reports on statistical methods to evaluate such data have been published. Among the mutagenicity assays, the micronucleus assay using mouse bone marrow erythropoietic cells have been widely used to assess cytogenetic activities of test chemicals. A statistical evaluation procedure for this assay is proposed herein, combining the use of historical control data and dose-response relationships. The probability of type I errors and the power of this method are compared with those of some other conventional methods by Monte Carlo simulation.


Introduction
Several published guidelines for mutagenicity studies require appropriate statistical treatment ofassay data. Although there are a large number of publications on the statistical evaluation of mutagenicity test data, no one method is recommended exclusively. Many of the methods are statistically validated but toxicologically impractical. Thus, many empirical methods are still used for data evaluation. This situation may be attributable to inadequate communication between biologists and statisticians. An attempt at improvement has been made by the U.K. Environmental Mutagen Society, and an overview and recommendations have been published (1).
Many kinds of mutagenicity assays have been performed in the course of safety assessments of chemicals. Gene mutation, chromosomal aberration, and DNA modification are main end points. The micronucleus assay, an in vivo chromosomal aberration assay, is targeted here to establish a statistical evaluation ' procedure that is practical and should be readily accepted by biologists (2).

Micronucleus Assay
Male ddY mice, 8 weeks old, were used for the micronucleus assay. The experiment consisted ofat least three dose groups and concurrent negative and positive control groups. Each group had six mice. The highest dose level and the sampling times, which were the most important factors in the assay, were optimized experimentally (3). Second and third dose levels were one-halfand one-quarter of the highest dose level, respectively. After chemical treatment, mice were killed and femoral marrow cells were smeared on clean glass slides, fixed with methanol for 5 min at room temperature, and stained with Giemsa or acridine orange (4). The frequencies of micronucleated polychromatic erythrocytes (MNPCEs; young erythrocytes with one or more micronuclei) were scored microscopically based on an observation of 1000 polychromatic erythrocytes (PCEs) per animal.

Characteristics of the Micronucleus Assay Data
The characteristics of data obtained from the micronucleus assay are as follows: a) The frequency distribution of MNPCE, the target cell being analyzed, is binomial, at least in the negative control groups (Fig. 1). b) The historical negative and positive control data can be constructed relatively easily. Control charts of MNPCE data from negative and positive control groups accumulated atthe National Institute of Hygienic Sciences, Tokyo, are shown in Figures 2 and 3, respectively. c) Each chemical is assayed atat leastthree dose levels, to assess a dose-response relationship; d) It is not difficult to repeat the assay ifthe results are marginal or problematic, unlike long-term animal experiments.

Strategy of the Evaluation
First, the concurrent negative and positive control data are tested to validate the assay system. Second, the treatment-induced response is evaluated for each dose group. Third, the doseresponse relationship is assessed. This strategy is comparable to that of the empirical evaluation process used by biologists in toxicological studies.

Proposed Procedure
The procedure consists ofthe following three steps (Fig. 4) and accepts as significant an overall p-value of 5 0.01.
Step 1. The concurrent negative and positive controls are compared with the historical control. If the mean frequency of MNPCEs deviates from the historical mean by >3 standard deviations, the experiment is discounted and a new one is performed for the same chemical.
When type I error was simulated against the number of mice per group, the Cochran-Armitage trend test showed almost constant type I error near the nominal level (Fig. 5). Although type I errors ofthe other methods were not exactly controlled at 0.01, the probabilities were comparable among the four methods. The statistical powers of the methods were simulated with four groups. The doses were set as d0, d,, d2, d3 = 0, 1, 2, 4 mg/kg.
The population proportions of MNPCEs were set as wo, 7rI, 7r2, 7r3 = 0.2, 0.25, 0.3, 0.4%. The number of mice per group was set as 2-10. The simulated powers of the four methods were ranked as binomial test > three-step method > > Cochran-Armitage trend test > conditional binomial test (Fig. 6).
The experiment could be rejected ifthe concurrent control data (e.g., step 1 ofthe three-step method) deviated greatly from the historical control data, depending on the current experimental conditions. When the deviation was small, however, the robustness of the method was important. When data were biased downward by 10%, i.e., when the frequencies ofMNPCEs in all groups were lowered 10% from the expected values (based on historical data), the power of the three-step method was the highest (Fig. 7) in spite ofthe decrease of a type I error. On the other hand, when data were biased upward by 10% (Fig. 8), the binomial test showed the highest power, with the three-step method as a close second; moreover, the increase oftype I error for the three-step method was not as great as the binomial test. Although the type I error for the Cochran-Armitage trend test was constant and nominal even when data were biased in both directions, the power ofthis method, as well as ofthe conditional binomial test, was lower than that of the binomial test or the three-step method. The strategy ofthe proposed three-step method is practical and should be readily accepted by toxicologists. The results concur with toxicological judgment (2). For the overall evaluation of micronucleus test data, reproducibility is also important. Ifthe results of the statistical evaluation disagree with the intuition of the investigator, an additional experiment is recommended to confirm the test result.
Prerequisites to apply the proposed three-step method are as follows: a) negative and positive historical control data must be Number of mice per group FIGURE 4. Strategy ofthe evaluation ofthe micronucleus assay data. First step: the concurrent negative and positive control data are tested to validate the assay system itself, and, if it is not acceptable, a new experiment should be performed. Second step: data from each dose group are evaluated to determine the increase of response compared with thehistorical control. Third step: the dose-response relationship is assessed. After testing these steps, chemicals would be declared negative or positive in the micronucleus assay.   (7)]; (-) the proposed three-step method (significance level: 0.05/no. ofdose groups for step 2, 0.05 for step 3, and 0.01 overall).  available for the relevant mouse strain, and the distribution of the negative control is binomial. b) For every new chemical, dose levels and sampling times must be optimized, possibly by a dose-and sampling-time-finding pilot experiment. c) Slides should be coded and examined without any knowledge about treatment, preferably by the same investigator(s). d) The frequency of MNPCEs should be based on the observation of at least 1000 PCEs per animal. e) Both negative and positive control groups must be included in an experiment. It is most important that the test results are credible and reliable technically and biologically. After statistical evaluation ofdata, the results ofan experiment are often interpreted as fact. But statistical methods cannot evaluate the quality ofthe data, and they might lead to the impression that there were no problems in the data. Therefore, it is the responsibility ofexperimental toxicologists to generate reliable data for statistical analyses.