Statistical issues on the no-observed-adverse-effect level in categorical response.

The determination of the value of the no-observed-adverse-effect level (NOAEL) when observed responses can be categorized by severity (categorical data) and sample sizes are small is discussed. The common situation of only two categories, where only the presence or absence of an effect is observed, is addressed first (dichotomous data). Three tests for dichotomous data are critically examined, including the Brown-La Vange test, a modified version of that test, and Dunnett's multiple comparison test. Although the modified test is an improvement, all three procedures have shortcomings in determining the value of the NOAEL, particularly when the sample size is small. An alternative method is suggested, based on the Akaike information criterion (AIC), which performs well. This method is extended to severity data with an arbitrary number of categories. Use of a dose-response curve for the NOAEL is discussed.


Introduction
As used here, the no-observed-adverse-effect level (NOAEL) is the highest experimental dose at which there is no statistically significant increase in an adverse toxicological end point. This definition restricts the possible values of the NOAEL to the experimental dose values, the only dose levels at which there are observations. Sometimes a dose-response curve is fit to the data, which provides a way of estimating the NOAEL as the lowest dose corresponding to the point on the curve at which the predicted response equals the control rate plus a specified value equal to an acceptable level ofincreased risk. At low-dose levels, the NOAEL dose may be sensitive to the choice of the doseresponse curve fit to the data, particularly in small samples. Consequently, this approach has been suggested for determining the "benchmark dose" as an alternative to the NOAEL, a lower confidence limit to a dose producing some predetermined increase in response rate that will not involve extrapolation far below the experimental range (1). The concept ofthe NOAEL is central to assessment of risk from systematic toxicants, as currently practiced. Inclusion of the NOAEL value in reported laboratory experiments is recommended by the Pharmaceutical Affairs Bureau, Japanese government (GLP, 1989 statistical approach that may be used fot the NOAEL is to test the hypothesis ofno difference in the true response rates between the control group and a treatment group, pairing the control group for a test with each treatment group sequentially. Williams' test functions this way and can be applied when the data are assumed to be sampled from a normal distribution, e.g., when response is weight gain (4). A nonparametric version of that test for use when data are from a continuous but non-normal distribution is described by Shirley (5) and Williams (6). These tests are order restricted, incorporating a prior knowledge that the expected response does not decrease (or increase) as dose level increases. We are unaware of any test in this class for categorical data applicable when severity of response is recorded. For simple dichotomous data (two categories), the test ofBrown-La Vange (7) and a modified version of that test described here are examples of order-restricted conditional tests. For dichotomous responses, considerable attention has been focused on applying dose-response curves for both cancer and noncancer responses. Crump (1) essentially converted his multistage model for cancer data to noncancer application by adding a parameter for a "threshold" dose.
In this paper we are interested in the NOAEL for categorical data, including dichotomous data as a simple case (k=2), from the statistical point ofview. Issues related to regulatory applications, such as the use of safety factors with the NOAEL, are not discussed. We study first the behavior of three tests with dichotomous data, including the Brown-La Vange method, a modification ofthis method, and the multiple comparison test of Dunnett. It is shown that, although the modified test is an improvement over the other two tests, all three tests have serious shortcomings when the sample size is small. A new test implementing the Akaike information criterion (AIC) (8) is shown to work well. The AIC test is generalized to an arbitrary number of categories for application with severity data. Finally, application of the AIC with a dose-response model for noncancer end points is outlined, to be more fully developed in a follow-up paper (Yanagawa et al., unpublished data).

Tests for Dichotomous Response Data
An experiment with dichotomous response data is described by the number of experimental subjects at risk (ni), the number with the response of interest (ri), and the exposure level (di), for i = 0,1,..., k. The subscript zero refers to the control group, making do = 0; otherwise the dose values are arbitrary, subject to order 0 = do < di <... <dk. The true, but unknown response rate at dose di is denoted bypi, i = 0,1,..., k. It is assumed that the samples are random and mutually independent, and that the number of responses ri at di is binomially distributed with parameters (nipi), i = 0,1,... ,k. It is also assumed to be known a priori that the true response rate is nondecreasing as dose increases, i.e., 0 <po:5pi < ... pk 1. Alternatively, one could assume that 1 >po.P, > ... >PkO0.
Let (* denote the largest di value such thatpo = pi. The test procedure to be described is a method by which to assign the NOAEL a dose value based on the sample data, conditional on the total number of responses observed over all dose groups, namely, S(r) = (ro +r, + . . . +rk). In the following section, we describe the Brown-La Vange (BLV) test, a modified form of it (MBLV), and the Dunnett-type multiple comparison test (DMC). The tests are compared when k = 2 for simplicity.

Brown-La Vange Test
Without the constraint po pI ... <iPk, the maximum likelihood estimate (MLE) ofpi is r1/ni. The MLE ofpi under the order restriction, however, is mi/ni, where mi is constructed by the pool-adjacent-violators algorithm (9,10). The BLV stepup tests are based on the values of (mo, ml, .. . ,mk), as described for k = 2 in the following. Initially, the null hypothesis H01: Po = p, is tested against Ha': Po<p,. If H,' is rejected, then the NOAEL takes the value do; if it is not rejected, then the NOAEL is di or d2, as determined by the subsequent test. Thus we could writeH0': 6 = dord2versusH0':a *1 = d0. Lett, = m,/n,mo/no be the test statistic for H0'. For a specified test size, a,, reject H0' if t, takes a value as large as k,, where k, is the smallest constant such that Pr[ti 2 kI S(r)] < aI, when Ho' is true.
Here IS(r) should be read as "conditional on S(r) = (ro + r, + r2)." If H0' is rejected, then the NOAEL takes the value do. If H,' is not rejected, then H02: po = p2 |Po = pi should be tested, where IPo = pi should be read as "conditional on having not rejected H,,: po = pi." The alternative hypothesis can be written Pr(t2 2 k2 I S(r), tj < kl) = Pr(tj < k1, t2 2 k2 I S(r)) < a Pr(ti < ki I S(r) ) under H02.
Dunnett-Type Multiple Comparison Test (1) Alternatively, we may apply the Dunnett multiple comparison test (DMC) for the NOAEL based on the adjusted response. For a specified test size, a, this test first selects the smallest constant k such that Pr(t2< kIS(r)) 2 1-ac (2) underpo = pi = P2, from which the NOAEL is determined according to: if t, 2 k, then NOAEL = do; if t, < k and t2 > k, then NOAEL = di; if t, < k and t2 < k, then NOAEL = d2.

Modification to the Brown-La Vange Test
This test pools the responses at do and di, if no significance difference is detected between these dose levels to increase the power of the test. The test is based on the values of (ro, r,,... ,rk), the naive responses. The test procedure is the same as that of the Brown-La Vange test except the test statistic. The test statistic fbr H0' is u, = rI/nIro/no, and the test statistic for H02 is U2 = r2/n2 -(r + r1)/(no + n,). For a specified test size, Of,, the test rejects H0' if u, 2 k,*, where k, is the smallest constant such that Pr[ u, 2 k,*I S(r) ] . a,, when H0' is true. For a specified test size, C2, the test rejects H02 if U2 2 k2*, where k2* is the smallest constant such that Pr(u2 2 k2 I S(r), ul < kj ) < al (3) under H02.

Small-Sample Behavior of the Tests
We compare the tests in detail when k = 2, no=n=n2=10, and S(r)=4. When S(r)=4 is given, the number of all possible configurations of the tables of no =n, =n2 =10 is 15, as shown in Table 1. The probability of each entry in the table, when Po=PI =P2, has been computed from a multiple hypergeometric distribution and included in the table. Consequently, the probability is the chance occurrence ofan entry in the absence ofan effect.
The distributions of statistics t, and u, have been tabulated from the entries in Table 1 and are displayed in Table 2. Table 2 shows that, in the case of the conditional test based on the adjusted response, the values of the test statistic t, take only four points with positive probability, and thejumps ofthe cumulative Table 1. List of all feasible tables when no =n, =n2=10 and S)r) = 4.
Entry Number  Response  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15   ro   0  0  0  0  0  1  1  1  1  2  2  2  3  3  4   r,   0  1  2  3  4  0  1  2  3  0  1  2  probability are so large that no finite k, exists when the values of o, are specified less than 0.1253. Similarly, the Dunnetttype multiple comparison test does not select do as the NOAEL when the values of a are specified to be less than 0.221. The modified test is also a conditional test, but based on the naive response, and the statistic uI takes more values than t,, and the jumps of the cumulative probabilities are relatively small. Thus we may test H0' at test sizes less than 10%, e.g., at a, = 0.0515or0.0077. The three tests are applied to each entry in Table 1 with the results summarized in Table 3. When entry no. 4 or 5 is observed (Table 1), the modified test selects do as the NOAEL at the test size a, = 0.10 and a2 = 0.1004; when entry no. 1, 2, or 6 is observed (Table 1), then d, is selected as the NOAEL; and when any other number is observed, d2 is selected as the NOAEL. The probabilities of the correct decision for MBLV under Po = pI. P2 (case 2) and po p, = p2(case 3) may be computed using the formula P(ui< kj, u2.> k I S(r)) = PrKu1 < k; I S(r) )P~u2 > 41; S(r), ul < kl ) (4) by specifying the values ofpo and the values ofthe added risk2 -po. Figure 1

Summary: Flaws of the Statistical test
The findings from Tables 2 and 3 and Figure 1 are summarized as follows: a) The BLV test failed to select do as the NOAEL at the routine test size, i.e., a, = 0.05 or 0.10. The same is observed for the DMC test at a = 0.10 or 0.20. b) For a step-up test, such as the BLV, the influence ofthe first step is considerable. The key is in the selection of the value ofoa,. For example, the probabilities of the correct decision by the BLV (and the DMC as well) is zero in case 3 at the test sizes a, = 0.10 (at = 0.20), even when the added risk is 0.30, because of the reason stated above. It is apparent from Table 3 that if we specify oa, = 0.1253, the behavior of the BLV test is much improved. The problem is that it is not easy to determine the test size to use. c) The DMC test is not a step-up test, but has a similar property to the BLV test. Generally, if sample sizes are small and a test is constructed based on the adjusted responses, then the jumps in the values of the tail probabilities are remarkable, frequently larger than 0.05. It is notjustifiable in those situations to carry out a test with a routine test size of0.05. d) The modified test (MBLV) removes the difficulty due to the first step and performs better than the BLV or DMC test.
With small sample sizes, however, the probability ofthe correct decision in case 3 is disappointingly small. e) A puzzling aspect of the modified test may be noted. Suppose that entry no. 4 in Table 4 is observed. Ifwe set a, = 0.05 and a2 = 0.10, then  The selection order of the values d as the NOAEL would reasonably follow the pattern d2 -d,do, instead of jumping from d2 to do. The same phenomenon occurs with BLV and DMC. J) We have applied the three tests to other small-sample tables and have observed that the smaller the sample sizes, the larger the values selected as the NOAEL. This behavior, discussed by Crump (1) and others, is unacceptable because smaller samples tend to make the dose levels appear safer. Brown and Erdreich (7) emphasized calculation ofstatistical power to detect an effect level of interest before drawing a conclusion. Those calculations, however, are cumbersome. A preferred approach may be to consider jointly the test size and sample size. It is not easy to develop this idea in the framework of statistical testing, but it can be achieved in the framework of model selection. We explore the use ofthe Akaike information criterion (AIC) for this objective in the next section.

Application of the AIC
We continue with the same notation and conditions described in the previous sections, i.e., k =2 with dichotomous data. Let YIP= log1( )Po)), Y2 =10g(P2(l PO.) ( -PI )Po (-P2 )Po (5) The parameters oy and 'y2 are the log odds ratios ofthe effect at diand d2, respectively, relative to the effect at do. Note  if and only if -YI 0, Y2 0°a nd that the order restriction po`PI < P2 is equivalent to 'Y 2 0, and Y2-YI> 0.
The conditional log likelihood conditioned on S(r) = (ro+rl +r2) is l(I ry2)= const + yi r I+72 r2, log I , X2( ) (nixi , n -x2) exp(rI xl + rt X2), (6) were Is X X!X xl ,x2} (S-XI -x2 ) !. xI ! x2 ! (7) and E* is the summation that extends over all integers xi and x2 suchthatni . This procedure is applied to the entries in Table 1. The results are given in the last row ofTable 3. Figure 2 illustrates the probability of the correct decision for case 2 and for case 3. Comparing these results with the outcomes ofthe preceding tests, one can clearly see the superiority ofthis method. In particular, the AIC method relieves the problem of selecting the test size described earlier and increases the probability of a correct decision.  assigned. We introduce the following model for the response probabilities: log(?2L)=fljp(Cj-Co.), j=1,2,...,b: i=O,1,...,k (8) PjO It is assumed to be known apriori that gio ,BI A... . (k. Alternatively, one could assume that go> 2( a... o(k. This assumption generalizes the previous assumption regarding the order restriction of the response probabilities. Put S(rj) = ro0 + ry + ... + rki. The conditional log likelihood of I rij) conditioned on S(rj), j = 0, 1,... ,b, is given by: where oyl = (i-g3o and E* is the summation that extends over all combinations of the integers I x0j, x, . . ,xkj ) such that ni 2 xi > 0 and XOj+xIj+... +Xkj = S(rj), j = 0,1,...,b. The log likelihood shows that it is sufficient to carry out the statistical inference on -yI, Y2. .., and 'k based on statistics b T = A rij(Cj -Co )., i = 1, 2,..., k j=1l (10) The model (8), which seems somewhat artificial, is a mathematical device to lead to this reasonable result. The order restriction is represented by -yj 2 0, and yiyi-, 2 0, i = 2,3,... ,k. The AIC is applied for the determination ofthe NOAEL taking this restriction into account. for k = 2, the procedure is the same as that given in the preceding section.

Use of a Dose-Response Curve
We define

Extension of the AIC for a NOAEL in Categorical Data
We extend application of the AIC to determination of the NOAEL in categorical response data. Suppose that there are b+1 categories, and let rj be the number of responses in the jth category at exposure level di, i = 0,1,... ,k; j = 0,1,... ,b. Letpij be the response probability at the ith exposure level and jth category. It is assumed that the samples are random and mutually independent and that the response at dose i(rio, r,1,...,rib) are multinomially distributed with parameters (ni, pio, pfl,. . . ,pib), i = 0,1 ... ,k. Let Co0C, A... .Cb be given scores that are assigned to the categories. For example, we might assign Co = 0, C, =1,... ,Cb=b, or alternatively, the Wilcoxon score could be (11) (1), we may introduce a threshold factor. That extension, and the construction of a reliable confidence interval, will be discussed in a follow-up paper.

An Application
Fitzhugh et al. (11 ) report results ofexposing Osborne-Mendel rats for 2 years to diets containing aldrin in 0,0.5, 2, 10, 50, 100, and 150 ppm. The study reports the degree of liver changes categorized as none, trace, very slight, slight, slight/moderate to moderate, and greater than moderate. For the purpose of illustration, we use a part of data as shown in Table 4. The scores are assigned as Co=0, C, =1, C2=2, C3=3, and then the AIC procedure is applied. The conditional MLE of -y1, 'y2 and Y3 are obtained as jb =  We also extended the modified test (MBLV) to apply to these data for comparison. The test leads to d2 as the NOAEL for caI = 0.10 and Ci2 = 0.10. The dose-response curve method is also applied for comparison, particularly because it is not restricted to experimental dose values for choice of the NOAEL.

Discussion
We have developed several methods of selecting the NOAEL when the responses are measured by severity and also when the sample sizes are small. Our conclusions are as follows: a) Ifone wants to select the NOAEL from the experimental dose levels I d0,dl,.. . ,dk ), then implementation of the AIC in the order restricted likelihood method is preferable to a testing approach, as demonstrated for three alternative test procedures. b) If one wants to select the NOAEL from the full experimental range of doses, (do to dk), then a dose-response curve is required to estimate responses between observed values. The choice of the dose-response curve may affect the outcome, but fitting the "average dose-response curve" as described is reasonable. The choice of c in that model should be chosen carefully. The NOAEL can be based on either relative risk or additive risk, depending on one's objective.