Statistical limitations in relation to sample size.

The statistical difficulties of estimating cancer risks from low doses of a carcinogen are illustrated by examples from radiation carcinogenesis. Although more is known about dose-response relationships for ionizing radiation than for any other environmental carcinogen, estimates of cancer risk from low radiation doses have been extremely controversial; disagreements by factors of 100 or more are not uncommon. Direct estimation, based on data from populations exposed to low doses, is usually impracticable because of sample size requirements. Curve-fitting analyses, by which higher dose data determine lower dose risk estimates, require simple dose-response models if the estimates are to be statistically stable. The current level of knowledge about biological mechanisms of carcinogenesis dose not usually permit the confident assumption of a simple model, however; thus frequently the choice is between unstable risk estimates obtained using general models and statistically stable estimates whose stability depends on arbitrary model assumptions.


Introduction
The purpose of this paper is to illustrate some of the statistical difficulties of estimating cancer risks from low doses of a carcinogen, that is, from dose levels producing excess risks that are small relative to normal risk. The illustration is by examples from radiation carcinogenesis. More is known about dose-response relationships for ionizing radiation than for any other environmental carcinogen, and models commonly used in curve fitting have widely accepted radiobiological interpretations. Nevertheless, estimates of cancer risk from low doses of ionizing radiation tend to be extremely controversial; disagreements by factors of 100 or more are common.
Direct Estimation: Hypothesis Testing and Point Estimation Pochin (1) has discussed the difficulties of estimating the increased health risk to populations in *Environmental Epidemiology Branch, National Cancer Institute, Bethesda, Maryland 20205.
December 1981 areas of unusually high levels of background radiation. These difficulties follow from the necessity of estimating excess risk as the difference between the observed risk in a population exposed to higher-than-usual radiation levels and that in a population exposed to usual levels. In general, the difference is much smaller than the risks in the two populations; thus, changes in the dose difference between the two populations can double or triple the difference in risk between them without having a noticeable effect on the overall risk in the more heavily exposed population.

Example 1
The evidence for a linear dose-response relationship for female breast cancer induced by exposure to sparsely ionizing radiation, like x-rays or -y-rays, is strong (2). The 1972 BEIR report estimate of 6 excess cases per million women exposed per year of observation for risk, following a minimal latent period of 10 years after exposure, for each rad to breast tissue (3) still seems appropriate for women exposed after the age of 20, although not for women exposed at younger ages (2). Consider an idealized experiment in which half of a sample of N women 15 receive a single mammographic examination resulting in 1 rad average tissue dose to both breasts. Suppose the exposed and nonexposed women are otherwise comparable and suppose, for simplicity, that all were 35 years old at the time of exposure, and that followup information with respect to breast cancer incidence is available for 20 years following exposure for each woman. Ignoring the first 10 years, we might expect to see 60 excess cancers per million exposed women, in addition to the 19100 breast cancers normally seen per million U.S. women of that age in a 10-year period (4).
The numbers of breast cancers observed in the exposed and nonexposed women can be assumed to be independent Poisson random variables with means equal to N/2 times 19160 per million for the exposed and times 19100 per million for the nonexposed. The estimated yearly excess risk due to radiation, obtained as the difference between the observed rates in the two populations, has mean D = 6 x 10' and standard deviation S = [(19160 + 19100) x 10/(N/2)]½110 = 0.02766/N. For simplicity, S will be assumed known, but because we are considering only very large values of N this is not misleading; the usual estimate of S itself has standard deviation inversely proportional to N. For N greater than 10,000, the estimate D has approximately a normal distribution. Finally, we ignore the small difference between the above value for S and that corresponding to the null hypothesis of no excess risk, S = 0.02764N½. Accordingly, the cal- culations given below are based on normal approximations to the distributions of the estimate D, with mean 6 x 10r and standard deviation S, and the test statistic T = D/S, with mean D/S = 0.000217N½ and unit standard deviation. Under these assumptions we can calculate the approximate statistical power of the level 0.05 test of the hypothesis of no radiation effect on breast cancer risk against the alternative that risk increases with increasing dose, and the probability of a negative estimate of risk, both shown as functions of N in the left-hand panel of Figure 1. Power is low for N less than 100 million (it is greater than 50% only for N greater than 60 million), and the chance of a negative estimate of risk is high when power is low. A negative estimate should not be interpreted as evidence that no radiation effect exists, but such an interpretation is often made, nevertheless.
Even when power is low, the chance of obtaining an estimate that is significantly greater than zero is at least 5%. The minimum value of a statistically significant estimate is graphed in the righthand panel of Figure 1, and the curve above it is the average value to be expected given statistical significance. For sample sizes corresponding to low power, statistically significant estimates are necessarily too high: for N = 1 million, power is only a little above 5%, the probability of a negative estimate is nearly 50%, and the average statistically significant estimate is about 55 per million per Example: Hypothetical 20-year follow-up study of breast cancer incidence among N women, half of them exposed and half not exposed at age 35 to a breast-tissue dose of 1 rad. Assumed excess risk among the exposed is six breast cancers per million women per year after a 10-year minimum latency period. Statistical power, the probability of a negative risk estimate, and the minimum and average risk estimates given statistical significance at level 0.05 are plotted as functions of sample size N. Adapted from Land (11 year, or 9 times the true excess. For N = 10 million, power is 17%, the chance of a negative estimate is 25%, and the average significant estimate is 3.2 times the true excess, while for N = 100 million power is near 1, negative estimates are unlikely, and statistical significance imposes no appreciable bias. If all risk estimates received equal attention, and if studies of large populations exposed to low doses of carcinogens were easy to do, the situation illustrated in the right-hand panel of Figure 1 would present no problem, at least in the long run. Unfortunately, estimates of an effect often are considered uninteresting if unaccompanied by evidence that the effect in fact exists, and it is a commonplace among scientists that "positive" studies, those in which the null hypothesis of no exposure effect is rejected, are more likely to be reported and published than "negative," and therefore inconclusive, studies. Large studies involve great effort and expense and for that reason are unlikely to go unreported, but many possible effects tend to be investigated using the same body of data, and it is the statistically significant estimates that receive the most attention. A case in point is the various analyses of the mortality data on workers at the Hanford Plutonium Works, collected by Dr. Mancuso and analyzed first by Stewart and and Kneale (5) and, later, by others (6-9). It seems fair to say that there has been more attention paid to the two cancers for which everyone has found a statistically significant association with dose-pancreatic cancer and multiple myeloma-than to other cancers, including leukemia, for which no association was found. The point estimates for the two statistically significant sites were very high, even though these cancers, unlike leukemia, are not among those most frequently associated with radiation exposure.
Confidence Intervals. The curves presented in Figure 1, other than the power curve, highlight a common fallacy in the use of statistical methods which can be summed up as a tendency to use only part of the information available from an analysis, either from a desire to make a point or confirm a bias, or from a kind of impatience or mental laziness which leads us to reduce information to a single number. In other words, reporting (or noticing) only point estimates and whether or not the estimates are significantly greater than zero can create an illusion of precision where no precision exists. A strategy based on confidence interval estimation is less likely to be misleading but, perhaps because a confidence interval emphasizes statistical uncertainty while a single number suggests precision, this strategy is too seldom employed.
In the example of Figure 1 the event of rejecting the null hypothesis corresponds to the event that a right-infinite, one-sided, level 0.95 confidence interval for D does not contain zero. The probability of this event, therefore, is given by the power function shown in Figure 1. The probability that the true excess risk will be excluded from the interval is 0.05, regardless of sample size, and the probability that any given larger value is not contained in the interval is a decreasing function of sample size, with an upper limit of 0.05 (Table 1). Exclusion probabilities for positive values smaller than D follow the pattern of the power function, increasing with increasing sample size. In the example, one-sided, left-infinite confidence intervals for D are symmetric with right-infinite intervals in the sense that the probability of exclusion of a value D + E from a left-infinite interval is the same as that for the value D -E from a rightinfinite interval of the same confidence level. As can be seen from Table 1, the confidence interval approach is less subject to the problems highlighted in the right-hand panel of Figure 1. For example, for a sample of 10 million women, there is a 17% chance that a statistically significant point estimate of risk will be obtained, and if this happens the estimate can be expected to be 3.2 times as large as the true risk. The probability that the true value will be excluded from the 95% right-infinite confidence interval is only 5%, however, and the chances of excluding all values less than twice the true value is only 1%. These probal spond to conditional probabilities, giv significance, of 0.29 and 0.06, respE probability of obtaining a negative p of risk is 25%, but the chance that estir than half the true risk will be excl one-sided, left-infinite confidence intE 0.95 is only 2.3%, and the conditional ] this, given a negative point estimate, Sample Size as a Function of Dose of a 1-rad dose to breast tissue in I larger dose were used, the excess rit posed women would be larger, and thE tion and other functions graphed in Fi be shifted. Assuming linearity, or pr between dose and excess risk, the eff of increasing dose by a given factoi sample size is approximately the san increasing sample size by the square factor while keeping dose constant. I ple, this relationship holds approximat dose range 1-100 rad (Fig. 2). In oth 100 million women must be studied to excess breast cancer risk of 1 rad, o sand need be studied to estimate the rad.
Curve Fitting. If linearity, or an ple rule by which low-dose cancer rih ferred from high-dose risk, were knoi any given case, it clearly would be n to learn about low-dose risk by stud

18
)ilities corre-'en statistical ectively. The oint estimate nates greater luded from a erval of level probability of is only 9%. >. If instead Example 1 a sk in the exa power funeigure 1 would oportionality ect on power r for a fixed ne as that of of that same tions exposed to high doses as opposed to low doses. Because the sample size requirements for direct estimation of low-dose risk are so enormous, this is true anyway, but in the absence of knowledge about the shape of the dose-response model there must always be uncertainty about low-dose risk estimates obtained by extrapolation from highdose data. Even in the case of radiation carcinogenesis, for which radiobiological theory suggests dose-response curves limited, at least for sparsely ionizing radiation of no more than 200 rads or so, to linear-quadratic forms with nonnegative coefficients for dose and the square of dose, differences in the choice of dose-response model can make large differences with respect to estimated risks from low-dose exposures.

Example 2
:n the exam- The leukemia incidence data from the life-span tely over the study sample of survivors of the Nagasaki A-bomb er words, if for 1950-1971 (10) constitute the most useful existestimate the ing information on dose-response relationships for inly 10 thouleukemia induced by sparsely ionizing radiation. effect of 100 These data yield very different estimates of excess risk at low doses when fitted to a general linearly other simquadratic dose-response model or to pure linear or sk can be inpure quadratic models, yet the fitted curves do not wn to hold in differ markedly in their closeness of fit to the data. nore efficient That is, the chi-square values for lack of fit do not lying populaindicate that any one of these models fits the data better than any of the others. This lack of discrimination among competing models is ascribable to lack of statistical power at low doses, as illustrated in the following discussion. Table 2 gives average radiation doses to bone marrow, person-years at risk for grouped data covering the period 1950-1971 and parameter estimates from regression analyses of age-adjusted rates (11). These analyses assumed linear-quadratic, linear and pure quadratic dose-response models. In the present analysis, we assume each of these models, and for each, the estimated parameter values are assumed to be true. For each assumed dose-response function, we consider the statistical properties of curve-fitting analyses using different dose-response models. In particular, statistical power is calculated for level 0.05 hypothesis tests of the coefficients of dose and dose-squared, against positive alternatives. These calculations are based on normal ap-  Table 2 is assumed, but the personyears at risk are uniformly multiplied by factors Environmental Health Perspectives between 0.1 and 10 in order to show the dependence of power on sample size. Dependence on average dose level is illustrated by parallel calculations in which all dose values are assumed to be reduced by one tenth. Table 3 summarizes the findings for analyses assuming the linear-quadratic model.
The linear-quadratic model analyses indicate a strong dependence of power on the true parameter value. The power function for tests of the linear coefficient of dose is largest when the linear model is assumed because the assumed linear coefficient is largest according to this model, and it is least when the pure quadratic model, with zero linear coefficient, is assumed. When the linear-quadratic model is assumed, the power for tests of the linear coefficient is high only after the numbers of personyears have been increased by a factor of nearly 10, which explains why analyses of the original data did not discriminate well between the linearquadratic and pure quadratic models. Similarly, power is low for tests of the dose-squared coefficient unless the assumed sample size is increased, explaining the lack of discrimination between the linear-quadratic and linear models. The values for the reduced dose levels illustrate the formidable sample size requirements for complex curve-fitting analyses of low-dose data.
Power calculations for the linear coefficient of dose, assuming a linear model analysis, are shown in Table 4, and those for the quadratic coefficient of dose, assuming a pure quadratic model analysis, are shown in Table 5. An important difference between these calculations and those in Table 3 is that the linear-quadratic model is a general one, including the linear and pure quadratic models as Table 3. Power calculations for linear-quadratic model analyses of Example 2, assuming dose values, person-years at risk, and dose-response functions shown in Table 2. Values correspond to multiples of the person-year array in Table 2, and to the given dose array and the array divided by 10  special cases. Thus no bias is introduced by doing a linear-quadratic model analysis of data when the true dose-response relationship is linear or pure quadratic, although, as can be seen from a comparison of the tables, there will be a loss of power from using an unnecessarily general model. Using a linear model to analyze data corresponding to a nonlinear dose response does introduce bias, however. In such a case, the linear model analysis estimates the average excess risk over the range of doses represented, but unlike the linear coefficient in a linear-quadratic model analysis (assuming the true dose response is no more complicated), this value cannot be interpreted as the excess risk per rad at low dose levels. Thus the value to be estimated by a linear model analysis depends not only on the true model but also on the dose distribution of the data; for example, the linear-quadratic dose response with linear coefficient equal to 1 per million, and quadratic coefficient equal to 0.01 per million, corresponds to an average excess per rad of 2.34 per million over the dose distribution in Table 2, but only 1.13 per million over the dose distribution scaled down by a factor of 10. Similar considerations apply to linear model analyses of pure quadratic dose-response data, and to pure-20 quadratic model analyses of linear and linearquadratic data. Perhaps the most surprising thing about Tables 4 and 5 is that power, using linear and pure quadratic model analyses, should appear to depend so strongly on the value of the parameter to be estimated and so little on whether or not the model assumed in the analysis is the same as that generating the data. In other words, lack of fit appears to have little to do with power. The second noteworthy observation, which has already been made, is that the protection against bias obtained through use of a more general model has a cost in reduced power.

Summary
There are formidable statistical difficulties associated with refined estimation of risk from exposure to carcinogens at low dose levels. These difficulties are unlikely to be overcome by sample size expansion or by curve fitting, unless it can be established independently that the dose-response relationship is a particularly simple one. Research into the biological mechanisms of carcinogenesis would appear to be an essential part of the estima-Environmental Health Perspectives tion process, by which plausible models can be derived. In the case of radiation carcinogenesis, radiobiological theory suggests that linear model analyses, confined to doses under a few hundred rads to low-LET radiation, may give credible upper limits of risk at low doses, in the form of confidence limits. Although more refined solutions may eventually appear, the concept of upper limits based on conservative, simple models is a useful one, adequate for many purposes.