Natural variability and the influence of concurrent control values on the detection and interpretation of low-dose or weak endocrine toxicities.

While defining the no effect level for the 5 alpha-reductase inhibitor finasteride in the Hershberger assay, we encountered an inverted-U low-dose trophic effect on the prostate gland of the rat. Two attempts to confirm this observation were unsuccessful, and we concluded that the positive effect initially observed was associated with normal biologic variability. During the same period we attempted, unsuccessfully, to repeat our own observation of weak uterotrophic activity in the rat for the sunscreen 3-(4-methylbenzylidene)camphor (4MBC). Further evaluation led us to conclude that 4MBC is uterotrophic only when the control uterine weights are at the low end of their normally encountered range. This led us to reevaluate our earlier mouse uterotrophic assay data for bisphenol A (BPA). Originally we had concluded that BPA gave irreproducible evidence of weak uterotrophic activity, but upon ordering the eight experiments we had conducted, according to decreasing control uterine weight, we confirmed reproducible weak uterotrophic activity for BPA when the control uteri were at the low end of their normal range. In this article, we describe these observations, together with a reanalysis of the data associated with several reported instances of weak or low-dose endocrine effects that have proven difficult to confirm in independent laboratories. These include the activity of BPA on the CF1 mouse prostate; the activities of BPA, octylphenol, and nonylphenol on the rat testis; and the effect of polycarbonate caging on control mouse uterine weight. In all of these cases, variability among controls provides a major obstacle to data interpretation and confirmation. Our recommendations on experimental design are also presented, with a view to ending the current impasse on the reality, or otherwise, of low-dose or weak endocrine toxicities.

Two issues in the field of chemically mediated endocrine disruption (ED) are the possible occurrence of ED effects at dose levels lower than the no effect level defined by classical toxicity evaluations, and how to interpret weak chemically induced ED effects that lie within the range of normally encountered control variability (Ashby 2003). We have studied several reports of low-dose ED effects over the past 6 years, so far without being able to confirm any of them; these chemicals include butylbenzylphthalate (BBP) (Ashby et al. 1997;Sharpe et al. 1995Sharpe et al. , 1998; bisphenol A (BPA) Nagel et al. 1997;Sakaue et al. 2001;Talsness et al. 2000;Tinwell et al. 2002a); and nonylphenol (NP) (Colerangle and Roy 1996;Lee 1998;Odum and Ashby 2000;Odum et al. 1999). In contrast, we were able to confirm (Tinwell et al. 2002b) the weak uterotrophic activity of 3-(4-methylbenzylidene)camphor (4MBC) reported by Schlumpf et al. (2001aSchlumpf et al. ( , 2001b, despite the inability of Bolt et al. (2001) to confirm this activity.
Recently, we encountered two instances that may throw light on the origins of these reproducibility problems. The first of these was the observation of an apparent inverted-U low-dose effect for the 5α-testosterone reductase inhibitor finasteride (FIN) in the Hershberger antiandrogen assay, an effect we were unable to reproduce. The second was our inability (after publication of Tinwell et al. 2002b) to reconfirm our own observation of uterotrophic activity for 4MBC. In each case, further studies revealed that control variability between experiments was probably at the root of these reproducibility problems. This conclusion led us to reevaluate earlier findings for BPA in the mouse uterotrophic assay (Tinwell et al. 2000) and to analyze earlier reports of low-dose effects for BPA (Howdeshell et al. 2003;Nagel et al. 1997;Nonneman et al. 1992;Sakaue et al. 2001) and NP (Lee 1998).

Materials and Methods
Chemicals. We obtained testosterone propionate (TP) and FIN from Sigma Chemical Company (Poole, Dorset, UK). Tocopherolstripped corn oil (ICN Biomedicals Inc., Basingstoke, Hampshire, UK) was used to suspend FIN and TP by homogenization (Ashby 1987). We obtained 4MBC ; Merck batch TT805785 029) from ChemQuest (Wilmslow, Cheshire, UK) and diethylstilbestrol (DES) from Sigma Chemical Company. Antarelix (ANT) was a gift from Europeptides (Argenteuil, France). We used arachis oil (AO; Sigma Chemical Company) as the vehicle for 4MBC (homogenized as described previously; Tinwell et al. 2002b), DES, and ANT. All chemicals were stored at room temperature except ANT, which was stored at 4°C. All preparations were shaken vigorously just before and during dosing. All experiments were conducted according to U.K. Home Office guidelines as described in the appropriate Home Office license for animal use.
Hershberger assays. Animals. Male Alpk:APfSD rats (animal breeding unit at AstraZeneca Pharmaceuticals, Macclesfield, Cheshire, UK) were castrated at 6 weeks of age via midline incision followed by 8 days of recovery. In each study, the animals were randomized into groups of six rats of approximately the same initial mean body weight and range. They were group housed at a maximum of six per cage in solid-bottomed polypropylene cages containing sawdust (Wood Treatments Ltd., Macclesfield, Cheshire, UK) and shredded paper as bedding. Fun tubes and houses were provided as environmental enrichment. All animals were given RM1 diet (Special Diet Services Ltd., Witham, Essex, UK) and water ad libitum for the duration of the experiment.
Study design. The experimental method has been described previously (Ashby and Lefevre 2000). Rats were given 10 consecutive daily treatments consisting of a subcutaneous (sc) injection of either the corn oil vehicle or TP (0.4 mg/kg) and an oral dose of either vehicle or the relevant dose of FIN. We used 5 mL/kg and 2 mL/kg dosing volumes for oral and subcutaneous administrations, respectively.
The animals were killed 24 hr after the last treatment by an overdose of halothane (Concord Pharmaceuticals Ltd., Essex, UK). The following tissues were removed and weighed at necropsy: liver and combined kidneys (to 1 mg), Cowper's glands, levator ani/bulbo cavernosus muscles (LA/BC), seminal vesicles (with coagulating gland), ventral prostate, glans penis, and combined adrenal glands (all to 0.1 mg). Changes in tissue weights were evaluated for statistical significance using analysis of variance (ANOVA) and analyses of covariance (ANCOVA) with final body weight.
Environmental Health Perspectives • VOLUME 112 | NUMBER 8 | June 2004 Uterotrophic assays. Animals and dosing. The experimental methods were as described previously (Odum et al. 2001). Female Alpk:APfSD (Wistar derived) rats were obtained from the barriered animal breeding unit (AstraZeneca). These were 19-20, 20-21, 21-22, or 22-23 days of age at the beginning of the study. The animals were housed up to a maximum of six per cage as described for the Hershberger assay. Each female received an sc injection of the appropriate compound on 3 consecutive days and was killed 24 hr after the last dose. On the basis of previously published data, 4MBC was given at 1 g/kg (Tinwell et al. 2002b) and ANT was given at 300 µg/kg (Odum et al. (2001); 5 µg/kg DES was used as a positive control. We used a dosing volume of 5 mL/kg body weight when animals were exposed to a single compound. However, in studies in which the animals were exposed to test compound and ANT, we used a dosing volume of 2.5 mL/kg body weight. Females were exposed to ANT 15 min before receiving the appropriate test compound (AO, 4MBC, or DES). Study design. 4MBC had previously been shown to induce a significant increase in uterine weight in immature rats (Tinwell et al. 2002b). Thus, the initial experiment in the present study was designed to repeat those original observations. We also included additional groups of animals exposed to 4MBC together with the gonadotropin-releasing hormone (GnRH) antagonist ANT. In this experiment, 4MBC failed to significantly increase the uterine weight; however, the control uterine weights were greater than usual. Therefore, we repeated the experiment. In the final two experiments, we used animals of different ages at the start of the uterotrophic assays. All animals were killed 24 hr after the last dose by an overdose of fluothane followed by cervical dislocation. The uterus was removed, trimmed free of fat, gently blotted, and weighed. It was then placed in a preweighed vial, dried overnight (24 hr) at 70°C, and reweighed. All data were assessed for statistical significance using a two-sided Student's t-test.

Results
Hershberger assays. The assays for FIN described in this article represent a continuation of previous experiments (Ashby and Lefevre 2000;Blohm et al. 1986) and studies conducted as part of the current Organisation for Economic Co-operation and Development (OECD) validation of the Hershberger assay. The OECD studies demonstrated that a dose of FIN as low as 200 µg/kg/day inhibited the growth of the seminal vesicles, ventral prostate, and glans penis of TP-treated rats (J. Ashby, unpublished data). Subsequently, we have shown that doses of FIN as low as 8 µg/kg/day are active in the assay (Ashby et al. 2004), with the prostate gland being the most sensitive tissue (Figure 1). The present experiments were designed to determine the no effect level for the inhibitory effect on the growth of the prostate in TP-treated animals. In all of the experiments, a dose of 5,000 µg/kg/day FIN significantly inhibited the growth of the seminal vesicles, ventral prostate, Cowper's glands, and glans penis compared with animals administered only TP (Table 1). In one study (experiment 1, Table 1) growth of the LA/BC was also significantly inhibited.
In the first low-dose experiment (experiment 1, Table 1), doses ranging from 5,000 µg/kg/day to 8 µg/kg/day FIN were all clearly active in the assay. The lowest dose tested (8 µg/kg/day FIN) affected all tissues except the LA/BC and the Cowper's glands. In experiment 2 (Table 1), the only tissues significantly inhibited at 8 µg/kg/day FIN were the seminal vesicles and glans penis. The glans penis was reduced in weight at all dose levels of FIN down to 0.1 µg/kg/day ( Figure 2). However, the ventral prostate weight was significantly increased (p < 0.01) for the 2 µg/kg/day dose of FIN. In attempts to repeat this inverse prostate weight dose response (Figure 1), we conducted two further experiments (experiments 3 and 4, Table 1) using a tighter dose range of FIN around the dose that had increased prostate weight in experiment 2. In the first of these repeat studies (experiment 3, Table 1), the only significant weight change observed was an increase in the weight of the Cowper's glands at 5 µg/kg/day FIN. This increase is probably associated with biologic variation and is probably not biologically significant (compare values for Cowper's glands weights in experiments 1-4, Table 1). However, seemingly in support of the previous experiment, we observed an inverted-U shaped response for increases in prostate weight around the 2 µg/kg/day dose of FIN, but none of the effects were statistically significant ( Figure 1). In experiment 4, we did not observe the same pattern in the increase of the Cowper's glands weight seen in the experiment 3 (Table 1), and no change in prostate weight was observed below the 5,000 µg/kg/day positive control dose of FIN ( Figure 1). In that final experiment, we observed a statistically significant reduction in the weight of the seminal vesicles at 5 µg/kg/day FIN. Inhibition of the growth of the glans penis was not consistently observed in these experiments ( Figure 2).
The data sets recorded for the prostate gland during the course of these experiments are shown in Figure 1, both in terms of the weight of the prostate gland relative to the concurrent control value (taken as 100%) and in terms of absolute prostate weights.

Variability in control tissue and body weights.
We calculated a mean of each control tissue weight and of control body weights from the four experiments, yielding a new mean weight value for all controls (corn oil + TP) combined. These means, multiplied by the SD of the mean and divided by 100, yield a percentage of the variation in control weights between experiments. The control prostate weights showed the highest variability between experiments, as follows: prostate, 11%; Cowper's glands, 6%; seminal vesicles, 5%; glans penis, 4%; LA/BC, 4%; liver, 1.8%; kidneys, 1.5%; body weight, 1.1%. The relatively high variability of all of the sex accessory tissues, compared with the liver and kidney weights and body weight, probably relates to the fact that the stimulatory dose of TP used in these Hershberger assays causes submaximal growth of the tissues, being just below the plateauinducing dose of TP (Ashby and Lefevre 2000). Thus, small pharmacodynamic differences between experiments may yield small differences in the stimulatory dose of TP experienced by the tissues between experiments.
Uterotrophic assays. The data for the uterotrophic studies of 4MBC are presented in Table 2. We have previously shown 4MBC to be reproducibly active in the immature rat uterotrophic assay in this laboratory (Tinwell et al. 2002b) (experiment 1, Table 2). However, in the initial repeat study (experiment 2, Table 2), 4MBC failed to induce a significant increase in uterine weight. In this experiment the control uterine weights were heavier than usual: 35.1 ± 5.9 compared with 23.3 ± 4.3 (experiment 1, Table 2; Tinwell et al. 2002b); thus, a repeat of this study was undertaken (experiment 3, Table 2). In experiment 3, control uterine weights were within our normal range, and 4MBC induced a weak, although significant (p < 0.01), increase in uterine weight. This activity was abolished by the GnRH inhibitor ANT, suggesting that the uterotrophic effects were mediated centrally via a temporal advance in puberty rather than via a direct effect of 4MBC on the uterus. Two further experiments were therefore performed in which we investigated the effect of the initial age of the rat on the activity of 4MBC (experiments 4 and 5, Table 2). Although 4MBC led to an increase in uterine weight in experiments 4 and 5, the significance of the response diminished with the older animals. Thus, the response to 4MBC was highly significant (p < 0.01) when 19-20day-old animals were used; this was reduced to p < 0.05 with 20-21-day-old animals, and the response was not significant when the older animals (21-22 and 22-23 days old) were used. These effects were reproducible between experiments 4 and 5 (Table 2).

Discussion
FIN inhibits the production of dihydrotestosterone from testosterone and consequently shows activity as a male rat developmental toxin with a lowest active dose of 10 µg/kg/day (Bowman et al. 2003). FIN also shows reproducible antiandrogenic activity in the Hershberger assay (Hershberger et al. 1953) at doses of ≥ 8 µg/kg ( Figure 1A; Ashby et al. 2004). The present studies initially demonstrated a statistically significant low-dose trophic effect on the prostate gland at 2 µg/kg (experiment 2, Figure 1A), suggesting an androgenic, as opposed to antiandrogenic, effect. In the first attempted repeat of this trophic effect (experiment 3), we observed a Article | Control variability and low-dose ED toxicity Environmental Health Perspectives • VOLUME 112 | NUMBER 8 | June 2004 849 Table 1. Tissue weights from castrated rats administered corn oil (oral), corn oil + TP (0.4 mg/kg sc), or corn oil + TP + FIN (oral), each for 10 consecutive days.

Experiment Treatment
Body weight (mean ± SD) Tissue weight (mean ± SD) no.
(µg/kg) Initial weight (g) Final weight (g) Liver (g) Kidneys (g) Adrenals (mg) LA/BC (mg) CG (mg) SV (mg) Prostate (mg) Glans penis (mg)  statistically nonsignificant increase in prostate weight at the 2-µg/kg dose, and in experiment 4 no effects were seen ( Figure 1A). Because reproducibility of an effect is the primary criterion of its validity, we conclude that FIN does not show low-dose endocrine effects. In seeking an explanation for the initially observed low-dose trophic effect of FIN, we noticed that control prostate weights were the most variable of all of the tissue weights recorded in the Hershberger assay: 11% variance for prostate compared with values as low as 1.1% for body weights (see "Variability in control tissue and body weights" in "Results"). This variability is evident in Figure 1B, which shows absolute prostate weights for the series of experiments with FIN. Thus, the control prostate weight in experiment 1 is essentially the same as the statistically increased prostate weight recorded for the FIN group in experiment 2. This variability in control prostate weights raises the strong possibility that the significant effect seen for the low dose of FIN was a chance observation occurring within the normal interexperimental control range. Similarly, although the variability in control glans penis weights (4%) was lower than that for the prostate (11%), it is probable that the nonreproducible decreases in glans penis weight also induced by low doses of FIN in experiment 2 were associated with normal biologic variability (Figure 2). The key points from these data for FIN are that the increase in prostate weight and the decreases in glans penis weights induced by FIN were all within the control range encountered for these tissues in the present experiments, and the effects induced were not reproducible.
The uterotrophic activity of 4MBC (Table 2, Figure 3) was dependent upon the weight of the control uteri in the relevant experiment. There is no way to select for low control uterine weights because there is not a predictive correlation between initial body weight and uterine weights observed 4 days later (Table 2). Thus, it is a matter of chance whether an experiment has control uteri in the low weight range thereby giving rise to a positive response induced by 4MBC or whether the control uteri will be in the high range in which case 4MBC will be negative. The data shown in Figure 3 confirm, as expected, that the uterus grows marginally during postnatal days (PND) 19-25, before the major growth phase starting at puberty (around PND 29-30). By the time the maximum uterine weight for this prepubertal growth is reached (~40 mg), the assay is no longer able to detect 4MBC as uterotrophic. The fact that the prepubertal growth of the uterus, and the uterotrophic activity of 4MBC, is abolished by GnRH inhibition indicates that both effects are centrally mediated, perhaps via modulation of centrally controlled release of low levels of estradiol from the ovaries or the adrenal glands. These data therefore indicate that there are two mechanisms by which an agent can induce an uterotrophic response. The first mechanism, as illustrated by the reference  estrogen DES, involves direct action of the estrogen on the uterus leading to maximal growth of the tissue (from 20 to > 100 mg; Table 2). This mechanism, as expected, is not influenced by coadministration of the GnRH inhibitor ANT. The second mechanism, as illustrated by 4MBC, involves the test agent influencing the centrally mediated stimulation of prepubertal growth of the uterus. This activity is blocked by GnRH inhibition, and the magnitude of the uterotrophic effect is limited to the maximum control uterine weight that can be encountered in prepubertal animals (~40 mg). The fact that the activity of 4MBC in the uterotrophic assays was dependent upon the concurrent control uterine weight led us to reevaluate our earlier data for BPA in the mouse uterotrophic assay (Tinwell et al. 2000) in which eight uterotrophic assays were described for the BPA dose range of 0.02 µg/kg to 300 mg/kg. Although we found some evidence of weak uterotrophic activity at some doses, none of the effects was consistently reproducible. The clearest example of this was for the 200-mg/kg dose of BPA, which gave four weak positive responses and four negative responses ( Figure 4A). When those data are reordered from chronological order into the order of descending control uterine weights ( Figure 4B), the results become consistent with the effects seen for 4MBC. Thus, the positive BPA results were weak and only became significant when the control uterine weights were low. Further, the magnitude of the effects induced by BPA remained constant: the decreasing control values created the statistically significant effects for BPA.
These data for FIN, 4MBC, and BPA indicate that concurrent control values and interexperimental variation in control levels can influence the interpretation of weak endocrine toxicities and thereby affect the qualitative outcome of bioassays. This conclusion is possibly relevant to several other instances of low-dose or weak endocrine effects that have proven resistant to independent confirmation. For example, Sharpe et al. (1995) reported that octylphenol (OP) and BBP were able to reduce rat testes weights. We were unable to confirm those effects for BBP (Ashby et al. 1997), and this led Sharpe et al. to repeat their evaluation of OP (Sharpe et al. 1998). In that repeat study, the control testes weights had fallen and OP was no longer active (Sharpe et al. 1998; Figure 5). Uncertain of the origins of these effects, these researchers ran a further control group and recorded control testes weights similar to their original 1995 study ( Figure 5; Sharpe et al. 1998). They were unable to account for this variability in control testes weights, and the activity of BBP and OP in the assay remains unconfirmed. A similar situation exists for the effects of NP on the rat prostate gland (Lee 1998). Lee described a study in which NP reduced prostate weight and in which access to five identical control groups was possible. In the first four of these, control prostate weights were approximately 60 mg/100 g body weight, but in the fifth study this value was significantly lower (~40 mg/100 g body weight; Figure 6). Lee (1998) did not comment on this unexpected and statistically significant change in control prostate weight, which itself led to the conclusion that NP was uniquely inactive in that fifth experiment ( Figure 6). We were subsequently unable to show any effects for NP (Odum and Ashby 2000) when tested under the conditions described by Lee (1998), and the control prostate weights were similar to the first four studies reported by Lee (1998).
A further example of how the variability of control prostate weights can influence the interpretation and confirmation of chemically induced effects is illustrated by the activity of BPA on the CF1 mouse prostate gland (Figure 7). Nonneman et al. (1992) described the effect of intrauterine position on the subsequent weight of the prostate gland of CF1 mice. Subsequently, Nagel et al. (1997), from the same laboratory, reported the ability of BPA to increase the weight of the mouse prostate gland, with the implication that the estrogenic activity of BPA had simulated the presence of female fetuses being next to male fetuses in utero. Cagen et al. (1999) and Ashby et al. (1999) were unable to confirm those effects of BPA, despite the fact that Cagen et al. (1999) recorded control prostate weights as low as those of Nagel et al. (1997).

Article | Control variability and low-dose ED toxicity
Environmental Health Perspectives • VOLUME 112 | NUMBER 8 | June 2004 Decreasing control blotted uterine weight (mg) Absolute blotted uterine weight (mg) Absolute testis weight (g) 1995 1998 Figure 5. Testis weight (mean ± SD) from 90-to 95-day-old control rats and those exposed to 1,000 µg/L OP either in utero or during lactation. The control testes weights were reduced in the second study compared to the original control data. An additional control group was assessed in the repeat (1998) study. These weights were similar to the original 1995 data. The authors were unable to account for these differences.

Figure 6.
Rat prostate weight (mean ± SD) from five groups of controls reported by Lee (1998) and repeat data from Odum and Ashby (2000). All of Lee's study groups (study number shown within bar) had similar mean body weights at termination on postnatal day (PND) 31. Lee (1998) observed positive effects for NP in study 4 (8 mg/kg NP on PND 6-24) but not in study 5 (8 mg/kg NP on PND 13-30); no account was given of the difference in control prostate weights in the fifth experiment compared with those reported in studies 1-4. In the repeat studies of Odum and Ashby (2000), rats were treated with 8 mg/kg NP on PND 6-24.
*Weight significantly reduced compared to concurrent control. Control NP Figure 7. Absolute weight (mean ± SD) of the prostate from control and BPA-treated CF1 mice. Abbreviations: 0M, male fetus positioned between two females; 2M, male fetus positioned between two males; IUP, intrauterine position. Group sizes are shown above each bar. Animals were killed at 6 months of age, except for those reported by Cagen et al. (1999), which were killed at 3 months of age.
*Statistically significantly increased. **Significantly different (p < 0.01) from the 0M mice reported by Nonneman et al. (1992). Within the present context, the most interesting aspect of the combined data shown in Figure 7 is the variability of the control prostate weights between laboratories and, most particularly, within the laboratory of Nonneman and Nagel. Again, the most interesting questions are what factors influence the unexplained variations in control prostate weights, within and between laboratories, and how these variations influence interpretation of the effects seen for BPA.
A further example of the importance of control variability is illustrated by the reports of weak effects induced by BPA on rat testes Talsness et al. 2000). We were unable to confirm the studies of either Sakaue et al. (Tinwell et al. 2002) or Talsness et al.  after extensive studies. The results reported by Sakaue et al. (2001) and related results from the same laboratory Sakaue et al. 1999) enable reference to four sets of control observations for rat daily sperm production (DSP) (Figure 8). The most marked reduction in DSP induced by BPA in the two experiments was reported by Sakaue et al. (2001) (Figure 8). However, neither of these effects was significantly different from the control DSP value reported in the same year by Ohsako et al. (2001; bar 2, Figure 8). This therefore provides a further instance of unaccounted control variability enveloping chemically induced effects from another study conducted in the same laboratory. In four independent repeat studies , we found that test and control DSP values were not significantly different from each other and were similar to the BPA values reported by Sakaue et al. (2001) (Figure 8). We also have established that diet was not the cause of the different assay outcomes Odum et al. 2001).
The remaining question posed by the lowdose studies on FIN is how to interpret effects that show a nonsignificant change, as illustrated by the nonsignificant increase in prostate weight seen for FIN (experiment 2, Figure 1). If the statistical methods used are appropriate, the absence of significance should indicate the absence of a chemically induced effect. An example of this problem is provided by Howdeshell et al. (2003), who discussed in detail a nonsignificant increase (p = 0.31) in uterine weight for CF1 mice housed in worn polycarbonate cages equipped with worn polycarbonate drinking bottles (Figure 9). Howdeshell et al. (2003) associated this effect with the BPA (~300 ng/mL) that was reported to leach from the cages during the course of the experiment, but it may equally have been related to the heavier body weights of the test animals at the start of the experiment (p = 0.17). The group sizes in that study were large (57/group), but this represented the pooling of three replicate experiments. It would have been useful to know the results of those three individual experiments to evaluate whether the nonsignificant increase in uterine weight was reproducible between replicates, especially given the problems we encountered (Tinwell et al. 2000) when evaluating the uterotrophic activity of BPA in mice.
Some endocrine-mediated toxicities are highly specific to the conditions of the test, as exemplified by the specificity of the effects induced by BPA in the vagina of F344 rats but not Sprague-Dawley rats (Long et al. 2000) and by the resistance of C57 mice, as opposed to CD1 mice, to the reproductive tract developmental effects of neonatal DES exposure (Couse et al. 2001). Such clear-cut specificities, usually defined in the same laboratory, are distinct from the several recent failures to confirm endocrine toxicities between laboratories when using ostensibly the same test protocol. We suggest that the latter instances can only be resolved, and avoided in future, if the following actions are taken: • For the end points being studied, it is useful to have a historical database with which to compare current assay performance and test results. Inclusion of this database into publications would aid resolution of the problems described herein. • Most laboratories experience variability in the end points they measure. Understanding the origin of these variations is a necessary precursor to concluding weak endocrine activity for test agents. • When low-dose or weak endocrine effects are indicated, it is important to confirm the observations before publication. • Unadjusted data from separate experiments should be shown, as opposed to pooled and adjusted data. Adjusted data are subsidiary to the original data. • A distinction should be drawn between effects that lie within recent historical control levels and those that exceed those levels. Different mechanisms may operate in these two situations, leading to different approaches to data extrapolation. • Use of appropriate statistical methods enables objective qualitative judgments to be made. Subjective discussion of statistically nonsignificant effects should be avoided. VOLUME 112 | NUMBER 8 | June 2004 • Environmental Health Perspectives Figure 8. Comparison of control DSP (mean ± SD) reported from the same laboratory  and Sakaue et al. (1999Sakaue et al. ( , 2001] and a different laboratory  with the greatest effect induced by BPA . A range of BPA doses was used in these experiments, and only the dose that induced the greatest effect in each experiment is shown: 20 µg/kg (Sakaue et al. 1999); 200 µg/kg ; 200 mg/kg . The effect of BPA is not significantly different from the control reported by Ohsako et al. (2001; bar 2: one-or two-sided Student's t-test). Sakaue et al. (1999) and Ohsako et al. (2001) used Holtzman rats, and Sakaue et al. (2001) and Ashby et al. (2003) used Sprague-Dawley rats. However, the identical control DSP values for Holtzman rats (Sakaue et al. 1999, bar 1) and Sprague-Dawley rats ; bar 3) indicate that rat strain is not a key variable on control DSP values and that, consequently, it is possible to compare data across strains and experiments for that laboratory.
*Reported by Sakaue et al. (2001) as statistically different from the concurrent control (bars 3 and 4). . Mean (± SD) uterine weights for mice maintained under different housing conditions (Howdeshell et al. 2003