Species sensitivities and prediction of teratogenic potential.

Many chemicals shown to be teratogenic in laboratory animals are not known to be teratogenic in humans. However, it remains to be determined if the unresponsiveness of humans is due to lessened sensitivity, to generally subteratogenic exposure levels, or to the lack of an appropriate means of identifying human teratogens. On the other hand, with the exception of the coumarin anticoagulant drugs, those agents well accepted as human teratogens have been shown to be teratogenic in one or more laboratory species. Yet, no single species has clearly distinguished itself as being more advantageous in the detection of human teratogens over any other. Among the species used for testing, the rat and mouse most successfully model the human reaction, but the rabbit is less likely than other species to give a false positive finding. Among species less commonly used for testing, primates offered a higher level of predicability than others. Regarding concordance of target malformations, the mouse and rat produced the greatest number of concordant defects, but they also were responsible for the most noncorcordant responses as well. Since no other species is clearly more predictive of the human response, it is concluded that safety decisions should be based on all reproductive and developmental toxicity data in light of the agent's known pharmacokinetic, metabolic and toxicologic parameters.


Introduction
The extrapolation of animal data to the human is the foundation of safety evaluation of chemicals and drugs prior to human exposure. As will be shown, there is at present no perfectly suitable animal model from which to make these extrapolations in most cases, nor is it abundantly clear just what endpoints residing in the animal database are most important in this determination. In short, it is conceded that the predictive value of animal teratogenicity tests in extrapolating results into terms of human safety is imperfect. Consider the fact that of over 2800 chemicals now reported to have been assayed in animals for teratogenic potential, only about 14 chemicals or groups of chemicals have been shown to have this propensity in humans (1). Put another way, there are about 1000 or so chemicals that demonstrate some measure of teratogenicity in animals but for which we have no evidence at all that they share this property in humans. However, this discrepancy may only reflect that we cannot clearly recognize ter- atogenicity in humans. It should be realized from the onset that testing strategies of drugs for teratogenic potential on the one hand need to be clearly distinguished from testing of industrial or environmental chemicals on the other. Conditions of exposure, numbers of potential victims of exposure, benefit considerations, economics and other factors are radically different in the two cases and affect both the philosophy of testing and the acceptable risk (2).
Fortunate as we are that newly identifiable human teratogens have not wreaked havoc in the same manner as in the 10,000 thalidomide victims in 1961 to 1962, or in the 20,000 rubella-infected children in 1964, correlations of animal and human responses may not be providing us with the necessary answers to avoid potentially similar situations in the future. It is the apparent lack of association that precludes accurate prediction of teratogenic potential in extrapolating from animals to humans. We are really left at present with the dilemma of not being capable of selecting which animal species are the most predictive of the likely human response.
Two decades ago, with the thalidomide catastrophe, the rabbit emerged as the sensitive species with respect to this teratogen, highlighted in part because of the negative and/or inconclusive results produced in a large number of other laboratory species tested at the time.
Enthusiasm for wider use of primates ensued soon thereafter, chiefly on taxonomic grounds, but also due to confirmatory results produced with thalidomide and our inexperience in what to look for in an animal model to predict teratogenicity. With the subsequent demonstration that even subhuman primates did not respond positively to several other suspected human teratogens, the fact remains that the human female is the only truly reliable model. However, a number of reasons obviate against direct testing. Since the mid and late 1960s, then, virtually every species maintained in the laboratory (and some that were normally not!) has been touted as a potential teratogenic model.
A number of factors relate to the inability to predict accurately and the impreciseness in extrapolating from one species to another, and include genetic heterogeneity (affecting absorption, metabolism and excretion of a given chemical), and variability in diet, size, developmental patterns, intercurrent disease processes, placental transfer, etc. It seems likely that variations in metabolic pathways are a major cause of species differences (3).
The traditional endpoint in assessing teratogenic potential is outright structural malformation. A given chemical may, however, kill fetuses rather than malform them at a given dose in one species, whereas it may result in deformation among survivors at the same dosage in another species. Further, the use of other endpoints representing developmental toxicity, e.g., mortality or effects on growth or function, may be just as appropriate as inducing terata in several regards (see below). Perhaps better correlation between laboratory and the human species might be attained through collection and analysis of all feasible developmental toxicity endpoints. The purpose of this presentation therefore, is to re-evaluate the species selection process in light of the plethora of information gathered on the potential developmental toxicity and sensitivities of species to therapeutic agents and environmental chemicals.

Historical Perspective
A laboratory animal to be used in evaluating human teratogenic risk ideally would be chosen because it metabolizes and distributes a given chemical and transfers it across the placenta in similar ways to man (4). Unfortunately, response to a given chemical by a particular species of animal is almost as variable as the number of chemicals tested.
Despite a wide range of species having been used in teratology studies (2,(5)(6)(7)(8), no one species has been clearly demonstrated to everyone's satisfaction to be the one of choice to the exclusion of all others. This is particularly true with regard to the few proven human teratogens, as will be amply demonstrated below. It need only be recalled that the marked teratogenicity of thalidomide in man has been observed in relatively few species. Although some nine subhuman primate species have demonstrated the characteristic limb defects observed in humans when administered that drug, only 8 of 15 other putative human teratogens have been teratogenic in one or more of the various primate species (8).
Rodents are frequently used in teratogenicity studies. While the laboratory rat has been the most frequently used rodent species, the susceptibility of this species to putative teratogens has been variable, and certain teratogens such as cortisone (4,9), thalidomide (5,6), trimethadione (10), and lithium carbonate (11) have elicited a poor teratogenic response. Mice have also been used frequently in teratology studies despite the marked variability of responses observed with different strains (12)(13)(14). Differences in intraspecies sensitivity has also been reported for phenytoin (15) and ethanol (16,17). Furthermore, stress-induced enhancement of already comparatively high plasma corticosterone levels is believed to be the underlying mechanism facilitating stressrelated teratogenicity observed in the mouse (18,19). Therefore, the mouse would seem unsuitable for the testing of such agents as sedatives, tranquilizers, hypnotics or agents requiring unusual, manipulative procedures. The rat and rabbit are less prone to stressinduced teratogenicity (20,21). A lack of steroid-induced cleft palates in man led Tuchmann-Duplessis (6) to conclude that the unique susceptibilities encountered with various mouse strains may lead to many false positives. However, it can be said that inbred mouse strains are probably more valid indicators than outbred strains in assessing teratogenic potential (13).
Recently it has been suggested that hamsters may serve as an appropriate rodent species for teratogenicity testing (22). A lack of data on spontaneous malformations and intraspecies differences (23), while offering no distinct advantage over more commonly used rodent species (6), would tend to limit the usefulness of this species. To date, only Canada includes hamsters as a preferred animal of choice in teratogenicity testing for regulatory decisions.
The guinea pig has also been suggested as an appropriate rodent species (24,25). Described as having a reproductive endocrinology more closely related to man than other rodents, limitations of a 68-day gestation period, dependence upon an everted yolksac, and lack of data concerning intraspecies differences may preclude the common use of this species for teratogenicity testing.
Rabbits have been used routinely as the nonrodent species required by most regulatory agencies. As in the case with rats and mice, the selection of this species has been based largely on availability, economy, and long history as a laboratory animal. While the rabbit's responsiveness to thalidomide has supported its appropriateness as a test species (26), it has not been responsive to such putative human teratogens as alcohol (27) and lithium carbonate (11). In addition, this species has limited use in the testing of antibiotics as induced imbalances in the digestive microflora has led to ma-ternal malnutrition, increased embryolethality and fetal hypoplastic skeletal development (28).
It has been suggested that the ferret may serve as an alternative to rabbits as the nonrodent species of choice (29)(30)(31)(32). Special problems of diet, seasonal breeding, length of gestation (42 days), and insufficient historical data may preclude the widespread use of this species. However, unlike rodents and rabbits, but similar to man, the ferret embryo is not maintained by an everted yolksac placenta. Therefore, this species may have merit in the teratogenic screening of compounds whose mechanism of action involves the placenta (29). The species has also been singled out as especially valuable in assessing behavior in the context of reproductive toxicology (33).
Swine (34,35), dogs (34,36,37) and cats (5,38) have been suggested as possible nonrodent species of choice. All three species have been only variably responsive to the teratogenicity of thalidomide. Further, limitations of space, seasonal estrus, prolonged gestation, differences in xenobiotic metabolism, susceptibility to underlying disease, and inadequate control data have reserved the choice of these species for those occasions when more conventional laboratory animals leave unanswered questions.
Various species of primates have been suggested as animals of possible selection for teratogenicity testing (4,(39)(40)(41)(42)(43)(44). It must be emphasized that the close parallel between man and monkey with regard to the teratogenicity ofthalidomide, norethindrone, testosterone, diethylstilbestrol, and methylmercury (44) has not been demonstrated for other chemicals which are teratogenic in humans. In particular, methotrexate has been shown to be teratogenic in man and rat, but macaque and rhesus monkeys were refractory at doses equivalent to or considerably higher than those above the usual human dose (46). Similarly, attempts to duplicate in monkeys the aminopterin teratogenicity observed in man showed monkeys more susceptible to embryolethality than malformation (47). On this basis, as well as the limitations due to scarcity of these animals, it has been suggested they not be used for widespread teratogenicity screening, but rather reserved for cases of questionable results from more commonly used laboratory animals or in toxicological evaluation of a very few selected agents (6,47,48).
Despite a lack of agreement as to the most appropriate animal model, a general consensus exists as to the necessary criteria for the selection of such a model. It is generally accepted by scientists and officials responsible for safety evaluation that, for realistic testing, the maternal-placental-embryonic relationship as characterized in mammals is essential. In addition, metabolic rates as well as the pathways of xenobiotic metabolism should be comparable to those of man. Parent compounds and their respective intermediates should undergo distribution, including transplacental crossing, in a manner similar to that in human beings. Also, the patterns of embryonic and fetal structural and metabolic development should parallel those in man. Finally, the ideal animal model should be able to be easily bred, have a short gestation, produce large litters, and be economically housed and easily handled.
While it would be accepted at face value by all that extrapolation to humans should be made from the most sensitive animal species tested, the confidence in such an extrapolation is increased as the number of species tested increases, as will be discussed later. When different results are obtained from different species, it is important to determine which of the test species more closely resembles humans for the relevant underlying mechanisms (49).
Since no single species thus far evaluated fulfills all of the above criteria, the best compromise would appear to be the selection of two or more species. This approach has been recommended by others recently (50). As mentioned, past choices by scientists and regulatory officials alike, have often been arbitrary, based solely on past experiences and the availability of the test species.

Species Selection by Regulatory Agencies
Without exception, principal regulatory agencies throughout the world require the use of two species to assess the teratogenic potential of drugs and chemicals (Table 1). While tacit approval is given by several agencies for the use of less common species, the requirement in general mandates the use of one nonrodent and presumably, one rodent species. For all intents and purposes, this is meant to imply the rat and rabbit. Although the U.S. Food and Drug Administration (FDA) regulations state that the mouse and rabbit have occasionally been preferable to the rat, rarely in practice has the mouse substituted for the rat. As with other regulatory agencies, the selection of a species other than the mouse, rat or rabbit has most often been reserved for those cases of equivocal results from more conventionally used species. The World Health Organization lists perhaps the broadest range of acceptable alternatives, including: hamsters, guinea pigs, ferrets, cats, pigs, dogs and nonhuman primates, while the U.S. Environmental Protection Agency (EPA) allows for the selection of a nonconventional laboratory species if an appropriate rationale is given. Presumably such a rationale would be based on similarities in xenobiotic metabolism, pharmacokinetics, etc. However, in acknowledging the usual lack of such data, regulatory agencies often recommend a species that is easy to use and characterized for its response to known teratogens. The same stipulations generally exist for other national regulatory agencies, including those of Australia, West Germany, Spain, Argentina, Austria, Denmark, Finland, France, Ireland, Italy, and Sweden. Paradoxically, one country, Norway, requires the use of one species in which the effect of thalidomide has been documented (presumably rabbit or primate) in addition to either rat or mouse. Commonly used are the rat, mouse, hamster, and rabbit. Preferable are the rat and rabbit.
No recommendation as to acceptable species. However, should use two species, one a nonrodent, if possible.
Use a common strain not of low fecundity and characterized for its response to teratogens.
Frequently used are mouse, rat, and rabbit. However, the use of hamsters, guinea pigs, ferrets, cats, pigs, dogs, and nonhuman primates is encouraged.
The reason for the apparent restriction to rat and rabbit is probably twofold. First, these species have the greatest historical control database available. In this light, Japanese requirements further suggest that the species of choice should have a low background of deformities. It is well known that intraspecies differences in background malformation rates are not only more consistent but significantly less in rats and rabbits than  in mice. Second, the scientific consideration is that the species utilized in the assessment of teratogenicity should be that species which most closely resembles the human with respect to metabolism, pharmacokinetics, excretion, etc., and the rat undoubtedly would most likely be the species under study in this regard. Increasing emphasis on the role ofpharmacokinetics is perhaps best exemplified by the suggestion of United Kingdom and Canadian regulatory agencies that it may be useful to compare the pharmacokinetics of a drug in nonpregnant and pregnant animals to determine the maternal-fetal distribution of the drug. A final consideration is that a "sensitive" species be used. Regulatory agencies of the U.S., Japan, and Europe have suggested that the spe-cies and strain used should be "characterized for its response to teratogens," and as we shall note later, rats and rabbits are susceptible to most known teratogenic insults.

Identification of Human Teratogens
Recognition or identification of teratogens in the human is extremely tenuous for several reasons: therapeutic dosages or exposure levels are generally several orders of magnitude lower than doses purposefully given to animals to elicit malformations; pregnant women are (hopefully!) not usually given long courses of drug therapy nor a large number of drugs in this age of thera- peutic nihilism; the means of proving causation or even association in humans requires extensive analysis and a large number of controlled cases, etc. In addition, many chemicals which have demonstrated teratogenic properties in animals have no normal means of exposure to the human populace, and the human response has simply never been put to test. While with one exception all putative teratogens in the human have been teratogenic in one or more laboratory animal species as well, unfortunately it has not been the case that discovery of teratogenic potential of a given chemical was made in animals prior to humans ( Table 2). Of the 22 individual or groups of human teratogenic chemicals listed, eight were first identified as teratogens in the human. Included in the group are of course the notorious thalidomide, aminopterin, and methylmercury, all well known teratogens, perhaps publicized more widely because of their initial identification in the human species.
The method of detection of teratogens in the human remains primarily astute clinicians' reports of individual cases or, with several classic teratogens, e.g., thalidomide and diethylstilbestrol, through almost intuitive observation of clusters of similar cases. The detection of the latter chemicals was aided by the fact that the abnormalities observed in both examples were of such rarity as to direct attention towards their occurrence. Prior to the establishment of thalidomide as a causal agent, for instance, phocomelia or amelia, the defect it induced, occurred only very rarely (82); likewise, vaginal neoplasms had only been recorded in a few instances in the entire world medical literature prior to the causal association with diethylstilbestrol (83). Even so, sufficient numbers of cases are required to ensure recognition; solitary case reports do not demonstrate association, let alone establish causation. Detection of some of the remainder of human teratogens on the list has been much less fortuitous. Consider for instance, that 15 years elapsed between the first recorded association of alcohol consumption and teratogenesis and acceptance in the scientific community that alcohol was the causal agent of a specific syndrome. Contrast this to the suspicion made regarding thalidomide in 1961 and its removal from the market almost worldwide within the course of less than one year.
The reliability of a particular animal study to predict the fate of a chemical when applied to humans is never fully known until sufficient epidemiological studies have shown how the human responds to the substance. Thus, the best evidence of an adverse human health effect may well be a properly conducted epidemiological study (84)(85)(86)(87). However, this has not been the case: in only three instances-methylmercury, valproic acid, and the hydantoin anticonvulsants-have epidemiological studies as opposed to any other means of identification been the primary determinant of teratogenic potential.
Methods used to identify teratogenic hazards in human populations must, of course, depend on their ability to identify an exposed group and documentation of exposure (88). This is generally easiest for drugs, whose use is usually recorded; it is relatively easy for chemicals used socially for which individuals determine their own pattern of usage and document it fairly reliably; it is fairly difficult for occupational exposures, for which exposed individuals can usually be identified but for which the magnitude of exposure is usually difficult to estimate and exposure to single chemicals is uncommon, and is very difficult for environmental chemicals, for which both the distribution and magnitude of exposures are difficult to document.
Let us examine the process by which animal models can be used to aid in predicting the human response to teratogens: how the model reacts to known human teratogens and how it reacts or overreacts to agents not considered teratogens. In the former case, are the target sites the same?  Table 3. The responses of the teratogens elicited in multiple laboratory species are also included: it is apparent that, with one exception (coumarin anticoagulants), every chemical or chemical group known to be teratogenic in humans is also teratogenic in one or more laboratory species. Thus positive animal teratology studies are at least suggestive of potential human response.
With respect to responses of individual species, it appears from this limited list that the ferret, guinea pig, mouse, and rat (in descending order) were the most predictive of the human response. However, in the first two species listed (ferret, guinea pig), testing was limited and may not be representative should more putative human teratogens be put to test in those species. That leaves the mouse and rat most predictive, successfully modeling the human reaction about 70% of the time. In this limited series, the rabbit, hamster, primate, canine, and swine all identified equally well (about 40-50% of the time) the human response. The cat was unacceptable in this regard, only identifying a single human teratogen of the four tested.
In the only other analysis of this type, the mouse was identified as the species most likely to yield a positive result among 38 chemicals from which there were reports of associated birth defects in humans (2).

Predictability of Concordant Malformations
It is considered by many that to be successful, animal models should mimic in the laboratory a similar or precise response to that of the human. Put another way, the limb malformations in humans induced by thalidomide were replicated in certain breeds of rabbits and in all but one primate species tested; these species are therefore considered better models than are those species in which malformations were induced by the drug, but which were not concordant, or the same type or pattern of reference malformation.
Concordant malformations to those induced by putative teratogens have been produced by one or more species of laboratory animals with but several excep-tions (Table 4). Only several specific anticancer alkylating agents, several anticonvulsants, and lithium did not produce a similar pattern of defects in animals to those in humans. These data add credibility to use of animal models in teratology, since more than half of the known human carcinogens that have been adequately tested in animals produced tumors in one or more animal species at organ sites different from those produced in exposed humans (89).
Individual species responses, however, were less perfect, with no species having especially good predictive ability in type of teratogenic response. The mouse and rat produced the greatest number of concordant defects, but they also were responsible for the most nonconcordant responses as well.

Chemicals Not Teratogenic in Humans
Multispecies comparisons of animals exposed to the same chemical might be useful in assessing risk to humans, the rationale being that there might be clues to interspecies variability and sensitivity differences upon such comparison. Most comparisons that have been made represent direct animal-to-human extrapolation; animal-to-animal comparisons have been neglected in this respect. Many studies in the biological literature give results of testing numerous species for teratogenic potential ofa given compound, therefore animal-to-animal comparisons can be made. One study that has been made addressing these relationships was reported recently by one of us (8).
A summary of the multispecies comparisons indicate that rabbits and monkeys offer greater predictability of possible human responses than do any of the other species (Table 5). Both species were responsive to all chemicals cited in only one-quarter of the cases. However, it should be stated that the primate in particular has been used for many of the cases to confirm the teratogenicity of chemicals, especially those already suspected of being teratogenic. Other commonly used species, including mouse and rat, reacted negatively about 50 to 60% of the time. Some other species, such as hamster, reacted positively to almost two-thirds of the chemicals examined, indicating little similarity to humans in actual teratogenic response.
In contrast to these results, rat and hamster responses were closest to those of humans in the nonteratogenic situations with respect to dose-response ("best response"). Mouse and pig also had close responses more than half the time, while rabbit and primate, the species most representative of the human response from the perspective of overall reaction, gave best responses only about 50% of the time, as did guinea pig.
Sensitivity comparisons indicated that rabbits and primates again were the most predictive in nonteratogenic situations, but this is to be expected since these species gave nonteratogenic responses more frequently than did the others. Cats were exquisitely sensitive, but the responses to the few chemicals studies may not be respresentative of their full repertoire. When teratogenic responses were subjected to comparison, rabbit again was the leader, with mouse, rat, and dog not far behind.
Overall, analysis of data from this sort of animal-toanimal extrapolation, though admittedly crude, further indicated the wide variability of animal species in response to biological testing. The data do point out, in particular, however, that the use of rabbits and primates provides the greatest validity currently for predicting potential human teratogenicity.
Another FDA study addressing concordance of animal and human teratogenicity data came to a similar conclusion with respect to most predictive species (2). In the compilation reported, the monkey and the rabbit gave the best negative response with respect to human nonteratogens, providing correct (nonpositive) responses 80% and 70% of the time, respectively among some 165 chemicals studied. Concordance for the rat (50%), mouse (35%), and hamster (35%) was not as good.
There are a number of chemicals that are of importance because of occupational exposure which are known animal teratogens but are not known to be teratogenic in humans (Table 6). These chemicals are teratogenic in one or more species of laboratory animals; epidemiologic data are insufficient to determine their potential to adversely affect human development. Because of widespread human exposure to these chemicals in the absence of sufficient epidemiologic data coupled with their known teratogenicity in animals, exposure to these chemicals should be managed conservatively, and epidemiology studies should be considered a high priority.
Another group of chemicals (Table 7) of occupational and environmental importance have anecdotal reports in the literature of adverse reproductive or developmental outcomes. None of these chemicals are known to be teratogenic in humans. With but several exceptions, these chemicals are also nonteratogenic or only equivocally teratogenic in laboratory animals and would not be suspected as human teratogens based on the animal data.

Predictability Based on Use of Nontraditional Endpoints
It is highly likely that analysis of endpoints other than structural malformation might prove to be valuable adjuncts for use in estimation of risk from animal models to the human. Other data emerging from developmental toxicity assessment, such as mortality, growth retardation, fertility rate, and/or functional impairment are the most prominent parameters in this regard. To date, these have not been utilized to the fullest possible extent. Since obvious malformations are unreliable indicators of teratogenic activity in isolated initial screening tests, other parameters could be evaluated that often associate with teratogenicity, that occur more frequently and consistently and consequently are more readily analyzable (91). Palmer (92) pointed out some time ago that the low pregnancy rate, reduced litter size, and poor viability observed in the early animal tests with thalidomide in rodents should have attested to the potential hazard of this chemical, even in the absence of teratogenicity. Combining several endpoints into risk assessment schemes has been recommended for other toxicities, especially mutagenesis (93). Background data for several of these endpoints are given in Table 8.

Discussion and Conclusion
Recognized human teratogens, except for coumarins, are also teratogenic in laboratory animals. However, many chemicals are teratogenic in laboratory animals that are not known to be teratogenic in humans. Whether this reflects lack of sensitivity of humans or lack of appropriate data in humans is unknown. Also, some chemicals might in fact be teratogenic in humans at some level of exposure but are managed such that there is no exposure of people to toxic levels.
There are sufficient epidemiologic data for many drugs to determine the human response and thus assess the predictability of animal data for human sensitivity. In contrast, there are generally insufficient epidemiologic data on most environmental and occupational chemicals to judge the predictability of animal data. Since animal data predict the effects of most chemicals where we have adequate human data, it is prudent to assume that animal data are also predictive of the human response to chemicals for which we have inadequate human data.
Among the species used for teratologic testing, the rat and mouse are the most successful in modeling the human reaction, but the rabbit is less likely than other species to give a false positive finding. No single species is clearly more predictive of the human response than others. Also, the organ systems or tissues affected in laboratory animals are not necessarily predictive of the type of response in humans. Lack of an effect in a particular organ in animals does not predict the absense of an effect in that organ in humans.
Assessment of the safety of drugs and other chemicals regarding teratogenic potential must take into account the following points. The greater the number of species with positive results, the greater the likelihood of an adverse effect in humans. All reproductive and developmental data should be used to predict safety, not just data on malformations. A regulatory approach based on a number of similar considerations has been suggested for carcinogens (95) and seems equally applicable to developmental toxins. The relevancy of the route of exposure and existence of a dose-response relationship are important for all species. Data from any species must be used in the context of the total data base for the agent, including pharmacologic, disposition, and toxicologic data.