Use of QSARs in international decision-making frameworks to predict health effects of chemical substances.

This article is a review of the use of quantitative (and qualitative) structure-activity relationships (QSARs and SARs) by regulatory agencies and authorities to predict acute toxicity, mutagenicity, carcinogenicity, and other health effects. A number of SAR and QSAR applications, by regulatory agencies and authorities, are reviewed. These include the use of simple QSAR analyses, as well as the use of multivariate QSARs, and a number of different expert system approaches.


Scope, Aims, and Goals
Human exposure to exogenous chemicals, whether through drinking water, foodstuffs, personal products, medicines, or occupational or environmental exposure, is controlled and regulated at local, national, and international levels. Control and regulation of chemical substances are effected through a number of regulatory agencies and authorities and are mandated by legislation. To regulate the use of chemicals successfully, authorities require suitable information concerning likely human health effects. Traditionally, such information has arisen from the use of in vivo animal testing. Increasingly, however, there has been an awareness that test data may be inadequate, inappropriate, or incomplete for many chemical substances. An attractive alternative to the use of animal testing has been the development of methodology that enables predictions of effects to be made directly from chemical structure. Predictions of effects from chemical structure encompass a broad range of techniques and methodologies, generally referred to as quantitative structure-activity relationships (QSARs).
The assumption that biological activity is implicit from chemical structure has been around for well over 100 years. QSARs offer a process to formalize this knowledge and an attempt to form some direct relationships between chemical structures and biological effects. QSARs enabling prediction of human health effects have taken many forms. The approaches used have included numerical models, true QSARs, and more formalized expert system approaches.
Our goal here is to review the international regulatory use of QSARs to predict the health effects of chemical substances [the international regulatory use of QSARs to predict ecologic effects and environmental fate forms the basis of a second review (Cronin et al. 2003)].
The use of QSARs by a number of regulatory agencies to prioritize chemicals for testing and to fill data gaps in risk assessment data sets are described. Although QSARs are applied by agencies worldwide, this review focuses upon their use in North America and in Europe. It should be emphasized that our purpose is not to review the use of QSARs per se, but their regulatory application; further details on this complex and evolving topic may be obtained from a recent review by Walker et al. (2002) and a web-based database developed by the Organisation for Economic Co-operation and Development (OECD 2002b). QSARs alone have been subject to a number of excellent recent reviews (Cronin 2000;Dearden et al. 1997;Hulzebos et al. 1999;Walker. In press).

Regulations and the Use of SARs/QSARs
The European Union. In the European Union (EU), risk assessment of chemical substances is driven by the requirements of Commission Directive 93/67/EEC on Risk Assessment for New Notified Substances and Commission Regulation [European Commission (EC)] No. 1488/94 on Risk Assessment for Existing Substances (EEC 1993a(EEC , 1994. To ensure consistency of application of the Environmental Risk Assessment (ERA) process, in 1996 the EU produced a comprehensive technical guidance document (TGD) to support the Directive on New Substances and the Regulation on Existing Substances (EC 1996). This document includes a substantial chapter providing guidance on the use of QSARs in the ERA process in terms of where they should be used, how they should be used, and which ones should be used. Although considerable information is provided in the TGD regarding the prediction of ecologic effects and environmental fate, no formal recommendations are given on the use of QSARs for the prediction of human health effects. The TGD is currently being extensively revised, but the chapter on the use of QSARs is not included in this revision. The TGD can be downloaded from the European Chemicals Bureau (2002).
According to the current EU system of chemicals legislation, new and existing substances are not subject to the same testing requirements, which means that there is a lack of knowledge about the potential danger represented by many existing substances. Existing substances make up about 99% of the total volume of chemicals on the EU market. To address this problem, and other shortcomings of the current EU system, the EC has proposed a new policy on chemicals in which new and existing substances will be subject to the same information requirements. In addition, the new proposals place the burden of performing hazard and risk assessments on industry rather than on the regulatory authorities (EC 2002). The proposed system is called REACH (Registration, Evaluation and Authorisation of Chemicals). It is expected that a legislative proposal to implement the new policy will be drafted in 2003. When the REACH system is introduced, it is possible that additional human health and ecotoxicologic information could be required for up to 30,100 existing chemicals that are currently marketed in volumes Environmental Health Perspectives • VOLUME 111 | NUMBER 10 | August 2003 1391

Mark T.D. Cronin, 1 Joanna S. Jaworska, 2 John D. Walker, 3 Michael H.I. Comber, 4 Christopher D. Watts, 5 and Andrew P. Worth 6
greater than 1 metric ton per annum (t.p.a.). Therefore, QSAR and other computer-based methods for predicting toxicity are expected to play an increasingly important role not only for the priority setting of chemicals that need further assessment but also for hazard assessment purposes.
Danish Environmental Protection Agency. The Danish Environmental Protection Agency (Danish EPA 2001) has prepared an advisory list for self-classification of dangerous substances using QSAR models. Of approximately 47,000 substances examined; 20,624 substances were identified as requiring classification for one or more of the following dangerous properties: acute oral toxicity, sensitization by skin contact, mutagenicity, carcinogenicity, and danger to the aquatic environment. The Danish EPA stated that "the [QSAR] models used here are now so reliable that they are able to predict whether a given substance has one or more of the properties selected with an accuracy of approximately 70-85%." The Danish EPA has made extensive use of QSARs and has developed a QSAR database that contains predicted data on more than 166,000 substances (OSPAR Commission 2000). A recent publication from the Danish EPA (Tyle et al. 2001) reports the use of QSARs for the high-and medium-production-volume chemicals used in the EU. The Danish EPA used a suite of commercially available and proprietary QSARs for environmental and human health endpoints. The predictions were made off-line and were stored in a CHEM-X database. The database was searchable by the Chemical Abstract Service (CAS) number or chemical name. Only discrete organic chemicals can be stored in the database. Expert systems such as MultiCASE (MultiCASE Inc., Beachwood, OH, USA) and Toxicity Prediction by Computer-Assisted Technology (TOPKAT; Accelrys Inc., Cambridge, UK) were used for the predictions (details noted by endpoint below).

German Federal Institute for Health Protection of Consumers and Veterinary
Medicine. In Germany, new chemicals are notified to the Federal Institute for Health Protection of Consumers and Veterinary Medicine (BgVV). To provide a tool for the evaluation of physicochemical properties and probable toxic effects of notified substances, the BgVV has developed a computerized database from data sets containing physicochemical and toxicologic properties. The database has been used to develop specific structure-activity relationship (SAR) models for predicting skin and eye irritation/corrosion, which have been incorporated into a decision support system (DSS) (Gerner et al. 2000a(Gerner et al. , 2000bZinke et al. 2000). Recently these and other data have been used to verify skin irritation and corrosion predictions (Hulzebos et al. 2003).

EU TGD for existing substance regulation and notification of new substances.
Existing substance regulation. In 1993, the EU adopted Council Regulation (EEC) 793/93, the Existing Substance Regulation (EEC 1993b), thereby introducing a comprehensive framework for the evaluation and control of "existing" chemical substances. The regulation was intended to complement the already existing rules governed by Council Directive 67/548/EEC (EEC 1967) for "new" chemical substances. An "existing" chemical substance in the EU is defined as any chemical substance listed in the European Inventory of Existing Commercial Substances (http://www.ecb.jrc.it/ existing-chemicals), which contains about 100,195 substances manufactured/imported between 1 January 1971 and 18 September 1981. Regulation 793/93 foresees that the evaluation and control of the risks posed by existing chemicals will be carried out in four steps: data collection, priority setting; risk assessment, and risk reduction: Step 1: Data collection. The regulation was initially concerned, in phases I and II of the data collection step, with the so-called high-production-volume (HPV) chemicals: substances that have been imported or produced in quantities exceeding 1,000 metric tons per year and produced or imported between 23 March 1990 and 23 March 1994. In phase III of the data collection step, companies that produce or import existing substances in quantities between 10 and 1,000 metric tons per year (low-production-volume substances) were required to submit a reduced data set by 4 June 1998. All the data had to be submitted in a specific electronic format, the Harmonised Electronic DataSET (Heidorn et al. 2003), and is incorporated in the International Uniform ChemicaL Database (IUCLID) (Heidorn et al. 2003).
Step 2: Priority setting. In consultation with the member states, the commission regularly draws up lists of priority substances that require immediate attention because of their potential effects to man or the environment. The commission and member states use the information collection during step 1 as a basis for selecting priority substances. Since 1994, four such priority lists have been published.
Step 3: Risk assessment. Substances on priority lists must undergo an in-depth risk assessment covering the risks posed by the priority chemical to people (covering workers, consumers, and people exposed via the environment) and to the environment (covering the terrestrial, aquatic, and atmospheric ecosystems and accumulation through the food chain). This risk assessment follows the framework set out in Commission Regulation (EC) 1488/94 (EEC 1994) and implemented in the detailed TGD on Risk Assessment for New and Existing Substances. The EU member states act as rapporteurs in the drafting of the risk assessment reports, and the EC mediate meetings, which attempt to reach consensus on the conclusions of the risk assessments.
Step 4: Risk reduction. One possible outcome of the risk assessment performed in step 3 is that the chemical is considered to be a "substance of concern" and that "further risk reduction measures, beyond those already in place, are required." In such cases a risk reduction strategy is developed and implemented by means of appropriate legal instruments such as Directive 76/769/EEC (EEC 1976a) on the restrictions in marketing and use of dangerous substances.
Notification of new substances. New chemicals, which have been notified before 18 September 1981, form a cumulative index called the European List of New Chemical Substances (http://www.ecb.jrc.it/existing-chemicals) (ELINCS), which is periodically updated in the Official Journal of the European Communities.
A harmonized European system for the notification of new substances was part of the 6th amendment to Directive 67/548/EEC (Directive 79/831/EEC) (EEC 1976b), which was concerned with the classification, packaging, and labeling of dangerous substances. The 6th amendment was adopted in September 1979 and came into force in all member states on 18 September 1981 (EEC 1976b). A 7th amendment to Directive 67/548/EEC (Directive 92/32/EEC) (EEC 1992) was adopted in April 1992 and took effect from November 1993 and introduced a risk assessment for new notified substances. Approximately 5,000 notifications in total, representing about 3,000 substances, have been submitted since 1981.
In the notification process, a technical dossier on a new substance provides details of the notifier/manufacturer and the identity of the chemical [International Union of Pure and Applied Chemistry (IUPAC) name, CAS number, etc.] and should provide information on the substance such as its production process and proposed uses, as well as physicochemical, toxicologic, and ecotoxicologic data. Proposals for classification and labeling are also submitted, including recommended precautions relating to safety. The amount of data required increases according to the importation/production volume of the chemical.
Toxic Substances Control Act Interagency Testing Committee. The Interagency Testing Committee (ITC) is not a regulatory organization per se; however, there are 16 U.S. government organizations represented on ITC, many of which have regulatory responsibilities. The ITC was created under section 4(e) of the Toxic Substances Control Act (TSCA; http://www.epa.gov/opptintr/iur/) as an independent advisory committee to the U.S. Environmental Protection Agency (U.S. EPA) EPA Administrator (U.S. EPA 2002c). The ITC was created to identify chemicals in need of testing and to add them to the priority testing list in May and November reports to the U.S. EPA Administrator (Walker 1993a). The ITC has a statutory mandate under TSCA section 4(e) to consider SARs when recommending chemicals for testing . Several U.S. government organizations represented on the ITC have applied SARs, and those that have applied QSARs include the U.S. EPA, the U.S. Agency for Toxic Substances and Disease Registry (ATSDR), and the U.S. Food and Drug Administration (U.S. FDA) . The QSAR applications of U.S. government organizations have been previously described Walker et al. 2002). The health-effects-related QSAR applications of the U.S. EPA, ATSDR, and U.S. FDA are briefly summarized below.
U.S. EPA. Section 5 of TSCA provides for the regulation of new industrial chemicals by the U.S. EPA. The U.S. EPA has received about 38,000 premanufacture notifications (PMNs) for new chemicals and currently receives about 2,000 PMNs per annum. Because the TSCA does not require testing before submission of a PMN, few data are submitted and SARs and QSARs are used to predict health effects .
ATSDR. In 1998, the ATSDR established a computational toxicology laboratory and initiated efforts to use physiologically based pharmacokinetic models, benchmark dose models, and QSARs (El-Masri et al. 2002). The ATSDR uses two commercial computational toxicology models to make toxicity predictions based on QSARs. The ATSDR used one of these models to predict the toxicity of 15 chemicals from a hazardous waste site. The model predicted that 9 of the 15 chemicals have carcinogenicity potential, 6 have developmental toxicity potential, and 6 have mutagenicity potential .
U.S. FDA. The U.S. FDA Center for Drug Evaluation and Research (CDER) recently considered applications of QSARs to support regulatory decisions when toxicology data are unavailable or limited (Matthews and Contrera 1998;Matthews et al. 2000). CDER evaluated the ability of several QSAR-based commercial computational toxicology models to make carcinogenicity predictions for about 400 pharmaceuticals that had been tested in 2-year carcinogenicity studies (Matthews and Contrera In press). As a result of these evaluations, CDER is designing its computational toxicology models to provide reliable toxicologic estimates for FDA endpoints, coverage of U.S. FDA-regulated drugs, and opportunities to predict effects of drugs in humans . To initiate the regulatory applications of QSARs for drugs, CDER is developing an electronic toxicology database. The first database to be developed was the CDER rodent carcinogenicity database (Contrera et al. 1995a(Contrera et al. , 1995b. Acute, chronic, reproductive, and developmental toxicity and genotoxicity databases are being developed. Additional details are available (U.S. FDA 2002).
Canadian regulatory agencies. Health Canada is currently considering using QSARs and expert systems to provide health effects predictions for the Canadian Domestic Substances List (http://www.ccohs.ca/products/databases/ dsl.html), as Environment Canada has done for ecologic effects and environmental fate predictions (MacDonald et al. 2002).
Other organizations involved in the use of SARs/QSARs. Despite not being formal regulatory agencies, two bodies, the European Centre for Validation of Alternative Methods (ECVAM) in the EU and the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) in the United States, have responsibility for the validation of alternative methods to the use of animals in the safety evaluation of chemical substances. Alternative methods include in vitro tests as well as QSARs and other computer modeling techniques. ECVAM has evaluated the development and validation of expert systems, including those using QSARs for predicting toxicity (Dearden et al. 1997). Other organizations, such as the Fund for the Replacement of Animals in Medical Experimentation (FRAME; Nottingham, England), have also been involved in the assessment of alternative methods.
Another important organization that is involved in the assessment of alternative methods is the OECD. The OECD was responsible for collating the results from a tripartite (United States, EU, Japan) assessment of SARs to predict toxicity (Karcher et al. 1995;OECD 1994). This study considered the predictions made by the EC and U.S. EPA from respective minimum premarket data (MPD). Of the health effects considered, comparisons were made for the predictions of metabolism, skin and eye irritation, skin sensitization, systemic toxicity, mutagenicity, carcinogenicity, and several other endpoints. The results of the study were useful in judging many of the strengths and weaknesses of the U.S. approach, as well as in determining the utility of MPD-type data in improving U.S. assessment capabilities. The SAR/MPD exercise confirmed that although the SAR approach to screening the toxicity of new chemicals is extremely useful in identifying the ones that may be toxic, it is of limited value in predicting the exact level and type of toxicity. It was also noted that the set of chemicals reviewed was not wholly representative of chemicals reviewed for regulatory purposes. With that in mind the exercise may have been a worst-case analysis of the ability of the SARs to predict which chemicals may present an "unreasonable risk to human health (or the environment)" [for more details on this comparative study, refer to U.S. EPA (2002d)].

Expert Systems to Predict Toxicity
There are a number of software packages for the prediction of human health effects and related toxicities. These are described by the general term "expert systems" (Dearden et al. 1997). Such systems allow toxicity to be predicted directly from chemical structure and have been used by regulatory agencies and industry alike because of their ease of use and rapid application. Many may also be run in batch mode to allow screening of large numbers of compounds. Although expert systems for toxicity prediction provide a convenient means of predicting human health effects, little is currently known regarding their suitability or their relevance of or accuracy for toxicity prediction for many type of chemicals. Commonly used commercial expert systems that are capable of the prediction of a number of human health endpoints are introduced in this section. Specific modules, models, or rule bases are described in relation to the relevant endpoint. Further, their applications are described in greater detail in following sections addressing individual endpoints. These commercial packages have been evaluated and used, both formally and informally, by a number of agencies, including the Danish EPA, U.S. EPA, U.S. FDA, and the U.K. Health and Safety Executive (U.K. HSE). Details of the organizations using these programs are noted below and are also available from Walker et al. (2002) and OECD (2002b).

TOPKAT
TOPKAT is a statistically based system that consists of a suite of QSAR models. It is marketed by Accelrys Inc. [for more details, see Accelrys Inc. (2002)]. Models are normally derived after the analysis of large data sets of toxicologic information, usually retrieved from the literature. Molecules are characterized by any of a large number of structural, topologic, and electrotopologic indices. Models are developed using regression analysis for continuous endpoints, and discriminant analysis for categorizing toxicity data.

Computer-Automated Structure Evaluation (CASE) Methodology
CASE methodology and all its variants were developed by Klopman and colleagues (Klopman 1992;Klopman and Rosenkranz 1991). There are a multitude of models for a variety of endpoints and hardware platforms [for more details, see MultiCASE Inc. (2002)]. The CASE approach uses a probability assessment to determine whether structural fragments are associated with toxicity. To achieve this, molecules are split into structural Mini-Monograph | Regulatory use of QSARs for human health effects Environmental Health Perspectives • VOLUME 111 | NUMBER 10 | August 2003 fragments up to a certain path length. Probability assessments determine whether the fragments significantly promote or inhibit toxicity. To create models, structural fragments are incorporated into a regression analysis. There are many forms of the CASE models; the software is variously called CASE, MultiCASE (MCASE), CASETOX, and TOXALERT depending on the endpoint being modeled, the hardware platform, and the endpoint.

Deductive Estimation of Risk from Existing Knowledge (DEREK) for Windows
DEREK for Windows is a knowledge base expert system for the prediction of toxicologic hazard (LHASA Ltd., Leeds, England). It uses a knowledge base that contains alerts describing structure-toxicity relationships, with an emphasis on the understanding of mechanisms of toxicity and metabolism (http://www.chem. leeds.ac.uk/luk/index.html). At the time of our writing this article, there are a total of 296 alerts covering a wide range of toxicologic endpoints. An alert consists of a toxacophore (substructure known or thought to be responsible for the toxicity of a number of chemicals) alongside associated literature references, comments, and examples. DEREK for Windows also contains an argumentation model. This allows the program to associate levels of likelihood with predictions and gives it the ability to reason about the effects of the physicochemical and known toxicologic properties of a chemical. It is also able to extrapolate a prediction for one toxicologic endpoint to a second related endpoint, to take advantage of general toxicologic principles to fill gaps in available data. Therefore, it may be considered that because DEREK for Windows predictions no longer rely solely on the presence of alerts, confidence in the predicted absence of toxicologic activity may also be expressed in some cases (Marchant CA. Personal communication).
Two regulatory agencies have purchased a license for the DEREK for Windows system. These are the U.S. HSE and the Agence Française de Sécurité Sanitaire des Aliments in France. Currently, DEREK for Windows is used by the U.K. HSE only for internal and informal use and is not used to support any regulatory decisions.

HazardExpert
HazardExpert is a rule-based system using known toxic fragments collected from in vivo experimentation (Compudrug, Budapest, Hungary). The knowledge base was developed based on the list of toxic fragments reported by more than 20 experts. In addition to toxicity, HazardExpert also estimates toxicokinetic effects such as bioaccumulation and bioavailability on the basis of predicted physicochemical values. A further application is its integration with the MetabolExpert expert system (Compudrug, Budapest, Hungary) to enable it to predict the toxicity of both the parent compound and the metabolites.

Optimized Approach Based on Structural Indices Set (OASIS)
OASIS Forecast software was developed by Mekenyan et al. (1990Mekenyan et al. ( , 1994. The OASIS Forecast is a shell system for screening chemical inventories for physicochemical and toxic endpoints accounting for conformational flexibility of chemicals. The software was designed for personal computers with Microsoft Windows and is an interfacing program providing screening of chemicals by making use of QSAR models. Models related to predicting biological activities related to health effects are available for estrogen and androgen binding affinity and mutagenicity. A metabolism model is also being developed. Additional information on OASIS is provided in the companion article in this mini-monograph (Cronin et al. 2003) and via the Internet (Laboratory of Mathematical Chemistry 2002).

Substructure-Based Computerized Chemical Selection Expert System (SuCCSES)
SuCCSES was developed to facilitate the ITC review of large groups of chemicals with similar substructures (and modes of action, if available). SuCCSES and the substructures used to facilitate the ITC review of large groups of chemicals have been described previously (Walker and Brink 1989;Walker 1991Walker , 1995. SuCCSES is used to facilitate the ITC's review of chemicals with similar substructures, not to develop QSARs.
SuCCSES was developed based on historical information and expert opinions. Historical information was obtained from the ITC's scoring exercises 1, 2, 3, 4, and 5 that were convened from 1978 to 1983 (Walker 1993a(Walker , 1993b(Walker , 1995. For health effects, numerous international experts were sent a questionnaire listing more than 100 different chemical substructures and were asked to predict (based on their field of expertise related to human health effects and knowledge of modes or mechanisms of action) the potential for chemicals containing any of the substructures to cause acute, chronic, mutagenic, carcinogenic, developmental, reproductive, or neurotoxic effects or membrane irritation. Opinions from these health effects experts were converted to codes that identified chemical substructures and indicated potential of chemicals containing one or more substructures to cause specific health effects. Additional information on SuCCSES is provided in the companion article in this minimonograph (Cronin et al. 2003). Details have been published previously (Walker and Brink 1989;Walker 1991Walker , 1995. The substructures in SuCCSES that were associated with membrane irritation were included in a recent publication (Hulzebos et al. 2003). A forthcoming book chapter (Walker and Gray. In press) summarizes the development of SuCCSES and its applicability to the ITC's statutory mandate to use SARs before recommending chemicals for testing in May and November reports to the U.S. EPA Administrator. SuCCSES is not available to the public because it contains confidential business information.

Prediction of Acute and Chronic Toxicity
Some of the expert systems developed to predict acute and chronic toxicity are described below.

TOPKAT Model Rat Oral LD 50
The Rat Oral LD 50 module of the TOPKAT package comprises 19 QSAR models and the data from which these models are derived: experimental acute median lethal dose (LD 50 ) values of approximately 4,000 chemicals from the open literature. Each quantitative structure-toxicity relationship (QSTR) model assesses oral LD 50 for the rat for a specific class of chemicals.

TOPKAT Model for Rat Chronic Lowest Observed Adverse Effect Level
The Rat Chronic Lowest Observed Adverse Effect Level (LOAEL) module of the TOP-KAT package comprises five QSAR models and the data from which the models are derived. These models were developed from 393 uniform experimental LOAEL values selected after critical review of the open literature, U.S. National Toxicology Program (U.S. NTP) technical reports, and the U.S. EPA databases.

TOPKAT Model for Rat Inhalation Toxicity LC 50
The Rat Inhalation Toxicity LC 50 module of the TOPKAT package comprises five QSAR models and data from which these models were derived. These multiple regression models were derived from experimental median lethal concentration (LC 50 ) values on more than 643 chemicals after review of the open literature. Reviewed literature data ranged over various time limits; only exposure times in the range of 0.5-14 hr were accepted. Endpoints were modeled as log 10 (1/C) -log 10 (hours of exposure), where C is the concentration in moles/m 3 . The chemicals are grouped into five class-specific models: single benzenes, heteroaromatics and multiple benzenes, alicyclics, and acyclics with and without halogens. Each QSTR model assesses acute LC 50 to rat of a specific class of chemicals in units of moles per cubic meter per hour.

TOPKAT Model for Rat Maximum Tolerated Dose
The Rat Maximum Tolerated Dose module of the TOPKAT package comprises three QSAR models, and data from which these models are derived; 256 uniform experimental data from the U.S. NTP carcinogenesis reports are grouped into three class-specific models: single benzene, heteroaromatics and multiple benzenes, and aliphatics. Two dosing regimens are commonly used-either gavage or addition of compound to water-both of which were considered in the modeling process. To reflect this difference, two models are available to the user and selectable from the menu, depending upon the method of dosing. Endpoints have been modeled as log 10 (1/C), where C is the molar concentration of dosed compound.

Regulatory Use
Danish EPA. The Danish EPA has reported the use of the TOPKAT mouse LD 50 model to predict toxicity for compounds for which experimental data were not available (Danish EPA 2001).
The BgVV. The BgVV has developed a database from regulatory test results that has been used to develop specific SAR models to predict local skin and eye irritation and corrosion (EU classifications R34, R35, R36, R38, and R41) (Commission of the European Communities 2001). These models have been incorporated into a DSS (Gerner et al. 2000a(Gerner et al. , 2000bZinke et al. 2000). The DSS is mainly a rule-based approach, with rules developed based on not only substructural molecular features but also on physicochemical properties such as molecular weight, aqueous solubility, and logarithm of the octanol-water partition coefficient (log K ow ). The rules have been developed and validated on a total of 1,562 compounds (of which 385 are classified as hazardous) for oral toxicity, 1,043 compounds (44 hazardous) for dermal toxicity, and 154 compounds (35 hazardous) for inhalation toxicity. The DSS is designed to predict EU risk phrases such as R34, R35, etc.

Prediction of Mutagenicity
Mutagenicity is an important human health endpoint. It represents a genotoxic event. A considerable number of chemicals have been tested for mutagenicity, and these have formed the basis of a number of QSAR analyses and expert systems. Mutagenicity data may be used in two manners for modeling. First, and most commonly in expert systems, they may be used in a quantitative manner to predict the possibility of a mutagenic event. Second, and more commonly in individual QSAR analyses, relative potency may be quantified and predicted.

QSARs
As with other endpoints, QSARs have been developed for classes of chemicals. These tend to provide good relationships of potency because for some classes such as the aromatic amines (Hatch et al. 2001), all compounds can be considered to be acting by the same, or very similar mechanism of action. Other more general models have been developed, for instance, for aromatic compounds with a nitro functional group. For these models, statistical fit tends to be poorer, and the mechanisms more diverse. As with all models on chemical classes, these QSARs cannot be applied outside the chemical class on which they have been trained and so are of only limited value for regulatory application unless they can be formalized into some hierarchical framework to allow chemicals to be assigned to classes. QSARs for predicting mutagenicity have been reviewed recently [Patlewicz et al. In press (a)].

Expert Systems
HazardExpert. HazardExpert contains a number of rules for the prediction of mutagenicity.
DEREK for Windows. DEREK for Windows contains 76 structural alerts for mutagenicity.
TOPKAT. The Ames mutagenicity module of the TOPKAT package is composed of 10 QSAR models and the data from which these models are derived. Each model applies to a specific class of chemicals. These QSARs are linear discriminant analysis models based on positive and negative categories. The model is derived from 1,866 uniform studies selected after critical review of open-literature histidine reversion assays using Salmonella typhimurium strains. The QSARs compute the probability of a submitted chemical structure being a mutagen in the histidine reversion assay; a probability below 0.3 indicates a nonmutagen, and a probability above 0.7 signifies a mutagen. The probability range between 0.3 and 0.7 refers to the "indeterminate" zone.
CASE. The CASE algorithm has been trained on a number of mutagenicity databases. These models provide an estimate of the likelihood of the toxicologic events occurring. OASIS. The OASIS forecast software includes a suite of modules. One of these modules, Common Reactivity Pattern (COREPA), is a pattern-recognition method for identifying common stereoelectronic (reactivity) patterns of structurally diverse chemicals that exert similar biological effects. The COREPA approach is not dependent upon predetermined toxicophores or alignment of conformers to a lead compound. COREPA was used to identify structural requirements for eliciting mutagenic effects (Mekenyan OG. Personal communication). Elucidation of this pattern required examination of the conformational flexibility of the compounds, revealing areas in the multidimensional descriptor space that were most populated by the conformers of mutagenic chemicals and least populated by nonmutagenic ones (including chemicals that become mutagenic after metabolic activation). The QSAR analysis was based on Salmonella data from the U.S. NTP (http://www.toxnet.nlm.nih.gov). The training set was confined to a single strain, TA100, because of the complexity of the data. The mutagenicity profile was described as a hierarchically ordered set of rules based on ranges of parameter variations. The structural factors controlling the effect were global reactivity of chemicals [E gap = E(HOMO) -E(LUMO)] combined with their ability to take part in SE2 (local electronic charges) and SE1 (reactive fragments) electrophilic reactions. These significant factors were tuned by additional structural requirements associated with molecular polarity and surface. Based on derived reactivity patterns, a descriptor profile (decision tree) was established for identifying mutagenic chemicals. The model correctly identified 137 of 148 (93%) of the direct acting mutagens in the training set, and 789 of 820 (96%) of the nonmutagens in the training set. A system that identifies those chemicals that require metabolic activation has also been developed. This model correctly identified 201 of 229 (88%) of the chemicals in a training set (Mekenyan OG. Personal communication).

Regulatory Use
Danish EPA. The Danish EPA applied a tiered selection of models for the prediction of mutagenicity. The models used include the MCASE Model A2E (Ashby and Tennant 1991; structural alerts for DNA reactivity), model A62 (induction of micronuclei), model A2H [Salmonella (Ames) mutagenicity], model A61 (chromosomal aberrations), and model A2F (mutations in mouse lymphoma) as well as TOPKAT Salmonella (Ames) mutagenicity model. The predictions from these models were integrated to allow systematic evaluation, along with expert evaluation, for the prediction of the EU mutagenic classification R40 (Commission of the European Communities 2001).

Prediction of Carcinogenicity
Carcinogenicity still remains one of the most difficult toxicologic endpoints to assess and comprehend experimentally. Because of the

Mini-Monograph | Regulatory use of QSARs for human health effects
Environmental Health Perspectives • VOLUME 111 | NUMBER 10 | August 2003 cost, difficulty, and length of time of the experimental measurement, the prediction of this endpoint is very attractive. A large number of systems and models dedicated to the prediction of carcinogenicity have been developed (Richard 1998). The prediction of carcinogenicity has also benefited from two blind trials organized by the U.S. NTP. These have demonstrated that carcinogenicity is generally only poorly predicted, and the best models tend to be those that can integrate mechanism-based reasoning with biological data (Richard and Benigni 2002).

QSARs
QSARs for carcinogenicity were reviewed by Cronin and Dearden (1995a) and Patlewicz et al. [In press (a)]. A relatively small number of QSARs exist for distinct chemical classes. In such examples the assumption is made that structurally similar chemicals may act by similar mechanisms of action. Good examples are provided by Franke et al. (2001), who demonstrated both the modeling of activity (carcinogenic vs. noncarcinogenic) and the potency of aromatic amines (from the carcinogenicity potency database). As with all class-based QSARs, their use is restricted by the domain of the QSAR.

Expert Systems
A large number of expert systems exist for prediction of carcinogenicity. Some systems and approaches are solely dedicated to the prediction of carcinogenicity; others are part of systems that cover a greater number of toxicologic endpoints. Some good reviews exist on the possibilities for predicting carcinogenicity (Richard 1998;Richard and Benigni 2002;Hulzebos et al. 1999).
DEREK for Windows. At the time of our writing this article, DEREK for Windows contained 46 alerts for the prediction of carcinogenicity. Further, the argumentation model in DEREK for the Windows system allows predictions of carcinogenicity in appropriate species to be extrapolated from predictions for endpoints known to be related to carcinogenicity, such as peroxisome proliferation.
HazardExpert. At the time of writing HazardExpert contained a number of rules for the prediction of carcinogenicity.
OncoLogic. OncoLogic (LogiChem Inc., Boyertown, PA, USA) is an expert system that assesses the potential of chemicals to cause cancer. It is marketed by LogiChem Inc. which, established in 1986, is owned and operated by a group of biochemical and computer science professionals. OncoLogic predicts the potential carcinogenicity of chemicals by applying the rules of SAR analysis and incorporating what is known about mechanisms of action and human epidemiologic studies. OncoLogic was developed in cooperation with the U.S. EPA Structure Activity Team involved in the PMN process. OncoLogic has the ability to reveal its line of reasoning, just as human experts can. After supplying the appropriate information about the structure of the compound, an assessment of the potential carcinogenicity and the scientific line of reasoning used to arrive at the assessment outcome are produced. This information provides a detailed justification of a chemical's cancer-causing potential. OncoLogic can evaluate the following classes of compounds: fibers, polymers, metals, metalloids, and metal-containing compounds as well as organic chemicals.
TOPKAT. The TOPKAT software comprises a number of modules for the prediction of carcinogenicity. Each is described in more detail below.
The U.S. FDA Rodent Carcinogenicity module of the TOPKAT package is composed of eight QSTR models and the data from which these models are derived. Each QSTR model relates to a specific sex/species combinationmale rat, female rat, male mouse, and female mouse-each of which is further divided into carcinogen versus noncarcinogen and multipleversus single-site models. These discriminant models, derived from data provided by the U.S. FDA CDER under a material transfer agreement, compute the probability of a submitted chemical structure being a carcinogen. In the first-stage model, carcinogen versus noncarcinogen, a computed probability below 0.3 indicates a noncarcinogen, and probability above 0.7 signifies a carcinogen. The second-stage model, multiple versus single site, can then be applied to carcinogens. The probability range between 0.3 and 0.7 refers to the "indeterminate" zone.
The NTP Rodent Carcinogenicity Module of the TOPKAT package comprises four QSTR models and the data from which these models are derived. Each QSTR model relates to a specific sex/species combination: male rat, female rat, male mouse, and female mouse. These discriminant models, derived from uniform studies selected after critical review of technical reports on 366 rodent carcinogenicity tests conducted by the National Cancer Institute (NCI) and the U.S. NTP using inbred rats and hybrid mice, compute the probability of a submitted chemical structure being a carcinogen.
The Weight-of-Evidence Rodent Carcinogenicity Module of the TOPKAT package comprises a single QSTR model and the data from which this model is derived. The QSTR model scores the chemical using the U.S. FDA CDER weight-of-evidence protocol, which scores the chemical as a carcinogen if a) it is a multiple-site carcinogen in at least one sex/species combination (male or female/rat or mouse) or b) it is a single-site carcinogen in at least two sex/species combinations. This discriminant model, derived from data provided by the CDER and from uniform studies selected after critical review of technical reports on rodent carcinogenicity studies conducted by the NCI and the U.S. NTP, computes the probability of a submitted chemical structure being a carcinogen.
CASE. There are a number of modules for the prediction of carcinogenicity available in the MCASE software. These include the NTP Rodent assay (model developed on 313 compounds), NTP Mouse (319 compounds Other less formalized models. RASH. The rapid screening of hazards (RASH) method predicts carcinogenic potential based on the observed relative potencies of tested chemicals in different short-term bioassays. It is not fully automatic and requires a human expert to select relevant comparisons (Jones and Easterly 1996).
Purdy's model. Purdy (1996) reported a hierarchical model consisting of QSARs based mainly on chemical reactivity that was developed to predict the carcinogenicity of organic chemicals to rodents. The model is composed of QSARs based on hypothesized mechanisms of action, metabolism, and partitioning. A large number of physicochemical predictors were used to individually model different mechanisms of action. The model correctly classified 96% of the carcinogens in the training set of 306 chemicals and 90% of the carcinogens in the test set of 301 chemicals.

Regulatory Use
Danish EPA. Predictions of potential carcinogenicity were made after a number of QSAR approaches. An initial assessment of the compounds was made by the prediction of mutagenicity (as described above). The focus of the prediction acknowledged that although many compounds could promote carcinogenicity via a nongenotoxic mechanism, the screening would identify only those compounds associated with genotoxicity. Subsequent to the prediction of genotoxicity, the TOPKAT NTP and U.S. FDA models for carcinogenicity (all species and sexes) were applied. In addition two MCASE models based on the carcinogenicity potency database were also used.
U.S. FDA. The U.S. FDA has been instrumental in the release of data and information from regulatory submissions. Matthews and Contrera (1998) report the development of MULTICASE for the prediction of carcinogenicity using data released from the U.S. FDA under a cooperative research and development agreement (CRADA). The model developed with the U.S. FDA data had greatly improved predictivity.
Other reports from the U.S. FDA report the use of TOPKAT to make predictions of the carcinogenicity of pharmaceutical substances. The results of a trial using TOPKAT to predict the carcinogenicity of chemicals tested by the U.S. NTP were disappointing, with a low rate of successful prediction (Prival 2001). It should be emphasized that the results of this trial should not be taken in isolation. The performance of TOPKAT is unlikely to be significantly different from other expert systems. This trial simply confirmed the difficulty in predicting this endpoint and that computational prediction of carcinogenicity is complex.
U.S. EPA. The U.S. EPA Office of Pollution Prevention and Toxics (OPPT) regularly uses the SARs contained within the OncoLogic system to assess the carcinogenic potential of substances (Woo et al. 1995).
NCI. The NCI's use of SARs is illustrated by the review of juglone (CAS Registry No. 481-39-0), a potentially toxic natural product, reported in . The NCI Chemical Selection Working Group reviewed three structurally related chemicals and associated genotoxicity data and concluded that juglone should be recommended for carcinogenicity testing to the U.S. NTP.

Reproductive Toxicity/ Developmental Toxicity
Along with carcinogenicity, the experimental assessment of reproductive toxicity and developmental toxicity is one of the most costly, time-consuming, and mechanistically complex endpoints to perform. Cronin and Dearden (1995b) reviewed QSARs for the prediction of reproductive toxicity. Because of the paucity of published data, there are relatively few published QSARs. Typically, many of the more successful approaches to predicting developmental toxicity, in particular, have resulted from the analysis of distinct chemical classes.

DEREK for Windows. DEREK for
Windows has a small number of alerts for reproductive toxicity, developmental toxicity, and teratogenic effects.
HazardExpert. HazardExpert has a number of rules for teratogenic effects.
TOPKAT. The Developmental Toxicity Potential Module of the TOPKAT package comprises three QSAR models and the data from which these models are derived. Each model applies to a specific class of chemicals. These discriminant models, derived from uniform experimental studies selected after critical review of approximately 3,000 open literature citations, compute the probability of a submitted chemical structure being a developmental toxicant in the rat; a probability below 0.3 indicates no potential for developmental toxicity, and probability above 0.7 signifies developmental toxicity potential. The probability range between 0.3 and 0.7 refers to the "indeterminate" zone.

Prediction of Eye Irritation
Eye irritation is a complex and emotive toxicologic endpoint to assess experimentally. Regulatory classifications of ocular toxicity are made from the assessment of several different endpoints. Because the toxic effect may be elicited by either physical (corrosive) or biological effects, efforts to predict eye irritation have often been inadequate.

QSARs
A large number of approaches to predict eye irritation using QSARs have been applied. These have been reviewed recently by Cronin et al. (In press) and Patlewicz et al. [In press (b)]. Many of the efforts have centered on the modeling of eye irritation as a nonlinear event (e.g., Worth and Cronin 1999), membrane interaction (Kulkarni et al. 2001), or more traditional QSAR analyses (e.g., Abraham et al. 1998aAbraham et al. , 1998b. Recently Worth (2000; In press) extended the OECD tiered assessment regime to incorporate physical (pH) data, a QSAR model, and in vitro data.

Expert Systems
DEREK for Windows. DEREK for Windows contains a total of 33 alerts for irritation; 29 of these include consideration of irritation of the eye.
HazardExpert. HazardExpert contains a number of rules for irritation.
TOPKAT. The Ocular Irritancy module of the TOPKAT package comprises 15 QSARs and the data from which these models are derived. Each model applies to a specific class of chemicals, each of which is further subdivided into three groups on the basis of severity. These models, based on 1,453 uniform studies selected after critical review of open literature, compute the probability of a submitted chemical structure being an ocular irritant in the Draize eye irritation test. In the first stage, nonirritants and mild irritants combined are classified in contrast to moderate and severe irritants combined. At the second stage, nonirritants are separated from mild irritants, and moderate separated from severe irritants.
CASE. The MultiCASE software comprises a model for eye irritation, developed from the results of 207 Draize tests.

Regulatory Use
The BgVV. The BgVV has developed a database from regulatory test results that has been used to develop specific SAR models for predicting eye irritation/corrosion, which have been incorporated into a DSS (Gerner et al. 2000a(Gerner et al. , 2000bZinke et al. 2000). The DSS is mainly a rulebased approach, the rules being developed on not only substructural molecular features but also on physicochemical properties such as molecular weight, aqueous solubility, and log K ow . The rules have been developed and validated on a total of 1,484 compounds (of which 405 are classified as being hazardous). The DSS is designed to predict EU risk phrases.

Skin Irritation/Corrosivity
The assessment of skin irritancy and corrosivity is important for chemicals that may be dermally applied or for occupational exposure by this route.

QSARs
There have been relatively few QSARs of skin irritation or corrosivity, and these have been reviewed recently by Cronin et al. (In press), Hulzebos et al. (2003), and Patlewicz et al. [In press (b)].

Expert Systems
DEREK for Windows. DEREK for Windows contains a total of 33 alerts for irritation, 25 of which include consideration of irritation of the skin.
HazardExpert. HazardExpert contains a number of rules for irritation.
TOPKAT. The Rabbit Skin Irritation Module of the TOPKAT comprises 13 QSAR models, and data from which these models are derived. Each model applies to a specific class of chemicals, and each model is further subdivided into two or three groups based on severity. Compounds and data were collected from national and international journals as well as U.S. government sources for a total of 1,258 compounds. The chemicals are grouped into five class-specific models: heteroaromatics and multiple benzenes, alicyclics, single benzenes, and two classes of acyclics. Each class-specific model in turn has severity-specific submodels.

Regulatory Use
The BgVV. The BgVV database has been used to develop specific SAR models for predicting skin irritation/corrosion. These models have Mini-Monograph | Regulatory use of QSARs for human health effects Environmental Health Perspectives • VOLUME 111 | NUMBER 10 | August 2003 been incorporated into a DSS (Gerner et al. 2000a(Gerner et al. , 2000bZinke et al. 2000). As with the discussion for eye irritation (above), the DSS is mainly a rule-based approach, the rules being developed based not only on substructural molecular features but also on physicochemical properties such as molecular weight, aqueous solubility, and log K ow . The rules have been developed and validated on a total of 1,508 compounds (of which 199 are classified as being hazardous). The DSS is designed to predict EU risk phrases.

Prediction of Skin Sensitization
Skin sensitization is another important toxicologic endpoint for substances that may come in contact with the skin. Essentially, skin sensitization is an immunologic response, and as such, there are no validated in vitro alternatives to in vivo testing.

QSARs
Skin sensitization requires two fundamental processes to proceed: the passage of a chemical through the skin, and the interaction of the chemical with a skin protein to trigger the immunologic response. A number of QSAR analyses have been performed. Basketter et al. (1992) demonstrated that the potency of skin sensitization for a series of haloalkanes was related to their ability to cross the skin, and their relative alkylating potential once at the site of action. Other analyses have been more multivariate in nature (Cronin and Basketter 1994;Magee et al. 1994). QSARs for skin sensitization have been reviewed by Rodford et al. (In press).

Expert Systems
DEREK for Windows. DEREK for Windows contains a total of 59 alerts for skin sensitization and five alerts for photoallergenicity. The predictive performance of these alerts has been assessed by Barratt and Langowski (2000). In addition, an argumentation model in the DEREK for Windows system allows predictions in these areas to take account also of the percutaneous absorption of the chemical of interest as calculated from the Potts and Guy (1992) equation. Chemicals for which percutaneous absorption is calculated to be low are associated with a reduced level of likelihood of activity (Marchant CA. Personal communication).
HazardExpert. HazardExpert contains a number of rules for all types of sensitization.
TOPKAT. The Skin Sensitization Module of the TOPKAT package is a suite of two modules, one for nonsensitizers versus sensitizers and the other for weak/moderate versus strong sensitizers. Each module comprises two QSARs models applicable to a specific class of chemicals and the data from which these models were derived; 335 uniform studies selected after critical review of guinea pig maximization test assays in the open literature were used to develop these models.
CASE. A CASE model for skin sensitization has been developed for the human exposure of 1,034 chemicals.

Regulatory Use
Danish EPA. The Danish EPA used two approaches to predict skin sensitization. The first was the use of the TOPKAT skin sensitization module. Compounds predicted to be strong allergens were considered likely to fulfill the criteria for EU classification R43 (Commission of the European Communities 2001). Second, the MCASE allergic contact dermatitis model was applied. Again, compounds that were predicted to be very active were considered to meet the criteria for R43 classification.
The BgVV. The BgVV has initiated a process of validation and development of skin sensitization alerts. These alerts have been incorporated into a DSS (Gerner et al. 2000a(Gerner et al. , 2000bZinke et al. 2000). The performance of the alerts has been assessed using a database of 1,039 chemicals (of which 403 are classified as being skin sensitizers). Some weaknesses in the alerts were identified (Zinke et al. 2002). The DSS is designed to predict EU risk phrases.

Prediction of Percutaneous Absorption
The assessment of the ability of a chemical to cross the skin is important for risk assessment of dermal toxicity but need not necessarily be considered as a toxicity test per se. There are a variety of in vitro and in vivo methodologies to assess percutaneous absorption. Probably the most widespread and potentially useful is the use of excised human skin in vitro.

QSARs
QSARs for skin permeability are well reviewed by Moss et al. (2002) and Walker et al. (2003). The passage of chemicals across the skin may be viewed as a passive diffusion process. As such, most success from modeling skin permeability has come from the use of descriptors for hydrophobicity and molecular size. Also, a number of issues regarding data quality from historical sources have made modeling more complex.

Syracuse Research Corporation's Dermwin
Program. This program estimates the dermal permeability coefficient (K p ) and the dermally absorbed dose per event (DA event) of organic compounds from their chemical structure and Syracuse Research Corporation's (Syracuse, NY, USA) LogKow (KOWWIN) program to estimate K ow . The estimation methodology was taken from the U.S. EPA (1992). The program uses one general estimation equation and three class-specific estimation equations to predict K p . DA event is predicted by two separate methods (an adapted equation of Fick's first law and another method, both of which are indicated in the U.S. EPA report) and requires a) input of the duration of the event and b) concentration of the chemical in water (a default water solubility using the method in the Syracuse Research Corporation's WsKow program is calculated for the user if no value is entered).
Random walk model. The random walk model is new mathematical model for permeability of chemicals in aqueous vehicle through skin (Frasch 2002). The rationale for this model is to represent diffusion by its fundamental molecular mechanism, that is, random thermal motion. Diffusion is modeled as a two-dimensional random walk through the biphasic (lipid and corneocyte) stratum corneum.

Regulatory Use
U.K. HSE. The U.K. HSE has funded two studies into use and validation of a knowledgebased system for the prediction of dermal absorption, the system being based on SARs (Dick and Williams 1998; Wilkinson and Williams 2001). However, the HSE does not make routine use of these findings, and the findings do not reflect HSE policy.
ITC. Walker et al. (2003) described the regulatory application of QSARs to predict dermal absorption of compounds. The permeability coefficient was predicted by a series of simple QSARs that were based either on hydrophobicity and molecular size or on hydrophobicity alone.

Use of (Q)SARs to Assess the Human Health Effects of HPV Chemicals
Under the U.S. EPA HPV Chemical Challenge Program (Challenge Program) (Walker et al. In press) the chemical industry is being challenged to voluntarily compile a screening information data set (SIDS) for chemicals on the U.S. HPV list. The SIDS, which has been internationally agreed to by member countries of the OECD, provides basic screening data needed for an initial assessment of the physicochemical properties, environmental fate, and human and environmental effects of chemicals. The information used to complete the SIDS can come either from existing data or from new tests conducted as part of the Challenge Program. The Challenge Program chemical list, available online (U.S. EPA 2002b), consists of about 2,800 HPV chemicals reported under the TSCA 1990 and 1994 Inventory Update Rule. The large number of chemicals on the list makes it important to reduce the number of tests to be conducted, where this is scientifically justifiable. SARs may be used to reduce testing in at least three different ways: a) by identifying a number of structurally similar chemicals as a group, or category, and allowing selected members of the group to be tested with the results applying to all other category members; b) by applying SAR principles to a single chemical that is closely related to one or more better characterized chemicals (analogues), the analogue data are used to characterize the specific endpoint value for the HPV candidate chemical; and c) a combination of the analogue and category approaches may be used for individual chemicals. For example, one could search for a "nearest chemical class," as opposed to a nearest single chemical analogue, to estimate a SIDS endpoint.

Guidance on the Use of SARs for the Prediction of Human Health Effects of HPV Chemicals
The SIDS manual (OECD 2002a), with guidance on the use of SAR in the OECD SIDS program, consists mainly of citations to OECD and other documents. There is no specific guidance for the use of SAR in assessing mammalian toxicity. The manual also lists some examples of the potential use of SAR: groups of isomers with similar SAR profiles; close homologues; and availability of information on precursors, breakdown products, and metabolites/degradation products of specific chemicals.
SARs for health effects (summarized in Table 1) are different from the other SIDS endpoints. This is because of the variety of scenarios (acute vs. chronic exposure conditions, in vitro vs. in vivo tests) and endpoints (e.g., general toxicity, organ-specific effects, mutagenicity, developmental effects, effects on fertility). Therefore, generic QSAR models are either not readily available or not widely accepted [for a review, see Hulzebos et al. (1999)], and an analogue approach is a reasonable way to proceed.

Scope and Applications in the Use of (Q)SARs in the U.S. HPV Challenge Program
The use of SAR/QSAR in the U.S. HPV Challenge Program is expected to decrease the number of new tests required to develop a SIDS for each HPV chemical. Their use, by either the category or individual chemical approach, will necessarily be limited by the nature of the SIDS endpoint, the amount and adequacy of the existing data, and the type of SAR/QSAR analysis performed. Measured data developed using acceptable methods are preferred over estimated values. The development and use of SAR/QSAR in the Challenge Program will be different for each of the major categories of SIDS (i.e., physicochemical properties, environmental fate, ecotoxicity, and health effects). In the final analysis, because the goal of the program is to adequately characterize the hazard of HPVs, a careful, reasonable, and transparent argument using measured data and estimation techniques will need to be presented.
The estimation of toxicity to mammals is complicated because there are a variety of endpoints (mutagenicity vs. general toxicity vs. reproductive/developmental toxicity) and exposure (in vitro vs. in vivo and acute vs. chronic) conditions. In addition, the available SAR programs are very different from each other and unique to certain endpoints, and most are not validated [for a review, see Hulzebos et al. (1999)]. Therefore, in all cases, SAR estimations for a health endpoint must be accompanied by experimental data with a close analogue.

Predictions for Individual Chemicals
For individual chemicals, SAR is applied in two ways: a) by the use of (usually quantitative) predictive models based on well-validated data sets (QSAR) and b) by comparing the chemical with one or more closely related chemicals, or analogues, and using the analogue data in place of testing the chemical. In the case of models, the comparison has essentially been incorporated into the model.
In developing a SAR, proposers (i.e., developers who propose a SAR) need to consider the following steps for each HPV chemical they are interested in sponsoring:

SIDS endpoint SAR model
Acute toxicity Nearest analogue analysis using expert judgment General toxicity (repeated dose) Genetic toxicity (effects on the gene and chromosome) Reproductive/developmental toxicity characterize the hazard of an HPV-the above-mentioned models could not replace an actual test. However, there is an opportunity to use SARs for health endpoints in the Challenge Program. Given the complexity of health endpoints and the amount of uncertainty in many models, OPPT has historically used an expert judgment/nearest-analogue approach to SARs for predicting such effects in assessing new chemicals. OPPT suggests that a similar approach be applied in the Challenge Program. The goal is to find toxicity data for an analogue that can be used to address the testing needs of an HPV chemical. This is best done on an endpoint-by-endpoint and case-by-case basis. Valid analogues should have close structural similarity and the same functional groups. In addition, the following parameters should be compared between the chemical and its analogue(s): physicochemical propertiesphysical state, molecular weight, log K ow , water solubility; absorption potential; mechanism of action of biological activity; and metabolic pathways/kinetics of metabolism. A high correlation between the HPV chemical and the putative analogue for most of these parameters improves the chance that a SAR approach will be reasonable and acceptable. A more convincing argument can be made for the use of surrogate data if there are toxicity studies in common (i.e., ones that are not necessarily SIDS endpoints but have been done with both the analogue and the HPV candidate chemical) that demonstrate the toxicologic similarity of the chemicals.
The following presents possible examples of the use of surrogate data to characterize individual chemicals: • Chemicals that are essentially the same in vivo: For example, different salts of the same anion or cation. The salts must fully dissociate in vivo, and the counter ion must not contribute any more (or less) toxicity. • A chemical that metabolizes to one or more compounds that have been tested: The metabolism must be rapid and complete. • Chemicals that have only minor structural differences that are not expected to have an impact on toxicity: All functional groups must be the same. Table 2 provides a summary of the SAR models discussed above.

Main Findings
A framework of QSARs has been established by regulatory agencies worldwide (Table 3). By far the greatest use and application of QSARs have resulted from the TSCA and the efforts of the U.S. EPA and U.S. FDA. The regulatory use of QSARs in Europe and elsewhere in the world is less widespread and formalized and is generally on a local (national) level by individual agencies.

Future Outlook
Because of the perceived need to assess the human health effects of a large number of existing substances, it is likely that QSARs and other computational approaches for predicting human health effects will become increasingly applied for the purposes of priority setting, hazard assessment, and risk assessment. In the cases of QSARs that are intended for hazard and risk assessment purposes, it will be particularly important to establish the limitations and predictive capacities of the models. This can be achieved only by proper validation under the auspices of organizations or platforms that are independent of both the QSAR developers and the end users (industry and/or regulatory authorities). In addition to the use of models for regulatory assessment, the increased release of confidential data for modeling is both a necessity and more likely through initiatives such as the U.S. FDA CRADA.
In the EU, the REACH system is likely to have important implications for the development, validation, and application of QSARs and other computer-based approaches for predicting chemical toxicity. In particular, the EC white paper (EC 2002) has envisaged that assessments of one or more physicochemical, toxicologic, and ecotoxicologic properties of up to 30,100 existing chemicals, which are currently marketed in volumes greater than 1 metric ton per year, will be required by the end of 2012. Furthermore, in its conclusions on the white paper (Council of Ministers 2001), the Environment Council of the European Commission has called upon the commission . . . to explore ways in which chemicals of concern can be identified to allow prioritisation for taking action, developing clear and transparent screening criteria, essential information requirements, and exploring the use of chemical grouping and modelling techniques. . . . (Council Conclusion 37) Given the limitations in the testing capacity of EU industry, it seems likely that the envisaged deadline for obtaining the required information will only be met if QSAR approaches are used wherever it is scientifically feasible to do so. For example, QSAR models could be used to prioritize chemicals for further testing, to identify certain types of toxic hazard (possibly in order to derogate from further testing), or to provide estimates of toxic potency for use in risk assessments.