Prioritizing testing of organic compounds detected as gas phase air pollutants: structure-activity study for human contact allergens.

Organic compounds that are used or generated anthropogenically in large quantities in cities can be identified through their presence in the urban atmosphere and in air pollutant source emissions. Compounds identified by this method were screened to evaluate their potential to act as contact allergens. The CASE and MULTICASE computer programs, which are based on the detection of structure-activity relationships (SAR), were used to evaluate this potential. These relationships first are determined by comparing chemical structures to biological activity within a learning set comprised of 458 compounds, each of which had been tested experimentally in human trials for its sensitization potential. Using the information contained in this learning set, CASE and MULTICASE predicted the activity of 238 compounds found in the atmosphere for their ability to act as contact allergens. The analysis finds that 21 of 238 compounds are predicted to be active contact allergens (probability >0.5), with potencies ranging from mild to very strong. The compounds come from chemical classes that include chlorinated aromatics and chlorinated hydrocarbons, N-containing compounds, phenols, alkenes, and an S-containing compound. Using the measured airborne concentrations or emission rates of these compounds as an indication of the extent of their use, together with their predicted potencies, provides an efficient method to prioritize the experimental assessment of contact sensitization of untested organic compounds that can be detected as air pollutants. ImagesFigure 1.

Skin sensitization or allergic contact dermatitis (ACD) is a common condition associated with immunologically mediated dermal inflammation in response to contact with certain compounds. Identification and regulation of allergens has taken place historically in the occupational setting. Since ACD is a frequent condition in the general population, there is a need to identify allergens that are present in environmental settings.
One particularly efficient method that can be used to conduct a systematic search for organic chemicals that are in widespread use is to examine their concentration in the atmosphere of cities. The atmosphere acts as the reservoir for the organics that evaporate from solvents, paint strippers, household cleaning chemicals, and fuels, as well as industrial chemicals. By examining those organic chemicals that are measurable in the atmosphere or in air pollutant source emissions, one can quickly obtain a broad overview of the entirety of the volatile organic compound use in the city as a whole. Even when the atmospheric concentration of a particular chemical is too low to directly represent a significant exposure to the skin, the presence of the volatile chemical often signals that there was the potential for skin contact with the liquid phase of that chemical at the place where it originally was in use.
Traditional methods for identifying sensitizers include animal and human response tests, which are often time consuming and expensive. Various structure-activity relationship (SAR) models have been developed to evaluate the sensitization potential of compounds that have not been tested experimentally. These models involve the identification of structural alerts that are responsible for allergic sensitization. One such model is based on the CASE/MUL-TICASE system developed by Rosenkranz and Klopman (1,2).
Most current SAR models of allergic contact dermatitis are based on an assumed mechanism of activity. It is generally accepted that ACD involves the initial penetration of the allergen through the skin followed by its binding to proteins to initiate an immune response. It can be inferred that characteristics that enhance the ability of the chemical to penetrate the skin (such as lipophilicity) and react with proteins (such as electrophilicity) might be found in allergens. In fact, most SAR models of ACD incorporate these assumptions (3).
The CASE/MULTICASE system (C/MC) is different from many other SAR models in that it operates independently of an assumed mechanism of activity, i.e., it is knowledge based. C/MC analyzes compounds of experimentally determined activity, looking for structural features that are statistically associated with active chemicals. The C/MC system offers a fast and reliable method for evaluating a large number of structurally diverse compounds. Its current limitations are that metallic species cannot be analyzed, and chemicals must contain at least two bonded nonhydrogen atoms. The C/MC system has been applied to a number of biological endpoints including ACD (3), mutagenicity (4), and carcinogenicity (5,6). It is ideal for analyzing the sensitization potential of the numerous organic compounds that are ubiquitous as a result of their presence in polluted air. The aim of this study is to apply the predictive capabilities of C/MC to the gas phase organic compounds identified in urban air and in air pollution source emissions in order to prioritize compounds for further testing.

Methods
Clasification oforganic air pollutants. The organic compounds that are the focus of this study were obtained from two separate sets of data collected in the Los Angeles, California, area. On 8-9 September 1993, extensive measurements of both gas-phase and particle-phase organics were made at five field monitoring sites during a severe photochemical smog event (7). From this study, 156 different gas-phase organic compounds present in outdoor air were quantified and 48-hr average concentrations were calculated for each site. The second study consists of compounds identified in the emissions from pollution sources. The 11 largest source types that contribute to nonmethane anthropogenic gas-phase organic air pollutant emissions in Southern California during August 1987 were resurveyed (8). These are (in decreasing order of importance) noncatalystequipped motor vehicle exhaust, catalystequipped motor vehicle exhaust, industrial surface coatings, solvent-borne architectural surface coatings, gasoline headspace vapors, whole gasoline vapors, domestic solvents, industrial adhesives, diesel engine exhaust, composite thinning solvents, and waterborne architectural surface coatings. The organics Articles -Air pollutants-SAR for contact allergens present in these top source emitters, as well as in a few other sources, were grouped into one master list that consisted of 82 compounds in addition to those measured in the atmospheric samples. A composite list of 238 organic compounds that represent all volatile organic compounds found either in the source or ambient data sets was generated from the two studies and was submitted to the C/MC system for analysis.
CASE and MULTICASE methodology. A learning set containing compounds that have been tested for human ACD activity was compiled from a literature survey. Chemicals were selected if standard human maximization tests (HMT) had been performed or if three or greater case reports of positive patch tests were cited. (The literature citations are available upon request.) The result was a database composed of 458 compounds of which 229 were active con-tact allergens and 229 were inactive. In fact, a total of 695 inactive chemicals was identified. Since previous studies had indicated that the best SAR model was obtained with a learning set composed of an equal number of active and inactive chemicals (9), three test models were developed, each containing the 229 active sensitizers but coupled with different nonoverlapping sets of 229 inactive chemicals. No differences were found between the three models regarding the structural alerts identified. Accordingly, one model was randomly selected for further evaluation. The list of chemicals composing the model is available upon request. Each chemical in the model was assigned a potency based on the concentration of chemical used for the challenge dose, and in HMTs, on the sensitization rate (3). Potencies were designated as follows: 10, inactive; 25, marginally active; 39, weakly active; 49, moderately active; 59, strongly active; and 69, extremely active. Two chemicals, i.e., a-pinene and 1,2-propanediol, detected as gas-phase air pollutants, were present in the learning set ( Table 1).
The C/MC systems used in this study were developed as tools for predicting the activity of chemicals with respect to specific biological and/or toxicological endpoints (4). CASE and MULTICASE (MULTICASE Inc., Beachwood, OH) examine all possible molecular fragments containing 2-10 contiguous heavy atoms that can be formed from decomposition of an organic molecule. (Accordingly, a molecule must contain at least three nonhydrogen atoms to be processed by the C/MC system.) Fragments that are statistically associated with biological activity or inactivity are identified. The fragments are generated from chemicals in a learning set composed of biologically active Abbreviations: ND, below detection limits; C/MC, CASE/MULTICASE; MC, MULTICASE. aRange of average ambient concentration of gas-phase organic compounds measured during a Los Angeles smog event on 8-9 September 1993; high and low numbers are range of 2-day average concentrations measured among four urban sites during that particular event (7).
bAir basin-wide mass emission rates in the South Coast Air Basin that surrounds Los Angeles for 27 August 1987 (8). cRank is determined from the product of Bayes' probability value (see Table 2) x MC units (potency) x atmospheric concentration. The chemical yielding the highest product receives the rank of 1. dCompound is present in the source emission library, but the basin-wide emission rate is not calculated. eRank cannot be calculated because the measured mass emission rate is not known.
fChemical was included in the table because the probability of activity (0.48) was very close to the cut-off value of 0.50.
gGlyoxal is known experimentally to be a strong contact sensitizer, but is too small to be fragmented by C/MC. hEmission rate is given is for all isomers with the same molecular formula, not just the compound shown. and inactive compounds. Fragments significandy enriched in the active group of chemicals are considered activating fragments (biophores), while those associated predominantly with inactive compounds are considered deactivating (biophobes). CASE uses the significant fragments to classify a chemical submitted for analysis. Classification is expressed as the probability of being active or inactive. CASE also derives a quantitative estimation of potency by generating a global multivariate linear regression equation using structural determinants as well as the log P (octanol:water partition coefficient).
The MULTICASE program differs in that it uses a hierarchical approach to identify biophores that account for activity. MULTICASE uses molecular fragments as well as 2-dimensional distance descriptors to find the biophore accounting for the majority of the active class of chemicals. Chemicals containing the biophore are removed from further consideration, and subsequent biophores are identified that explain the activity of the remaining compounds. MULTICASE then groups compounds that contain each biophore and examines them for additional descriptors (modulators) responsible for increasing or decreasing the potency of molecules containing the biophore. Modulators may be additional structural fragments, distance descriptors, or physical-chemical properties such as electronic energies calculated by the Huckel method (HOMO, highest occupied molecular orbital, or LUMO, lowest unoccupied molecular orbital) or transport indices (lipophilicity, water solubility). The modulators are considered only in chemicals that contain the biophore under consideration. MULTICASE predicts the probability of activity as well as the potency of untested chemicals based on the presence of biophores and modulators.
A chemical submitted for testing to C/MC will be assigned a predicted probability of activity and a predicted potency. Any fragment present in a chemical being examined that is not encountered in the learning set is identified by a warning of uncertainty as to its relevance for biological activity. Test chemicals that contain no biophores are presumed inactive.
The C/MC system generates four individual predictions of activity for each test compound: the CASE potency estimate (in CASE units), the CASE-generated probability that the compound is active, the MULTICASE potency estimate (in MUL-TICASE units), and the MULTICASEgenerated probability that the compound is active. The sensitivity, specificity, and concordance for each parameter, as defined below and in Table 2, are then combined using Bayes' theorem (5,6) to yield an overall prediction of activity.

Results
A validation study was performed to determine the predictive capability of the C/MC system when applied to human contact allergens. Chemicals were removed from the learning set (5% each time); the model was then reestablished using this reduced learning set and tested for its ability to predict the activity of the removed chemicals. The results of this n-fold validation study are shown in Table 2. The cutoff value for each parameter was modeled for optimal predictivity by varying the four indices within reasonable ranges (C/MC units between 17 and 45; C/MC probabilities between 0.45 and 0.75). The cutoffs selected (see Table 2) gave the best overall concordance between experimental and predicted results.
Comparison of the predictions with the known activities for each of the selected chemicals yielded the sensitivity, specificity, and concordance for each parameter.
The sensitivity (number of correctly predicted actives divided by the total number of experimentally known actives) ranged from 0.67 for CASE units to 0.81 for MULTICASE units. The specificity (number of correctly predicted inactive chemicals divided by the known inactives) ranged from 0.88 for CASE probability to 0.93 for CASE units. The concordance (number of accurately predicted active and inactive compounds divided by the total number of active and inactive chemicals evaluated) ranged from 0.80 to 0.86.
An overall probability that a chemical is active was obtained by application of Bayes' theorem. The method utilizes the sensitivity and specificity of each of the parameters described above together with a prior estimate of the probability of the chemical being active. For the current study, the prior probability was arbitrarily estimated to be 0.20 based upon the number of chemicals that 'MULTICASE and CASE probabilities are the probabilites thatthe compound is an active human ACD sensitizer. bValues are assigned when the result for a compound falls at or above (+) or below (-) the cutoff values given in Table 2. Volume 105, Number 9, September 1997 * Environmental Health Perspectives have been documented to be human contact allergens. Bayes' theorem was used to update the probability estimate using each of the prediction models. Table 3 lists the Bayesian probabilities for each of the 16 possible patterns of predictions obtained from the validation study. The Bayesian probabilities were then used to dassify the activity of each of the chemicals detected in the atmospheric monitoring experiments or in source emissions. Those chemicals having a probability >0.50 were listed as potentially active contact sensitizers. It should be noted that only 1 of the 238 chemicals examined yielded a Bayesian probability dose to the 0.50 value.
For this reason, a-pinene (probability of activity = 0.48) is induded as a putative positive in Table 1. The probability of activity of each test chemical, multiplied by the predicted potency (estimated by MULTICASE units) and by either the airborne concentration or mass emission rate into the atmosphere, was used to rank the chemicals predicted to be active in order of greatest priority for future testing (see Table 1). A typical prediction generated by C/MC is presented in Figure 1. For p-dichlorobenzene, MULTICASE identifies the chloroarene moiety as a biophore. The biophore is found in 19 compounds in the learning set, 17 of which are contact allergens. Also shown is the electronegativity characteristics of the compound that modulate the activity to increase slightly its potency. MULTI-CASE calculates a probability that this compound is active (0.857) and has a potency of 58 units. It concludes that the compound is active. CASE analysis identifies two further refined chloroarene biophores and a third disubstituted arene fragment. CASE predicts that the compound is active (0.997 probability) with a predicted potency of 43 units. Based on this pattern of predictions (++++), Bayes' probability of ACD activity for pdichlorobenzene is 0.999 (see Table 3). Table 1 provides detailed information for the 21 vapor phase organic compounds present in ambient air or in air pollutant source emissions that are either known to be active or are predicted by C/MC to be contact allergens. Data indude the experimental activities of those chemicals that have been examined in human testing protocols, the MC potency units, the Bayesian probability associated with each chemical, and the range of the 48-hr average ambient concentration during a Los Angeles photochemical smog episode for those compounds that were identified in the ambient air. The highest concentration shown represents the urban air monitoring site that reported the highest average concentration, while the lower value represents the urban monitoring site that reported the lowest average concentration Predictions based on CASE * 93% chance of being active due to substructure (confidence level 100%). C-C-C=CH-CH= * 91% chance of being active due to substructure (confidence level = 100%). CH=CG-CH= * 74% chance of being active due to substructure (confidence level = 100%).
C=CH-CH=C-* The probability that this molecule is a human contact sensitizer is 0.997.
* The compound is predicted to be very active (43 CASE units). [not including the offshore island background monitoring site studied during the experiments of Fraser et al. (4]. The active compounds identified in the ambient air are prioritized (rank) for further testing. Also shown in Table 1 is the air-basin wide mass emission rate in kilograms per day within the South Coast Air Basin that surrounds Los Angeles for each compound listed, followed by a rank ordering of the compounds based on the product of the probability that the chemical is active times its predicted potency and times its air basin-wide mass emissions rate in the Los Angeles area. Of the 238 chemicals measured in the ambient air or in air pollutant source emissions, 7 had been tested experimentally. The C/MC system could not evaluate one of the chemicals (glyoxal) because it is too small to be fragmented by C/MC. However, the system accurately predicted the activity of each of the other 6 chemicals. Benzene, ethylbenzene, 2-heptanone, and carvone were each correctly predicted to be inactive; a-pinene and 1,2-propanediol were predicted to be bility of activity for a-pinene is 0.48, which ene, and 1,1-dichloroethylene. The five active (Table 1). MULTICASE identifies is close to the cutoff of 0.50. Accordingly, compounds ranked highest in priority seven different portions of the a-pinene ring a-pinene is listed among the chemicals con-according to activity-weighted mass emissystem as structural alerts, but notes that the sidered to be active. sion rate are dichloromethane, a-pinene, fragments are present only in a small num-In Table 1, the five compounds ranked 1-hexene, ethanamine, and N,N-dimethyl ber of recognized sensitizers in the learning highest in priority for further testing methanamine. set. For this reason, MC predicts that the according to activity-weighted atmospheric Table 4 lists the compounds that have chemical is active but has marginal/weak concentration are dichloromethane, tetra-been tested experimentally and found to be activity (30 MC units). The Bayesian proba-chloroethylene, 1-hexene, trichloroethylinactive, and compounds that are presumed Table 4. Compounds predicted to be inactive as contact sensitizers Compounds determined experimentally to be Compounds predicted by C/MC to be  Volume 105, Number 9, September 1997 * Environmental Health Perspectives to be inactive. As indicated previously, a-Discussion Structural alerts associated with contact pinene had a probability of activity of 0.48 Contact sensitivity is a delayed-onset allergic sensitizers are electrophilic and nucleand, accordingly, was included as a putative inflammation of the skin. Numerous chemi-ophilic moieties, as well as some aromatic positive chemical.
cals have been reported to cause this disor-fragments (3). der in humans, and many more chemicals Several factors have been shown to have been identified from animal tests.
influence the development of sensitivity.   bWarning that the chemical contains a fragment that C/MC has not seen previously.
The dose of chemical applied to the skin, as well as the frequency of exposure, affect the sensitization rate. A dose-response relationship has been shown in animal and human testing, with high dose suppression noted in some animal systems. Adjuvants have been shown to enhance the sensitization rate. Compounds with adjuvant activity include detergents, fatty substances, aluminum hydroxide gel, and bacterial peptides. Because sensitization is believed to proceed through skin penetration, recruitment, and activation of inflammatory cells, irritants are also believed to enhance sensitization.
The CASE/MULTICASE system was able to evaluate 148 of the 156 chemicals identified in atmospheric samples and 78 of the additional 82 chemicals identified only in source emissions. Chemicals not evaluated because they were too small to be fragmented included formaldehyde, methanol, glyoxal, and chloromethane. C/MC also issued warnings associated with some predictions. In such cases, the system notes that the test chemical contains a fragment that is not found among the chemicals in the learning set and warns that the influence of this fragment on the biologic activity is uncertain. Such chemicals are noted in Table 4.
CASE/MC identified 21 chemicals with high probability ofACD activity. Of these, 2 have already tested positive in humans (i.e., a-pinene and 1,2-propanediol). The only active compound not listed in Table 1 is glyoxal. Glyoxal is known experimentally to be a strong contact allergen, but it is too small to be fragmented and analyzed by C/MC. Two compounds, peroxyacetyl nitrate (PAN) and peroxypropionyl nitrate (PPN), that have an overall probability of activity below 0.50 (see Table 4) are worth mentioning. C/MC predicts these compounds to be inactive since neither contains a previously identified biophore. They both contain fragments (NO2-O-, N02-O-O-) that are unknown to the learning set. Free radical generators, such as the peroxide structure bonded to the nitro-and carbonyl groups contained in PAN and PPN, are considered to be reactive with skin proteins and are thus suspected allergens (10).
In order to prioritize those compounds predicted to be active contact allergens, each was-separately ranked according to probability of activity-and potency-weighted ambient concentration and according to activity-and potency-weighted mass emission rate. The reason for two ranking schemes is that some compounds are identified in source emissions but not in the ambient air samples, and conversely some compounds are found in the ambient air but not in source emissions. Certain compounds are either emitted at a very low rate or undergo degradation due to atmospheric chemical reactions and are absent from the ambient database. Other compounds identified in the ambient air but not in any source emissions, most notably peroxyacetyl nitrate and peroxypropionyl nitrate, are formed in the atmosphere by atmospheric chemical reactions.
As mentioned, the top five compounds ranked according to activity-weighted mass emission rate are dichloromethane, xpinene, 1-hexene, ethanamine, and NNdimethyl methanamine. Dichloromethane, the highest ranked compound, is an excellent solvent that is found in products as diverse as paint strippers, adhesives, and architectural surface coatings. It has a very high probability of activity (0.999) as well as of being highly potent (61 MC units). 1-Hexene also has a very high probability of being active (0.999) and extremely potent (91 MC units). This chemical is emitted directly to the atmosphere from noncatalyst-equipped gasoline-powered motor vehicle exhaust, catalyst-equipped gasolinepowered vehicle exhaust, whole gasoline evaporation, and gasoline headspace vapors. Skin contact can result from exposure to liquid gasoline. Ethanamine and N,N-dimethyl methanamine are two potentially active agents directly emitted from animal waste decomposition. a-Pinene, whose activity was discussed earlier, is thought to originate mainly from biogenic sources. Other prominent groups of compounds that are predicted to be active ACD sensitizers include a variety of chlorinated organic solvents (e.g., tetrachloroethylene, trichloroethylene, and 1,1-dichloroethylene) that are used in industrial cleaning operations, and substituted phenols that are emitted from motor vehicle exhaust and other sources.
In this study, 238 organic compounds present in the urban atmosphere in Southern California and in source emissions were narrowed to 21 compounds that are likely human contact allergens. It is recognized that many of the organic compounds found in polluted urban air are present in very low concentrations and that these concentrations may be too low to cause initial sensitization or subsequent allergic reactions. However, pollutants present at low concentrations in urban air are present at much higher concentrations near their source of emission from the workplace or the home. By examining the polluted urban atmosphere, we are able to detect chemicals that are in widespread use and for which the probability of human contact at high concentrations elsewhere is substantial.