Conditional Toxicity Value (CTV) Predictor: An In Silico Approach for Generating Quantitative Risk Estimates for Chemicals

Background: Human health assessments synthesize human, animal, and mechanistic data to produce toxicity values that are key inputs to risk-based decision making. Traditional assessments are data-, time-, and resource-intensive, and they cannot be developed for most environmental chemicals owing to a lack of appropriate data. Objectives: As recommended by the National Research Council, we propose a solution for predicting toxicity values for data-poor chemicals through development of quantitative structure–activity relationship (QSAR) models. Methods: We used a comprehensive database of chemicals with existing regulatory toxicity values from U.S. federal and state agencies to develop quantitative QSAR models. We compared QSAR-based model predictions to those based on high-throughput screening (HTS) assays. Results: QSAR models for noncancer threshold-based values and cancer slope factors had cross-validation-based Q2 of 0.25–0.45, mean model errors of 0.70–1.11 log10 units, and applicability domains covering >80% of environmental chemicals. Toxicity values predicted from QSAR models developed in this study were more accurate and precise than those based on HTS assays or mean-based predictions. A publicly accessible web interface to make predictions for any chemical of interest is available at http://toxvalue.org. Conclusions: An in silico tool that can predict toxicity values with an uncertainty of an order of magnitude or less can be used to quickly and quantitatively assess risks of environmental chemicals when traditional toxicity data or human health assessments are unavailable. This tool can fill a critical gap in the risk assessment and management of data-poor chemicals. https://doi.org/10.1289/EHP2998


Table of Contents
Illustrative risk characterizations Figure S1. Principal component analysis loadings. The top twenty descriptors in each of the first three principal components are shown, with their percentage contributions. Definitions of each molecular descriptor can be found online at: https://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/qsar/descriptors/molecular/packagesummary.html Figure S2. Example of mechanistic interpretation of QSAR model for the RfD NOAEL, BMD, and BMDL. For each toxicity value, the top panel ranks the molecular descriptors by their frequency of use in the model. The top twenty are denoted by the dashed lines, and are shown separately in the middle panel with the descriptor names. The bottom panel compares the descriptor values for the top two descriptors between the highest and lowest potency toxicity values. RfD = Reference Dose; NOAEL = No observed adverse effect level; BMD = Benchmark dose; BMDL = Benchmark dose lower confidence limit; QSAR = quantitative structure activity relationship. Definitions of each molecular descriptor can be found online at: https://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/qsar/descriptors/molecular/packagesummary.html Figure S3. Example of mechanistic interpretation of QSAR model for the OSF and CPV. For each toxicity value, the top panel ranks the molecular descriptors by their frequency of use in the model. The top twenty are denoted by the dashed lines, and are shown separately in the middle panel with the descriptor names. The bottom panel compares the descriptor values for the top two descriptors between the highest and lowest potency toxicity values. OSF = Oral slope factor; CPV= Cancer potency value; QSAR = quantitative structure activity relationship. Definitions of each molecular descriptor can be found online at: https://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/qsar/descriptors/molecular/packagesummary.html Figure S4. Example of mechanistic interpretation of QSAR model for the RfC and IUR. For each toxicity value, the top panel ranks the molecular descriptors by their frequency of use in the model. The top twenty are denoted by the dashed lines, and are shown separately in the middle panel with the descriptor names. The bottom panel compares the descriptor values for the top two descriptors between the highest and lowest potency toxicity values. RfC = Reference Concentration; IUR = Inhalation unit risk; QSAR = quantitative structure activity relationship. Definitions of each molecular descriptor can be found online at: https://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/qsar/descriptors/molecular/packagesummary.html In each panel, the x-axis is the margin of exposure (MOE=Toxicity Value/Exposure) or Hazard Quotient (HQ=Exposure / Toxicity Value) derived from CTV (left panels) or HTS assays (right panels), which is compared to the MOE or HQ derived using regulatory toxicity values on the y-axis. Comparisons are made for regulatory NOAELs (panels A and B), BMDLs (panels C and D), or RfDs (panels E and F). In all cases, the predictions from CTV are based on cross-validation (panels A, C, and E). Each panel also includes lines indicating equality and 10-fold greater or less than equality (grey solid and dotted lines), nominal risk characterization thresholds (MOE = 100; HQ = 1), the number of compounds n, and the adjusted R 2 based on a linear model of log-transformed toxicity values. RfD = Reference Dose; NOAEL = No observed adverse effect level; BMDL = Benchmark dose lower confidence limit; OED05 = High throughput screening-based oral equivalent dose lower 5% confidence limit.

Illustrative risk characterizations
In order to explore the possible risk assessment implications of using the Conditional Toxicity Value (CTV) predictor as compared to other methods, we calculated illustrative risk characterization values using (1) CTV predictions, (2) high throughput screening (HTS) assay-based oral equivalent dose (OED) estimates from Wetmore (2015), and (3) the "gold standard" regulatory NOAEL, BMDL, or RfD values. Risk characterization values require exposure estimates, so for illustration, we used the upper 95% exposure estimate from ExpoCast as the exposure value (Sipes et al. 2017;Wambaugh et al. 2013). We then calculated margins of exposure (MOEs) between that level of exposure and the NOAEL or BMDL (for CTV and "gold standard" regulatory values) and between exposure and the 5 th percentile OED 05 (for HTS). We also calculated hazard quotients (HQs) as the ratio between exposure and the RfD for CTV and "gold standard" regulatory values.
For HQs based on HTS assay-based results, we used a nominal "uncertainty factor" of 1000 for illustration, so that the HTS-based "RfD" = OED 05 /1000. This value is based on the idea of (Crump et al. 2010) that RfDs based on in vitro studies could be derived by applying an additional uncertainty factor for in vitro-to-in vivo extrapolation. We then evaluated the degree to which CTVand HTS-based risk characterizations replicated the risk values calculated using the "gold standard" regulatory toxicity values. This evaluation was related both to the consistency with "gold standard" regulatory values, as well as whether they gave different "decision" outcomes based on whether they satisfied the criteria of MOE > 100 or HQ < 1.
The results of these risk characterization illustrations are shown in Supplemental Figure S5.
In all cases, as with the original toxicity values described in the main text, the CTV predictions for NOAELs (n=36) and BMDLs (n=14) resulted in MOEs that were more accurate and more precise (smaller absolute deviations and larger R 2 ) than MOEs based on HTS assays and IVIVE, when compared to "gold standard" POD-derived MOEs. Risk characterizations using the RfD involve calculating a hazard quotient (HQ) instead of a MOE, and were available for more compounds (n=51), with similar results. Interestingly, for none of the compounds did the risk characterization using the "gold standard" regulatory toxicity values indicate a concern, defined by MOE < 100 or HQ > 1. These results were also the case for the risk characterizations based on CTV-derived toxicity values. On the other hand, HTS-based risk characterizations flagged some compounds as having a risk concern, suggesting that such risk characterizations may be more "conservative." Overall, when compared to the "gold standard" of using regulatory toxicity values, CTV gives more precise and more accurate risk characterizations than those derived from HTS assays and IVIVE. HTS-based risk characterizations tended to be more "conservative," in that some compounds were flagged as having a potential risk whereas both the "gold standard"-or "CTV"derived risk characterizations indicated acceptable MOEs or HQs. However, these results should be considered illustrative, given the additional assumptions and uncertainties involved in these calculations (e.g., exposure values, minimum MOE, uncertainty factor for HTS-based RfDs) as compared to the direct comparison of predicted toxicity values described in the main text. Definitions of each molecular descriptor can be found online at: https://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/qsar/descriptors/molecular/packagesummary.html  to the MOE or HQ derived using regulatory toxicity values on the y-axis. Comparisons are made for regulatory NOAELs (panels A and B), BMDLs (panels C and D), or RfDs (panels E and F). In all cases, the predictions from CTV are based on cross-validation (panels A, C, and E). Each panel also includes lines indicating equality and 10-fold greater or less than equality (grey solid and dotted lines), nominal risk characterization thresholds (MOE = 100; HQ = 1), the number of compounds n, and the adjusted R 2 based on a linear model of log-transformed toxicity values. RfD = Reference Dose; NOAEL = No observed adverse effect level; BMDL = Benchmark dose lower confidence limit; OED05 = High throughput screening-based oral equivalent dose lower 5% confidence limit.