Research August 2012 | Volume 120 | Issue 8
A Three-Stage Algorithm to Make Toxicologically Relevant Activity Calls from Quantitative High Throughput Screening Data
Keith R. Shockley
Background: The ability of a substance to induce a toxicological response is better understood by analyzing the response profile over a broad range of concentrations than at a single concentration. In vitro quantitative high throughput screening (qHTS) assays are multiple-concentration experiments with an important role in the National Toxicology Program’s (NTP) efforts to advance toxicology from a predominantly observational science at the level of disease-specific models to a more predictive science based on broad inclusion of biological observations.
Objective: We developed a systematic approach to classify substances from large-scale concentration–response data into statistically supported, toxicologically relevant activity categories.
Methods: The first stage of the approach finds active substances with robust concentration–response profiles within the tested concentration range. The second stage finds substances with activity at the lowest tested concentration not captured in the first stage. The third and final stage separates statistically significant (but not robustly statistically significant) profiles from responses that lack statistically compelling support (i.e., “inactives”). The performance of the proposed algorithm was evaluated with simulated qHTS data sets.
Results: The proposed approach performed well for 14-point-concentration–response curves with typical levels of residual error (σ ≤ 25%) or when maximal response (|RMAX|) was > 25% of the positive control response. The approach also worked well in most cases for smaller sample sizes when |RMAX| ≥ 50%, even with as few as four data points.
Conclusions: The three-stage classification algorithm performed better than one-stage classification approaches based on overall F-tests, t-tests, or linear regression.
Citation: Shockley KR. 2012. A Three-Stage Algorithm to Make Toxicologically Relevant Activity Calls from Quantitative High Throughput Screening Data. Environ Health Perspect 120:1107–1115; http://dx.doi.org/10.1289/ehp.1104688
Address correspondence to K.R. Shockley, Biostatistics Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709 USA. Telephone: (919) 541-3033. Fax: (919) 541-4311. E-mail: email@example.com
I thank R. Tice (Biomolecular Screening Branch, National Institute of Environmental Health Sciences (NIEHS) and G. Kissling and S. Peddada (Biostatistics Branch, NIEHS) for reviewing the manuscript and providing helpful suggestions. I thank S. Harris (SRA International, Inc.) for help with programming the curve fitting procedures. Finally, I thank F. Parham (Biomolecular Screening Branch, NIEHS) for assistance with analyzing the androgen receptor agonist data.
This research was supported in part by the Intramural Research Program of the National Institutes of Health, NIEHS.
The author declares he has no actual or potential competing financial interests.
Received: 2 November 2011
Accepted: 10 May 2012
Advance Publication: 10 May 2012
Final Publication: 1 December 2012
- Supplemental Material (971 KB) PDF
The goals of the Tox21 collaboration are to prioritize chemicals for in vivo testing, identify mechanisms of toxicity, and predict adverse responses to environmental chemicals in humans (Collins et al. 2008; Shukla et al. 2010). Low throughput animal and tissue models are yielding ground to high throughput screening (HTS) methods that enable the simultaneous assessment of large numbers of compounds. For applications of HTS in traditional drug discovery, assays are usually conducted at a single test concentration (e.g., 10 μM) to find compounds with strong pharmacological activity while reducing the risk for false positives. This strategy is not as relevant for toxicological research and toxicity testing, which also seeks to find compounds with weak activity while reducing the risk for false negatives. However, quantitative high throughput screening (qHTS) provides an opportunity to meet Tox21 objectives, holding the potential for wide chemical coverage and reduced cost of testing on a per-substance basis. Moreover, the ability of a substance to induce a toxicological response is better understood by analyzing the response profile over a broad concentration range than by evaluating effects at one or a few concentrations.
The Tox21 collaboration began formally in 2008 with Phase I (Proof of Concept) consisting of qHTS studies conducted at the National Institutes of Health Chemical Genomics Center (NCGC) in 1,536-well–format and mid-throughput studies conducted in support of the U.S. Environmental Protection Agency’s (EPA) ToxCast™ program. In conjunction with Tox21 Phase I, the NTP and U.S. EPA have produced an extensive set of concentration–response data on some 2,800 substances screened at the NCGC in > 70 qHTS assays and on 320 substances tested across > 500 in vitro and lower organism in vivo assays by various contract and government laboratories. In Tox21 Phase II, qHTS data will soon be produced for a library containing approximately 10,000 compounds. Analyses of Phase I data indicate reproducible levels of compound behavior that match previously known toxicological responses (Huang et al. 2008). These experiments are typically analyzed using a heuristics-based curve classification algorithm that does not use uncertainty in model fits to make activity calls (Inglese et al. 2006). However, classification of chemical activity has also been based on clustering by pattern dissimilarity (Zhang et al. 2009), a heuristics approach incorporating curve fit p-values (Huang et al. 2011), testing for significance of response using mathematical models (Parham et al. 2009), or a preliminary test estimation (PTE) procedure robust to variance structure (S. Peddada, personal communication).
Because of the potential for complex concentration–response behavior, toxicological evaluation has traditionally been based on manual scrutiny of concentration–response (or dose–response) data. But the large data volume surrounding qHTS renders manual inspection of individual profiles restrictively laborious, subjective, and prone to human error. Indeed, the human eye cannot consistently discriminate calls based on small (but statistically relevant) trends or differences, and conventional curve fit diagnostics are not feasible when considering the large number of compounds used within qHTS studies. Heuristics approaches to screen qHTS data sets may identify candidates with positive activity, but such methods are not based on the principles of statistical hypothesis testing. On the other hand, statistical assessments based on fits to a nonlinear function may not capture important responses occurring outside of the specified model framework. For instance, a maximal response at the lowest tested concentration will not be adequately explained by fitting the conventional Hill equation (Hill 1910). Given these considerations, there is currently no suitable approach for making statistically rigorous activity calls in an automated manner for the massive amount of data emerging from large-scale toxicity testing within the NTP and Tox21 qHTS efforts. In addition, the operating characteristics of the limited number of activity call algorithms published to date have not yet been systematically explored in the published literature.
To meet this need, we propose a three-stage framework based on formal statistical testing of toxicologically relevant hypotheses. Although much of the data generated to date has been produced from unreplicated designs, this approach can accommodate various levels of replication and provides a consistent platform for making activity calls. In the first stage of the algorithm, compounds with a robust concentration–response relationship are identified by comparing the best fit to a nonlinear model with a horizontal line (no concentration–response) and classified as “active.” Compounds not detected as “active” in the first stage are tested for activity at the lowest tested concentration in the second stage. Finally, compounds with a statistically less robust concentration–response are classified as “inconclusive” and distinguished from “inactive” calls in the third stage. Receiver operating characteristic (ROC) curves of simulated qHTS data are used to assess the overall ability of the algorithm to detect active compounds under toxicologically relevant conditions produced in simulated data sets.
Development of the algorithm. Our approach assumes that the toxicological importance of a response profile generated in qHTS applications should be determined by a robust framework to impartially classify tested substances and limit the return of false negatives. A set of simple decision rules are used to make consistent activity calls from the wealth of complex response patterns resulting from high throughput chemical profiling. These decision rules are formalized with statistical procedures and automated through a systematic computational workflow. Substances classified as “actives” have response values exceeding the assay detection limit (see below) and may fall into one of two different categories: a) compounds with concentration–response curves within the tested concentration range, and b) compounds eliciting maximal responses at the lowest tested concentration. The first category of substances can be subdivided into two subgroups: a) compounds producing statistically and toxicologically robust concentration–response trends supported by multiple data points in different regions of each profile (see “Stage 1: Test for robust concentration–response,” below), and b) compounds that may fit sigmoidal curves better than flat lines in a mathematical or statistical sense but are comparatively nonrobust by toxicological standards (e.g., curves with only one data point exceeding the detection limit). Substances underlying such nonrobust concentration–response profiles are labeled “inconclusive.” “Inconclusive” calls may arise due to low levels of replication, variability in assay performance, or confounding of factors in nonrandomized designs (e.g., experimental drift of scanning machines).
A three-stage algorithm (Figure 1) is proposed to classify each substance in a tested chemical library as “active,” “inactive,” or “inconclusive” (Table 1). There are two types of actives: a) ACTIVE*[±1] substances describe robust concentration–response curves and b) ACTIVE*[±2] agents are putatively active ≤ the lowest tested concentration. Less robust responses are assigned INCONCLUSIVE*[±3] and substances with no discernable activity within the tested concentration range are classified as INACTIVE*. Numbers inside brackets refer to the stage where the call was made (i.e., STAGE 1, STAGE 2, or STAGE 3 in Figure 1). The “+” or “–” sign inside each bracket corresponds to the direction of the response. Accordingly, ACTIVE* and INCONCLUSIVE* describe response curves in which the response signal tends to increase with increasing concentration (activators), whereas ACTIVE*[–1] and INCONCLUSIVE*[–3] describe response curves in which the response signal tends to decrease with increasing concentration (inhibitors). Calls labeled ACTIVE* (or ACTIVE*[–2]) refer to substances with mean responses significantly greater (or lower) than the detection limit of the assay.
Detection limits define a response range in which the normalized signal can be reliably measured within a given experiment, and are usually set to 3 SD above or below the normalized signal intensities observed in negative control plates. A detection limit of 25–30% of the positive control is typical within Tox21 efforts. The positive detection limit for activator assays is found by adding the assay noise level to the control response; the negative detection limit for inhibitor assays is calculated by subtracting the assay noise level from the control response.
Example concentration–response profiles and their activity calls from qHTS data generated with the NTP compound library used in Tox21 Phase I are shown in Figure 2. More extensive data can be found in NTP’s Chemical Effects in Biological Systems database (Waters et al. 2008).
The following form of the Hill equation model is used here:
where Ri is the response value for concentration i, R0 is the baseline activity (lower asymptote of the sigmoidal curve for activators, upper asymptote of the sigmoidal curve for inhibitors), RMAX is the maximal activity (upper asymptote for activators, lower asymptote for inhibitors), and |RMAX – R0| defines the maximal response compared to baseline activity for activators and inhibitors. In Equation 1, Ci refers to the test concentration of each response, AC50 is the concentration yielding 50% of the maximal response produced by the tested compound (i.e., EC50 for activators, IC50 for inhibitors), SLOPE determines the shape of the curve and error is the residual error of the model. [For technical details describing curve fitting, see Supplemental Material, p. 3 (http://dx.doi.org/10.1289/ehp.1104688)].
Stage 1: Test for robust concentration–response. The objective of the first stage is to find chemicals with a robust dose–response relationship within the tested concentration range. To satisfy the robust criterion, response profiles should exhibit statistical significance in both unweighted nonlinear least squares (NLS) and weighted nonlinear least squares (WNLS) regression approaches [see Supplemental Material, p. 3 (http://dx.doi.org/10.1289/ehp.1104688)]. The NLS approach weights all data points equally and, consequently, may not discriminate adequately between a profile with data along two clearly defined asymptotes and a less robust profile in which one asymptote is supported by a single point (Figure 2). In contrast, the WNLS criterion weights each response point i from n concentrations (Ri, i = 1, …, n), so that more influence is given to neighboring data points with similar response levels than neighboring data points with very different responses. An active compound from Stage 1 (i.e., ACTIVE*[±1]) will
- Have a maximum response greater than positive detection limit (for activators) or a minimum response less than the negative detection limit (for inhibitors).
- Fit the four-parameter Hill model better than a horizontal line using both NLS and WNLS regression at a selected significance level.
- Show increasing response (RMAX > R0 for ACTIVE* calls) or decreasing response (RMAX < R0 for ACTIVE*[–1] calls) with increasing concentration.
Stage 2: Test for activity at lowest tested concentration. In the second stage, compounds not detected as active in the first stage are evaluated by comparing the distribution of measured responses to the detection limit of the assay. Compounds with activity at the lowest tested concentration are considered to be relatively potent since their AC50 values are presumably less than (or equal to) the lowest tested concentration. To find these “potent responders,” tests for mean response greater than the detection limit of the assay are performed using weighted t-tests with the same weighting scheme used for WNLS [see Supplemental Material, p. 4 (http://dx.doi.org/10.1289/ehp.1104688)]. The weighted t-test ensures that outlier responses are not given unwarranted weight in statistical assessment. A compound active in this stage will have activity at the lowest tested concentration that is greater than the positive assay detection limit (ACTIVE*) or decreased activity at the lowest tested concentration that is lower than the negative assay detection limit (ACTIVE*[–2]).
Stage 3: Test for nonrobust concentration–response. In the third stage, compounds not detected as active in the first or second stage are evaluated. Compounds with significant fits to the Hill model using either NLS or WNLS, but not both, are classified as INCONCLUSIVE*[±3]. For example, profiles found in Stage 3 may be incomplete curves in which the curve fit is driven by a single data point at a high test concentration (Figure 2). An inconclusive compound will
- Have a maximum response greater than positive detection limit (for activators) or a minimum response less than the negative detection limit (for inhibitors).
- Fit the four-parameter Hill model better than a horizontal line (no response) in either the unweighted (NLS) or weighted (WNLS) curve fit.
- Show increasing response (RMAX > R0 for INCONCLUSIVE*) or decreasing response (RMAX < R0 for INCONCLUSIVE*[–3]) with increasing concentration.
Compounds that are not classified as active or inconclusive in Stage 1, Stage 2, or Stage 3 are putatively assigned inactive.
Simulation study. Activators (RMAX > R0) and inhibitors (RMAX < R0) simulated from Equation 1 will produce fully symmetric profiles and yield identical performance metrics for equivalent R0, |RMAX – R0|, AC50 and SLOPE. Therefore, simulations were conducted only for activators. A total of 10,000 hypothetical substances were simulated for each concentration–response data set with R0 = 0%, where each data set included 2,000 simulated actives (|RMAX| ≥ 25% of positive control activity) and 8,000 simulated inactives (RMAX = 0%). Three different simulation settings were explored (see Table 2). Case 1 explored the performance of the algorithm under different error structures. Case 2 assessed the effect of the SLOPE parameter when residual error was set to a value typical of qHTS data [σ = 25%; see Supplemental Material, Table S1 (http://dx.doi.org/10.1289/ehp.1104688)]. Case 3 examined the performance of the algorithm when various numbers of data points (1, 3, 5, 7, or 10) were removed to produce a smaller effective sample size n. The R package “drc” (Ritz and Streibig 2005) was used to fit all concentration–response models, and outlier detection followed a two-step algorithm that was combined with curve fitting (Wang et al. 2010). (For more information about the simulation see Supplemental Material, pp. 4–5).
Type I error rates were estimated for null hypothesis cases (RMAX = 0%) by evaluating the empirical proportions of trials in which the algorithm assigned a simulated null hypothesis as active (ACTIVE*[±1] or ACTIVE*[±2]). For computational purposes, INCONCLUSIVE*[±3] calls were treated as inactive. Sensitivities were estimated by evaluating the empirical proportions of true active cases (|RMAX| ≥ 25%) assigned as active (ACTIVE*[±1] or ACTIVE*[±2]). In all cases, the significance level (α) for statistical testing was set to 0.05.
The area under receiver operating characteristic (ROC) curves was used as the primary statistic to assess performance. ROC graphs describe the relationship between sensitivity (true positive rate or power) and 1-specificity (false positive rate or type I error rate) of a classification method and are not influenced by skewed class distribution or unequal classification error costs (Fawcett 2006). The area under the curve (AUC) of each ROC graph was calculated using the trapz() function in the R package “caTools” (Tuszynski 2009). AUC ranges from 0.0 to 1.0 and provides a probability describing how well the algorithm can correctly classify true actives and true inactives based on the known parameter values used to simulate the data. Random performance is indicated by AUC = 0.5. Here, AUC = 0.75 is chosen to indicate good performance, whereas AUC = 0.9 indicates excellent performance.
Analysis of androgen receptor agonist assay data. Chemical genomics profiling data from a previously published androgen agonist assay was obtained for the 1,408 compounds in the NTP Tox21 compound collection (Huang et al. 2011). Compounds in that study were dissolved in dimethyl sulfoxide and exposed to 14 concentrations ranging from 4.90 × 10–4 μM to 76.63 μM. For p < 0.05, the three-stage algorithm proposed here classified 82 compounds as active (26 activators and 58 inhibitors), 100 compounds as inconclusive (55 activators and 44 inhibitors), and the remaining 1,225 compounds as inactive. These calls were compared to activity calls generated by a curve class procedure (Huang et al. 2011) and the Parham method (Parham et al. 2009) and results obtained from single-stage tests, including F-tests based on NLS or WNLS curve fits, robust linear regression, Student’s t-tests and weighted t-tests (Table 3). There was substantial overlap and notable differences between these outcomes even though all approaches used the same statistical significance threshold (p < 0.05). Of the 26 activator hits identified by the three-stage algorithm, the curve class method placed 15 in curve class 1 (full sigmoidal response profiles), 8 in curve class 2 (partial response profiles with one asymptote), and 1 compound each into curve classes 3 (single point activity), 4 (inactive) and 5 (undefined). The Parham method shared 19 of the 26 three-stage actives, with 2 inconclusive activators, 1 inconclusive inhibitor, and 4 inactives. The NLS and WLS methods each contained the same 26 compounds in common with the three-stage approach, but the robust linear regression approach had only 21 of the 26 actives in common. Calls based on Student’s t-test and the weighted t-test shared 11 and 7 compounds, respectively, in common with the three-stage approach. The full comparison between approaches is presented in Table 3.
Overview of simulation studies. The performance of the algorithm was investigated for all 171 simulated qHTS data sets by examining combinations of AC50 (three levels), |RMAX| (three levels), and R0 (one level) for three different cases (Table 2). Case 1 varied the error structure, Case 2 varied the Hill slope, and Case 3 varied the number of available data points. An AUC corresponding to each ROC curve was calculated for each parameter configuration, except when σ = 5% (9 data sets) since no false positives were returned under this condition. Resulting AUCs from the remaining 162 data sets were compared with the proposed algorithm versus overall F-tests comparing the fit to the Hill model and a straight line (NLS or WNLS), t-tests (Student’s t-tests or weighted t-tests) and robust linear regression as shown in Supplemental Material, Figure S1 (http://dx.doi.org/10.1289/ehp.1104688). In general, performance was not good (AUC ≤ 0.75) when |RMAX| = 25%, but the proposed algorithm showed similar or improved performance compared to overall F-tests in almost every scenario, and performed considerably better than overall F-tests for AC50 = 0.001 μM. The proposed method usually outperformed t-tests when AC50 = 10 μM (fewer data points with detectable responses), but did not perform as well as t-tests in some instances when AC50 < 10 μM (increased number of detectable responses). The proposed approach outperformed robust linear regression in almost every scenario. Compared to the proposed method, t-tests generally had smaller type I error rates (see Supplemental Material, Figure S2), but t-tests also had noticeably reduced power when AC50 > 0.001 μM (see Supplemental Material, Figure S3).
Case 1: 14-point-concentration–response curves. A total of 54 simulated qHTS data sets were used to evaluate the proposed algorithm for nine configurations involving changes in AC50 (three levels) and |RMAX| (three levels) for R0 = 0 and SLOPE = 1 under six different residual error structures (Table 2). Residual errors were modeled as ε ~ N(0, σi2) for σi = (5%, 10%, 25%, 50%, 100%, and f(Ci)), where σi is expressed as percent of positive control activity at concentration i and f(Ci) = 9.7355 + 0.1146 × Ci. The function f(Ci) is based on the best fit line between σi and concentration derived from qHTS data generated from human nuclear receptor agonist-mode assays [Huang et al. 2011; see also Supplemental Material, Table S1 (http://dx.doi.org/10.1289/ehp.1104688)]. Table 4 summarizes the operating characteristics of the proposed approach for Case 1. Type I (false positive) error rates do not exceed 0.05 for true inactives when σi = (5%, 10%, 25%, f(Ci)), remained close to 0.05 for σi = 50%, and consistently exceeded 0.05 when σi = 100% (see also Supplemental Material, Figure S2). Notably, type I error rates increase with increasing residual error, with no false positives at σi = 5%. For known actives, the proposed approach exhibits greater power with increasing |RMAX|. The power decreases with increasing residual error and is almost always above 80% when |RMAX| = 100% in constant error (σi = 25%) and heteroscedastic error (σi = f(Ci)) scenarios (see Supplemental Material, Figure S3). As shown in Table 4, the proposed algorithm performed well (AUC ≥ 0.75) for scenarios with typical levels of residual error (σ ≤ 25% in most cases in Supplemental Material, Table S1), and with even better performance (AUC ≥ 0.9) for |RMAX| > 25%. Table 4 also illustrates that an increasing proportion of activity calls are ACTIVE* (rather than ACTIVE*) with increasing residual error. Figure 3 summarizes the performance of the proposed approach using contour plots and indicates that AUC > 0.75 for all levels of AC50 within the tested range when |RMAX| > 25%. Performance diminished with increasing residual error, and for σ = 100% the approach was only better than random prediction for large maximal responses (|RMAX| > 75%) and lower potencies (AC50 < 1 μM).
Case 2: Evaluating the SLOPE parameter. Combinations of AC50 (three levels), |RMAX| (three levels), and SLOPE (seven levels) were used to investigate the performance of 63 parameter configurations for a range of SLOPE parameter settings (Table 2). As shown in Figure 4, performance was similar for most SLOPE settings, where SLOPE varied from SLOPE = 10–4 to SLOPE = 100 and |RMAX| took one of three values (25%, 50%, 100%). At |RMAX| = 25%, the proposed algorithm performed poorly for every parameter configuration, while at |RMAX| = 100%, the proposed approach performed similarly well for every SLOPE parameter value examined. When |RMAX| = 50%, most parameter configurations yielded similar performance, except when SLOPE ≤ 0.5.
Case 3: Effects of sample size. A defined number of data points (0, 1, 3, 5, 7, or 10) were randomly removed within each substance in each 14-point data curve in order to evaluate the operating characteristics of the proposed algorithm for different sample sizes n (Tables 2 and 5). While type I error rates generally increased with n, they were < 0.03 in every case examined here. Power to detect actives increased with increasing n. The performance of the proposed approach was good in most cases (AUC ≥ 0.75) with |RMAX| set to 50% or 100% and AC50 set to 0.001 μM or 0.1 μM, even with as few as four data points. However, the power was greatly reduced when |RMAX| = 25%. As shown in Table 5, an increasing proportion of activity calls are ACTIVE* (rather than ACTIVE*) with increasing sample size.
Assessment of health risks posed by an environmental chemical generally proceeds through costly and time intensive studies such as the 2-year rodent bioassay. These in vivo assays can take several years to complete and cost millions of dollars. Yet, an estimated 30,000 unique chemicals are in wide commercial use (Judson et al. 2008; Muir and Howard 2006) and most of these substances have not been tested for adverse effects on humans or the environment. Accordingly, there is a need to prioritize chemicals for standard toxicity testing and to find alternative strategies to evaluate the large inventory of potentially harmful substances (Judson et al. 2010). qHTS holds potential to meet these objectives by augmenting the low throughput animal and tissue testing models with approaches that simultaneously assess large numbers of compounds over a wide chemical space with reduced cost per substance.
Chemical prioritization efforts and structure activity prediction modeling often utilize activity calls as input (e.g., Johnson et al. 2009; Martin et al. 2011; Reif et al. 2010) and, consequently, depend on consistent and reliable methods for making activity calls from the underlying data. However, the incomplete concentration–response profiles frequently observed in qHTS data render nonlinear statistical modeling and parameter testing challenging. It is not possible to determine whether response variances are homoscedastic (constant) or heteroscedastic (not constant) in unreplicated data sets, and few degrees of freedom may be available for statistical testing after curve fitting and outlier detection. Furthermore, traditional methods to assess nonlinear regression model fits depend on graphical diagnostics, but visual inspection of residual plots is not feasible in the qHTS analysis context that can involve thousands of compounds and hundreds of assays. An approach to activity call evaluation was developed here in response to these concerns.
The proposed three-stage activity call algorithm accommodates large volumes of qHTS data and does not require replicate assessments. Actives and inconclusives must produce a response that exceeds the assay detection limit and meet a prespecified statistical significance threshold. However, while the p-values obtained from statistical testing are based on uncertainty in model fits, in practice the true errors are not known. Calculated error estimates may be too low (producing false positives) or too large (producing false negatives) in some instances. In this study, data were simulated under a variety of scenarios (Table 2) to quantify algorithm performance over a broad range of possible profiles. Nevertheless, when dealing with large chemical libraries it may be useful to employ moderated test statistics like those developed for DNA microarray analyses to stabilize variance components [e.g., Cui et al. (2005); Smyth (2004)].
Similar to a PTE approach (S. Peddada, personal communication), the method described here performs well under conditions of homoscedasticity and heteroscedasticity (Table 4). The algorithm also performs well for moderate-to-high response levels across a broad range of parameter space (Figure 3) and with as few as four data points when |RMAX| ≥ 50% and AC50 ≤ 0.1 μM (Table 5). The method can identify substances with full concentration–response curves as well as compounds inducing activity below the lowest tested concentration. The procedure effectively distinguishes substances with robust concentration–response profiles (ACTIVE*[±1] and ACTIVE*[±2]) and compounds with nonrobust concentration–response profiles (INCONCLUSIVE*[±3]). Even so, inconclusive calls may correspond to real activity and can be considered active when there is increased concern to minimize false negatives (e.g., toxicity studies). Compounds without sufficient evidence for activity within the tested concentration range are placed into a final category (INACTIVE*).
The effects of sample size (n) are summarized in Table 5. A small n may result from study designs with < 14 data points, data discarded due to experimental failure, or outlier removal during curve fitting. Type I error rates were < 0.03 in every case examined here (σ = 25%), whereas power and performance varied across parameter configurations. The performance of the classification algorithm was good (AUC > 0.9 in most cases) for almost all examined sample sizes (n = 4, 7, 9, 11, 13, 14) with |RMAX| = 100% and AC50 set to 0.001 μM or 0.1 μM. The algorithm performed well (AUC ≥ 0.75) under almost all scenarios in which the |RMAX| ≥ 25%, the modeled detection limit of the qHTS assay.
The three-stage algorithm can be implemented in two steps in the freely available statistical software R (R Development Core Team, Vienna, Austria). Step 1 generates NLS and WNLS curve fits. Step 2 generates activity calls and other summary statistics from the output of Step 1. A computer with an Intel® Xeon® E5430 processor (2.66 GHz) and 2.92 GB of RAM was used with the Microsoft Windows® XP Professional Service Pack 3 operating system to obtain execution times for 1, 10, 100, and 1,000 chemicals. Due to possible memory constraints, it is recommended to use Linux machines when analyzing more than a few thousand chemicals at a time. For Step 1, the run times were approximately (in seconds) 1.5, 6.2, 53.1, and 520.4, respectively, for NLS curve fits and 2.0, 8.2, 53.8, and 532.3, respectively, for WNLS curve fits. For Step 2, the run times to generate activity calls were approximately (in seconds) 0.1, 0.2, 1.9, and 18.5, respectively. The R code for the three-stage algorithm and all simulated data are available upon request.
An automated approach was developed to reliably classify concentration–response data into toxicologically relevant categories: actives (ACTIVE*[±1] or ACTIVE*[±2]), inconclusives (INCONCLUSIVE*[±3]), and inactives (INACTIVE*). The algorithm strategically uses both unweighted and weighted statistical testing in a multiple-decision framework. Active substances are subdivided in two types: ACTIVE*[±1] compounds exhibit concentration–response curves within the tested concentration range, whereas ACTIVE*[±2] substances have already achieved maximal response (or nearly maximal response) at the lowest tested concentration. The approach performed better than single-stage testing approaches and provides insight into nonlinear modeling in high-throughput toxicology.
Huang R, Southall N, Cho M-H, Xia M, Inglese J, Austin CP. 2008. Characterization of diversity in toxicity mechanism using in vitro cytotoxicity assays in quantitative high throughput screening. Chem Res Toxicol 21:659–667.
Inglese J, Auld DS, Jadhav A, Johnson RL, Simeonov A, Yasgar A, et al. 2006. Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. Proc Natl Acad Sci USA 103:11473–11478.
Johnson RL, Huang R, Jadhav A, Southall N, Wichterman J, MacArthur R, et al. 2009. A quantitative high-throughput screen for modulators of IL-6 signaling: a model for interrogating biological networks using chemical libraries. Mol Biosyst 5:1039–1050.
Judson RS, Houck KA, Kavlock RJ, Knudsen TB, Martin MT, Mortensen HM, et al. 2010. In vitro screening of environmental chemicals for targeted testing prioritization—the ToxCast project. Environ Health Perspect 118:485–492.
Shukla SJ, Huang R, Austin CP, Xia M. 2010. Foundation review: The future of toxicity testing: a focus on in vitro methods using a quantitative high-throughput screening platform. Drug Discov Today 15:997–1007.
Tuszynski J. 2009. caTools: Tools: moving window statistics, GOF, Base64, ROC AUC, etc. R package version 1.10. Available: http://CRAN.R-project.org/package=caTools [accessed 28 May 2010].
Waters M, Stasiewicz S, Merrick BA, Tomer K, Bushel P, Paules R, et al. 2008. CEBS—Chemical effects in biological systems: a public data repository integrating study design and toxicity data with microarray and proteomics data. Nucleic Acids Res 36:D892–D900.
EHP is pleased to announce that “The Prevalence of Antibiotic-Resistant Staphylococcus aureus Nasal Carriage among Industrial Hog Operation Workers, Community Residents, and Children Living in Their Households: North Carolina, USA,” published in EHP in the April 2017 issue, has been selected by the Children’s Environmental Health Network (CEHN) as its October 2017 Article of the Month. (more…)
EHP is pleased to announce that “A Difference-in-Differences Approach to Assess the Effect of a Heat Action Plan on Heat-Related Mortality, and Differences in Effectiveness According to Sex, Age, and Socioeconomic Status (Montreal, Quebec)” has been awarded Honorable Mention as one of the Best Environmental Epidemiology Papers published in 2016. (more…)