Cytotoxicity Burst? Differentiating Specific from Nonspecific Effects in Tox21 in Vitro Reporter Gene Assays

Background: High-throughput screening of chemicals with in vitro reporter gene assays in Tox21 has produced a large database on cytotoxicity and specific modes of action. However, the validity of some of the reported activities is questionable due to the “cytotoxicity burst,” which refers to the supposition that many stress responses are activated in a nonspecific way at concentrations close to cell death. Objectives: We propose a pragmatic method to identify whether reporter gene activation is specific or cytotoxicity-triggered by comparing the measured effects with baseline toxicity. Methods: Baseline toxicity, also termed narcosis, is the minimal toxicity any chemical causes. Quantitative structure–activity relationships (QSARs) developed for baseline toxicity in mammalian reporter gene cell lines served as anchors to define the chemical-specific threshold for the cytotoxicity burst and to evaluate the degree of specificity of the reporter gene activation. Measured 10% effect concentrations were related to measured or QSAR-predicted 10% cytotoxicity concentrations yielding specificity ratios (SR). We applied this approach to our own experimental data and to ∼8,000 chemicals that were tested in six of the high-throughput Tox21 reporter gene assays. Results: Confirmed baseline toxicants activated reporter gene activity around cytotoxic concentrations triggered by the cytotoxicity burst. In six Tox21 assays, 37%–87% of the active hits were presumably caused by the cytotoxicity burst (SR<1) and only 2%–14% were specific with SR≥10 against experimental cytotoxicity but 75%–97% were specific against baseline toxicity. This difference was caused by a large fraction of chemicals showing excess cytotoxicity. Conclusions: The specificity analysis for measured in vitro effects identified whether a cytotoxicity burst had likely occurred. The SR-analysis not only prevented false positives, but it may also serve as measure for relative effect potency and can be used for quantitative in vitro–in vivo extrapolation and risk assessment of chemicals. https://doi.org/10.1289/EHP6664


Introduction
The increasing abundance, number, and heterogeneity of anthropogenic chemicals in our environment call for high-throughput effect screening of chemicals while complying with the strategy to reduce, refine, and replace animal testing (Burden et al. 2015;National Research Council 2007). The application of in vitro cellbased bioassays has emerged in the last decade, and their implementation in miniaturized test systems increased the throughput of such bioassays, culminating in collaborative high-throughput screening (HTS) approaches like the Tox21 program (Tice et al. 2013), in which thousands of chemicals were tested in a battery of ∼ 70 in vitro bioassays (Attene-Ramos et al. 2013) or ToxCast™, where a smaller number of chemicals were tested in hundreds of cellular bioassays at receptor and pathway level Judson et al. 2010) but also in cell-free assays (Sipes et al. 2013). The resulting large database is freely available on the U.S. EPA's Chemistry Dashboard ; U.S. EPA 2019) and has been linked to in vivo databases (Hu et al. 2015), applied for hazard assessment (Pham et al. 2016;Reif et al. 2010) and quantitative in vitro-in vivo extrapolation (QIVIVE) for risk assessment (Sipes et al. 2017; Thomas et al. 2019;Wetmore 2015).
In vitro assays that apply cells stably transfected with reporter genes coupled to the biological receptor of interest are particularly promising tools enabling the classification of chemicals according to their mode of action and/or their potential to disturb biological pathways in humans (Krewski et al. 2020). Upon interaction of a chemical with the receptor, the reporter gene is expressed, triggering the synthesis of reporter proteins and enzymes, e.g., luciferase or b-lactamase, that can be detected by adding appropriate substrates to quantify the enzyme activity. ToxCast™/Tox21 includes reporter gene assays for hormone activity [e.g., estrogen receptors (Filer et al. 2014;Huang et al. 2014) or others (Kleinstreuer et al. 2015)], for activation of metabolic enzymes [e.g., aryl hydrocarbon receptor AhR (Brennan et al. 2015;Murk et al. 1996)] or adaptive stress response assays .
One important confounding factor of in vitro reporter gene activation is the cytotoxicity of the dosed chemicals. Cytotoxicity has been assessed in parallel to reporter gene activation in many of the assays in Tox21. Time-dependence of cytotoxicity was assessed for two cell lines (HepG2 and HEK293) using metabolic activity or loss of cell membrane integrity as a measure of cytotoxicity . Kinetics of cytotoxicity depended on the underlying pathway triggering cytotoxicity with activation of c-H2AX, which is related to genotoxicity pathways being particularly fast, followed by those associated with mitochondrial disruption . However, no quantitative picture emerged. In our experience the typically used cell viability assays using fluorometric methods to detect metabolic activity or loss of cell membrane integrity are prone to artifacts and, as observed by Hsieh et al. (2017), produce inconsistent results. We are therefore routinely screening cytotoxicity during reporter gene assays measurements by assessing confluence with a cell imager directly after dosing and prior to activity measurement of the reporter protein . Although, for most chemicals, this imaging method provided effect concentrations similar to those of the fluorometric cell viability assay, artifacts of apparent high metabolic activity while cells have already disappeared under the microscope can be avoided.
The minimal toxicity any chemical can elicit is baseline toxicity (or narcosis) that results from disturbance of structure and functioning of the cell membrane by the presence of chemicals in the membrane (van Wezel and Opperhuizen 1995). Many functions are lost in response to baseline toxicity but mitochondria are especially affected and lose their ability for energy transduction leading to ATP depletion as membranes become permeable (Vinken and Blaauboer 2017). Lipophilic chemicals that have high sorption affinities to phospholipid membranes trigger baseline toxicity at lower dosed concentrations than hydrophilic chemicals, but the cytotoxic concentrations in the cell membranes do not differ much between different chemicals causing baseline toxicity (van Wezel and Opperhuizen 1995). We confirmed for eight reporter gene cell lines that chemicals triggered baseline toxicity when reaching a critical membrane concentration of approximately 70 mmol × L −1 lip . We developed quantitative structure-activity relationships (QSARs) for these cell lines to predict the 10% inhibitory concentrations (IC 10 ) based on one chemical parameter, the liposome-water partition constant (K lip=w ) . Known baseline toxicants (Vaes et al. 1998) were used to derive the QSARs. The QSARs were very similar across cell lines with differences caused by the assays conditions, mainly the serum content of the medium ).
Close to cell death, the cells activate many cellular signaling pathways hence the exposure to high concentrations of chemicals may lead to a nonspecific activation of the reporter gene, a phenomenon termed "cytotoxicity burst" . Judson et al. (2016) developed a statistical approach to identify cytotoxicity burst in large in vitro platforms that include thousands of data points. Another measure of selectivity of chemicals was the number of assays activated by a chemical and the ratio of the effect concentration of the most sensitive assay to the 10th percentile of the distribution of all assays (Thomas et al. 2013). Fay et al. (2018) recently refined the diagnosis of the cytotoxicity burst phenomenon. They proposed a diagnostic odds ratio that differentiates assays that respond in the range of cytotoxic concentrations from those that are active at much lower concentrations and hence true responders. They also related the predicted baseline toxicity concentrations from Fischer et al. (2017) with the threshold for the cytotoxicity burst and concluded that the cytotoxicity burst phenomenon is more complex than baseline toxicity.
The goal of the present study was to quantify the degree of specificity of a chemical for each reporter gene assay directly from the experimental effect data without the need to analyze them in context with other chemicals and bioassays. We hypothesized that if the reporter gene activation occurs at concentrations close to baseline toxicity, it is likely not a specific effect but that the effect resulted from the cytotoxicity burst. To test this hypothesis, we measured the cytotoxicity and reporter gene activation of seven confirmed baseline toxicants and eight additional chemicals of environmental and toxicological relevance in eight standardized and widely used in vitro reporter gene assays. For chemicals that are more cytotoxic than baseline toxicity, the relationship between the receptor or pathway affected and the degree of cytotoxicity enhancement over baseline toxicity was investigated. Aiming to capture the big picture in an HTS in vitro platform, we applied the specificity analysis to the Tox21 in vitro reporter gene database to evaluate which role the cytotoxicity burst may play.

Experimental Study Chemicals
We selected seven confirmed baseline toxicants (2-Phenylphenol, 3-Nitroaniline, 4-Chloro-3-methylphenol, 4-Pentylphenol, 2-Allylphenol, 2-Butoxyethanol, 2,4,5-Trichloroaniline) that had been used to set up baseline toxicity QSARs for eight reporter gene cell lines ) and seven additional chemicals that were frequently found in environmental samples (Bisphenol A, Quinoxyfen, Fluoranthene) or in foodstuff (Genistein,Coumarin,Zingerone). Their names and physicochemical properties are listed in Table S1. None of the chemicals tested were below the volatility cutoff of a medium-air partition constant of 10,000 .
The experimental liposome-water partition constants (K lip=w ) of the seven baseline toxicants were taken from Vaes et al. (1997). In the second set, the K lip=w stem from Kwon et al. (2006) and van der Heijden and Jonker (2009). We included ionizable chemicals that dissociate into a neutral (f neutral ) and ionized fraction (f ionized ) in the in vitro assay medium at pH 7.4. From the acidity constants pK a as well as the K lip=w of the neutral and charged species (Table  S1), ionization-corrected liposome-water distribution ratios [D lip=w (pH 7.4)] can be calculated (Equation 1) or directly measured (Henneberger et al. , 2020. (1)

Reporter Gene Assays
Eight reporter gene assays (Table 1) were performed as described in Neale et al. 2017). The cell confluency served as surrogate for cell viability as previously described . Briefly 2,500 to 5,000 cells in 30 lL medium were plated in each well of a black 384-well polystyrene microtiter plate with clear bottom (AREc32 #3764; all other cell lines BioCoat #356663, Corning), leaving the last column as control without cells for the GeneBLAzer cell lines, and incubated for 24 h at 37°C, 5% CO 2 to let the cells attach. All medium components were purchased from Gibco. Media were 90% DMEM + GlutaMAX plus 10% FBS and 100 U=mL penicillin and 100 lg=mL streptomycin for AhR-CALUX and AREc32 cells, 90% phenol red-free DMEM, 10% dialyzed FBS, 0:1 mM NEAA, 25 mM HEPES, 1 mM sodium pyruvate, 100 U=mL penicillin and 100 lg=mL streptomycin and 4 mM GlutaMAX for ARE-BLA and 98% Opti-MEM supplemented with 2% charcoal-stripped FBS 100 U=mL penicillin and 100 lg=mL streptomycin for all other GeneBLAzer cell lines. During the initial 24 h, the cell number did not increase visibly, but cells attached . Plated cell numbers were adjusted between 2,500 to 5,000 cells per well depending on the cell line that the confluency was ∼30% to 50% prior to dosing and no more than 80% after 24 h of exposure . Before dosing and after additional 24 h of exposure, the cell confluency was measured with an IncuCyte S3 live cell imaging system (Essen BioScience).

Dosing of Chemicals
Chemical stocks dissolved in DMSO or neat liquid chemicals were dosed into medium at 4 × concentrations on 384-well plates with a Tecan D300e Digital Dispenser (Tecan) and then 10 ll per well of chemicals in medium were transferred with a 96-pipette head (Hamilton Microlab Star) into the cell plates that contain cells as described above in 30 lL medium. The chemicals were dosed in final concentrations up to three times their predicted IC 10 for baseline toxicity , with the dose range depicted in the concentration-response curves (Figures S1-S15) with different symbols for each independent experiment (n ≥ 3).

Data Evaluation
Cytotoxicity was expressed as percent inhibition of cell viability in comparison with unexposed cells (ratio of confluency of exposed to confluency of unexposed cells). The inhibitory concentration for 10% cytotoxicity, IC 10 (Equation 2) was determined from the linear range of the concentration-cytotoxicity curves (% cytotoxicity = slope × concentration) as described previously .
The effect concentrations EC 10 were calculated analogously from the linear concentration-effect curves (% effect = slope × concentration) with Equation 3.
For AREc32 and ARE-BLA, the effect concentration causing an induction ratio (IR) of 1.5, EC IR1:5 , was derived from the linear regression of the IR against the concentration (IR = 1 + slope × concentration) for IR < 4 .

Baseline Toxicity QSARs
Baseline toxicity QSARs of the form given in Equation 5 previously developed for the eight cell lines [ ), Table 1] were used to predict the IC 10 for baseline toxicity (IC 10,baseline ) of the study chemicals using the ionizationcorrected log D lip=w (pH 7.4) calculated by Equation 1. log½1=IC 10,baseline ðMÞ = slope × log D lip=w ðpH 7:4Þ + intercept (5) We replaced the K lip=w of the neutral species in the original QSAR by the ionization-corrected D lip=w (pH 7.4) in Equation 5 to include also charged chemicals. The expansion of QSARs for neutral chemicals to ionizable chemicals was previously described for bacteria ) and the zebrafish embryo toxicity test (Klüver et al. 2019). A potential ion-trapping effect does not need to be accounted for, provided the pH does not deviate much from pH 7.4, because ion-trapping becomes relevant only if the intracellular pH and the extracellular pH differ by more than one pH unit ).

Specificity Analysis
The toxic ratio (TR, Equation 6) is a measure of enhanced cytotoxicity, i.e., how much more potent a chemical is in comparison with its predicted baseline toxicity. TR = IC 10,baseline IC 10 (6) For TR, it is commonly accepted that a TR ≥ 10 is associated with specific or reactive toxicity (Maeder et al. 2004), and any chemical with TR < 10 is considered a baseline toxicant.
We defined the specificity ratio as the ratio between cytotoxicity (IC 10 ) and effect concentration (EC 10 or EC IR1:5 ). The specificity ratio can relate either to the experimental IC 10 (SR cytotoxicity , Equation 7) or to the predicted IC 10,baseline (SR baseline , Equation 8), as conceptually illustrated in Figure 1. SR cytotoxicity = IC 10 EC 10 or SR cytotoxicity = IC 10 EC IR1:5 ð7Þ In analogy with the threshold for TR, we considered SR ≤ 1 as not specific, 1 ≤ SR < 10 as moderately specific (with high uncertainty), 10 ≤ SR < 100 as specific, and 100 ≤ SR as highly specific.

Tox21 Data Extraction and Processing
Concentration-effect data for the 8,628 chemicals from the Tox21 10K library with available chemical identifiers (CAS number and DSSTox ID) were extracted for ARE-BLA, AhR-CALUX, PPARc-BLA, AR-BLA, ERa-BLA, and GR-BLA (see Table 1 for Tox21 assay notation). Reporter gene activation and cell viability were downloaded from the Tox21 Concentration-Response Browser as percent of the maximum effect of the positive control ( = 100%) (https://sandbox.ntp.niehs.nih.gov/tox21curve-visualization, last accessed 10 June 2019). No data for AREc32 were available in Tox21. The database includes effect data from the U.S. EPA, the National Toxicity Program, and the Federal Drug Administration laboratories. In Tox21, chemicals were partly tested multiple times in one or more laboratories, and provided that the same chemical ID was tested (equal vendor and purity), the concentration-effect data were merged and reevaluated using the same linear concentration-response analysis ) as that used for the experimental data measured for this study, yielding single IC 10 and EC 10 for each chemical and assay. In the BLA-bioassays, cell viability was quantified via adenosine triphosphate (ATP) with CellTiter-Glo, and the decrease of the signal served as measure of cytotoxicity. In AhR-CALUX, the cytotoxicity was quantified with CellTiter-Fluor based on protease activity. Note that cell viability was measured by microscopy in our own experiments; thus differences in the sensitivity between the two cytotoxicity measurement methods may represent a source of error. For the Tox21 ARE-BLA, PPARc-BLA, AR-BLA, and ERa-BLA, the reporter gene constructs, cell lines, and specific effect measurement techniques were the same as those used in our experiments. AhR agonism was measured by CALUX luciferase quantification in both Tox21 and our experiments, but Tox21 used the HG2L7.5c1 cells derived from HepG2 (He et al. 2011), whereas we applied H4L7.5c2 cells (Brennan et al. 2015). Another difference was that we used IR as an effect measure for ARE-BLA. For GR-BLA, we used the construct based on HEK293T, and Tox21 applied the HeLa-derived GR-BLA. Cytotoxicity data were available for ARE-BLA and AhR-CALUX, but for the other cell lines cytotoxicity was derived from the antagonistic assays. This difference can be justified because the potent agonists were added at very low constant concentrations, where they did not cause any cytotoxicity.
The data were fitted by linear regression as described in  using MATLAB R2018a with the code detailed in Text S1 and S2. EC 10 and IC 10 for all chemicals in the six assays were calculated by Equations 2 and 3, and standard errors were derived according to Escher et al. (2018). Concentrations triggering >30% effect and >50% cytotoxicity were excluded from the fit, because linearity was observed only for the lower portion of concentration-response curves in in vitro reporter gene assays . Chemicals with EC 10 and IC 10 > 0 and <their maximum test concentration were classified active and/or cytotoxic. EC 10 and IC 10 with relative standard errors >50% were classified "inconclusive" and excluded from the specificity analysis. The remaining EC 10 and IC 10 were analyzed for their TR, SR cytotoxicity , and SR baseline by Equations 6-8.
The IC 10,baseline of all Tox21 chemicals were predicted with the baseline toxicity QSARs (Table 1). The log K lip=w of the neutral species were predicted by a log K ow -based QSAR [ (Endo et al. 2011), Equation 9] and the log D lip=w (pH 7.4) with Equation 10 that was derived from Equation 1 by assuming 10 times lower affinity of the ionized species to phospholipid liposomes (Bittermann et al. 2016;Escher et al. 2020). The fraction of the neutral and ionized species was calculated with the Henderson-Hasselbalch equation from the acidity constants pK a , which were estimated with ACD/Percepta pK a using the GALAS algorithm (www.acdlabs.com/software/pka/). log K lip=w = 1:01 × log K ow + 0:12 ð9Þ Only IC 10,baseline for chemicals with 1 < log D lip=w ðpH 7:4Þ < 5 were predicted; the others were marked "outside QSAR domain." The entire workflow of data analysis of the Tox21 data is outlined in the SI, Figure S18.

Cytotoxicity and Reporter Gene Activation of Baseline Toxicants
All seven evaluated baseline toxicants triggered cytotoxicity in the eight reporter gene assays and the IC 10 were consistent with those previously reported . The TR of the chemicals in the assays were all within one order of magnitude around 1 (0:1 ≤ TR < 10); hence, the measured IC 10 were similar to the IC 10,baseline predicted by the baseline toxicity QSARs (Figure 2A).
We previously proposed that all concentrations above the IC 10 must be omitted before analyzing reporter gene activation . Here, we have purposely violated this rule to extrapolate the EC IR1:5 and EC 10 whenever possible, and there are a few examples, where even after the IC 10 cutoff, valid activation would have been detected. As we measured both cytotoxicity and reporter gene activation for the baseline toxicants, SR cytotoxicity were calculated according to Equation 7. In AREc32, all confirmed baseline toxicants activated the oxidative stress response ( Figure 2B and Figure S1), albeit with very low Figure 1. Conceptual illustration of the proposed specificity analysis framework. The line corresponds to the Quantitative Structure Activity Relationship (QSAR) for baseline toxicity. The experimental effect concentration [log (1/EC)] is depicted by a blue square; the experimental inhibitory concentration [log (1=IC 10 )] leading to 10% cytotoxicity is depicted by a red circle. The distance between the log [1=IC 10 (QSAR)] and the experimental log (1=IC 10 ) is the toxic ratio log TR. The distance between the experimental log (1=IC 10 ) and the experimental log (1/EC) is the specificity ratio log SR cytotoxicity . The distance between the log [1=IC 10 (QSAR)] and the experimental log (1/EC) is the specificity ratio log SR baseline . For better legibility, log(1/y) was omitted in the graph, but all measures are in form of negative logarithms. Note: EC, effect concentration; exp., experimental; QSAR, Quantitative Structure Activity Relationship; SR, specificity ratio; TR, toxic ratio. SR cytotoxicity ranging from 0.3 to 4.3 (Table S2). Only five chemicals activated ARE-BLA ( Figure S2) but did not exceed the SR threshold of 10 [SR cytotoxicity 0.5 to 5.0 (Table S3)], and five chemicals activated AhR-CALUX ( Figure S3) with SR cytotoxicity 0.4 to 8.1 (Table S4). In PPARc-BLA ( Figure S4), 2,4,5-Trichloroaniline showed a specific effect that appeared not to be influenced by cytotoxicity with a SR cytotoxicity of 28 ( Figure 2B and Table S5). No SR cytotoxicity could be deduced for the hormone receptors apart from 4-Chloro-3-methylphenol and 4-Pentylphenol in ERAa-BLA (Tables S6-S9); still low activity was recorded at cytotoxic concentrations (Figures S4-S8).

Cytotoxicity and Reporter Gene Activation of Environmental Chemicals
To further explore the cytotoxicity burst, we selected seven environmentally relevant chemicals with diverse physicochemical properties, five of which (Bisphenol A, Quinoxyfen, Fluoranthene, Genistein, Coumarin) overlapped with and were reported to be active in the Tox21 reporter gene assays corresponding to the assays performed in this study (https://comptox. epa.gov/dashboard, Tables S1-S6). The concentration-response curves in all eight reporter gene assays are depicted in Figures S9-S16, and IC 10 and effect concentrations are listed in the corresponding Tables S10-S17. When inserting the experimental IC 10 into the figures of all baseline QSARs ( Figure S19), visual inspection indicated that the more hydrophobic chemicals were often below the baseline, which could be an artifact due to sorptive loss processes or degradation. The QSARs were developed with a test set of chemicals, whose log K lip=w ranged from 0.60 (2-Butoxyethanol) to 4.31 (4-Pentylphenol), but three of the environmental chemicals (Quinoxyfen, 8-Gingerol and Fluoranthene) exceeded this range (Table S1). The TR analysis of the IC 10 for cytotoxicity (Tables S10-S17, Figure 3A) revealed that apart from one single outlier (Genistein in PR-BLA with a TR of 15), all environmental chemicals caused baseline toxicity in the cytotoxicity end point.
As expected for known estrogen agonists, Genistein (SR cytotoxicity = 118, SR baseline = 441) and Bisphenol A (SR cytotoxicity = 98, SR baseline = 282) were highly specific in the ERa-BLA ( Figure  3B) and because the TR were <10, SR cytotoxicity and SR baseline were in the same range. Unexpected SR ≥10 were found for Genistein in AREc32 (SR cytotoxicity = 38) and Quinoxyfen in PPARc-BLA (SR cytotoxicity = 25). All other chemicals did not show activation of the reporter gene or had SR cytotoxicity < 10 indicative for moderate or nonspecific effects. Bisphenol A had a SR cytotoxicity of 1-1.5 and SR baseline of 1.2-4 in AREc32, ARE-BLA, and AhR-CALUX, which indicates that these activations are of low specificity, and the estrogenic effect was the only true pathway among the tested assays.

Specificity Analysis of Tox21 Effect Data
The corresponding cell lines in Tox21 are identical with exception of AhR and GR-BLA (Table 1). Table 2 summarizes the output of the toxic ratio analysis and Tables 3 and 4 the specificity analysis for the six Tox21 reporter gene assays. Individual results are in the Tables S1-S6. The number of chemicals included in the SR cytotoxicity analysis was limited to the availability of chemicals, for which both EC 10 and IC 10 were measured. Because a relatively narrow and constant concentration range was measured for all chemicals in Tox21, a large proportion of the chemicals were not tested up to baseline cytotoxic concentrations; therefore, false negative counts are possible.
The SR baseline and TR analyses included more chemicals than the SR cytotoxicity analysis because the QSAR-predicted IC 10;baseline were used as an anchor for baseline-associated cytotoxicity. The IC 10;baseline could be derived for approximately 60% of the chemicals (59% for ARE-BLA, 57% for AhR-CALUX, 60% for PPARc-BLA, 57% for AR-BLA, 57% for ERa-BLA, and 57% for GR-BLA), the remainder was outside the applicability domain of the baseline toxicity QSAR of 1 < log D lip=w ðpH 7:4Þ < 5. Figure 4A and Table 2 report the TR values for all cytotoxic chemicals in the Tox21 reporter gene assays. Approximately 50% of the experimental IC 10 values were in the range of one order of magnitude around IC 10;baseline, hence, the TR < 10 and measured cytotoxicity were baseline-associated. Both neutral and ionizable chemicals were included in this analysis. The baseline toxicity QSAR can in principle be extended to ionizable chemicals by replacing the log K lip=w with the D lip=w (pH 7.4) corrected for ionization, but this has not yet been demonstrated for mammalian cell lines. We therefore split the data set in two subsets of predominantly neutral (f neutral > 98%) and (partially) charged (f neutral < 98%) chemicals, and the resulting TR analysis is depicted in Figure S20 and Table S18. Further, 44% to 61% of chemicals were (partially) charged (Table S18). There was little difference between the TR ranges of neutral and charged chemicals ( Figure S20), with slightly more chemicals with TR >10 for the charged chemicals, which can be explained by the fact that some specific modes of action, such as uncoupling, require charged chemicals. Therefore, neutral and charged chemicals were evaluated together.
The other 50% of the cytotoxic chemicals triggered cytotoxicity at lower concentrations than predicted by the baseline QSARs. Interestingly, chemicals triggered cytotoxicity at lower concentrations than their predicted IC 10,baseline , especially in ARE-BLA and AhR-CALUX, with 55% and 56% of cytotoxicity data exceeding the TR = 10 threshold. In addition, 5% to 25% of the chemicals with experimental IC 10 and matching IC 10,baseline had TR < 1 ( Table 2). These numbers reflect the uncertainty and variability of the assay results because in theory a true baseline toxicant has TR = 1. Deviation to higher TR can be related to specific cytotoxicity, but those below 1 must reflect uncertainty as well as measurement artifacts. SR cytotoxicity were <10 for a large proportion of the chemicals, for which both, IC 10 and EC 10 were reported ( Figure 4B) and, apart from ARE-BLA, the majority of those were even SR cytotoxicity < 1 (Table 3). Taking all assays together, only 9% of the active chemicals had a SR cytotoxicity > 10 and were classified as specifically active. The highest proportion of specifically active chemicals was found for AR-BLA (14%), with another 21% in the range of 1 ≤ SR cytotoxicity ≤ 10, which still means that 65% of all chemicals reported as activating the androgen receptor triggered cytotoxicity at concentrations similar to that of reporter gene activation. For GR-BLA, only 2 of the 121 active chemicals exceeded the range of cytotoxicity by a factor of 10. In PPARc-BLA, only troglitazone (SR cytotoxicity = 1,742) and in GR-BLA ciclesonide (SR cytotoxicity = 3,390) were highly specific, whereas the other chemicals were either of low specificity (16% and 12%, respectively, with 1 < SR cytotoxicity < 10) or did not exceed SR cytotoxicity of 1. Sixteen (8%) and three (1.8%) chemicals were highly specific in the AR-BLA and ERa-BLA assays, respectively, with SR cytotoxicity > 1,000. All were well-known agonists of these hormone receptors. Contrarily, no chemicals exceeded a SR cytotoxicity of 1,000 in the ARE-BLA and AhR-CALUX assays.
The SR baseline analysis indicated that a larger proportion of the evaluated Tox21 chemicals were specifically active in the reporter gene assays (Table 3 and Figure 4C). Taking all chemicals together, ∼ 60% of the active chemicals in the assays were specific with SR baseline > 10 while 23% to 38% fell into the 1 ≤ SR baseline ≤ 10 range of IC 10,baseline and were classified as moderately specific, with fewer than 25% having SR baseline < 1.
As for the SR cytotoxicity analysis, more chemicals were highly specific in the hormone receptor assays AR-BLA and ERa-BLA, in a few cases even exceeding an SR baseline of >10 6 . Estriol, a metabolite of estradiol and estrone and known to be a good ERligand, had an SR baseline of 45,000 in ERa-BLA and fluorometholone, a synthetic glucocorticoid, had an SR baseline > 106 in AR-BLA. For ARE-BLA and AhR-CALUX, the range of specific chemicals were narrow, with only very few chemicals exceeding an SR baseline of 10 4 .

Specific Toxicity
We found that 32% to 56% of all chemicals in Tox21 had TR >10 and could therefore be classified as specifically acting. This is a higher proportion than in the experimental data set of 15 compounds, where most chemicals were classified as baseline toxicants, with 7 of them specifically selected because they are known to be baseline toxicants (Vaes et al. 1998). However, when comparing with classification of a large set of ecotoxicity data using different mode-of-action classification tools (Kienzler et al. 2017), depending on the method, 27% to 69% of chemicals were assigned as baseline toxicants. The remainder are not necessarily expected to have a TR >10 but also included chemicals that could not be assigned to a mode-of-action class. There exist no such estimates for cytotoxicity, but the ranges in the present analysis seem realistic in comparison with the analysis of ecotoxicity data (Kienzler et al. 2017).

Identification of the Cytotoxicity Burst
Some of the reporter gene assays could be activated by the known baseline toxicants, but, as expected from baseline toxicants, the SR cytotoxicity ranged within an order of magnitude around one ( Figure 2B), with the only exception being 2,4,5-Trichloroaniline. Thus, the proposed classification (1 ≤ SR < 10 moderately specific, 10 ≤ SR < 100 specific, and 100 ≤ SR highly specific) appeared reasonable because the confirmed baseline toxicants fell into the range of SR < 10. Of course, it is possible that the initial Number of chemicals n for which a TR could be derived n (TR).
c Binning into the TR categories (TR > 10, 1 ≤ TR ≤ 10, TR < 1). In parentheses are the percentages of categories in each bin. Table 3. Specificity ratio SR cytotoxicity analysis for the six Tox21 in vitro reporter gene assays.
Reporter gene assay n (chemicals) a n (SR cytotoxicity ) b n (SR cytotoxicity > 10) c n (1 ≤ SR cytotoxicity ≤ 10) c n (SR cytotoxicity < 1) c Number of chemicals n for which a SR cytotoxicity could be derived n (SR cytotoxicity ).
c Binning into the SR cytotoxicity categories (SR cytotoxicity > 10, 1 ≤ SR cytotoxicity ≤ 10, SR cytotoxicity < 1). In parentheses are the percentages of categories in each bin.
classification of baseline toxicants according to Verhaar et al. (Verhaar et al. 1992, 2000 was faulty, but with respect to cytotoxicity they were all at TR < 10. Furthermore, these chemicals are not reactive and too small to bind to hormone receptors. The observed reporter gene activation by baseline toxicants supports our earlier proposed data evaluation strategy ): All chemical concentrations >IC 10 should be excluded from analysis of reporter gene activation to ensure that the derivation of effect concentrations is not influenced by the cytotoxicity burst (SR < 1). Considering that the present analysis demonstrated that the cytotoxicity burst is a critical confounding factor when using in vitro reporter gene assays, we suggest an even more cautious approach when using reporter gene assay data for risk assessment: Any chemical with 1 < SR cytotoxicity < 10 should be further scrutinized. A particularly striking example is butoxyethanol, with a SR cytotoxicity of 0.31 in AREc32, which is even below SR 1 and would not have been identified as an active chemical if the data were analyzed only up to IC 10 however, if all activity data were considered, it would have been mistaken as specifically acting (false positive).
The cytotoxicity burst was even more pronounced for the selected environmental chemicals where an even larger proportion of chemicals triggered the cytotoxicity burst (SR cytotoxicity < 1) or were moderately specific (1 < SR cytotoxicity < 10) ( Figure 3B).
Not all literature data are subject to such a rigorous data treatment; hence, it is very likely that some of the reported specific activity in the literature are false positives caused by the cytotoxicity burst. Cytotoxicity does not equate to baseline toxicity, but we can use baseline toxicity as a reference and SR baseline (Equation 7) can be a proxy for identification of a potential cytotoxicity burst in absence of SR cytotoxicity . SR baseline may even serve better to identify specific effects in cases when a different specific mechanism than the receptor/pathway targeted by the reporter gene assay had led to cell death, i.e., for chemicals with TR ≥10.
We have not included p53-BLA in our analysis because it is an example of a bioassay where it is hard to derive an EC 10 because, in almost all cases, the activation of the p53 pathway is close to cell death, and therefore cytotoxicity masks activation, which is hard to differentiate from the cytotoxicity burst.

Comparison of Experimental EC and Tox21 Database
To make our own data comparable with the Tox21 database, we reevaluated the concentration-response curves and derived EC 10 and IC 10 values for Tox21 assays, which are typically reported as ACC (Filer et al. 2016;Judson et al. 2016). Neither ACC nor the previously used AC10 was suitable for the TR and SR analyses because the baseline toxicity QSARs are available only for IC 10 , and the ACC is not associated to a fixed effect level but rather a measure of the lowest concentration that is statistically robust to show an effect. In this respect, ACC is rather similar to the lowest observed effect concentration LOEC. The EC 10 was derived from the absolute 10% effect in relation to the maximum effect triggered by a potent reference compound, and these types of raw data could be extracted from the Tox21 Concentration-Response Browser as outlined in "Materials and Methods." Our goal was not to evaluate the uncertainty of Tox21 data or to propose an alternative effect measure but to extract robust data for the TR and SR analyses. Therefore, we evaluated all data sets for one compound together, even if they stemmed from different laboratories. If they differed too much (relative standard errors >50%), the entire data set was excluded and classified as "inconclusive," but for most chemicals, there was high consistency among labs, and the joint evaluation of all data sets yielded robust and representative EC 10 values, as is demonstrated for the example of ERa-BLA in Figure S20.
The EC 10 and IC 10 values agreed well between our measurements and the Tox21 database for ARE-BLA ( Figure S22A), Table 4. Specificity ratio SR baseline analysis for the six Tox21 in vitro reporter gene assays.
Reporter gene assay n (chemicals) c n (SR baseline ) b n (SR baseline > 10) c n (1 ≤ SR baseline ≤ 10) c n (SR baseline < 1) c Number of chemicals (n) for which a SR baseline could be derived n (SR baseline ).
c Binning into the SR baseline categories (SR baseline > 10, 1 ≤ SR baseline ≤ 10; SR baseline < 1). In parentheses are the percentages of categories in each bin.
A B Figure 3. (A) Toxic ratios and (B) specificity ratios of the environmental chemicals (Bisphenol A, Quinoxyfen, Fluoranthene, Genistein, Coumarin, 8-Gingerol, Zingerone) in all reporter gene assays. Underlying data are in Tables S10-S17. The solid line is a SR cytotoxicity of 1, and the dashed lines are the thresholds of 10 and 0.1. If the effects did not exceed 10%, then no IC 10 and/or EC 10 could be derived, and there is no symbol in the figure. Underlying data are in Tables S2-S9. AhR-CALUX ( Figure S22B), and PPARc-BLA ( Figure S22C). The only exception was ERa-BLA, but here the SR were so low that the evaluated ERa-receptor is clearly not of relevance for those chemicals that showed no agreement, such as Quinoxyfen with a SR cytotoxicity of 5. The good agreement of most data ( Figure  S23) indicated that, despite different assay protocols and different plate formats, the responses are fairly robust. In the present study we worked in 384-well plates with larger medium volumes (40 lL) in comparison with the Tox21 reporter gene assays that were performed in 1,536-well plates with 4-6 lL of medium. The application of 384-well plates and larger medium volumes typically reduced the chemical losses from the system, in comparison with 1,536-well plates, due to higher storage capacity of the medium proteins and slower uptake in well-plate plastic (Fischer et al. 2018), but the difference appeared to be rather small.

Analysis of Tox21 Database
Approximately 50% of the chemicals triggered cytotoxicity at 10 times lower concentrations than their predicted IC 10,baseline . This finding means that a specific effect or reactive toxicity led to premature cell death. The mechanisms leading to cytotoxicity must not necessarily be the same as the receptor/pathway associated with the reporter gene. A larger proportion of chemicals were specifically toxic (TR ≥ 10) for ARE-BLA (55%) and AhR-CALUX (56%) than for PPARc-BLA (45%) and the hormone receptors (32%-41%). Well-known toxicants stood out, such as digitoxin (TR = 43,000 in ARE-BLA, no data in AhR-CALUX, 3,400 in PPARc-BLA, 115 in AR-BLA, 99 in ERa-BLA, and 2,200 in GR-BLA), which is cardiotoxic and highly cytotoxic with the proposed mechanisms related to oxidative stress response and interferon-related pathways, which explains the much higher TR in AREc32 (Prassas et al. 2011). The next highest TR chemical was the reactive dye 1,8-Dihydroxy-4,5-dinitroanthraquinone with a TR of 4,600 in ARE-BLA and again much more moderate TRs in the other assays. The cytotoxicity burst phenomenon may have led to a significant number of false positives in all evaluated Tox21 reporter gene assays. The SR cytotoxicity analysis clearly showed that a large proportion (86% to 97%) of the measured EC 10 were above cytotoxic concentrations (IC 10 ) and thus impacted by the cytotoxicity burst. Only 1.7%-14.2% of the chemicals classified as specifically active with another large fraction (12%-54%) in the range of 1 ≤ SR < 10. Given the variability of SR cytotoxicity in our own experimental data of confirmed baseline toxicants ( Figure 2B), it is likely that not only chemicals of low specificity were included in the range of 1 ≤ SR < 10 but also some with effects triggered by the cytotoxicity burst.
Due to practical reasons, all Tox21 chemicals were tested in the same concentration range (typically between ∼ 0:001 and ∼ 100 lM), which in turn resulted in only a small proportion of the chemicals reaching IC 10 concentrations. Thus, the total number of chemicals with both experimental IC 10 and EC 10 was reduced. A larger proportion of chemicals were classified as specific in the SR baseline analysis than SR cytotoxicity , because this analysis could be performed for more chemicals in the absence of experimental cytotoxicity data (up to three times more data points included) but also because chemicals with a cytotoxicity triggered by a different pathway had smaller SR cytotoxicity . A large proportion of the active chemicals ( ∼ 60%, exception ERa-BLA) were likely truly specific (Fig. 4C). Nevertheless, for 3% to 25% of the active chemicals with an EC 10 , the cytotoxicity burst presumably led to a false positive EC 10 and a further 23%-30% was only moderately specific, some of which, given the experimental variability could also have been impacted by the cytotoxicity burst. For ERa-BLA, 25% of the EC 10 fell below IC 10,baseline (Fig. 4C); thus, the cytotoxicity burst was particularly impeding the outcome of this assay.

Conclusion
Our findings clearly confirm previous reports of the cytotoxicity burst phenomenon and show a systematic way to identify and interpret the cytotoxicity burst. For chemicals with a high TR (i.e., those that are more toxic than baseline toxicity), one needs to be especially cautious because the specific effect may in some cases also cause enhanced cytotoxicity, e.g., in the two assays of oxidative stress response, AREc32 and ARE-BLA. For these assays, the activation of defense mechanisms might go hand-inhand with the higher cytotoxicity caused by specific effects. In contrast, the activation of the hormone receptors ERa-BLA, AR-BLA, PR-BLA, and GR-BLA is not likely to be associated with increased cytotoxicity, so any activation of hormone receptors at cytotoxic concentration is likely to be a result of the cytotoxicity burst phenomenon.
For routine applications of cell-based bioassays, we recommend performing the evaluation of activity data only at A B C Figure 4. (A) Toxic ratio TR, (B) specificity ratio SR cytotoxicity , and (C) specificity ratio SR baseline of chemicals that triggered the specific effect (EC 10 ) and/or were cytotoxic (IC 10 ) within the measured concentration range in the six Tox21 reporter gene assays. The underlying data are in the Tables S1-S6. The total number of chemicals included in the analysis is given in the top row [number of chemicals n for which a TR could be derived n (TR); number of chemicals n for which an SR cytotoxicity could be derived n (SR cytotoxicity ); number of chemicals n for which an SR baseline could be derived n (SR baseline )]. The percentages in the top refer to all data with TR=SR >10 (diamond symbols); the percentages in the bottom refer to all data with TR=SR ≤ 10 (circle symbols).
concentrations smaller than IC 10 to avoid false-positive responses (SR < 1). Likewise, measures can be taken to avoid false negatives with respect to highest tested concentration. The baseline toxicity QSARs can help to predict, at which concentrations we expect the minimal toxicity (baseline toxicity). We previously suggested to dose chemicals up to their solubility limit in medium or to their predicted IC 10,baseline , whichever is lower . Although it is challenging for HTS experiments to adjust each concentration, it might be possible to group chemicals according to their physicochemical properties (hydrophobicity, speciation, etc.) and test them groupwise at fixed concentrations depending on their expected IC 10 -range. Both medium solubility limit and IC 10,baseline can be predicted for neutral chemicals by the log K ow of the chemicals as sole descriptor Fischer et al. 2019). For ionizable chemicals, the speciation additionally needs to be included. By ensuring that the tested concentration range reached cytotoxicity, one can calculate both the SR cytotoxicity and the SR baseline as a diagnostic tool for evaluation of the specificity of a chemical. This approach will also help to reduce the uncertainty of HTS data. Watt and Judson (2018) identified data with high variability by an uncertainty analysis of the Tox21 data. It is conceivable that the elimination of data triggered by nonspecific effects will decrease uncertainty.
Although we have validated the classification method with a small selection of reporter gene assays only, the approach is easily transferable to other reporter gene assays because the baseline toxicity QSARs are based on constant critical membrane concentrations and can be predicted for any cell-based bioassay, provided that lipid and protein composition of cells and medium are known .
For application of in vitro test methods in risk assessment, it is important that mode-of-action specific QIVIVE are performed only with data that are not compromised by the cytotoxicity burst.