Elucidating Gene-by-Environment Interactions Associated with Differential Susceptibility to Chemical Exposure

Background: Modern societies are exposed to vast numbers of potentially hazardous chemicals. Despite demonstrated linkages between chemical exposure and severe health effects, there are limited, often conflicting, data on how adverse health effects of exposure differ across individuals. Objectives: We tested the hypothesis that population variability in response to certain chemicals could elucidate a role for gene–environment interactions (GxE) in differential susceptibility. Methods: High-throughput screening (HTS) data on thousands of chemicals in genetically heterogeneous zebrafish were leveraged to identify a candidate chemical (Abamectin) with response patterns indicative of population susceptibility differences. We tested the prediction by generating genome-wide sequence data for 276 individual zebrafish displaying susceptible (Affected) vs. resistant (Unaffected) phenotypes following identical chemical exposure. Results: We found GxE associated with differential susceptibility in the sox7 promoter region and then confirmed gene expression differences between phenotypic response classes. Conclusions: The results for Abamectin in zebrafish demonstrate that GxE associated with naturally occurring, population genetic variation play a significant role in mediating individual response to chemical exposure. https://doi.org/10.1289/EHP2662


Introduction
Little is known about why the effects of environmental agents, including industrial chemicals, manufacturing by-products, metals, pesticides, and herbicides differ between individuals within and across populations (Abdo et al. 2015;National Academies of Sciences, Engineering, and Medicine et al. 2016;Blaser et al. 2013). There is strong evidence that gene-environment interactions (GxE) play an important role in health outcomes, and that these interactions are likely a major source of the heterogeneity in chemical response. It is well established that GxE are an important component of the etiology of complex traits and diseases (Hunter 2005). Pharmacogenomic studies have consistently demonstrated that differential susceptibility to chemical exposure is directly related to genetic variation (Johnson 2003;Motsinger-Reif et al. 2013). Thus, for any genetically diverse population, latent genetic variation may contribute to observed differential susceptibility when challenged by chemical exposure. While it is accepted that such GxE interactions are important, there remain significant challenges-both experimental and statistical-in detecting and characterizing such interactions (Rappaport and Smith 2010;Zeise et al. 2013).
The zebrafish (Danio rerio), with over 26,000 protein-coding genes consisting of orthologues for over 70% of human genes, has gained momentum as a model organism in vertebrate genomics (Howe et al. 2013;Lieschke and Currie 2007). Toxicological and pharmacological applications in chemical biology have also seized upon the many benefits of zebrafish, including the short generation time, well-characterized development, and early maturation as clear embryos (Kimmel et al. 1995). Various morphological end points are easy to observe, and effects of multiple chemical exposures on these outcomes have been broadly studied (Asharani et al. 2011;Bai et al. 2010;Truong et al. 2014;Usenko et al. 2007). These advantages have led to an upward trend in high-throughput zebrafish chemical screens, especially toward screens of many chemicals in many fish, primarily in 96-well plates (Rennekamp and Peterson 2015). Thus, there exists potential for large-scale studies of chemical bioactivity that integrate genetic information to probe mechanisms underlying morphologic response to chemical exposures during development (Baer et al. 2014) or even across multiple generations (Knecht et al. 2017;Kovács et al. 2015).
Zebrafish populations differ from many model organisms in that the standard husbandry practices can be designed to maintain diversity (Nasiadka and Clark 2012), meaning that most laboratory populations contain an unknown level of genetic diversity (Brown et al. 2012). While this diversity is attractive in translating to questions of human and ecological health, it raises critical questions of how unmeasured interindividual genetic variation might contribute to susceptibility differences in response to chemical exposure. Uncharacterized genetic diversity can manifest as apparent error effects within and across laboratories (Rennekamp and Peterson 2015).
Comparisons between named strains and interlab populations of zebrafish have shown variability in several phenotypes, providing the rationale that constitutive genetic variation may contribute to the variability in exposure response (Lange et al. 2013). Unfortunately, partitioning this variability among genetic, environmental, and phenotypic factors is hindered by (non)systematic differences in experimentation, statistical analysis, and most importantly, by a lack of available genetic data for the strains evaluated. Despite the small samples (one to two individual fish) or pooled strategies used in studies aiming to characterize genetic diversity, results have shown between 5 and 15 million single-nucleotide polymorphisms (SNPs) segregating in a zebrafish population, with roughly half of the variants showing evidence of population specificity (Butler et al. 2015;LaFave et al. 2014;Obholzer et al. 2012;Patowary et al. 2013).
To address these limitations, we directly interrogated GxE interactions by individually sequencing entire genomes from a large sample of zebrafish using refined experimental and statistical methods for characterizing phenotypic responses to chemical exposure (Garcia et al. 2016;Truong et al. 2016;Zhang et al. 2016Zhang et al. , 2017. The analytical methods were developed using years of data from high-throughput studies HTS of diverse chemicals in the Tanguay Laboratory's (Oregon State University, Corvallis, OR) Tropical 5D Zebrafish line (T5D). The T5D line is an outbred population of heretofore unknown genetic heterogeneity that has been used to screen thousands of chemicals for adverse biological responses Truong et al. 2014). We leveraged these HTS data to identify a chemical (Abamectin) that elicited biological response patterns indicative of population genetic differences, and then generated genome-wide sequence data to compare individuals displaying differential susceptibility to chemical exposure. To our knowledge, this is the first genome-wide association study (GWAS) using individually sequenced zebrafish drawn from a diverse population. Our approach inverts the paradigm of typical GxE research, where rather than interrogating the same list of usual chemical suspects, we only ask questions of particular compounds that have strong evidentiary support for differential population susceptibility.

Overview of Experimental Approach
First, we exploited the large-scale design of a high-throughput screening (HTS) system that has tested thousands of chemicals in an outbred zebrafish population of unknown genetic heterogeneity to select Abamectin as a chemical that displayed phenotypic patterns of high variability between individual responses. Second, we characterized the correlation structure across morphological end points in order to describe a specific multivariate Affected vs. Unaffected phenotype in zebrafish exposed to Abamectin. Third, we performed range-finding studies to narrow our estimate of the Abamectin critical concentration to that at which a stable 50:50 Affected:Unaffected proportion was observed. Fourth, samples were exposed to the critical concentration of Abamectin, from which we isolated DNA from Affected samples that displayed our multivariate phenotype and Unaffected samples that did not respond to chemical exposure. Fifth, whole-genome sequences were generated from DNA individually isolated from each Affected and Unaffected sample. These data were then used to identify GxE as genetic variants associated with differential response to chemical exposure.

Experimental Population and Developmental Screening System
Adult T5D (wildtype) zebrafish were housed at Sinnhuber Aquatic Research Laboratory at Oregon State University. All experiments in this manuscript used this population, which originated from a minimum of 25 small group crosses, each group containing three males and three females. Each generation was propagated using equal proportions of offspring from the original group crosses. Adult zebrafish were group spawned to produce embryos for the developmental screening system detailed in Truong et al. (2014). In this system, embryos were dechorionated (Truong et al. 2011) and placed into individual wells of a 96-well plate. Chemical exposures were initiated at 6 hpf and evaluated at 120 hpf (hours post fertilization) for 17 morbidity and mortality end points. The end points evaluated were MORT: mortality at 120 hpf, YSE: yolk sac edema, AXIS: bent body axis, EYE: eye, SNOU: snout, JAW: jaw, OTIC: otic, PE: pericardial edema, BRAI: brain, SOMI: somite, PFIN: pectoral fin, CFIN: caudal fin, CIRC: circulation, PIG: pigmentation, TRUN: trunk length, SWIM: swim bladder, and NC: notochord distortion. Each end point was recorded as present or absent for each individual embryo. These binary vectors of 17 morphological end points per individual were used for subsequent analysis.

Identifying Chemicals with Evidence of Differential Population Response
Screening data on 1,060 chemicals from Truong et al. (2014) were evaluated to prioritize chemicals for GWAS mapping. Our aim was to identify the chemical with maximal evidence of differential response for optimal GxE power. Chemical choice was based on morphological data from a large-scale, concentration-response study of chemically exposed zebrafish embryos assessed for developmental toxicity end points at 120 hpf. In searching for patterns indicative of population variability over a broad, multipoint concentration series, our prioritization metric highlighted maximal population variability in response across all binary morphological end points to identical environmental exposures. The full concentration-response data for 1,060 chemicals (n = 32 samples tested across each of five chemical concentrations) were analyzed. We first produced a sublist of chemicals that had at least one morphological end point (other than mortality) observed at near 50% incidence (32-68%) at a minimum of two concentrations. This measure ensured that there was developmental variability between fish exposed to that chemical (i.e., high proportions of both Affected and Unaffected individuals), that a stable critical concentration could be estimated, and that the response variability held for more than one concentration. Given our goal of maximizing interindividual response variability, the latter principle ensured a less steep concentration-response curve and a higher probability that reproducible patterns of differential population response to exposure would be observed in subsequent trials.

Generating Data at a Critical Exposure Concentration to Elicit Maximal Population Variability
Two rounds of range-finding experiments were performed to estimate a critical concentration of test chemical (Abamectin, CASRN 71751-41-2, Sigma-Aldrich) that induced 50% incidence of effect so that the genetic association study could draw evenly from Affected and Unaffected individuals. For each round, adult zebrafish were group spawned to produce embryos that were exposed to several concentrations of Abamectin, following the developmental screening protocol detailed above. First, 576 individual zebrafish embryos were distributed into six 96-well plates, with 12 individuals per plate exposed to a concentration (0, 0.03, 0.1, 0.3, 1, 3, 5, or 10 lM) using a 20-mM stock of Abamectin digitally dispensed using an HP D300e (Hewlitt Packard HP D300e). Second, a narrower range including 0, 0.5, 0.7, and 0:9 lM was performed using 192 individuals (48 per concentration). From these data, the final critical concentration of 0:6 lM was selected to approximate a 50% effect size while ensuring sufficient numbers of completely clean (i.e., absence of adverse developmental end points) Unaffected samples.
After the concentration was established, individual zebrafish were exposed to the critical concentration of 0:6 lM of Abamectin to identify individuals responding differently to equivalent environmental concentrations. We also included untreated (negative) controls on each plate (Exposed: control ratio at 72:24 per plate) to ensure that we could detect global plate effects and have confirmatory samples to sequence if unexpected genotype distributions had been encountered. The total number of samples exposed was 768 (8 plates with 96 individuals), including 576 Exposed:192 Control. From the exposed embryos and knowledge of end point-end point relationships and developmental cascades (Zhang et al. , 2017, we identified samples of maximum phenotypic clarity (i.e., no intermediates) to define a clean multivariate phenotype of individuals Affected by exposure vs. individuals Unaffected by the same Abamectin exposure. For Affected status, an individual had to display all of the following specific end points: altered eye, snout, jaw, and axis development, plus pericardial and yolk sac edema (EYE, SNOU, JAW, AXIS, PE, and YSE). The Unaffected phenotypic status was applied to individual embryos having absence of any morphological defect (i.e., normally developed embryos). A total of 276 (138 Affected, 138 Unaffected) zebrafish exposed at the critical concentration were randomly selected to be sequenced for genome-wide association mapping. While other individuals may have been classifiable into one of our two extreme, Affected/Unaffected classes, we optimized sequencing resources towards fully penetrant phenotypes in order to maximize discriminatory power.

Genotyping by Sequencing
Genomic DNA was extracted from individual larvae (Quick-DNA™ 96-Kit, Cat # D3011; Zymo Research), with a subset selected for sequencing. For each sample, 350 ng of DNA was used for library preparation. Prior to library prep, the quality and quantity was verified using a fluorometric plate reader and bioanalyzer. Samples were sheared to ∼ 320 base pair ðbpÞ, and 100 ng of each sample was used in the WaferGen DNA amplification library prep kit (Takara Bio, USA). After the library prep, each sample was quantified to verify similar input for sequencing. The samples were sequenced on an HiSeq3000 (Illumina) with 12 samples per lane ( ∼ five times coverage) and 150-bp paired-end sequencing. All library preparation and sequencing was performed at Oregon State University's Center for Genome Research and Biocomputing (http://cgrb.oregonstate.edu/ core).

Quality Control and Alignment
FastQC (Babraham Bioinformatics version 0.11.3) tests for sequence quality, Guanine Cytosine (GC) content, sequence length distribution, sequence duplication levels, overrepresented sequences, adaptor content, and kmer content were used to ensure fidelity of results. For each sample (DNA from an individual zebrafish), reads were aligned to the Genome Reference Consortium GRCz10 (Howe et al. 2013) reference genome with Bowtie2 (version 2.1.0; open source) (Langmead and Salzberg 2012), using standard settings. Potential PCR duplicates were then removed using Samtools rmdup (Li et al. 2009).

Variant Calling and Filtering
Variant calls were generated for each individual at every variant site. A variant call was made at any site (across the entire genome, including all chromosomes and mitochondrial DNA, excluding nonchromosomal material or scaffolds not aligned within a chromosome) where there was sufficient evidence of a nonreference base for at least one individual. GATK version 3.5 (McKenna et al. 2010) HaplotypeCaller was used to call genotypes on all samples simultaneously (joint genotyping). This leverages data across samples to assign genotypes for individuals with low coverage at certain bases using a Bayesian likelihood model for genotyping. Reads with a mapping quality (MQ) below 20 were not included, and a minimum phred-scaled confidence threshold of 10 was required. Genotypes are reported for every individual at every variant site for which there remained reads.
The GATK VariantFiltration tool was used to implement the GATK Best Practices (DePristo et al. 2011) hard filtering recommendations (filter SNPs with quality by depth <2; phred-scaled Fisher's exact test p-value >60; root mean score MQ <35; MQ Mann-Whitney rank sum < − 12:5; read position Mann-Whitney rank sum < − 8; strand odds ratio >3). The MQ threshold of 35 was adjusted from GATK's recommendation of 40. This is due to the GATK workflow's use of a different aligner, which outputs a larger range of MQ scores for each base, averaging 60 for high confidence reads. The maximum MQ outputted from the Bowtie2 aligner is only 42 (for a perfectly aligned read with no mismatches to the reference). A final filtering refinement and file conversion for subsequent analysis was performed using VCFtools [VCFtools version 0.1.14 (Danecek et al. 2011)]. SNPs with mean depth (across samples) below 2 were excluded. This was done to reduce false-positive calls for SNPs that were based on too few reads per individual.

Association Analysis
An allelic association analysis (Fisher's exact tests) was run for each SNP across the genome using PLINK 1.9 (Purcell et al. 2007). For each SNP, the counts of the minor and major alleles in the cases (morphologically Affected individuals) and controls (morphologically Unaffected individuals) were used to calculate the exact hypergeometric probability of observing those four counts under the null hypothesis that allele counts in cases and control do not differ. A p-value below a = 0:05 provides sufficient evidence that the allele counts for each group do statistically differ. To adjust for multiple testing, a Bonferroni correction was used, leading to a significant call for any SNP with a p-value <2:55 × 10 −9 (0.05/19,597,672).

Validation
For validation of candidate SNPs identified in our GWAS, gene expression was measured following Abamectin exposure on a new set of individuals that were randomly sampled from the population. The developmental screening protocol was identical to that described earlier. At 120 hpf, 84 individuals (balanced between control, Affected, and Unaffected) were randomly selected for realtime polymerase chain reaction (RT-PCR) gene expression analysis. Embryos from wells of interest (Affected and Unaffected) were independently snap frozen using liquid nitrogen and then copurified for RNA and DNA using ZR-Duet™ MiniPrep (Cat #D7001; Zymo Research). A list of the forward and reverse primers (targeting a SNP just upstream of sox7, which had the smallest GWAS p-value) used for the experiment can be found in Table S1 (Integrated DNA Technologies). The mRNA gene expression analysis was performed as described in Chlebowski et al. (2017).
Expression of target gene sox7 and housekeeping gene beta actin were measured as threshold cycle (C T ) value using relative standard curves to optimize how much input was in the reaction. Prior to analysis of RT-PCR results, one control individual was discarded for an abnormally high C T value (C T > 32; for other samples, 27 < C T < 30), and six individuals (two Affected; four Unaffected) were discarded due to insufficient amplification signal. A two-sample t-test was first conducted to ensure that beta actin was not differentially expressed in control individuals compared to exposed (Affected and Unaffected) individuals (p = 0:58). The statistical analysis of sox7 expression was then based on the log 2 fold change calculated for each individual using the DDC T method: Individual i is one of the total number of control, Affected, and Unaffected individuals, and c is the number of control individuals. The fold change for each individual is with respect to the Environmental Health Perspectives 067010-3 126(6) June 2018 average control. A two-sample t-test was then conducted comparing log 2 (fold change) for Affected vs. Unaffected individuals.

Response Patterns Indicative of Differential Susceptibility
We exploited the HTS data from Truong et al. (2014) to identify chemicals with empirical evidence of population susceptibility differences, as illustrated in Figure 1A. Out of 1,060 chemicals screened in full concentration-response by Truong et al. 2014, there were 19 that met our population variability heuristic (Table  S2). From this empirical shortlist, Abamectin had the most robustly variable response (i.e., highest proportion of responders) across the most concentrations. Given the broad concentration spacing (log 10 ) of the HTS design, certain individuals (Unaffected) tolerated concentrations of Abamectin that were orders of magnitude higher than concentrations causing severe abnormalities in other individuals (Affected). Figure 1B illustrates the progression of rangefinder experiments aimed at identifying the critical concentration for morphological effects induced by Abamectin. Power estimates showed that in the absence of prior knowledge of allele frequencies or population genetic structure, the optimal study design should include The curve with the asterisk in the upper left-hand corner of the panel represents a chemical response suggestive of differential population susceptibility, whereas all other curves depict steeper toxic points of departure (i.e., less spread in the range of concentrations eliciting effects across the population) or lack of response. (B) Rangefinders: Successive screens to find the critical concentration as that at which approximately 50% incidence is observed. The heatmaps show horizontal blocks (separated by whitespace) of identical concentrations, whose height corresponds to the number of zebrafish tested. Within each concentration block, each row is the vector of observed morphological end points (17 columns; see "Methods" section) for an individual. As per the legend at the lower right, blue represents no end point incidence, red represents incidence of an end point, and grey cross-hatching represents mortality. (C) Critical concentration exposure: Example of a single exposure plate, where 72 individuals (in single wells) were exposed to 0:6 lM Abamectin at 6 hpf, plus 24 individuals exposed to vehicle [dimethylsulfoxide (DMSO)] controls. Developmental morphology screening was performed at 120 hpf to identify Affected individuals with the phenotype of altered eye (EYE), snout (SNOU), jaw (JAW), pericardial edema (PE), yolk sac edema (YSE), and axis development (AXIS) vs. Unaffected individuals with no observed defects. (D) Individual DNA extraction: Individuals classified as Affected and Unaffected were randomly selected for whole-genome sequencing. a balanced phenotypic ratio of Affected:Unaffected samples. Therefore, rangefinders aimed to find the concentration eliciting an even ratio of our complete phenotype. The first round narrowed the variable-response concentrations of the original HTS data to a range between 100 ng. The second round tightened the target range between 0:03-10 lM. From these data, 0:6 lM was chosen as the critical concentration as intermediate between the high incidence at 0:7 lM and the low incidence at 0:5 lM.

Identifying Individuals for Genomic Sequencing
Of the 576 fish exposed to Abamectin at the critical concentration, we observed 3% mortality at the 120 hpf evaluation. Of the surviving exposed fish, 155 (28%) displayed the fully penetrant Affected phenotype (see Figure 1C, image of Affected individual with arrows pointing to specific end points), 200 fish (36%) were Unaffected, and the remainder of surviving individuals were scored as having an intermediate phenotype that consisted of a subset of the full Affected end point set. This is evidence of population variability in response to chemical exposure. From the fish at the two phenotypic extremes, 276 individuals were randomly chosen for full-genome sequencing (138 Affected and 138 Unaffected). By first focusing on a clean Affected group of fish that scored exactly the same on the morphologic measures, we reduced potential variability coming from sources other than genetics and thereby increased the power to detect association. In unexposed control individuals, only 1% showed any specific morphological deformity (none of whom showed the specific phenotype of interest), vs. 65% of exposed individuals (p < 10 −16 ).

Genetic Polymorphisms Associated with Gene-by-Environment
FastQC output indicated that reads were 151 bp in length. The overall alignment rate was 89% for each sample. GC content for each sample was approximately 37%, which is consistent with the zebrafish genome (Han and Zhao 2008 Association analysis was conducted for each SNP across the genome using Fisher's exact test in PLINK 1.9 (Purcell et al. 2007) to assess allele counts in the Affected vs. Unaffected individuals. To adjust for multiple testing, we subjected our nominal p-value (a = 0:05) to a Bonferroni correction of (0.05/19,597,672), yielding a genome-wide significance threshold of p < 2:55 × 10 −9 . The strict statistical significance criteria highlighted three SNPs that exceeded the genome-wide significance thresholds (Figure 2 and Table S3). The SNPs (ranked in order of ascending p-value) were mapped to genic regions of sox7, erf, and cfap74.

Validation
From the genome-wide data, a G ! T variant in the promoter region (569 bp upstream) of sox7 (the top hit in Figure 2) was significantly associated with severe developmental end points after exposure. This SNP had the smallest p-value and was observed at high prevalence in the population tested, highlighting a promising Figure 2. Genome-wide association study (GWAS) results for Abamectin. The Manhattan plot shows the genomic coordinate for each SNP on the horizontal axis (grouped into chromosomes) vs. its association with phenotypic status on the vertical axis (as the negative logarithm of p-value). The horizontal line indicates the Bonferroni-adjusted significance threshold. The dots above this line indicate candidate SNPs (cfap on chromosome 8, erf on chromosome 19, sox7 on chromosome 20) for validation as genetic factors associated with differential susceptibility (i.e., Affected vs. Unaffected phenotypes) to Abamectin exposure. target for functional validation. The SNP region upstream of sox7 and expression primer design for the validation study are highlighted in Figure 3A and Table S1. The lower portion of Figure 3A shows the T allele frequency differences at this SNP between Affected (45%) and Unaffected (12%) individuals. In the validation study, using a new sample of individuals from the population, sox7 showed significantly lower expression (p = 0:02; Figure 3B) at 120 hpf for Affected (n = 26) vs. Unaffected individuals (n = 24) after exposure to 0:6 lM Abamectin.

Discussion
Our evidence from Abamectin-exposed zebrafish showed that interindividual (i.e., population) genetic variation contributes to differential response to environmental chemical exposures. To reach this conclusion, we first exploited the large-scale, systematic design of HTS data (Truong et al. 2014) to select a target chemical (Abamectin) whose exposure produced patterns of differential response in the exposed population. Next, we generated genome-wide sequence data for individual zebrafish displaying susceptible vs. resistant phenotypes following identical chemical exposure. Finally, we identified a genetic region near sox7 associated with this GxE effect and confirmed gene expression differences between susceptibility groups. This approach addresses a critical need in the face of an expanding chemical exposome (Wild 2005). Select individuals or entire communities may be especially susceptible to adverse health effects from chemical exposure through common consumer products, occupational hazards, environmental emergencies, or geographic location, such as Superfund sites (Tilley et al. 2017;Brette et al. 2014;Judson et al. 2010). Models for diverse populations are needed to explore this interindividual susceptibility (French et al. 2015). The T5D zebrafish line used in our experiments presents such a genetically heterogeneous population, containing levels of interindividual diversity required for studies of differential susceptibility (Balik-Meisner et al. 2018).
In contrast to the pooled samples commonly used for these types of experiments in zebrafish (Butler et al. 2015;Obholzer et al. 2012), we followed individuals (in single-fish wells) from immediately postfertilization throughout the entire environmental exposure course, all phenotypic assessments, and generation of genetic information. Analysis of individual-level differences in behavioral and morphological responses to chemicals provided solid phenotypic anchors for genetic results. Indeed, our results demonstrated reproducible population variability in a multivariate phenotype that showed consistent concentration-response to chemical treatment across successive rounds of narrowing concentration. The fine-scale quantification of phenotype and exposure environment enabled us to elucidate the role that genetic variation can play in differential susceptibility. This specificity, where we can identify individual-level genetic variation that affects response to individual chemical environments, may bring new precision to personalized toxicity prediction and risk assessment.
Importantly, the authors note that the test compound, Abamectin, is not a genotoxic compound that would have been expected to alter DNA sequence (Oliveira et al. 2016). Abamectin is a standard compound formulation of avermectin B1a and B1b and member of the structurally complex mectin class of compounds. It is used to control insects in agriculture by acting through glutamate-gated chloride channels [Gamma-aminobutyric acid (GABA) receptor] and as an anthelmintic agent to treat common intestinal worms (Campbell 1989). Abamectin has evidence of population variability in response to pharmaceutical applications (Aljedani and Almehmadi 2016;Churcher et al. 2009;Khaldoun-Oularbi et al. 2013;Slimko et al. 2002).
A novel SNP upstream of sox7 was associated with GxE at a genome-wide significance level. There is strong evidence that this gene, a transcription factor, plays a critical role in development related to our Affected phenotype. Ablation of Sox7 in mice leads to developmental delays, pericardial edema, and yolk sac defects (Wat et al. 2012). In zebrafish, sox7 mutants have arterial block and pericardial edema after 72 hpf (Hermkens et al. 2015). Our functional validation experiments showed statistically significant suppression of sox7 gene expression in Affected individuals vs. those Unaffected following chemical exposure. Although these expression differences were observed in a new, unbiased sample of individuals, there are likely more factors at play in diverting exposed individuals from normal development toward a severely abnormal phenotype. Additional experiments will be needed that probe key time points along the coordinated cascade of vertebrate  (20:19,166,444) in Affected and Unaffected individuals from the genome-wide association study (GWAS). Sequence logos are centered at the SNP site, denoted as position 0. The relative letter height corresponds to the frequency of the base at that position. (B) Notched boxplots showing the distribution of log 2 (fold change) of sox7 expression for Control (unexposed), Unaffected (exposed), and Affected (exposed) groups. The boxplots show the median (thick horizontal lines), the upper and lower quartiles (top and bottom of boxes, respectively), 1.5 times the interquartile range (whiskers), and any outlier samples outside the whisker range of the observed expression for each group. developmental events (Zhang et al. 2017). We present the current results as a step toward unraveling this etiology by highlighting promising genetic candidates for detailed study.
Further experimentation must be undertaken before we can completely unravel the causative etiology of genetic associations with differential susceptibility. Indeed, initial deep sequencing efforts indicated that the sox7 SNP identified by our GWAS is highly correlated with nearby insertion/deletion variation. Thus, as with any genome-wide scan, the markers identified as significant highlight regions that, when considered in tandem with biological knowledge, are candidates for targeted follow-up. For example, the second SNP passing our significance threshold fell within the erf gene, for which there is evidence from other species for a plausible connection to our Affected phenotype. An Erf mutation in humans and mice is linked to craniosynostosis and related eye, snout, and jaw deformities (Twigg et al. 2013).
In order to provide a clear test of whether our system could detect GxE related to differential population susceptibility, we implemented a straightforward approach of well-characterized genetic analysis tools. By focusing on the reduction of measurement error in phenotypes and having knowledge of an individual's entire exposure history, we augmented our statistical detection power. However, this does not discount the possibility that higher-order interactions between genetic factors (i.e., epistasis) are involved (Motsinger et al. 2007). Future work will include methods aimed at assessing this gene-gene interaction space (Zhao et al. 2015).

Conclusions
Our approach leverages whole-organism, HTS data to introduce a data-driven strategy for identifying chemicals with putative GxE effects. This search for empirical patterns of differential response as an indicator of possible genetic explanations diverges from studies that aim to further characterize the small subset of chemicals with demonstrated GxE effects. Only with strong evidentiary support for differential population susceptibility do we assay the role of genetics in response to particular compounds. Moreover, such an unbiased approach may identify candidate GxE chemicals from a broader search space than traditionally studied, more accurately representing the diversity of real-world chemical space. In conclusion, by linking experimentation with bioinformatical prediction, we can make more informed choices on new experimental directions and avoid unnecessary expenditures of time and money-chasing effects that are unlikely to reflect genetic variation.