Small-Magnitude Effect Sizes in Epigenetic End Points are Important in Children’s Environmental Health Studies: The Children’s Environmental Health and Disease Prevention Research Center’s Epigenetics Working Group

Background: Characterization of the epigenome is a primary interest for children’s environmental health researchers studying the environmental influences on human populations, particularly those studying the role of pregnancy and early-life exposures on later-in-life health outcomes. Objectives: Our objective was to consider the state of the science in environmental epigenetics research and to focus on DNA methylation and the collective observations of many studies being conducted within the Children’s Environmental Health and Disease Prevention Research Centers, as they relate to the Developmental Origins of Health and Disease (DOHaD) hypothesis. Methods: We address the current laboratory and statistical tools available for epigenetic analyses, discuss methods for validation and interpretation of findings, particularly when magnitudes of effect are small, question the functional relevance of findings, and discuss the future for environmental epigenetics research. Discussion: A common finding in environmental epigenetic studies is the small-magnitude epigenetic effect sizes that result from such exposures. Although it is reasonable and necessary that we question the relevance of such small effects, we present examples in which small effects persist and have been replicated across populations and across time. We encourage a critical discourse on the interpretation of such small changes and further research on their functional relevance for children’s health. Conclusion: The dynamic nature of the epigenome will require an emphasis on future longitudinal studies in which the epigenome is profiled over time, over changing environmental exposures, and over generations to better understand the multiple ways in which the epigenome may respond to environmental stimuli. Citation: Breton CV, Marsit CJ, Faustman E, Nadeau K, Goodrich JM, Dolinoy DC, Herbstman J, Holland N, LaSalle JM, Schmidt R, Yousefi P, Perera F, Joubert BR, Wiemels J, Taylor M, Yang IV, Chen R, Hew KM, Freeland DM, Miller R, Murphy SK. 2017. Small-magnitude effect sizes in epigenetic end points are important in children’s environmental health studies: the Children’s Environmental Health and Disease Prevention Research Center’s Epigenetics Working Group. Environ Health Perspect 125:–526; http://dx.doi.org/10.1289/EHP595


Introduction
Epigenetics is defined as the mechanisms by which mitotically heritable perpetuation of gene activity occurs without modification of the underlying gene sequence. The most commonly studied epigenetic mechanisms are methyla tion of DNA cytosine residues and the post-translational modification of histone proteins. The entirety of the epigenetic features of the genome are referred to as the epigenome. This layer of regulatory information is essential for proper development of cellular function and determination of cellular identity. Unlike the genome, the epigenome is variable by cell, tissue type, and developmental stage. These mechanisms also represent an adaptive intermediary that interprets and responds to environmental stimuli, resulting in alterations in gene expression. Thus, epigenetic and epigenomic characterization has rapidly become a primary interest for children's environmental health researchers studying the influence of the environment on human populations, particularly exposures during pregnancy and early life and their impact on childhood and later-in-life health and disease outcomes. Indeed, extensive human epidemiological and animal model data indicate that environmental influences such as stress (Vidal et al. 2014), socioeconomic status (Olden et al. 2014), and exposures to various environmental factors including toxicants (e.g., lead, arsenic, mercury, bisphenol A, cigarette smoke) (Cardenas et al. 2015;Goodrich et al. 2015;Joubert et al. 2012;Koestler et al. 2013;Nahar et al. 2014), nutritional factors (Hoyo et al. 2011;Steegers-Theunissen et al. 2009), parental body mass index Soubry et al. 2013Soubry et al. , 2015, gestational diabetes (Finer et al. 2015), and maternal antibiotic use (Vidal et al. 2013) during critical periods of prenatal and postnatal development influence developmental trajectories, thereby imparting permanent changes in phenotypic expression of the genome and chronic disease susceptibility.
DNA methylation is the most intensively studied epigenetic modification. It involves the covalent addition of a methyl volume 125 | number 4 | April 2017 • Environmental Health Perspectives group (-CH 3 ) to the 5´ carbon of a cytosine moiety, generating 5-methylcytosine (5-mC) (Figure 1), which occurs predominantly in the context of cytosines that precede guanines (5´-CpG-3´ dinucleotides, or CpGs). Hydroxymethylation, in which a hydroxymethyl group replaces the hydrogen atom at the 5´ carbon position in cytosine, is a closely related derivative that was conventionally thought to be an intermediate product during 5-methylcytosine demethylation but may also have a role in gene regulation (Hahn et al. 2014;Shen et al. 2014). CpGs are highly underrepresented in the genome, yet an average of 70% of these are methylated in most tissues. The remainder are unmethylated, often found in "CpG islands" that exist throughout the genome and are often present at the 5´ promoter and/or exon region of genes. Nearly 60% of human promoters are characterized by a high CpG content. However, CpG density alone does not influence gene expression. Instead, regulation of transcription often depends on DNA methylation status. In general, promoter-associated CpG islands are unmethylated at transcriptionally active genes, whereas promoter methylation is typically associated with gene silencing. In contrast, intragenic methylation is often positively associated with gene transcription. Thus the impact of DNA methylation on gene activity can vary dramatically depending on context.
Compelling epidemiological evidence of a link between early-life exposure and later disease has been reported (Barker 1988(Barker , 1995Barker and Osmond 1988;Barker et al. 1989;Hales et al. 1991;Leon et al. 1998). Environmental influences that can disrupt development include nutritional factors, endocrine-disrupting agents as well as physiological and psychological stressors. Embryonic and fetal development requires the wellorchestrated formation of key structures. This is carried out in part by the epigenetic modifications that are established during two major epigenetic reprogramming events ( Figure 2). The first occurs during gametogenesis, when the vast majority of the DNA methylation information is erased and then reestablished. The second occurs postfertilization when the paternal genome is rapidly erased of most DNA methylation marks followed by erasure of the maternal methylation information. New DNA methylation is established around the time of implantation, before germ layer specification. An exposure that occurs during pregnancy has the capacity to affect three generations at one time, including the mother (F 0 ), the developing child (F 1 ), and the developing gametes within the developing embryo/ fetus (F 2 ), which undergo reprogramming in humans from about 4 to 12 weeks gestation. There are regions of the genome that are able to resist postfertilization reprogramming, including imprinted genes (a group of monoallelically expressed genes defined by parent-of-origin dependent methylation and expression), some repetitive elements, and the recently identified group of genes referred to as "escapees" that carry DNA methylation information forward from the prior generation (Tang et al. 2015). Perturbations during these critical developmental windows can lead to responses that likely result in irreversible changes to tissue structure and function (e.g., altered cell type, number and function). In turn, these changes can manifest later in life and have the capacity to modulate physiological function and susceptibility to disease. Research also is emerging that investigates the placenta as a target tissue by which to study exposures at the maternal-fetal interface (Li Q et al. 2015;Paquette et al. 2015;Schroeder and LaSalle 2013).
A common finding in environmental epigenetic studies is the small-magnitude epigenetic effect sizes that are associated with exposure. It is reasonable and necessary that we question the relevance of such small effect sizes. What is the functional consequence, and do these small differences become magnified over the course of our lives, raising risk for cellular malfunction and disease? It may be the case that we do not find larger effect sizes (e.g., as observed in cancer) not because they do not exist-but rather because such large shifts may be incompatible with continued development. We also must consider the literal meaning of "small" effect sizes. A small difference in DNA methylation, for example, is small only in the context of the population of cells examined as a whole. In any given somatic cell, the autosomes are diploid, which means at any given CpG site, methylation is either present or absent on that chromosome. Within a cell, each autosomal CpG dinucleotide is thus 0% methylated, 50% methylated, or 100% methylated when accounting for the diploid state of the chromosomes. A small difference in methylation means that a small fraction of the cells exhibits this difference at a particular CpG. Depending on the nature and identity of that cell, such a difference could substantially affect that cell's function and, because of mitotic heritability of DNA methylation, the function of that cell's progeny.
Here we focus on the epigenetics and epigenomics research being conducted within the Children's Environmental Health and Disease Prevention Research Centers, or Children's Centers, as it relates to the "Developmental Origins of Health and Disease (DOHaD)" hypothesis (Barker 1995), which proposes that adverse events during early life program an increased risk for numerous adult diseases. Our objective is to discuss the state of the science in environmental epigenetics research and, in particular, to focus on the collective observations of many studies published thus far that for nearly any given exposure, the magnitude of effect on DNA methylation is relatively small. We will address the current laboratory and statistical tools available for epigenetic analyses, discuss methods for validation and interpretation of findings, particularly when effect sizes are small, question the functional relevance of findings, and discuss the future for environmental epigenetics research. Figure 1. Two major epigenetic modifications. DNA methylation involves the transfer of a methyl group from S-adenosylhomocysteine to the 5´ position of the cytosine ring, most often on cytosines followed by guanines in the DNA sequence. This results in the formation of 5-methylcytosine. Histone modifications are another major type of epigenetic modification, and involve the post-translational transfer of, for example, methyl, acetyl, ubiquitin, or phosphate groups to specific amino acid residues on the N-terminal tail of the histone proteins. The N-terminal tails protrude from the center of the nucleosome core (shown on right) and are accessible for these types of modifications. A linker histone (H1) is bound to DNA outside the nucleosome and is thought to help keep the DNA correctly positioned in relation to the nucleosome core.

Targeted CpG Measurement
Because DNA methylation (5mC) does not change the detectable sequence of DNA, genetic methods to assay DNA methylation have relied on variations of three basic approaches: bisulfite conversion, methylsensitive restriction enzymatic digestion, or 5mC antibody detection or enrichment. Treatment of DNA with sodium bisulfite causes the deamina tion of cytosine to uracil, but 5-methylcytosine is protected from deamination. Any cytosines detected in the DNA sequence after conversion were methylated in the original sequence. Methyl-sensitive restriction enzymes are those that can cut when the recognition site is either methylated or unmethylated depending on the enzyme, and are most effective when paired with an isoschizomer (a restriction endonuclease that recognizes the same sequence), such as HpaII and MspI, respectively. 5mC antibody detection or enrichment methods rely on the specificity of monoclonal antibodies to 5mC.
Although all methods are effective at discriminating methylation differences using a variety of downstream targeted assays, restriction enzyme-based approaches have a disadvantage in being limited only to assay sites recognized by the enzymes used (5-6% of total methylated CpGs), though this may be tempered somewhat by the ability to combine different enzymes to expand coverage. Antibody-based methods rely on enrichment of methylated DNA, so are less quantitative and specific to individual CpG sites than bisulfite conversion or enzyme-based approaches (Laird 2010).
For targeted gene loci of interest, bisulfite treatment of DNA is followed by polymerase chain reaction (PCR) amplification using primers designed to recognize the converted sequence. Using the traditional Sanger sequencing method, PCR products are cloned and individual alleles sequenced. Pyrosequencing (PSQ) is a "sequencing by synthesis" platform that can quantify the proportion of individual nucleotides at a given position in a sequence [e.g., singlenucleotide polymorphisms (SNPs) or, relevant herein, cytosine versus thymine], providing the ability to detect small differences in methylation among samples or groups due to much greater depth of coverage than Sanger sequencing (Tost and Gut 2007). EpiTYPER offers a similar depth advantage for quantifying sequence mixtures, but instead uses a base-specific cleavage and matrix-assisted later desorption/ ionization time-of-flight mass spectrometry (MALDI-TOF MS) approach (Thompson et al. 2009).

Assessment of Global DNA Methylation
For assessing the impact of environmental exposures relevant to children, a global assessment of total levels of DNA methylation is often desired. The major challenge to the field is that most of the global DNA methylation assays have not been compared for accuracy with a more gold-standard approach such as bisulfite sequencing, and thus may be influenced by a variety of reagent or amplification biases (Laird 2010). A recent communitybased benchmarking study of DNA methylation assays concluded that global DNA Figure 2. DNA methylation dynamics throughout the human life span. During gametogenesis, the DNA methylation is erased in the primordial germ cells (PGCs) and then acquires new methylation profiles that are in large part sex-dependent, including the methylation present at imprinted genes. At fertilization, the parental pronuclei are erased of nearly all methylation (imprinted genes and "escapees" resist this demethylation-see text). Around the time of implantation, new DNA methylation information is established on the diploid chromosomes in a manner that will aid differentiation of cells to become trophoblast versus embryonic tissues, formation of the three germ layers and then differentiation into the somatic tissues. Many scientists believe that the highly dynamic nature of the genome-wide methylation profiles during these reprogramming and rapid growth periods of development represent windows of vulnerability where an environmental exposure could cause detrimental shifts in methylation by disrupting the fidelity of these reprogramming processes.
volume 125 | number 4 | April 2017 • Environmental Health Perspectives methyla tion assays showed lower correlations with each other compared to methods for absolute methylation detection of targeted regions (Bock et al. 2010). High-performance liquid chromatography (HPLC) tandem mass spectrometry (LC-MS/MS) can accurately compare total 5mC with total cytosine in a sample, but it requires large amounts of DNA and may be a less sensitive method than other approaches (Lisanti et al. 2013). Analysis of common repetitive sequences such as LINE-1 by bisulfite treatment and PSQ is one of the most common methods for clinical or epidemiologic samples. PSQ of Alu repeats also has been performed, but the global methylation levels are much lower than those of LINE-1 or genome-wide sequencing, suggesting that complexity of sequence variation of this repeat or the evolutionary context is influencing methylation results (Lisanti et al. 2013;Nelson et al. 2011). LUMA uses a methyl-sensitive restriction digestion followed by PSQ, but was found to be less accurate than LINE-1 or LC-MS/MS on the same samples (Lisanti et al. 2013).

Genome-Scale Approaches
Microarrays have long been the method of choice for profiling epigenetic marks on a genomic scale, with several platforms and protocols available for DNA methylation (Schones and Zhao 2008). Many of the early platforms used restriction enzyme digests and methylated DNA immunoprecipitation (MeDIP) with an anti-methylcytosine antibody to identify regions of differential methyla tion by hybridization to oligonucleotide arrays produced in house and by companies such as Agilent and Nimblegen. These include Comprehensive High-throughput Arrays for Relative Methylation (CHARM), in which restriction enzyme McrBC is used to cut methylated DNA and compare to the uncut input DNA (methylated plus unmethylated), among others (Ladd-Acosta et al. 2010). These approaches have resolution sufficient to detect regions of differential methylation and have been used successfully in studies of target tissue in which exposure or disease produced substantial methyla tion differences among experimental groups (Irizarry et al. 2009;Ji et al. 2010). The coverage of genomic elements (e.g., promoters, gene bodies, CpG islands, shores) depends on the density of probes present on the platform used.
More recently, Illumina developed arrays that allow assessment of single CpG sites, as opposed to regions, at a more quantitative level using bisulfite conversion enabling absolute quantification of methylation levels and detection of small exposure-or diseaseassociated methylation differences both in target and surrogate tissues (Breton et al. 2009;Morales et al. 2012). The first Illumina 27k array provided coverage for only CpG islands in the human genome, whereas the newer Illumina Infinium HumanMethylation450 BeadChip ("450K array") provided comprehensive coverage for 99% of Refseq genes with 20 probes per gene on average covering both promoter and gene body as well as CpG islands in the genome (5 probes on average), CpG island shores (5 probes on average), and more distant CpG motifs such as CpG shelves (4 probes on average). This has been the most commonly used platform for genomic analysis of DNA methyla tion in human cohorts and is especially advantageous for children's studies with limited samples, because only 250 ng DNA per sample is needed. However, this platform is not available for model organisms commonly used in epigenetic research including mice. In early 2016, Illumina replaced the 450K array with the Infinium MethylationEPIC (EPIC) array which retains > 90% of the original probe content while adding 350,000 CpGs in enhancer regions to improve detection of differential methylation at > 850,000 methylation sites and still requiring only 250 ng DNA per sample (Moran et al. 2016).
Next-generation sequencing technologies are alternative and increasingly used platforms for genomic assessment of altered methyla tion (Plongthongkum et al. 2014). They include methods that detect regions of differential methylation based on peak finding such as the sequencing analog of MeDIP (MeDIP-seq), Methylation-sensitive Restriction Enzyme sequencing (MRE-seq), and Methyl-CpG Binding Domain (MBD) protein-enriched genome sequencing (MBD-seq). Similar to analogous array-based technologies, these platforms enable detection of more pronounced methylation differences at a level of a region. More quantitative approaches rely on bisulfite conversion and include reduced-representation bisulfite sequencing (RRBS) (Boyle et al. 2012) in which MspI digestion is used to enrich for the most CpG-rich regions of the genome. Also, target enrichment methods based on hybdridization to oligonucleotides interrogate the most informative areas of the genome, regardless of their CpG density. Both RRBS and hybridization-based target enrichment approaches allow for assessment of absolute levels of DNA methylation at each CpG site and for detection of small methylation changes. However, RRBS coverage is restricted mostly to CpG islands, and coverage varies between individual samples. Hybridization-based capture approaches can be customized to target genes or regions of interest, but this approach showed lower reproducibility compared with ampliconbased bisulfite sequencing of targeted regions. Whole-genome bisulfite sequencing (WBGS) techniques have not been used widely in exposure and disease studies in human cohorts and animal models due to the expense and the complexity involved in the analysis of such large data sets. However, for most epidemiology studies high coverage of individual CpG sites is not required, and indexed sequencing libraries from 100 ng of DNA can achieve depth of 0.2× to 3× coverage at a fraction of the cost, and represent the most unbiased representation of CpGs in the genome. AmpliconBS, in which 10-20 targeted PCR amplicons from bisulfite DNA are pooled and sequenced, outperformed most other absolute targeted DNA methyla tion assays in a community-based benchmarking study (Bock et al. 2010).
At the present time, however, most publicly available data sets have been collected on the Illumina 450K array platform, and analysis methods for this platform have reached maturity , whereas those for sequencing-based approaches are still under development (Plongthongkum et al. 2014). Using this platform therefore offers a great advantage of easy comparison across different studies and relatively broad availability of published studies for validation purposes.

Integrative Data Analysis for DNA Methylation in Birth Cohort Studies: Challenges of Data Processing and Statistical Analysis
Early-life exposures typically produce relatively small effects on DNA methyla tion. Thus, maximizing data reliability via stringent quality control and data processing procedures, as well as statistical power to detect smallscale changes, is crucial for identifying environmental epigenetic links. Here we discuss these principles with regard to birth cohort and other longitudinal children's studies evaluating environmental factors as they apply to two widely used bisulfite-treatment methodologies: a) quantitative targeted DNA methylation analysis by PSQ and b) epigenome-wide analysis with the Infinium 450K or EPIC array [we refer readers to recent publications that provide more detail on specific aspects of the 450K array pipeline, data processing, and analysis (Heiss and Brenner 2015;Maksimovic et al. 2015;Morris and Beck 2015;Robinson et al. 2014;Yuan et al. 2015)].
Approaches to analyze DNA methyla tion data from birth cohorts or other longitudinal children's cohorts fall into three broad categories based on the timing of available data and the hypotheses: a) cross-sectional, b) longitudinal, and c) mediational analyses. Longitudinal analysis is optimal to assess the impacts of early-life and concurrent exposures on DNA methylation and intraindividual variability in DNA methyla tion "drift" over time (Issa 2014). The ultimate goal is to assess whether epigenetic change acts as a mediator between environment and outcome (e.g., in utero exposure and altered childhood growth trajectory). Linear regression and structural equation modeling are both commonly used for mediational analysis (Baron and Kenny 1986;Li 2011). Scale restriction makes detailed assessment of all interrogated CpG sites within a region or across the genome as mediators difficult. Thus, first applying dimension reduction methods such as principal component analysis  to the data can help investigators select a smaller number of variables to represent methylation at key regions in mediational analysis. When analyzing DNA methylation data to address hypotheses in any of the three categories, the nature of DNA methylation data-both continuous and finite with a beta distribution-must be considered. Variance stabilizing transformations should be considered to avoid violating the assumption of constant variance in normal regression, and beta regression should be used when DNA methylation is not normally distributed.

Key Covariates for DNA Methylation Analysis
Regardless of the source of DNA methylation data or type of analysis, covariates and confounders to consider when assessing relationships between environmental factors and DNA methylation in neonatal samples or childhood samples minimally include gestational age, sex, maternal smoking status, socioeconomic status, and race (Goodrich et al. 2015;Joubert et al. 2012;Murphy et al. 2012;Vilahur et al. 2014;Yousefi et al. 2015a). Given sex differences observed in DNA methylation and response to environmental exposures, sex-stratified analyses or examination of sex-exposure interactions are also worthwhile statistical pursuits when sample size allows Vilahur et al. 2014).
Common source tissues for DNA collected in neonatal and children's studies (e.g., placenta, buccal, blood, saliva) are heterogeneous with regard to cell type composition. Several studies have demonstrated that the degree of DNA methylation at specific loci is dependent on the type of tissue under examination (Davies et al. 2012;De Bustos et al. 2009;Lowe et al. 2015), and this variation can exceed the variation across individuals (Lokk et al. 2014). Cell-type heterogeneity within tissues can confound statistical analyses when cellular composition between controls and cases is divergent. Thus, when DNA is not obtained from sorted cells, adjustment for cell-type percentages in the main model or in subsequent sensitivity analyses will increase the reliability of associative findings whenever differential counts are available (Burris et al. 2013;Huen et al. 2014;Tarantini et al. 2013;Yousefi et al. 2015b). This is especially important in children's environmental health research because some exposures (e.g., arsenic) and age can affect both DNA methylation (Koestler et al. 2013;Yuan et al. 2015) and cell-type populations (Bellamy et al. 2000;Cheng et al. 2004;Kile et al. 2014). Houseman et al. proposed a method, based on data from a reference sample of isolated purified leukocyte subtypes (Houseman et al. 2012), that has been refined using 450K data available on leukocytes subtypes (Reinius et al. 2012) and more recently using data from cord blood leukocyte subtypes ). This method allows for changes in the relative proportions of cells associated with exposure or phenotype to be assessed by estimating the proportion of individual cell types, and this could provide important insights into the true effects of exposures on children's health outcomes. The accuracy, reliability, and utility of this estimation from arraybased DNA methylation data were subsequently demonstrated in a series of reports (Accomando et al. 2014;Koestler et al. 2013).
As more reference data become available for additional leukocyte types or for various specific cell types from other tissues, potentially from data available through the Roadmap Epigenome Project, these types of estimations could become more widely available. Until that point, Zou et al. (2014) and Houseman et al. (2012Houseman et al. ( , 2014 have developed reference-free methodologies, which use a surrogate variable type approach to control for cellular heterogeneity in the absence of a reference data set, approaches well-suited for environmental epidemiology studies making use of non-blood biological samples for analysis (e.g., placenta). However, the use of referencefree methods assumes that outcome-related changes will be larger than cell type-specific changes, which may not always be the case.

Statistical Model Selection for Targeted DNA Methylation Analysis
Statistical model selection with regard to treatment of individual CpG sites is important when examining associations between exposures and DNA methyla tion at targeted regions (e.g., PSQ data). In the aforementioned simulation studies, maximum statistical power was achieved when using a generalized linear model (GLM) that treated methyla tion at CpG sites within the bisulfite sequenced region as repeated measures with unstructured variances and covariances (Goodrich et al. 2015). This modeling strategy has the ability to identify exposure-DNA methylation relationships for the entire region as well as at individual CpG sites with the addition of an interaction term. An alternative modeling strategy that captures both intragenic CpG site-specific differences and variation between technical replicates utilizes linear mixed-effects regression with random effects for sites and replicates (Burris et al. 2012;Huen et al. 2014;Vilahur et al. 2014). The aforementioned models are used primarily for cross-sectional or longitudinal studies with methylation data at a single time point (e.g., prenatal exposure and DNA methylation in childhood). Analysis methods for longitudinal studies with DNA methyla tion data from multiple time points (e.g., birth and adolescence) include generalized estimating equations (GEE) which treat DNA methylation data from the same individual at different times as a cluster (Hou et al. 2014;Zeger et al. 1988). Mixed-effects models for repeated measures also can be used to examine the association of exposure with methylation at a targeted region (e.g., LINE-1 repetitive elements) from multiple time points (Baccarelli et al. 2009).

Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChips
Before epidemiological analysis can be performed with 450K or EPIC BeadChip data, as with any data file, it is imperative to perform quality assurance and quality control checks and data preprocessing to ensure that technical variation has been minimized and that remaining observations are free from several common sources of bias. Here we provide a brief overview of the typical steps involved and software offerings available for these preprocessing steps (Figure 3, steps 1-4). All analysis pipelines described here for 450K data can be applied to data from the new EPIC BeadChip. Following preprocessing, all software options can return a matrix of methylation percentages, or β values ranging from unmethylated (0) to completely methylated (1), for all retained samples and CpGs. Analysis can be run using this β scale or can be logit transformed to M-values to avoid heteroscedasticity when modeling (Du et al. 2010).

450K Statistical Methods: Linear Models
To date, epidemiological analysis with 450K data has generally relied on linear modeling approaches similar to those for PSQ, only on a larger scale due to the increased number of CpGs interrogated. However, as algorithmic batch effect removal is often performed during 450K preprocessing, explicitly modeling batch as a random effect or additively as a model covariate may not be required. Several methodologies have been proposed for removal of batch effects (Fortin et al. 2014;Heiss and Brenner 2015;Storey 2007, 2008;Maksimovic et al. 2015;Pidsley et al. 2013;Teschendorff et al. 2011), and ComBat volume 125 | number 4 | April 2017 • Environmental Health Perspectives (Johnson et al. 2007;) appears to be one of the most effective. When this is the case, an ordinary GLM can be used in cross-sectional analyses to determine the change in DNA methylation per unit change in an exposure of interest, adjusting for the key covariates explored above. In the longitudinal setting, again standard linear methods such as mixed effects or GEE models are appropriate (Figure 3, step 5).

450K Statistical Methods: limma-Based Estimators
In addition to ordinary regression performed with standard statistical software, use of the limma linear modeling Bioconductor package has become a popular option in 450K data analysis (Smyth 2005). The limma package has been incorporated into common 450K analysis pipelines (e.g., the "dmpFinder" function in minfi and the "champ.MVP" in ChAMP) Morris et al. 2014). The limma model allows for stable estimates when performing analysis with small sample sizes (Smyth 2005).

450K Statistical Methods: Causal Approaches
The most widely used approach to mediation analysis is the Baron and Kenny framework (Baron and Kenny 1986), which requires a series of regression models to determine whether a variable can be considered a mediator. This approach is hindered by its low power to detect an effect (Fritz and MacKinnon 2007). Further, the presence of mediation is indirectly inferred by looking at the relationship of a) the independent variable with the mediator and b) the mediator with the dependent variable rather than estimating that actual indirect effect itself (Hayes 2009). Parametric linear models are appealing in the context of array-based DNA methylation data analysis, but it may be preferable to implement semi-or nonparametric models that involve fewer assumptions. Two types of methodologies that have been applied to genomics and epigenomic studies are the Targeted Minimum Loss-Based Estimation (TMLE) (Figure 3, step 6) and Mendelian Randomization.
TMLE is a double robust semiparametric efficient estimation method, and is tailored to minimize bias and maximize precision as proven by theory (Chambaz et al. 2011;Robertson 2005 Laan et al. 2007), to obtain an initial estimate of the regression of the outcome on the target variable and the confounders, and then using a targeted bias reduction step that incorporates an estimate of the propensity score. SuperLearner provides a substantial modeling advantage because it uses cross-validation to select the best weighted combination of estimators from a user-defined library of candidate estimators and has been shown to be theoretically and practically superior to any of the individual candidate estimators in the library (van der Laan and Dudoit 2003;van der Vaart et al. 2006). The model library can include as diverse a set of models as can be conceived by the analyst-for example, any flavor of linear model, spline-based techniques (Friedman 1991), regression tree algorithms such as Random Forest (Breiman 2001) or Bayesian Regression Trees (Chipman et al. 2010), or many others could all be used each with many different tuning settings. The TMLE method can readily be implemented using the TMLE R package (Gruber and van der Laan 2012). Additionally, the TMLE theory has recently been optimized to perform similar estimation in the longitudinal setting (Petersen et al. 2014;, and now a dedicated L-TMLE software package has also been released (Figure 3, step 6) (https://github.com/ lendle/tmlecte).
TMLE is an optimal way to perform detailed mediation analysis. The mediating role expected for biological factors such as DNA methylation can be conceptualized as the natural indirect effect (NIE) described in the causal inference literature (Figure 3, step 6) (Lendle et al. 2013;Petersen et al. 2006). Under a counterfactual framework, the NIE is simply the difference between natural direct effect (NDE), or the effect of the exposure on the outcome holding the intermediate variable at what would have been its value at a reference exposure level, and the total effect of the exposure on the outcome. Software to estimate each of these quantities (NIE, NDE, and the total effect) by TMLE has recently been made available in the tmlecte package (https://github.com/lendle/tmlecte).
The Mendelian randomization approach has been utilized in epidemiologic studies as another methodology for estimating causal inference (Davey Smith and Hemani 2014; Davey Smith 2012, 2015). It relies on use of genetic polymorphisms that are a) highly associated with the modifiable intermediate but b) not associated with the health outcome of interest. The strength in this approach is that the estimate of the relationship of the highly correlated genetic variant with the outcome of interest is less prone to biases related to unmeasured confounding and reverse causation. Mendelian randomization has also been applied to epigenomic studies (Binder and Michels 2013;Richmond et al. 2016). To study mediation in particular, a two-step process has been described (Relton and Davey Smith 2012). The first step involves identification of a genetic variant that is strongly associated with the environmental exposure of interest (e.g., smoking, phthalates). Next a genetic proxy highly associated with DNA methylation (e.g., CpG site or region) will also be utilized. From there, the causal relationships between the exposure and the intermediate and also the intermediate and outcome can be estimated. Limitations of this approach include the requirement of larger sample sizes and the potential for genetic confounding that can be introduced by population structure (Relton and Davey Smith 2015).

450K Statistical Methods: DMRs
As DNA methylation analysis proceeds, researchers have increasingly focused on identifying differentially methylated regions (DMRs), also known as regions of altered methyla tion. DMRs are of interest for two reasons: a) CpG sites are not expected to function independently, but rather in groups to regulate gene expression, and b) observed differences in methylation and individual sites are more likely to be believed if neighboring sites show similar changes. Due to the increasing interest, approaches for DMR identification have proliferated in the last few years Butcher and Beck 2015;Jaffe et al. 2012;Pedersen et al. 2012;Peters et al. 2015;Sofer et al. 2013). An overview of currently available methods is shown in Table 1. These fall into two conceptual categories: a) those that perform individual CpG analysis first and then combine results into DMR groupings Butcher and Beck 2015;Jaffe et al. 2012;Pedersen et al. 2012;Peters et al. 2015), and b) those that group CpGs first and draw inference after the fact (Sofer et al. 2013). In the first group, measures of site-level results (e.g., an effect size or p-value) are typically aggregated across genomic coordinates according to smoothing functions, correlation structure, and/or genomic annotation, followed by drawing statistical inference on putative DMRs according to methodspecific definitions. The second approach, of which aclust is the only current example, applies a clustering algorithm to reduce dimensionality prior to performing statistical tests of association.
Although several DMR-finding packages exist, this field is still early in its development, and several aspects of method performance require additional characterization. This includes additional validation of the functional impact of identified DMRs in terms of gene expression Yuan et al. 2015). Further, sensitivity analysis on DMR calls has been rare to date. For example, for site-first-type approaches little is known about how effect-size outliers may drive the dimensions of called DMRs. Similarly, the stability and accuracy of DMR boundaries has not been sufficiently evaluated. Another obstacle that all DMR-finding methods must confront is how to appropriately adjust for multiple comparisons, because it is often difficult to determine what constitutes an "independent" test.
DMR finding in the context of longitudinal cohorts, especially those involving infants and children, raises still further considerations. Foremost is the issue of the temporal stability of DMRs called by existing methods. Although much attention has been devoted to age-related changes for individual CpGs, this topic has only just begun to be explored at the level of DMRs in studies involving children (Yuan et al. 2015).
Overall, many of the obstacles faced in developing robust DMR-finding algorithms stem from the lack of a clear definition for DMRs. This can be especially problematic in the sparse-data scenario of array-based DNA methylation analysis where many of the useful data are missing. However, as data from WGBS become increasingly available and DMR functional characterization proliferates, these methods are likely to improve.

Data Integration and Visualization
Following quality control, data processing, and statistical analyses, visualization of descriptive data and analysis results can be implemented using a variety of approaches. Typically packages in R can be used as well as independent coding or use of general graphics tools. Common useful plots for visualizing DNA methylation data include a) pairwise correlation of methyla tion values across CpGs according to genomic location; b) Manhattan plots displaying -log 10 (p-values) from statistical analysis according to genomic location of CpGs; c) general heat maps to display correlation of methyla tion values and/or coefficients from statistical models; and d) lollipop-like visualization to compare methylation values across samples, tissues, or other categories. Approaches implemented depend on the type of data analyzed.
R packages that can implement some of all of the above include MethVisual (Zackay  and Steinhoff Table 2). Most of these enable implementation of site-level as well as region-level DNA methylation analysis based on the 450K array including analysis pipeline and processing steps. Although most are implemented with R code, some tools such as coMET and MethTools offer a Shiny web service that can be used as an alternative to the programming method for generating plots, increasing the opportunity for use by researchers working outside of R.

Approaches for Validating/ Replicating Loci that Emerge as Top Hits from Primary Analysis
To understand the likelihood that technically and biologically "real" associations have been identified between an environmental exposure and differences in DNA methylation, several approaches for validating or replicating results can be employed. These include technological or platform validation, comparing results with other results published in the literature, replication using a different population, and meta-analysis. Technological validation typically involves using another platform, such as PSQ if results were originally generated on the 450K, to measure DNA methyla tion of a handful of CpG sites of interest in the same population in which the original associations were identified. Many individual CpG sites on the 450K array appear to cross-validate well with PSQ (Roessler et al. 2012). Correlation coefficients can then be computed to directly compare the two measurements in the same individuals.
Perhaps the ideal approach for replicating environmental exposure-CpG methyla tion associations would be to conduct the exact same methylation measurements in a separate yet comparable population with similar measures of environmental exposure. The same statistical modeling approach can be employed in both populations, making direct comparison of results, including magnitudes and direction of effect, feasible. The disadvantages to this approach are the identification of a comparable population, and the time and costs associated with conducting the replication measurements. A good example of this approach is in the paper by Joubert et al. (2012) in which CpG loci associated with maternal smoking were initially identified using the 450K platform in the Norwegian Mother and Child Cohort study (MoBa), and then 26 significant loci were assessed in a separate 450K analysis in the Newborn Epigenetics STudy (NEST). In both cohorts, the platform was the same, methyla tion was measured in cord blood, exposure was categorized in a similar way (any smoking by the mother during pregnancy), Caucasian/ European ancestry participants were included in the analyses (subset of NEST), and the statistical model and covariates were aligned. This approach also has been used in several studies that first identified CpG sites using arrays, and then validated the loci using PSQ (Breton et al. 2009;Devaney et al. 2015;Lazarus et al. 2015).
An alternative approach for large studies is to split the population into a discovery group and a replication group. A question of adequate sample size for the replication study often also arises. For practical considerations, often the replication population is smaller than the original population (Argos et al. 2015;Joubert et al. 2012). However, the proportion exposed should also be taken into account. For example, the NEST population (n = 36) used for replication of the MoBa findings included 18 smokers (50% exposed) and 18 nonsmokers (50% unexposed), which enhanced statistical power given the relatively small sample size (Joubert et al. 2012). Although there are no standard guidelines in place when choosing a replication analysis, a strategy that is anticipated to achieve adequate statistical power to detect the observed effect size is warranted. Overall, this approach has been successfully used and greatly enhances the confidence in observed results when the original results are replicated.
Last, in recent years the creation of large consortia in which like datasets are combined in a harmonized fashion to increase the power to detect associations has gained appeal. Several consortia with a focus on epigenetics have been formed including many GWAS (genome-wide association studies) consortia [CHARGE (Childhood Autism Risks from Genetics and the Environment), WHI (Women's Health Initiative), GIANT (Genetic Investigation of ANthropometric Traits), others], some of which also have DNA methyla tion data for adults (CHARGE), and newborns and children (PACE). The Pregnancy and Child Epigenetics Consortium (PACE) was created in 2013, and now combines data sets for > 20 cohorts. Recently, a first PACE paper focused on the effects of maternal smoking on the 450K data in the cord blood from 13 participating cohorts has been published (Joubert et al. 2016). It has identified 6,073 loci differentially methylated at genome-wide significance including 2,965 CpGs that are novel-orders of magnitude more loci than identified in any previous study on effects of maternal smoking. Remarkably, it has also replicated most of the main results previously found in individual studies.
Consortium analyses can be extremely powerful in answering a variety of study questions, depending on the availability of exposures and end points measured in the consortium participants. Consortium analyses typically require each study to independently implement a common analysis protocol and provide the results to a central location for meta-analysis. This can accommodate multiple studies, much more than replication analyses, and may be more stable to population heterogeneity, depending on the participants. The ability to accommodate a greater number of studies, increasing sample sizes into the thousands, has substantial impact on statistical power. The approach also promotes data sharing, as often required by the National Institutes of Health (NIH). However, strong coordination and communication across research groups is required to carry out successful meta-analysis, and often requires greater work "up-front" than simpler replication analyses.
Regardless of approach, not all loci will replicate. There are a number of reasons why replication may not be achieved, though it is often difficult to discern the precise reason for Data analysis and visualization R Appropriate for various types of DNA methylation data; specific to analysis and data needs; independent of data input and format requirements of packages but may require more analysis time and skill compared to other methods any given analysis. Possible reasons for failure to replicate include a) the original result was a false positive, b) technical or biological differences in the laboratory measurement of DNA methylation introduce a bias or measurement error, or there was c) variability in exposure assessment or d) differences in the statistical approach between the original and replication analyses. In fact, epigenetics studies may have stricter replication requirements compared with studies with genotyping data (GWAS) due to technical and true variation across study populations. Nevertheless, studies demonstrating lack of replication provide important information (Oliver et al. 2013;Wei et al. 2012), reduce publication bias, and may improve interpretation of complex data.

Magnitudes of Effect
The goal of epigenetic studies linking environmental exposures and children's health is to aid in the understanding of how environmental factors can influence health phenotypes at birth and over the course of a lifetime. Thus, it is important not only to identify valid and replicable variation in DNA methyla tion or other epigenetic mechanisms with environmental factors or outcomes, but to begin to consider how this variation can be contributing to phenotypes. Understanding the functional importance of environment-associated DNA methyla tion variation is challenged by the generally small to moderate differences being observed in relation to various environmental exposures. Initial studies of in utero exposure and DNA methyla tion in offspring focused on repetitive element DNA methylation, as a marker of global DNA methylation status. For example, in a Bangladeshi cohort, comparing the highest to lowest quartiles of maternal urinary arsenic was associated with increased LINE-1 methyla tion of 1.36% [95% confidence interval (CI): 0.52, 2.21%] (Kile et al. 2012). Among Mexican-American children in rural California, a 1-log increase in maternal serum o,p´-DDT levels was associated with a reduced ALU methylation of 0.37% (Huen et al. 2014). Contrast these differences with the reductions that could be observed comparing pathologically normal and tumor tissues, where differences can be 5-20% for LINE-1 (Cho et al. 2010;Matsuda et al. 2012;Stricker et al. 2012;Zhang et al. 2012) and 5-10% for Alu (Cho et al. 2010;Matsuda et al. 2012). In cancer, this marked hypomethylation of repetitive elements is thought to contribute to widespread genomic instability, which is a hallmark of most malignancies, but the functional importance of relatively small differences in these repetitive elements observed in nonpathologic tissues remains an outstanding question (reviewed by Nelson et al. 2011).
Studies focused on exposure-associated differences in the methylation status of specific candidate genes, as well as more recent epigenome-wide association studies, have commonly found only small effect estimates in regard to differences in methyla tion by exposure. In general, the differences in methylation observed between groups of exposed versus unexposed individuals, or in relation to some exposure, are generally on the scale of 2-10%, although in some cases even smaller differences have been reported (Table 3). What is striking is that in many cases there is a strong statistical significance (i.e., p-values) reported with these small differences suggesting that there is little variability in the measured values. In a number of cases, these differences have been validated in different study populations and even among different ages. This is particularly true for the work that has been done linking maternal smoking during pregnancy and DNA methyla tion in infant blood, further suggesting the robustness of these relatively small effects (Joubert et al. 2012;Knopik et al. 2012;Lee et al. 2015;Markunas et al. 2014;Richmond et al. 2015).
One of the most common ways to determine the functional consequence of an observed change in methylation is to study the impact of methylation on gene transcription. Made more powerful by simultaneous extraction and analysis of DNA and RNA from the same cell populations, DNA methyla tion levels can be correlated with the RNA levels to determine if there is a positive, a negative, or no correlation. In most cases, DNA methylation in gene promoters is negatively associated with transcription, whereas methyla tion in gene bodies is positively correlated with expression (Ball et al. 2009), consistent with the known effects of DNA methyla tion on chromatin condensation and transcriptional activity.
Small changes in methylation can have a strong effect on transcriptional activity. Analysis of the imprinted insulin-like growth factor II (IGF2) gene in umbilical cord blood determined that for every 1% change in methyla tion at the IGF2 differentially methylated region, there was a halving (increased methylation) or doubling (decreased methylation) of IGF2 transcription ). This change is equivalent to what would be expected if this gene had a complete loss of imprinted expression. The scale of this change is also equivalent to what is often observed in cancer due to loss of imprinting. Another study examining associations between mercury exposure (measured from toenails) and DNA methylation in placenta as this relates to neurodevelopmental outcomes found over 300 CpGs that had methyla tion differences greater than ~ 12.5%, comparing tertiles . The methylation levels of the CpGs analyzed in EMID2 were also moderately inversely correlated with transcription (correlation coefficients, -0.33 to -0.45). Study of DNA methyla tion associated with arsenic exposure in blood also identified correlations between methylation and expression for 28 CpGs, of which about onethird were positively correlated and one-third negatively correlated with expression (Argos et al. 2015). The remainder had multiple gene expression probes associated with each CpG, with the gene probes showing both positive and negative correlations with expression.
It is important to note that beyond the potential functional ramifications for changes in DNA methylation, the covalent nature of this molecular modification and its mitotic heritability provide a means to utilize the particular changes, alone or in combination, as biomarkers of a) past exposure, b) disease risk, or c) for disease detection. DNA methylation-based tests are already in use for detection of colorectal carcinoma (e.g., Cologuard®; Exact Sciences, Madison, WI), and are currently being developed for a number of other types of malignancies. Other methylation changes may be able to predict risk of developing a disease (Cui et al. 2003), information useful for implementation of strategies to reduce risk. Methylation changes may also provide biological documentation of historical exposures or adverse conditions, such as that reported for the individuals subjected to famine conditions in utero during the Dutch Hunger Winter in the 1940s in which exposure was associated with small but significant changes in methyla tion that were detectable in peripheral blood leukocytes six decades past the exposure (Heijmans et al. 2008).

Genomic Contributions to DNA Methylation Variation
It is increasingly apparent that future investigations in environmental epigenetics will also have to consider genomic context. In a study by Soto-Ramirez et al. (2013), the IL-4R SNP rs3024685 carried a significant risk for asthma only when controlled for IL-4R methyla tion. In a study of children ages 2-4 years in Spain, researchers showed that hypomethylation of CpG site in the arachidonate 12-lipoxygenase gene not only correlated with wheezing, but also correlated with the genotype for haplotype-tagging SNP rs312466 (Morales et al. 2012). Genomic variation in the promoter of the nitric oxide synthase (NOS2) gene in combination with air pollution exposure affected iNOS methylation levels (Salam et al. 2012). Specifically, increased 7-day average PM 2.5 exposure was associated volume 125 | number 4 | April 2017 • Environmental Health Perspectives with lower iNOS methyla tion, NOS2 promoter haplotypes were globally associated with NOS2 promoter methyla tion, and there was a 3-way interaction among one common promoter haplotype, iNOS methylation level, and PM 2.5 (particulate matter ≤ 2.5 μm) exposure on exhaled nitric oxide levels. A recent study of paraoxonase gene PON1 demonstrated how one can characterize multiple sources of variability-genetic, epigenetic, and expression-to determine important modulators of candidate susceptibility genes. Using causal mediation analysis, the study provided evidence that DNA methylation mediates the relationship between PON1 -108 genotype and PON1 expression measured by arylesterase activity (Huen et al. 2015).
Another example of the influence of underlying genetic variation was seen in the Brisbane Systems Genomics Study family cohort, which determined that the genetic contribution to CpG methylation state was highly variable and was dependent on degree of heritability. The effect size of such highly heritable cis-acting SNPs explained 50-85% of the variation in methylation at these sites (Shah et al. 2014). The importance of incorporating both genetic and environmental covariates in longitudinal study design was illustrated by Shah et al. (2014) in the Lothiah Birth Cohort, in which single nucleotide variation was associated with CpG methylation in 12/37 (32%) of CpG sites that had previously been identified strongly associated with smoking exposures. A further evaluation of the two CpG sites with highest repeatability and heritability found underlying SNP effects that explained 10% of the methylation variation, which was similar to the original effect size of smoking (Shah et al. 2014).
In this case, estimates of both genetic and environmental contributions are significantly associated with CpG methyla tion variation and drift or lack of drift over time.

Tissue or Cell Type Specific Effects
Most studies of the environmental impact on epigenetics in a children's health context are using accessible biological samples, including peripheral or cord blood, placenta, or buccal samples. These samples are constituted by a heterogeneous collection of cells. The differences in extent of DNA methyla tion observed between exposure groups or outcomes thus represent the fraction of the alleles within that given heterogeneous sample which demonstrate methyla tion. Essentially there is a dilution effect for the magnitude of changes or differences in methyla tion amongst this sample. To avoid this, one suggestion might be to try and reduce the heterogeneity, by enriching for certain cell populations. For example, in blood, one could focus on a specific lymphocyte subtype, such as CD4 + cells, which could be isolated using magnetic bead or FACS (fluorescence-activated cell sorting) technology. Although a desirable approach, there are still some limitations which need to be considered. First is the selection of the cell of interest, which often is not known or which may differ depending on the type of phenotype being interrogated. Second, even technically proficient cell enrichment does not lead to a perfectly homogeneous cell population-even within a given cell type, there are separate clonal outgrowths derived from different stem cell populations-so dilution of the effect may still be an issue. The technical difficulty of this type of enrichment also cannot be overlooked. In blood and most tissues, such purification is really only possible with freshly collected samples, because intact cell membranes and the cell type specific epitopes on those membranes are required for isolation. In addition, although FACS approaches could allow for multiple cell types to be isolated simultaneously, this requires significant expertise and appropriately validated, reproducible, reliable antibodies that can be used to select cell populations. This makes applying such enrichment techniques technically challenging in most existing cohort studies, because these studies are making use of archived samples, no longer able to be subject to such enrichment. Despite these advances, even in EWAS (epigenome-wide association studies) controlling for cell composition, findings of specific differentially methylated loci or genes associated with exposure or outcomes may still represent cellular composition effects. An example might be activation of specific leukocytes (i.e., NK cells, monocytes) to their active forms. Although these cells may still exhibit similarities in their surface moieties, at the DNA level, methylation may be involved in these final stages of differentiation. If environmental factors drive these differentiation processes, they might be observed as differentially methylated loci. A recent study by Bauer et al. (2015) demonstrated this possibility, identifying a specific T-cell subset characterized by hypomethylation of cg19859270, within the GPR15 gene, a loci that has repeatedly been identified to be hypomethylated amongst smokers. Although this does lead to different interpretations of findings, these findings are nonetheless important, and in fact might provide a better understanding of the functional impact of observed differential methylation.
Although identifying such tissue-specific effects may be important in indicating changes in the cellular landscape related to environmental exposures, there still remains an outstanding question of whether there can be environmentally induced epigenetic changes that could be more broadly identified across tissues. Such findings in humans would parallel those observed in the murine agouti models, where early developmental effects can lead to widespread epigenetic alterations, which in those cases leads to coat color and metabolic effects in the animals (Bernal and Jirtle 2010;Dolinoy et al. 2006Dolinoy et al. , 2007Jirtle 2014). These effects are specifically observed at regions of hypervariable methylation, known as metastable epialleles, which would represent genomic regions that demonstrate low within-person (across tissue) variability in DNA methyla tion, but higher between-person variability. These loci would be particularly sensitive to environmental insults during the early cleavage, gastrulation, and initial embryonic stages, allowing for the consistency of the methyla tion status across various tissues from different embryonic lineages. A recent genomewide scan using bisulfite sequencing revealed the presence of approximately 100 of these metastable epiallelic regions in the human genome, and found that one in the genomically imprinted VTRNA2-1 noncoding RNA was environmentally labile, being affected by the nutritional availability during the conception and early gastrulation period in a number of different cohorts examined (Silver et al. 2015). Additional studies focused on these potentially environmentally labile regions are warranted and may provide the opportunity to demonstrate true epigenetic changes linked to environmental exposures experienced during the earliest points of development.

Epigenome Editing
The development of technologies for locusspecific epigenome editing remains a central challenge in functional genomics, with future applicability to children's environmental health. Developing these technologies may allow for highly targeted assessments of the functional significance of novel findings of altered DNA methylation or histone posttranslational modifications. Many current technologies act globally and cannot target individual loci. For example, pharmaceutical agents, such as azacytidine, are widely used to inhibit DNA methyltransferases, resulting in global hypomethylation in dividing cells (Yang et al. 2010). An advantage of global approaches lies in their well-characterized use as human therapeutics and for basic research in cell lines and animals. Disadvantages, however, include their pleiotropic effects caused by indiscriminate epigenomic activity and propensity to affect biochemical pathways separate from the epigenome.
New methods of locus-specific epigenetic editing have been recently developed that rely upon transgenic technologies. For example, fusions of epigenome-modifying enzymes to programmable DNA-binding proteins hold promise for targeting DNA methyla tion (Maeder et al. 2013) as well as histone acetylation (Hilton et al. 2015) and epiproteomes (Waldrip et al. 2014) at specific loci; but they have drawbacks, for example, because every zinc-finger domain must be custom evolved to target a specific sequence, and target motifs are size limited. One recent innovation in the field of target specific DNA methyla tion is the development of a suite of tools, based on the Piwi-interacting RNA (piRNA) system, to accurately induce DNA methylation of targeted loci in adult tissues (work presently being done under NIH grant ES026877; https://directorsblog.nih.gov/tag/pirna/). The major strength in the piRNA approach is that induced changes in DNA methylation will be propagated by endogenous epigenetic maintenance pathways. Thus, piRNA treatment for both laboratory and clinical use will be acute and systemic, rather than chronic with potentially decreasing effectiveness.

Gains from Longitudinal Studies
Although most epigenomic studies have been cross-sectional to date, the prospect of longitudinal studies holds much promise. For example, the first integrative personal 'omics profiling (iPOP) efforts in 2012 revealed significant dynamic 'omics changes in peripheral blood mononuclear cells (PBMCs) and serum from one generally healthy individual, demonstrating that these comprehensive molecular portraits reflected real-time physiological states and physiological state changes in this individual ; Chen and volume 125 | number 4 | April 2017 • Environmental Health Perspectives Snyder 2013). An important lesson from this personalized medicine proof-of-principle study is that one is her/his best control over time. Different individuals have different baselines, and intrapersonal changes may be masked by interpersonal differences when using casecontrol design. Mouse models such as the one by Kanzleiter et al. (2015) have also demonstrated longitudinal methylomic differences in skeletal muscle cells in response to exercise training. The authors reported 2,762 differentially methylated genes associated with exercise training, and that ~ 13% of these methylomic differences also were associated with differential expression of the corresponding genes. The majority of the affected genes function in muscle growth and differentiation, as well as in metabolic regulation.

Moving beyond DNA Methylation
Population-based studies have focused predominantly on DNA methylation as the epigenetic mark of choice. However, other epigenetics marks, including chromatin modifications, microRNAs (miRNAs), and noncoding RNAs warrant further consideration as the technological and economic hurdles of assessing these marks in large numbers decrease.
Chromatin modifications have long been identified as important epigenomic markers involved in diseases and have been associated with multiple diseases such as cancer (Singh et al. 2015;Su et al. 2015), diabetes, and obesity (Schones et al. 2015). Different sequencing methods have been developed to probe high-dimensional chromatin structures (Rao et al. 2014) as well as chromatintranscription factor interactions (Kellis et al. 2014). All these epigenomic factors may affect downstream gene expression and regulation, which might further lead to changes in physiological states.
In recent years miRNAs have emerged as another epigenetic regulatory mechanism that may play a role in disease onset/pathology by regulating protein interactions. The role of miRNA regulation in cancer is well established. Recently, more studies are emerging showing their association with other diseases, particularly allergic diseases such as asthma and atopic dermatitis (Chen and Qiao 2015;Kan et al. 2015;Knopik et al. 2012;Lv et al. 2014;Omran et al. 2013;Perry et al. 2015;Salam 2014). The majority of these studies have identified miRNA as potential biomarkers (Kan et al. 2015;Li JJ et al. 2015;Lv et al. 2014;Sawant et al. 2015;Simpson et al. 2014). Multiple in-vitro and animal studies indicate that miRNA have a role in asthma development and pathogenesis. The 3´ UTR of the asthma susceptibility gene HLA-G is targeted by three different miRNAs: miR-148a, miR-148b, and miR-152 (Tan et al. 2007). Multiple miRNAs have been implicated in playing a proinflammatory role in asthma (Kumar et al. 2010;Lu et al. 2009;Mattes et al. 2009;Polikepahad et al. 2010). In a recent study in pediatric asthma patients, Nakano et al. (2013) showed a role for hsa-mir-15a in altering VEGFa expression in peripheral CD4 T cells. Pediatric subjects with asthma had lower expression of hsa-mir-15a in their CD4 T cells, which was associated with higher expression of VEGF-a. More in-depth mechanistic studies are needed to understand how miRNA can modulate protein expression and thereby affect downstream immune mechanisms in normal and disease conditions. Taken together, these studies show an important role for miRNA regulation in chronic childhood allergic diseases such as asthma and atopic dermatitis, and warrant further investigation into the role of these miRNAs in regulating the immune system.
Hydroxymethylation has recently been shown potentially to carry biological functions, instead of being just an intermediate product during 5-methylcytosine demethylation (Hahn et al. 2014;Shen et al. 2014). DNA hydroxymethylation has been found to be involved in transcription and chromatin regulation (Iurlaro et al. 2013), contributing to olfactory neuron cellular identity (Colquitt et al. 2013) and to monocyte-osteoclast differentiation (de la Rica et al. 2013;Klug et al. 2013), and the loss of 5 hr mC has been reported to be an epigenetic hallmark of melanoma (Lian et al. 2012). Therefore, the DNA hydroxymethylome could well serve as another epigenomic profile that can provide mechanistic insights into health and disease. As with DNA methlyation, measured effect sizes of these alternative epigenetic marks may also be small, and warrant inclusion in the broader discourse about interpretation of such small differences associated with exposures.

Data Integration
As 'omics data grow, the need for computationally efficient methods of integrating these data sets to better predict disease risk or to better explain biological systems underlying disease has reached a critical juncture. This need is evident in the recent manuscripts published addressing the need for data integration, with various sophisticated bioinformatics strategies proposed to integrate the variety of epigenomic and other "omics" data sets produced by scientists around the world (Génin and Devoto 2015;Gomez-Cabrero et al. 2014;Pineda et al. 2015;Saha et al. 2014;Wachter and Beißbarth 2015;Zierer et al. 2015). In addition, large consortia efforts such as the NIH Roadmap Epigenomics Mapping Consortium, curate data on DNA methylation, mRNA expression, and changes in histones and in chromatin accessibility, annotating these data across a sweeping array of human cell types and creating genome-wide annotation maps. In turn, these maps can be used to produce novel studies of epigenomic changes in development and disease, as well as of the relations among genomic and epigenomic variations (Roadmap Epigenomics Consortium et al. 2015). This type of data warehouse is a valuable tool that can not only inform data integration efforts, particularly from a systems biology perspective, but also inform in silico data validation efforts as discussed earlier.

Conclusion
Our objective in this review was to discuss the state of the science in environmental epigenetics research within the broader context of children's environmental health. We have presented a review of the technological tools available for assessing epigenetic marks, methods for data analysis and visualization, and methods for functional follow-up of identified loci. We note that a common finding in environmental epigenetics studies is the small magnitudes of effect that result from environmental exposures. Although it is reasonable and necessary that we question the relevance of such small effects, we present examples in which small effects persist and have been replicated across populations and across time. We encourage a critical discourse on the interpretation of such small changes and further research on their functional relevance for children's health and adult disease susceptibility. It may be the case that we do not find larger effect sizes-not because they do not exist, but rather because such large shifts may be incompatible with continued development.
Children's environmental health research has made great strides in recent years; yet it is clear that the dynamic nature of the epigenome will require an emphasis on future longitudinal studies in which the epigenome is profiled over time, over changing environmental exposures, and over generations to truly gain a better understanding of the multiple ways in which the epigenome may respond to environmental stimuli. Such longitudinal studies will improve our ability to identify small changes and the consistency of these changes across time and to specific events across development and into adulthood.