Cadmium, Smoking, and Human Blood DNA Methylation Profiles in Adults from the Strong Heart Study

Background: The epigenetic effects of individual environmental toxicants in tobacco remain largely unexplored. Cadmium (Cd) has been associated with smoking-related health effects, and its concentration in tobacco smoke is higher in comparison with other metals. Objectives: We studied the association of Cd and smoking exposures with human blood DNA methylation (DNAm) profiles. We also evaluated the implication of findings to relevant methylation pathways and the potential contribution of Cd exposure from smoking to explain the association between smoking and site-specific DNAm. Methods: We conducted an epigenome-wide association study of urine Cd and self-reported smoking (current and former vs. never, and cumulative smoking dose) with blood DNAm in 790,026 CpGs (methylation sites) measured with the Illumina Infinium Human MethylationEPIC (Illumina Inc.) platform in 2,325 adults 45–74 years of age who participated in the Strong Heart Study in 1989–1991. In a mediation analysis, we estimated the amount of change in DNAm associated with smoking that can be independently attributed to increases in urine Cd concentrations from smoking. We also conducted enrichment analyses and in silico protein–protein interaction networks to explore the biological relevance of the findings. Results: At a false discovery rate (FDR)-corrected level of 0.05, we found 6 differentially methylated positions (DMPs) for Cd; 288 and 17, respectively, for current and former smoking status; and 77 for cigarette pack-years. Enrichment analyses of these DMPs displayed enrichment of 58 and 6 Gene Ontology and Kyoto Encyclopedia of Genes and Genomes gene sets, respectively, including biological pathways for cancer and cardiovascular disease. In in silico protein-to-protein networks, we observed key proteins in DNAm pathways directly and indirectly connected to Cd- and smoking-DMPs. Among DMPs that were significant for both Cd and current smoking (annotated to PRSS23, AHRR, F2RL3, RARA, and 2q37.1), we found statistically significant contributions of Cd to smoking-related DNAm. Conclusions: Beyond replicating well-known smoking epigenetic signatures, we found novel DMPs related to smoking. Moreover, increases in smoking-related Cd exposure were associated with differential DNAm. Our integrative analysis supports a biological link for Cd and smoking-associated health effects, including the possibility that Cd is partly responsible for smoking toxicity through epigenetic changes. https://doi.org/10.1289/EHP6345


Introduction
Numerous studies have evaluated the association of tobacco smoke, a complex mixture of compounds, with several epigenetic marks, in particular with blood DNA methylation (DNAm) (Harlid et al. 2014;Joubert et al. 2012;Shenker et al. 2013;Sun et al. 2013;Tsaprouni et al. 2014;Wan et al. 2012;Zeilinger et al. 2013).For instance, DNAm of multiple CpGs (methylation sites) in the AHRR [Aryl Hydrocarbon Receptor Repressor, which mediates dioxin toxicity and is involved in cell growth and differentiation (Haarmann-Stemmann et al. 2007)] and F2RL3 [coagulation factor II receptor-like 3, also known as PAR-4, which plays a role in blood coagulation, inflammation, and response to pain, (Heuberger and Schuepbach 2019)] genes has been associated with former and current smoking status, cumulative smoking, and shorter time since smoking cessation in several studies (Bojesen et al. 2017;Breitling et al. 2012;Fasanelli et al. 2015;Gao et al. 2015;Joehanes et al. 2016;Reynolds et al. 2015;Zhang et al. 2013b).In addition to AHRR and F2RL3, other genes have also been consistently and specifically associated with smoking in methylome-wide epidemiological studies investigating as many as 450,000 CpGs (Gao et al. 2015;Joehanes et al. 2016).The potential epigenetic effects of individual environmental toxicants in tobacco, however, remain largely unexplored.
Cadmium (Cd) is one of the main metals found in cigarettes (Nordberg et al. 2015;Tellez-Plaza et al. 2012).Its concentrations are extremely high in comparison with those from other metals such as arsenic or lead (Chiba and Masironi 1992).It is one of the most relevant heavy metals in terms of potentially explaining adverse health effects of smoking because it has been associated with tobacco-related health outcomes, including cardiovascular disease (CVD) and cancer (Agarwal et al. 2011;Andersson et al. 2018;García-Esquinas et al. 2014;Navas-Acien et al. 2004).Increased exposure to Cd has been associated with differences in global DNAm in human (Ruiz-Hernandez et al. 2015) and animal (Nica et al. 2017;Šrut et al. 2017) studies, possibly related to Cd-induced alterations of DNA methyltransferases (Benbrahim-Tallaa et al. 2007;Jiang et al. 2008;Poirier and Vlasova 2002;Takiguchi et al. 2003;Yuan et al. 2013) and ten-eleven translocation (TET) (Ravichandran 2017;Xiong et al. 2017) enzymes activity.Some reports have suggested that TETs activity is influenced by oxidative stress (Delatte et al. 2015).Cd has been related to oxidative stress (Chia et al. 2011;Domingo-Relloso et al. 2019;Lee et al. 2009;Moitra et al. 2014;Srivastava et al. 2014) and could interfere in the DNAm pathways by promoting redox unbalance.We hypothesized that Cd exposure from tobacco could induce changes in genomic DNAm and mediate well-known smokingrelated methylation alterations.In vitro and in vivo studies have shown both gene-specific hypermethylation and hypomethylation in relation to Cd exposure (Hossain et al. 2012;Sanders et al. 2014;Vidal et al. 2015;Virani et al. 2016;Zhang et al. 2013a).The number of studies evaluating the association of Cd exposure with differences in CpG-specific DNAm in humans, however, is not sufficient (Ruiz-Hernandez et al. 2015).
Participants of the Strong Heart Study (SHS), a prospective cohort that includes 13 American Indian tribes, have substantially higher concentrations of urinary Cd in comparison with other large studies, such as the Multi-Ethnic Study of Atherosclerosis (MESA) and National Health and Nutrition Examination Survey (NHANES) (Deen et al. 2017).Cd exposure in the SHS was previously associated with smoking-related health effects, including total, lung, and pancreas cancer mortality (García-Esquinas et al. 2014); incident peripheral arterial disease (Tellez-Plaza et al. 2013b); and incident CVD (Tellez-Plaza et al. 2013a).The objective of this research was to conduct an epigenome-wide association study (EWAS) of Cd and smoking exposure with differences in human blood DNAm profiles.As a secondary objective, we assessed the implication of findings to relevant DNAm and demethylation pathways and evaluated whether the smoking-related increase in urine Cd levels can explain the association between smoking and genomic DNAm profiles.To achieve these aims, we evaluated the association of urinary Cd and smoking (selfreported active and former smoking, and cumulative smoking dose) with blood DNAm in 790,026 CpGs determined with the Infinium Methylation EPIC array in the SHS.We used a causal inference mediation approach (Pearl 2012) to estimate how much of the DNAm changes associated with smoking can be independently attributed to smoking-related increases in Cd exposure, as measured in urine (an established biomarker of Cd internal dose).In addition, the association analyses were followed by a gene set and molecular regulatory elements enrichment and in silico evaluation of relevant biological pathways for Cd and smoking influences in DNAm metabolism.

Study Population
The SHS is a prospective cohort study funded by the National Heart, Lung, and Blood Institute to investigate CVDs and its risk factors in American Indian adults (Lee et al. 1990).In 1989-1991, a total of 4,549 men and women ages 45-75 y who were members of 13 tribes based in Arizona, Oklahoma, North Dakota, and South Dakota accepted invitations to participate.All individuals were asked to fast overnight (8 h or more).Participants without sufficient urine for metal analyses were excluded (n = 576) (Figure S1).Due to tribal request, samples from one of the tribes were not selected for DNAm analyses, leaving 3,516 participants.Among them, participants who were free of CVD and not missing urinary metals or other variables of interest at baseline (1989)(1990)(1991) were eligible for blood DNAm analyses (n = 3,106).Sufficient blood was available for DNAm analysis in 2,351 participants.After laboratory analyses, we removed data from 18 individuals without classical bimodal distribution in DNAm levels and from 8 individuals with low median intensity levels, leaving a total of 2,325 participants for this study.These participants were similar to those eligible in sociodemographic and anthropometric characteristics (Table S1).

Participant Characteristics and Self-Reported Smoking
Trained and certified nurses and medical examiners collected information on sociodemographic factors (age, sex, study region, education level), medical history, smoking status (never, former, current), and cumulative smoking dose (cigarette pack-years) in a personal interview.Participants having smoked ≥100 cigarettes in their lifetime and smoking at the time of the interview were considered current smokers.Noncurrent smokers who had smoked >100 cigarettes in their lifetime were classified as former smokers.Cigarette pack-years were calculated as the number of 20-cigarette packs smoked per day times the number of years the person smoked, with zero assigned to never smokers.The trained nurses and medical examiners conducted a physical exam, including anthropometric measures [height and weight to measure body mass index (BMI)], and collected fasting blood and spot urine samples.Plasma creatinine was measured by an alkaline picrate rate method (Lee et al. 1990).We calculated estimated glomerular filtration rate (eGFR) from recalibrated plasma creatinine, age, and sex using the Chronic Kidney Disease -Epidemiology Collaboration formula (Levey et al. 2009).

Urinary Cadmium Determinations
Morning spot urine samples were collected in polypropylene tubes, frozen within 1 to 2 h of collection, shipped buried in dry ice, and stored at −70 C in the Penn Medical Laboratory, MedStar Research Institute, Washington, DC, USA.In 2009, urine samples were shipped to the Trace Metals Laboratory at Graz University, Austria, to measure Cd and other metals using inductively coupled plasma-mass spectrometry (ICP-MS) (Agilent 7700x ICP-MS; Agilent Technologies).The limit of detection (LOD) for urine Cd was 0:015 lg=L.In one participant below the LOD, the Cd concentration was imputed as the LOD divided by the square root of two.Urine Cd concentrations were corrected for molybdenum oxide interference.Other laboratory details and extensive quality control/quality assurance have been published (Scheer et al. 2012).To account for urine dilution, urine Cd concentrations were expressed in micrograms per gram of urine creatinine.Urine creatinine was measured at the Laboratory of the National Institute of Diabetes and Digestive and Kidney Disease, Epidemiology and Clinical Research Branch (Phoenix, Arizona, USA) by an alkaline picrate rate method.DNA was measured using the Illumina MethylationEPIC BeadChip (850K), which provides a measure of DNAm at a single nucleotide resolution at >850,000 CpGs.Samples were randomized across and within plates to remove potential batch artifacts and confounding effects, and replicate and across-plate control samples were included on every plate.All the preprocessing was conducted using R version 3.6.1.Data were read in six different batches (of ∼ 400 individuals each) and combined using the R package minfi (version 1.18.4).CpGs with a p-detection value greater than 0.01 in more than 5% of the individuals (6,159 CpGs) were removed.Single sample normalization was conducted using the preprocessNoob function in R package minfi (Fortin et al. 2017;Triche et al. 2013), which includes a background correction with dye-bias normalization for Illumina Infinium methylation arrays.As a result of these preprocessing preliminary analyses, we had data from 2,325 individuals and 860,079 CpGs.Cross-hybridizing probes, sex chromosomes, and SNP probes with minor allele frequency >0:05 (McCartney et al. 2016) were removed for analysis.The final number of CpGs for analysis was 790,026.Quality checks, data normalization, statistical preprocessing, and beta-value calculation, which ranges from 0 to 1 and represents the proportion of unconverted cytosines (Cs) in bisulfiteconverted DNA at specific locations, were performed using the R package minfi (Fortin et al. 2017).We estimated Houseman cell proportions (CD8T, CD4T, NK, B cells, monocytes, and granulocytes), also using the R package minfi, to use them as adjustment variables in the regression models.We detected and corrected for potential batch effects by sample plate, sample row, and DNA isolation time with the combat function (sva R package).We conducted annotation of CpGs to the nearest gene according to the Illumina Infinium MethylationEPIC Manifest File (version 1.0 B4) (Fortin et al. 2017;Illumina Inc. 2019).

Statistical Methods
DMPs.Four predictors of DMPs were assessed in separate analyses: total urinary Cd concentrations (log-transformed in lg=g of creatinine units), current smoking status (vs.never smoking), former smoking status (vs.never smoking), and cumulative smoking dose (pack-year units).Beta values (i.e., DNAm proportions at a given CpG) were converted into M values (logit transformation of DNAm proportions) before conducting linear regression models as implemented with the R package limma (version v3.28.14).Empirical Bayes shrinkage (conducted by limma) has been reported to work better in M values than in beta values due to the heteroscedastic nature of the beta values (Teschendorff et al. 2018).In addition, the logistic transformation monotonically maps values in the interval (0,1) to the whole real line (−1,1) and has been previously used to transform proportion data to fulfill linear modeling assumptions (Chen et al. 2017, Warton andHui et al. 2011).Thus, we decided to logit-transform b values and then apply linear regression.Models adjusted for biologically relevant variables: age at baseline, sex, study region, BMI, Houseman blood cell composition, and five principal components to account for genetic population stratification (Price et al. 2006).We further adjusted for smoking status (never, former, current) and eGFR in Cd models.We accounted for multiple comparisons by controlling for an FDR (Riancho et al. 2016) at a 0.05 cutoff.Genomic inflation can lead to inflated p-values and false positives in EWAS studies (van Iterson et al. 2017), and it was addressed by calculating a set of surrogate variables (SVs) (Leek et al. 2012) from fully adjusted Cd and smoking models using the R package SmartSVA (version 0.1.3)to estimate SVs out of the unexplained DNAm variability and further adjusting the models for them.Although in subsequent bioinformatic analysis we focused on DMPs at an FDR-corrected threshold of statistical significance of 0.05, in the supplementary material we provide the full EWAS output, including nonstatistically significant DMPs (Excel Tables S1-S4).
Mediation analysis.The main question addressed by causal mediation analysis is to what extent is the total effect of changing an exposure E mediated by an intermediate variable M on the causal pathway to outcome Y (the indirect effect) and to what extent is the effect not mediated by M (the direct effect) (Pearl 2012;Valeri and Vanderweele 2013).In our mediation framework the "exposure" E of interest is smoking, a surrogate for many toxic compounds in tobacco.We know that smoking is an important predictor of DNAm (outcome Y), and being a current or former smoker as well as accumulated smoking are related to changes in site-specific DNAm, independently of age, sex, and confounders, in comparison with never smokers.The intermediate variable M that we are specifically interested in investigating is urinary Cd, a well-established exposure biomarker, because smoking can cause changes in Cd biomarkers through direct intake through the lungs.Alternatively, the association of Cd biomarker to DNAm is substantiated based on published experimental studies reporting that increased exposure to Cd induces differences in DNAm.
We used the "product of coefficients" method (Pearl 2012) to assess the contribution of increase in urinary Cd concentrations from smoking (current, former, and cumulative smoking in separate models) to the corresponding absolute change in DNAm proportions at specific CpGs (i.e., smoking-related changes in DNAm proportions mediated by Cd).First, for the outcome models, because collapsibility is required for mediation analysis (Valeri and Vanderweele 2013), we used beta regression with an identity link for additively modelling untransformed methylation proportions (instead of the M-values) in a generalized linear regression setting with the mgcv package (version 1.8-31) of the R software.Specifically, we fitted beta regression models for DNAm (dependent variable in separate models for each CpG that was significant at 0.05 FDR-corrected p-values for Cd and current smoking (excluding former smokers), for Cd and former smoking (excluding current smokers), and for Cd and pack-years, by smoking variables and log-transformed urine Cd (independent variables) and confounders (age, sex, BMI, eGFR, study center, cell counts, and genetic principal components).Second, for the mediator model, we fitted linear regression models for log-transformed urine Cd concentration (dependent variable) by smoking variables (i.e., current and former smoking and pack-years as independent variables in separate models) and the same confounders included in the outcome model.Third, we calculated the effect of smoking on DNAm mediated through Cd (i.e., "indirect" effect) as the product of the mean change in log-Cd concentrations by smoking (smoking regression coefficient from the mediator model) and the absolute change in methylation percentages associated with log-transformed Cd concentrations (Cd regression coefficient from the outcome model).Mediated effects were expressed as the absolute amount of the smokingassociated change in DNAm attributable to Cd ("absolute mediated effect"), as well as the percentage of the smoking-associated change in DNAm attributable to Cd out of the "total effect," defined as the sum of the indirect and direct effects ("relative mediated effect").Additionally, 95% confidence intervals were derived by simulation from the estimated model coefficients and covariance matrices (adapted from Lange and Hansen 2011).
Sensitivity analyses.We conducted a number of sensitivity analyses.First, we repeated the models for the top 25 significant DMPs for current vs. never smokers in analyses stratified by center and sex.Second, we repeated the DMP analysis for the six significant DMPs for urinary Cd restricting the analysis to never smokers, to assess whether the association between Cd and DNAm was only reflecting smoking status.Third, as we do not have cotinine measurements in this study and smoking is a self-reported variable, we Environmental Health Perspectives 067005-3 128(6) June 2020 used the EpiSmokEr tool (Bollepalli et al. 2019) to predict smoking status based on DNAm data.Fourth, we reran the Cd models excluding the eGFR variable as well as additionally adjusting for specific gravity.Last, because probe type bias might be an issue, we applied Regression on Correlated Probes (RCP) normalization after snoob normalization as implemented by the R package ENmix (version 1.8.0) (Niu et al. 2016) to see whether the results changed.
Enrichment analyses.Among Cd-and smoking-related DMPs with a FDR-corrected p-value <0:05, we conducted a hypergeometric analysis to identify enriched gene sets using Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto 2000;Kanehisa et al. 2019) and Gene Ontology (GO) (Ashburner et al. 2000; The Gene Ontology Consortium 2018) as conducted by the gometh function of the Bioconductor missMethyl package (version 1.6.2) (Phipson et al. 2015).Among DMPs with a nominal p-value <10 × 10 −5 , we additionally investigated enrichment for molecular regulatory elements of statistically significant DMPs in comparison with nonstatistically significant DMPs, including annotated functional gene regions (TSS1500 and TSS200, 5 0 UTR, 1 st Exon, Gene body, and 3 0 UTR) and regions defined by the CpG density context (Open Sea, N-shelf, N-shore, Island, S-shore, and S-shelf).Similarly, we also investigated enrichment for 15-chromatin states retrieved from the ROADMAP Epigenomics Mapping Consortium (ChromHMM version 1.10) (Bernstein et al. 2010;Ernst and Kellis 2017;ROADMAP Epigenomics Project).The mapping of DNAm changes to the most probable chromatin state (combination of histone marks associated to varying degrees of expression) using ChromHMM is a way to identify DNA regulatory activity in a tissuespecific manner, including coding and noncoding portions of the genome, which provides potential functional mechanisms to EWAS findings.For regulatory elements enrichment analysis, p-values were corrected using the Bonferroni method considering 15 chromatin states, 6 categories relatives to functional gene region, and 6 categories relative to CpG density context.
Molecular characterization and in silico protein-network analysis of DNAm metabolism.To conduct the molecular characterization of DNAm metabolism we searched KEGG (Kanehisa and Goto 2000) public database for key pathways directly or indirectly related to DNAm (including factors related to methyl groups' depletion) as of 26 June 2019, using the following descriptors: "Glutathione metabolism" (hsa00480), "Citrate cycle" (hsa00020), "One carbon pool by folate" (hsa00670), "Selenocompound metabolism" (hsa00450), and "Cysteine and methionine metabolism" (hsa00270).The search strategy retrieved a total of 158 proteins.TET1, TET2, TET3, SELENOP, AICDA, TDG, and APEX1 genes were not retrieved by our search strategy, but we manually added them to the final list of key proteins because of their potential role in active DNA demethylation according to previous reports (Chia et al. 2011;Lee et al. 2009;Ruiz-Hernandez et al. 2015).
Subsequently, among DMPs with a FDR-corrected p-value <0:05, we created a list of unique protein-coding genes attributed to Cd-DMPs and to the union of the identified smoking related-Figure 1. Summary of results from the epigenome-wide association study of cadmium (Cd) and smoking with DNA methylation (DNAm) and subsequent bioinformatics analysis.After considering a false discovery rate (FDR)-corrected p-value < 0.05, we obtained 6 Cd-differentially methylated positions (DMPs) and 312 smoking DMPs in a genomic exploration of ∼ 790,000 CpG sites with the Infinium MethylationEPIC BeadChip.We excluded nonprotein-coding genomic regions and duplicate genes and conducted gene set enrichment analysis.We subsequently evaluated protein interaction networks among proteinencoding genes from the STRING database, resulting in a protein interaction network of 271 nodes and 1,802 interactions.We observed highly connected nodes, which were directly or indirectly related to DNAm metabolism key proteins.

Environmental Health Perspectives
067005-4 128(6) June 2020 DMPs (Figure 1).Finally, we constructed a protein interaction network of proteins encoded by the identified smoking and Cd-related genes with respect to the DNAm key proteins selected from the search strategy described above by obtaining protein interactions information from STRING database version 11.0 (Szklarczyk et al. 2019).The STRING database provides a confidence score (from 0 to 1) to indicate the estimated likelihood that the annotated interaction between a given pair of proteins is biologically meaningful, specific, and reproducible, according to the evidence derived from in-house predictions, homology transfers, and the externally maintained databases (Szklarczyk et al. 2019).We displayed protein interaction networks with Cytoscape (version 3.7.1)(Shannon et al. 2003) using the yfiles Organic layout.As a result, 4 and 113 Cd and smoking-related nodes, respectively, were included in the network, in addition to the DNAm related-nodes, which included 159 nodes with high confidence scores (CHAC1, CHAC2, GSTT1, GSTT2, INMT and NAT8B nodes were excluded from the original list of 165 DNAm metabolism selected proteins because they showed connections with low confidence scores).

Differentially Methylated Positions
We found differential methylation in 6 CpGs for urinary Cd at the FDR-corrected significance level of 0.05 (Table 2; Figure S2; Excel Table S1).Two CpGs (cg14391737 and cg17739917) were new from the Illumina Infinium MethylationEPIC array and therefore have not been reported before.One CpG was annotated to AHRR, one to F2RL3, one to PRSS23, two to 2q37.1, and one to RARA.Among those six significant CpGs in the overall population, four remained significant after restricting the models to never smokers (Table 2).A total of 288 CpGs were differentially methylated in relation to current smoking (Table 3; Figure S2; Excel Table S2).A total of 149 CpGs were new from the Illumina Infinium MethylationEPIC array.The top DMPs annotated to smoking-related genes in a previous meta-analysis was replicated in our study (Table 3, Joehanes et al. 2016).Among the top 25 DMPs for current smoking (Table 3), 5 were annotated to AHRR, 2 to F2RL3, 1 to PRSS23, 3 to 2q37.1, and 1 to RARA.A total of 17 CpGs were differentially methylated in relation to former smoking (8 of them were also associated with current smoking) (Figure S2; Table S2; Excel Table S3), 5 of them being new from the Illumina Infinium MethylationEPIC array.A total of 77 CpGs were differentially methylated in relation to cigarette pack-years [62 of them also associated with current smoking (Figure S2; Table S3; Excel Table S4), and 29 of them being new from the Illumina Infinium MethylationEPIC array].The genomic inflation factors (GIF) associated with the nominal p-values for the Cd, current and former smoking, and pack-years analyses before SV adjustment were 1.49, 2.91, 1.02, and 1.30, respectively.In the final models, which included SVs adjustment, the corresponding GIFs were, respectively, 1.05, 1.02, 1.02, and 1.003.

Sensitivity Analyses
Results were similar when excluding the eGFR variable from the Cd models.However, we still wanted to keep eGFR in the final model because the filtration rate affects metal levels excreted in the urine (Zheng et al. 2015).Adjusting for specific gravity slightly attenuated the results but did not change them in a substantial way.Environmental Health Perspectives 067005-5 128(6) June 2020 In stratified analysis by sex and center for the top 25 DMPs in current smoking models, all the DMPs remained statistically significant and directionally consistent (Tables S4 and S5) except for those from Arizona, where the results where attenuated and 4 of the 25 sites did not reach the statistical significance, probably due to the much smaller sample size in comparison with Oklahoma and North and South Dakota.The contingency table of the selfreported smoking status vs. predicted values by EpiSmokEr can be found in Table S8.Overall, only 1,179 of the 2,325 patients were classified correctly.However, most of the people predicted to be current smokers also reported to be current smokers.We saw similar results when applying RCP and snoob normalizations to those obtained when only applying snoob normalization (data not shown).

Mediation Analysis
The number of significant DMPs associated with smoking decreased from 288 to 247 when adjusting for urinary Cd levels (data not shown).The mediated effect in the relationship between current smoking and DNAm through urinary Cd was statistically significant for the six CpGs that were commonly differentially methylated for both Cd and current smoking using the FDR-corrected p-values (Table 4).Median (minimum, maximum) total effects for the association between current smoking and DNAm were −6:83% (−13:19%, −5:70%).The corresponding absolute mediated effects through Cd were −0:70% (−0:88%, −0:44%), respectively.Total effects for former smoking (four CpGs) and cigarette pack-years (six CpGs) and corresponding mediated effects by Cd are shown in Tables S6 and S7.Excel Tables S5, S6, and S7 show mediation analysis for each of the CpGs associated with the smoking variables (current smoking, former smoking, and pack-years).

In Silico Protein-Interaction Analysis
The list of candidate proteins for DNAm-related pathways are shown in Excel Table S10.From the protein-protein interaction, we obtained a network of 271 nodes with 1,802 interactions (Figures 1 and 3).Among nodes with more than 20 interactions, we mostly found smoking-only nodes (MYO1G, GFI1, CDKN1A, GNG12, and CYP1B1), smoking and methylation metabolism nodes (ANPEP), and 1 Cd and smoking node (F2RL3).Seven   a Linear regression models were fitted using logit-transformed DNAm proportions as dependent variables separately for each CpG and were adjusted, age (y), sex (male/female), BMI (kg=m 2 ), Houseman cell proportions (CD8T, CD4T, NK, B cells, monocytes and granulocytes), five genetic principal components and study center (Arizona, Oklahoma, or North and South Dakota).
b Not available CpGs in the metanalysis by (Joehanes et al. 2016) reflect that the metanalysis was based in the Illumina Infinium HumanMethylation450 BeadChip array.
Environmental Health Perspectives 067005-7 128(6) June 2020  Absolute changes (mean differences) in methylation percent comparing current with never smokers (i.e., "Direct effect") and absolute changes in DNAm % associated with one-unit increase in log-transformed urine Cd were both estimated as the regression coefficients from beta models with an identity link (i.e., "Outcome model") which included the following independent variables: current smoking status ("Exposure"), log-transformed Cd ("Mediator"), age (years), sex (men/ women), BMI, Houseman cell proportions (CD8T, CD4T, NK, B cells, monocytes and granulocytes), five genetic principal components, and study center (Arizona, Oklahoma, or North and South Dakota).
b Effects mediated through Cd (i.e., "Indirect effects") were calculated using the "product of coefficients" method that multiplies the coefficient for the mean difference in log-transformed cadmium concentrations comparing current with never smokers (i.e., 0.38 [95% CI: 0.32, 0.44] corresponding to coefficient [95% CI] for current smoking status from the linear regression "mediator model" where the outcome is log-transformed Cd adjusted for the same variables as the outcome model, data not shown) by the absolute change in DNAm % associated with one-unit increase in log-Cd concentrations indirect effects were expressed in absolute terms (difference in smoking-related DNAm % change) and relative to the "Total effect" [DNAm % changes (95% CIs) comparing current with never smokers] calculated as the sum of the direct and indirect effects.Note that if the mediation analysis assumptions hold, the estimated "Total effects" in the setting of the "product of coefficients" method is numerically identical to the regression coefficient associated with current smoker status in a beta regression as in the outcome model without Cd.
To illustrate the estimation of mediated effects in the setting of the product of coefficients methods, we provide an example: In absolute terms, the mediated effect of Cd in cg14391737 (column 8) was obtained as −2:32 % ðcolumn 6Þ × 0:38 ðfrom the mediator model mentioned aboveÞ = −0:88 %.The corresponding mediated effect in relative terms (column 9) was −0:88 %=− 6:72 %ð"Total effect" from column 10Þ × 100 = 13:1 %.The 95% CIs for the total and mediated effects were derived by simulation from the estimated model coefficients and covariance matrices.
Environmental Health Perspectives 067005-8 128(6) June 2020 smoking-only nodes (GFI1, CYP1B1, NFE2L2, TSHZ1, SIN3B, and ARRB1) were directly connected to 27 DNAm metabolism nodes.The four Cd and smoking nodes were indirectly connected to DNAm metabolism nodes through six smoking-only nodes.The Cytoscape session is provided so that readers can interactively explore the network (Cytoscape Additional_file.cyssupplementary file; publicly available Cytoscape software can be downloaded at https://cytoscape.org/index.html), and the general statistics from the Cytoscape file can be found in Excel Tables S11 (network nodes) and S12 (network edges).

Discussion
After a conservative FDR correction, we found DMPs associated with current smoking, cumulative smoking dose, and urinary Cd levels, being all Cd-DMPs also smoking-DMPs.We replicated the association of well-established smoking-associated DNAm sites, including sites mapped to the genes AHRR, F2RL3, PRSS23, RARA, and 2q37.1.In addition, we found several smoking-associated DMPs that had not been previously reported.Our mediation analysis supports that Cd is a partial mediator of the association between smoking and differential DNAm at specific sites.The integrative in silico analysis provides further support for interconnected epigenetic mechanisms of both Cd-and smoking-associated health effects, including the possibility of Cd influencing smoking-related DNAm changes at relevant loci.
A large evidence base supports the association between smoking and differential methylation of several genes, with AHRR, F2RL3, and 2q37.1 being the most documented (Bojesen et al. 2017;Fasanelli et al. 2015;Gao et al. 2015;Joehanes et al. 2016;Reynolds et al. 2015;Zhang et al. 2013b).Little is known about the 2q37.1 region.Although it is one of the most commonly deleted subtelomeric regions (Leroy et al. 2013), the health implications of the differential methylation in this region are not clear.The two Cd-related DMPs located in this region were no longer significant when restricting the Cd models to never smokers, whereas all the other DMPs were still significant among never smokers.
Whether epigenetic modifications are a biological mechanism for well-known smoking-related health effects remains a focus of ongoing investigations.In fact, altered DNAm of AHRR and F2RL3 has been suggested to be part of the biological links between cigarette smoking and several diseases, especially lung cancer and atherosclerosis (Reynolds et al. 2015), in addition to being strong predictors of all-cause mortality in several studies (de Smith et al. 2017;Fasanelli et al. 2015;Zhang et al. 2014).Other smoking-related DMPs in our data such as CPOX, KIAA0087, PRSS23, GPR15, and RARA, among others, have also been consistently documented (Harlid et al. 2014;Tsaprouni et al. 2014).
Two different Illumina microarrays have been used in previous studies, the Illumina Infinium HumanMethylation27 BeadChip array (which interrogates DNAm at 27,000 CpGs located mostly in promoter regions) and Illumina HumanMethylation450 BeadChip array (including 450,000 CpGs).This is the first human study to use the Illumina Infinium MethylationEPIC BeadChip array to assess differential DNAm in relation to smoking, which led to novel insights.For instance, some new genes that warrant further investigation include a zinc finger protein (ZNF83), with a role of coordinating zinc ions (Cassandri et al. 2017), PTPN1, which is involved in oncogenic transformations (Zhu et al. 2007), and RAB32, a member of the RAS oncogene family, whose overexpression has been shown in 20% to 25% of all human tumors and up to 90% in pancreatic cancer (Downward 2003).For former smoking, however, we only found six significant CpGs, which indicates that current smoking is more relevant for methylation changes than former smoking.In fact, several studies suggested that methylation changes associated to smoking can be reversed on cessation Environmental Health Perspectives 067005-9 128(6) June 2020 (Li et al. 2018;Tsaprouni et al. 2014).It is interesting to note that significant current, former, and cumulative smoking-DMPs were enriched several categories of blood-specific enhancer regions.These findings support the hypothesis that identified DMPs may be biologically relevant, because it is known that enhancers regulate gene expression in a tissue-specific manner, in this case blood, which is our target tissue.Overall, our findings from molecular regulatory elements enrichment analysis are consistent with the hypothesis of epigenetic dysregulations being involved in mechanisms by which smoking induces diseases.An interesting aspect is that Cd exposure has been consistently associated with several human health disorders in common with smoking, including CVD (Chowdhury et al. 2018;Tellez-Plaza et al. 2013a) and several types of cancer (García-Esquinas et al. 2014;McElroy et al. 2017), as well as kidney disease (Grau-Perez et al. 2017;Navas-Acien et al. 2009;Orr and Bridges 2017), bone disease (Akesson et al. 2006;Chen et al. 2009), and premature mortality (Larsson and Wolk 2016).Cd has also been related to biomarkers of inflammation (Fagerberg et al. 2017) and oxidative stress (Domingo-Relloso et al. 2019;Nemmiche 2016) in epidemiological studies.Given that smoking is known to be a major source of Cd exposure (Nordberg et al. 2015), several studies have assessed whether Cd could be a mediator between smoking and adverse health effects.In a study conducted in a Swedish cohort (Andersson et al. 2018), about two-thirds of the association between smoking and atherosclerosis was found to be mediated by Cd.In a representative sample of the U.S. general population, Cd partially explained the association of smoking on peripheral arterial disease, which is characterized by the atherosclerotic narrowing of the arteries of the lower extremities (Navas-Acien et al. 2004).Cd also displayed a potential role as mediator of the association between smoking and lung cancer (García-Esquinas et al. 2014).
To our knowledge, however, no studies have evaluated the potential mediating role of Cd in the association between smoking and DNAm.This hypothesis is relevant, given the increasing evidence on the connection of DNAm with relevant biological pathways by which environmental exposures could cause disease.Although some studies have assessed the potential effects of Cd exposure in DNAm, many of them were conducted in experimental settings or interrogated only a reduced number of specific CpGs and genes (Ruiz-Hernandez et al. 2015).Few EWAS, however, have been conducted for Cd exposure in adults.None of our DMPs for Cd have been found in previous studies; nevertheless, these studies were not comparable with ours.One EWAS Figure 3. Protein-interaction network between proteins attributed to cadmium (Cd)-and smoking-differentially methylated positions (DMPs) and key proteins in DNA methylation (DNAm) metabolism pathways from KEGG.The nodes correspond to proteins involved in DNAm-related pathways, and proteins encoded by genes associated to Cd-DMPs and smoking-DMPs.The size of the nodes is proportional to the number of connections.The STRING database provides a confidence score (from 0 to 1) to indicate the estimated likelihood that the annotated interaction between a given pair of proteins is biologically meaningful, specific, and reproducible, according to the evidence derived from in-house predictions, homology transfers, and the externally maintained databases (Szklarczyk et al. 2019).Increasingly darker solid edge lines indicate a protein interaction with increasingly higher confidence scores.The nodes included among the top 25 statistically significant Cd-and/or smoking-DMPs in our EWAS show thicker node lines.
Environmental Health Perspectives 067005-10 128(6) June 2020 involved placental Cd and placental DNAm and is therefore not comparable to blood DNAm given the of DNAm (Everson et al. 2018).Another EWAS specifically looked at differences in DNAm age by urinary Cd concentrations (Demanelis et al. 2017).A study evaluated DNAm differences associated with Cd in 4.6 million CpGs.However, the sample size was very small (n = 17), and none of the sites were significant after correcting for multiple comparisons, so top hits may be unstable (Sanders et al. 2014).In our data, changes in Cd levels from smoking were associated to DNAm changes in the most broadly known smoking-associated genes, suggesting common epigenetic dysregulations due to smoking and Cd exposure at these sites.Nonetheless, only 6 DMPs were found for Cd as opposed to the 288 found for current vs. never smoking.There are more than 7,000 chemicals in cigarette smoke, from which 93 are classified as harmful and potentially harmful constituents (HPHCs), which include Cd and other metals (Morgan et al. 2017).More studies exploring the epigenetic effects of other harmful chemicals from smoking are needed.
The protein-protein interaction network, on the other hand, deepens into the potentially close relationship between DNAm metabolism pathways and genes attributed to smoking and Cd-DMPs.For instance, the Aminopeptidase N (ANPEP), which belongs to the glutathione metabolism pathway showed the highest number of connections (directly interacting with 16 other methylation-related proteins and 15 smoking nodes including F2RL3 and RARA, also associated with Cd).It is interesting to note that negative APN (the ANPEP encoded protein) immunoreactivity has been suggested as a prognostic factor for prostate cancer (Sørensen et al. 2013), which has been associated with both Cd and smoking (García-Esquinas et al. 2014; U.S. Department of Health and Human Services 2010).Moreover, the F2RL3 gene, which encodes a protease-activated receptor that plays a role in blood coagulation, inflammation, response to pain, and cancer, is also highly connected in the network and directly interacts with RARA, which has been associated with increased sensitivity to retinoids in T-cell lymphoma in experimental settings (Wang et al. 2017).Other highly connected smoking-DMPs in the network, such as MYO1G, GFI1, CDKN1A, GNG12, and CYP1B1, have been previously reported as smoking-DMPs in other epigenetic studies (Gonseth et al. 2016;Steenaard et al. 2015;Stueve et al. 2017;Wiklund et al. 2019) and have also been implicated in cancer (Groth-Pedersen et al. 2012;Möröy and Khandanpour 2019;Yao et al. 2018) or cardiometabolic diseases (Dempsie et al. 2013;Parmar et al. 2018).All of them are directly connected to Cd-associated nodes.
Our study has some limitations.First, because smoking is a major source of Cd, and smoking status was based on self-report, a possible concern is whether Cd serves as a biomarker of tobacco exposure rather than reflecting Cd per se.However, in a confirmatory sensitivity analysis for the Cd-associated DMPs among never smokers, we found that four of the six signals remained significant after excluding current and former smokers.This supports the hypothesis that Cd in urine could come from other exposure sources such as diet.A previous study (Olmedo et al. 2017) evaluated the association between several dietary products and urinary Cd levels in the SHS, reporting the potential importance of processed meat products as a dietary source of Cd.Occupational exposure to Cd occurs mainly during mining, smelting, and work with Cd-containing minerals (ATSDR 2012).We do not have occupational data in our study; however, some people might do handmade work with minerals and Cd pigments (e.g., jewelry making) and might be exposed to Cd that way.In addition, secondhand tobacco smoke exposure could also be related to urinary Cd levels in never smokers.In a sensitivity analysis with EpiSmokEr, most of the misclassification was from participants who self-reported to be former smokers but for which the algorithm predicted them to be never smokers.Given that it is unlikely that people would indicate they smoked in the past without doing it (i.e., reporting to be a former smoker when he or she is a never smoker), this aspect could be related to the previously reported reversible nature of some of the smoking-related epigenetic marks (Reynolds et al. 2015).In the same way, many selfreported current smokers were predicted to be former smokers by the algorithm.This prediction could be related to the small number of cigarettes per day smoked by many smokers in our population (median cigarette pack was 4, which would mean that one person could have smoked, for example, 2 cigarettes per day through 40 y, or 4 cigarettes per day through 20 y).Finally, we cannot discard misclassification due to secondhand smoke, because in this study there was a median (IQR) of 1 (0, 6) h of self-reported exposure to secondhand smoke per day.In addition, mediation analysis is subject to strong assumptions (Pearl 2012;Valeri and Vanderweele 2013), for instance, the assumption of no unmeasured confounders.Residual confounding cannot be discarded in observational studies.It is important to note that we obtained essentially identical mediated effects when we used the difference of coefficient methods instead of the product of coefficient methods (data not shown), which indicates that the no exposure-mediator interaction assumption holds in our data.
Another limitation is the fact that we conducted mediation analysis in the setting of a cross-sectional study design.However, cigarettes are known to contain Cd, so the directionality between smoking and Cd exposure relation over time is clear.The directionality between Cd exposure and blood DNAm is less clear, but-given the long half-life of urinary Cd (15-30 y) and the fact that Cd is an exogenous nonessential metal-it is likely that Cd exposure precedes blood DNAm.Finally, although replication of results in independent study populations is required in studies conducting genomic explorations of the methylome, this concern may be a relatively minor one in our case, given the robustness of our data with respect to well-replicated smoking-related signals across a wide range of other population studies (Gao et al. 2015;Joubert et al. 2012).
The strengths of our study include the large sample size, including methylation data for the largest amount of CpGs possible with current microarray technology (Illumina Infinium MethylationEPIC BeadChip); the high quality of the study protocols; and the availability of information to account for potential confounders.
In conclusion, this work supports the hypothesis that Cd, one of the principal toxicants in cigarettes, could be partially explaining well-established associations of smoking with DNAm changes.Cd exposure could thus be linked to smoking adverse health outcomes through epigenetic mechanisms.Although smoking exposure has declined in recent decades, public health interventions are still needed for controlling smoking exposure among the population.Additional strategies are also needed to reduce Cd exposure from other sources beyond tobacco (Ruiz-Hernandez et al. 2017).Findings from our in silico study needs functional confirmation in experimental settings, as relevant proteins in our protein-interaction network can contribute to the understanding of Cd and smoking-related adverse health outcomes and mechanisms.

Figure 2 .
Figure2.Enrichment analysis of top significant cadmium (Cd) and smoking-related differentially methylated positions (DMPs) (p-value < 10 −5 ) for 15-chromatine states from the ROADMAP project.The area of the solid dots is directly proportional to the strength of the statistical evidence in favor of being annotated to a given chromatin state category comparing statistically significant vs. nonstatistically significant DMPs in our data.

Table 1 .
Median (IQR) of urine cadmium (lg=g) and accumulated smoking dose (pack-years) levels and percentage of former and current smoking status by participant's characteristics.

Table 3 .
Top 25 differentially methylated positions comparing current smoking status with never smoking status (i.e., current smoking-DMPs).

Table 4 .
Changes in DNA methylation percentages current with never smokers attributable to changes in urinary Cd concentrations ("mediated effect").