Skip to content

Environmental Health Perspectives

Facebook Page EHP Twitter Feed Open Access icon  

Research Volume 124 | 2016

Environ Health Perspect; DOI:10.1289/ehp.1509834

Genome-Wide Analysis of DNA Methylation and Cigarette Smoking in a Chinese Population

Xiaoyan Zhu,1,2* Jun Li,1,2,3* Siyun Deng,1,2 Kuai Yu,1,2 Xuezhen Liu,1,2 Qifei Deng,1,2 Huizhen Sun,1,2 Xiaomin Zhang,1,2 Meian He,1,2 Huan Guo,1,2 Weihong Chen,1,2 Jing Yuan,1,2 Bing Zhang,1,2 Dan Kuang,1,2 Xiaosheng He,1,2 Yansen Bai,1,2 Xu Han,1,2 Bing Liu,1,2 Xiaoliang Li,1,2 Liangle Yang,1,2 Haijing Jiang,1,2 Yizhi Zhang,1,2 Jie Hu,1,2 Longxian Cheng,4 Xiaoting Luo,5 Wenhua Mei,5 Zhiming Zhou,6 Shunchang Sun,6 Liyun Zhang,7 Chuanyao Liu,1,2 Yanjun Guo,1,2 Zhihong Zhang,1,2 Frank B. Hu,3,8,9 Liming Liang,3,10 and Tangchun Wu1,2

Author Affiliations open
1Department of Occupational and Environmental Health, and 2Ministry of Education Key Lab for Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China; 3Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA; 4Department of Cardiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China; 5Department of Cardiology, People’s Hospital of Zhuhai, Zhuhai, China; 6Department of Cardiology, Bao’an Hospital, Shenzhen, China; 7Department of Cardiology, Wuhan Central Hospital, Wuhan, China; 8Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA; 9Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA; 10Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA

PDF icon PDF Version (392 KB)

  • Background: Smoking is a risk factor for many human diseases. DNA methylation has been related to smoking, but genome-wide methylation data for smoking in Chinese populations is limited.

    Objectives: We aimed to investigate epigenome-wide methylation in relation to smoking in a Chinese population.

    Methods: We measured the methylation levels at > 485,000 CpG sites (CpGs) in DNA from leukocytes using a methylation array and conducted a genome-wide meta-analysis of DNA methylation and smoking in a total of 596 Chinese participants. We further evaluated the associations of smoking-related CpGs with internal polycyclic aromatic hydrocarbon (PAH) biomarkers and their correlations with the expression of corresponding genes.

    Results: We identified 318 CpGs whose methylation levels were associated with smoking at a genome-wide significance level (false discovery rate < 0.05), among which 161 CpGs annotated to 123 genes were not associated with smoking in recent studies of Europeans and African Americans. Of these smoking-related CpGs, methylation levels at 80 CpGs showed significant correlations with the expression of corresponding genes (including RUNX3, IL6R, PTAFR, ANKRD11, CEP135 and CDH23), and methylation at 15 CpGs was significantly associated with urinary 2-hydroxynaphthalene, the most representative internal monohydroxy-PAH biomarker for smoking.

    Conclusion: We identified DNA methylation markers associated with smoking in a Chinese population, including some markers that were also correlated with gene expression. Exposure to naphthalene, a byproduct of tobacco smoke, may contribute to smoking-related methylation.

  • Citation: Zhu X, Li J, Deng S, Yu K, Liu X, Deng Q, Sun H, Zhang X, He M, Guo H, Chen W, Yuan J, Zhang B, Kuang D, He X, Bai Y, Han X, Liu B, Li X, Yang L, Jiang H, Zhang Y, Hu J, Cheng L, Luo X, Mei W, Zhou Z, Sun S, Zhang L, Liu C, Guo Y, Zhang Z, Hu FB, Liang L, Wu T. 2016. Genome-wide analysis of DNA methylation and cigarette smoking in a Chinese population. Environ Health Perspect 124:966–973;

    *These authors contributed equally to this work.

    Address correspondence to T. Wu, Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, 13 Hangkong Rd., Wuhan 430030, Hubei, China. Telephone: 86-27-83692347. E-mail:

    We thank all individuals who participated in the present study.

    This study is supported by the National Key Basic Research and Development Program (973 Project, grant 2011CB503806), the National Natural Scientific Foundation of China (81230069), the China Medical Board of New York, and the Fundamental Research Funds for the Central Universities, HUST.

    The authors declare they have no actual or potential competing financial interests.

    Received: 17 February 2015
    Accepted: 22 December 2015
    Advance Publication: 12 January 2016
    Final Publication: 1 July 2016

    Note to readers with disabilities: EHP strives to ensure that all journal content is accessible to all readers. However, some figures and Supplemental Material published in EHP articles may not conform to 508 standards due to the complexity of the information being presented. If you need assistance accessing journal content, please contact Our staff will work with you to assess and meet your accessibility needs within 3 working days.

  • PDF icon Supplemental Material PDF (2.9 MB)

    Note to readers with disabilities: EHP has provided a 508-conformant table of contents summarizing the Supplemental Material for this article (see below) so readers with disabilities may determine whether they wish to access the full, nonconformant Supplemental Material. If you need assistance accessing journal content, please contact Our staff will work with you to assess and meet your accessibility needs within 3 working days.

    PDF icon Supplemental Table of Contents PDF (111 KB)


Tobacco kills nearly 6 million people per year on account of direct tobacco use or indirect smoke exposure [World Health Organization (WHO) 2014]. Cigarette smoking, the primary method of tobacco consumption, is a major cause of preventable diseases (including cardiovascular diseases, respiratory diseases, and cancers) (Cunningham et al. 2014Rea et al. 2002Sosnowski and Przewoźniak 2015) and mortality (Ezzati and Lopez 2003Mathers and Loncar 2006). Various human carcinogens have been identified in cigarette smoke, including polycyclic aromatic hydrocarbons (PAHs) [International Agency for Research on Cancer (IARC) 2004Centers for Disease Control and Prevention (CDC) 2010Rodgman et al. 2000]. Although the adverse health effects of smoking are well acknowledged, less is known about its underlying mechanisms of toxicity, especially at the molecular level.

DNA methylation is an epigenetic modification of the genome that is involved in regulating gene expression and genome stability (Lee and Pausova 2013). Methylation status can be modified by both genetic and environmental factors, and it can integrate the effects of both gene and environment on a phenotype or disease (Feil and Fraga 2012Schadt 2009). Previous studies using targeted approaches (global methylation and candidate gene methylation) have established potential links between smoking and DNA methylation (Furniss et al. 2008Philibert et al. 2010Smith et al. 2007), but it was not until the widespread use of genome-wide methylation technologies that hundreds of smoking-related methylation markers were discovered and their relationships with smoking-related diseases were evaluated (Besingi and Johansson 2014Breitling et al. 2011Elliott et al. 2014Harlid et al. 2014Joubert et al. 2012Markunas et al. 2014Shenker et al. 2013Sun et al. 2013Zeilinger et al. 2013). Previous genome-wide methylation analyses of smoking have been conducted in Europeans (Guida et al. 2015Shenker et al. 2013Zeilinger et al. 2013) and African Americans (Dogan et al. 2014Philibert et al. 2013Sun et al. 2013); however, populations of mid-income countries such as China, the biggest cigarette producer and customer in the world, have not been evaluated.

To investigate epigenome-wide methylation alterations in relation to cigarette smoking in a Chinese population, we measured DNA methylation levels at > 485,000 CpG sites (CpGs) in peripheral blood leukocytes and conducted a genome-wide meta-analysis of DNA methylation and smoking in a total of 596 Chinese participants. Furthermore, we investigated the correlations of smoking-related CpGs with the expression of annotated genes as well as their associations with urinary monohydroxy-PAH (OH-PAH) metabolites.


Study Participants

In the present study, the genome-wide meta-analysis of DNA methylation and smoking was conducted in 596 Chinese participants selected from the Coke Oven Cohort, acute coronary syndrome (ACS) patients from Wuhan and Guangdong, China, and the Wuhan-Zhuhai (WHZH) Cohort (see Figure S1, for a flowchart of the study).

The Coke Oven Cohort. A total of 1,628 coke-oven workers (COW) were recruited from a coke-oven plant in Wuhan, China in 2010 (Li et al. 2012). We included 144 workers in the present study based on the following criteria: a) donated blood and urine samples; b) had baseline total urinary OH-PAH (ΣOH-PAH) levels in the high tertile; c) had worked in the plant for more than 5 years; d) had no self-reported diseases or discomfort; e) had no fever or infectious conditions within 2 weeks of the baseline examination; f) did not take prescribed medicine in the past month; and g) had a body mass index (BMI) of 18.0–30.5. After quality controls for methylation and genotyping data were performed, 137 individuals (abbreviated as COW-1) remained in the present study.

Acute coronary syndrome patients. The present study also included 103 clinically confirmed acute coronary syndrome (ACS) patients from Wuhan, China (recruited in Union Hospital and Wuhan Central Hospital), and 103 ACS patients from Guangdong, China (recruited in Bao’an Hospital and Peoples’ Hospital of Zhuhai). Patients were a) diagnosed with acute myocardial infarction or unstable angina pectoris by professional clinicians; b) did not have complications including congenital heart disease, cardiomyopathy, autoimmune disease, acute infection, tuberculosis, chronic obstructive pulmonary disease, diabetes mellitus, severe kidney or liver disease, hyperthyroidism, or malignant neoplasms; and c) donated blood samples at the earliest convenient time on the first day of admission. We included 101 patients from Wuhan (abbreviated as ACS-1) and 97 patients from Guangdong (abbreviated as ACS-2) who passed quality controls for both methylation data and genotyping data in the present analysis.

The Wuhan–Zhuhai (WHZH) Cohort. The WHZH Cohort is a community-based cohort established in 2011 with 4,812 individuals (3,053 from Wuhan and 1,759 from Zhuhai, respectively) recruited at baseline (Song et al. 2014). From all participants who a) had no acute or chronic diseases or any kind of discomfort; b) showed no sign of abnormalities in clinical exanimations; c) had no fever or infectious conditions within 2 weeks of the baseline examination; d) did not take prescribed medicine in the past month; and e) donated both blood and urine samples, a total of 180 Wuhan residents were selected as healthy controls for the ACS patients in Wuhan (matched for age, sex, and BMI, n = 103) and/or healthy and low–PAH-exposed controls for COWs in Wuhan (matched for age, sex, and BMI, and with urinary ΣOH-PAH in the low tertile, n = 144; ACS patients and COWs shared 64 controls). A total of 103 Guangdong residents were selected as healthy controls for ACS patients from Guangdong (matched for age, sex, and BMI). We included 162 Wuhan residents and 99 Guangdong residents (abbreviated as WHZH) who passed quality controls for both methylation and genotyping data in the present analysis.

Subjects for investigating methylation-expression correlations. To investigate the correlation between DNA methylation and gene expression, we recruited 144 individuals who participated in regular health examinations at the Health Examination Center of Dongfeng Central Hospital (Dongfeng Motor Corporation and Hubei University of Medicine) in Shiyan, China during April and May of 2015. The selected participants met the following criteria: a) were 20 to 70 years of age; b) had no self-reported diseases or discomfort; c) had no fever or infectious conditions within 2 weeks of the baseline examination; d) took no prescribed medicine in the past month; and e) donated both blood and urine samples. The methylation and expression data for all 144 subjects (abbreviated as SY) passed quality control and were included in the present analysis.

Our study was approved by the Ethics Committee of Tongji Medical College, and written informed consent was obtained from each participant. We required all participants to consume a bland diet and to fast for at least 12 hr before donating blood samples. Biological samples from all study panels were collected according to the same protocol and were stored under similar conditions.

Laboratory Assays

Illumina HumanMethylation450 BeadChip. Genomic DNA was extracted from whole blood using a BioTeke Whole Blood DNA Extraction Kit (BioTeke) and was then stored at –80°C. One microgram of each sample was bisulfite converted using a Zymo EZ DNA Methylation kit (Zymo Research) according to the manufacturer’s instructions and was then diluted to a concentration of 60 ng/μL. DNA methylation was assayed at > 485,000 CpGs using a HumanMethylation450 BeadChip (Illumina) with 4-μL bisulfite-converted samples.

HumanHT-12 v4 Expression BeadChip. Leukocytes were isolated from whole blood immediately after blood collection, and the total RNA of blood leukocytes was isolated using TRIzol® LS solution (Invitrogen) according to the manufacturer’s instructions. Gene expression was profiled by a commercial company (ETMD, Beijing, China) using a HumanHT-12 v4 Expression BeadChip array according to standard protocols from Illumina. We acquired raw expression values using GenomeStudio (Illumina) and normalized the expression data using quantile-quantile normalization with the “beadarray” package (Dunning et al. 2007) in R 3.1.2 (R Core Team 2014). All unexpressed signals were assigned as 0 before analysis.

Urinary creatinine and OH-PAH measurement. The urinary measures of creatinine and 12 OH-PAH metabolites in the WHZH (Song et al. 2014) and COW (Deng et al. 2014Li et al. 2012) cohorts have been previously reported. All urine samples were collected in sterile conical tubes and were stored at –20°C until the laboratory assays were performed. The identification and quantification of PAH metabolites were based on retention time, mass-to-charge ratio, and peak area using a linear regression curve obtained from separate internal standard solutions. Among the 12 urinary OH-PAH metabolites, 10 noncarcinogenic metabolites, including 1-hydroxynaphthalene, 2-hydroxynaphthalene, 2-hydroxyfluorene, 9-hydroxyfluorene, 1-hydroxyphenanthrene, 2-hydroxyphenanthrene, 3-hydroxyphenanthrene, 4-hydroxyphenanthrene, 9-hydroxyphenanthrene, and 1-hydroxypyrene were above the limits of quantification (LOQ) and were hence included in the present analysis, whereas the 2 carcinogenic metabolites, 6-hydroxy chrysene and 3-hydroxy benzo[a]pyrene, were below the LOQ (Deng et al. 2014Li et al. 2012) and, therefore, were not used in the present analysis. The OH-PAH levels were calibrated to urinary creatinine and were presented as micromoles per millimole creatinine.

Quality Controls for Genome-Wide Data

We randomized sample pairs of cases (disease or exposed group) and matched controls across different plates and beadchips to minimize batch effects. We used the minfi package (Aryee et al. 2014) to preprocess the IDAT files. Signal outliers were identified by multidimensional scaling (MDS) analysis. We examined potential sample mix-ups by matching genotypes of the 65 single nucleotide polymorphisms (SNPs) on the Methylation450k Beadchips with the genotypes of the same SNPs obtained from the genome-wide association study (GWAS) data. Methylation probes were excluded if they: a) were the 65-SNP probes; b) had a missing rate > 20% across samples (missing was defined as follows for a probe of a certain sample: detection p value > 0.01 or bead counts < 3); or c) potentially contained or extended on SNPs with MAF > 0.05 in the 1000 Genomes Project 20110521 release for the ASN population, or cross-hybridized to other genomic locations (41,296 probes). Samples were excluded if they a) were MDS outliers; b) were mix-up samples; c) had a missing rate > 0.05 across probes; or d) failed GWAS quality controls, including unexpected duplicates or relatives (in IBD analysis, PI_HAT>0.185), sex discrepancies, heterozygosity outliers, or individual call rate < 0.98. After filtering, methylation values at 431,369 CpGs were normalized using the dasen method in the wateRmelon package (Pidsley et al. 2013). Methylation values with detection p value > 0.01 or bead counts < 3 were assigned as NA before further analysis.

Statistical Analysis

Genome-wide analyses of DNA methylation and smoking. Participants who had smoked an average of > 1 cigarette/day over the previous 6 months were defined as current smokers; participants who had stopped smoking for > 6 months were defined as former smokers; and participants who had never smoked during their lifetimes were classified as never smokers. Individuals who drank alcohol > 1 time/week over the previous 6 months were defined as current drinkers; individuals who had stopped drinking for > 6 months were defined as former drinkers; and individuals who had never had alcohol were defined as never drinkers. Surrogate variable analysis (SVA) was conducted separately in each panel using the SVA package (Leek et al. 2012). Variables used in the SVA included smoking status (coded as 0, 1, and 2 for never, former, and current smokers, respectively), age (years, as a continuous variable), sex (coded as 1 and 2 for male and female, respectively), drinking status (coded as 0, 1, and 2 for never, former, and current drinkers, respectively), and BMI (kilograms per meters squared, as a continuous variable). Surrogate variables (SVs) can capture major unknown variations of the genome-wide data that cannot be explained by included variables. Association analyses were performed separately in each panel using linear regression models, with inverse-normal transformed (INT) methylation beta values included as dependent variables and smoking status, age, sex, drinking status, BMI, and SVs included as independent variables. In the analyses of the COW and WHZH cohorts, ΣOH-PAHs were also included in the models as covariates because ΣOH-PAHs were considered in sample selection in these two panels. Results from all four panels were combined using a fixed effect meta-analysis with a sample-size weighted method to obtain p values and an inverse-variance weighted method to obtain estimates of effect size. The significance threshold for the genome-wide meta-analysis was a false discovery rate (FDR) < 0.05. The analyses were performed in R 3.1.2 (R Core Team 2014).

Correlation between CpGs and gene expression. CpGs and expression probes were paired based on annotation files provided by Illumina, which provided information on genomic locations and gene annotations for both expression probes and CpGs probes. Linear regressions, of which dependent variables were inverse-normal–transformed expression values and independent variables were methylation values, age, and sex, were used to estimate associations between methylation and expression. For each CpG, the significance threshold was defined as 0.05/number of expression probes of the corresponding gene.

Urinary PAH metabolites and smoking-related methylation alterations. We evaluated which urinary OH-PAHs could be used as representative biomarkers of smoking exposure by calculating the contribution of smoking to each OH-PAH metabolite [defined as the difference of R2 between the models with and without smoking status; other covariates were age, drinking status, BMI, occupation, geographical region and beadchip operation date (geographical regions were coded as 1 and 2 for Wuhan and Guangdong, respectively)] using linear regression models in males from the WHZH cohort. The association between methylation values of the smoking-related CpGs and urinary 2-hydroxynaphthalene levels were analyzed seperately in males from the WHZH cohort and the Coke Oven cohort. Mediation analysis was performed to evaluate whether 2-hydroxynaphthalene showed mediation effects of smoking on methylation alterations in males from the WHZH cohort with adjustment for age, drinking status, BMI, occupation, differential leukocyte proportion, geographical region and beadchip operation date (Valeri and Vanderweele 2013). The association analyses were conducted in R 3.1.2 (R Core Team 2014), and the mediation analyses were performed in SAS 9.2 (SAS Institute Inc.).


Basic Characteristics of the Participants

The genome-wide meta-analysis contained a total of 596 participants recruited from China, including 137 coke-oven workers (107 males; mean age = 46.51 years), 198 ACS patients (including 101 from Wuhan with 81 males and a mean age of 58.96 years, and 97 from Guangdong with 78 males and a mean age of 59.37 years), and 261 community residents from the WHZH cohort (206 males, mean age = 53.84 years). The characteristics of the study populations are summarized in Table 1.

Table 1 - Select View Table (HTML Version) for a 508-conformant versionTable 1 – Characteristics of the study participants (n or mean ± SD).

View Table (HTML Version) 
View larger image (TIF File) 

Genome-Wide Analysis of DNA Methylation and Smoking

In our genome-wide methylation meta-analysis, we identified 318 CpGs whose methylation levels were associated with smoking at genome-wide significance level (FDR < 0.05, Figure 1). Of these, 161 CpGs annotated to 123 genes were not reported to be significantly associated with smoking in previous genome-wide studies of methylation and smoking in Europeans (Guida et al. 2015Shenker et al. 2013Zeilinger et al. 2013) or in African Americans (Dogan et al. 2014Philibert et al. 2013Sun et al. 2013) (see Table S1). The association results for the top 40 smoking-related CpGs (FDR < 0.01) are presented in Table 2, and the association results of the 318 smoking-related CpGs in each panel are presented in Table S2. For most of the 318 CpGs, we observed a gradational alteration trend of the methylation levels from never to former to current smokers; the methylation alterations from current smokers to nonsmokers were larger than the alterations from former smokers to nonsmokers (see Table S3).

Figure 1 - Upper panel: Manhattan plot, –log(p) (y-axis) according to CpG chromosome location (x-axis). Lower panel: Q-Q plot, observed –log(p) (y-axis) according to expected –log(p) (y-axis).Figure 1 – Manhattan plot and Q-Q plot of the p values of the associations between methylation and cigarette smoking in the genome-wide meta-analysis. In the Manhattan plot, the x-axis indicates genomic locations of the CpGs, the y-axis indicates –log10 (p-values) of the associations, and the red line indicates the –log (p-value) at false discovery rate (FDR) = 0.05. In the Q-Q plot, the x-axis shows the expected –log10 (p-values), whereas the y-axis indicates the observed –log10 (p-values).

View larger image (TIF File) 

Table 2 - Select View Table (HTML Version) for a 508-conformant versionTable 2 – The 40 CpGs associated with cigarette smoking in the genome-wide meta-analysis (FDR < 0.01).

View Table (HTML Version) 
View larger image (TIF File) 

Correlations with the Expression of Annotated Genes

We further investigated whether the methylation values of the smoking-related CpGs were correlated with the expression of corresponding genes in an independent set of 144 healthy individuals whose methylome and gene-expression profiles were both measured (Table 1). Seventy-seven of the 318 smoking-related CpGs were excluded from the analysis either because no expression probes were designed for the genes or because of the low expression rate in blood leukocytes. Of the remaining 241 CpGs (a total of 414 CpG-expression probe pairs) that had qualified expression data for the annotated genes, we observed that methylation levels at 80 CpGs were associated with the expression of their corresponding genes (p < 0.05/number of expression probes of the corresponding gene; e.g., on the body of RUNX3p = 1.57 × 10–7 for cg10951873 and ILMN_1787461; on the body of IL6Rp = 1.98 × 10–9 for cg09257526 and ILMN_1696394, p = 5.61 × 10–6 for cg09257526 and ILMN_1754753; within 1,500 bps from the transcription start site of CEP135p = 1.82 × 10–2 for cg26542660 and ILMN_1693766; on the body of CDH23p = 9.45 × 10–3 for cg10750182 and ILMN_1779934; within 1,500 bps from the transcription start site of PTAFRp = 2.07 × 10–16 for cg20460771 and ILMN_1746836; in the 5´ untranslated regions of ANKRD11p = 1.03 × 10–8 for cg01107178 and ILMN_2108709) (see Table S4).

Associations of Smoking-Related CpGs and Urinary 2-Hydroxynaphthalene

Because the majority of smokers in our study were males (98.52%) and to avoid effects owing to occupational exposures, the analysis was mainly conducted in males from the WHZH cohort. We first tested which OH-PAH metabolite was the most representative biomarker for smoking. We observed that smoking could account for 18.0% of the variation of urinary 2-hydroxynaphthalene, larger than the variations explained by smoking for the other 9 OH-PAH metabolites (see Table S5).

We then assessed the association between methylation levels at the 318 smoking-related CpGs and urinary 2-hydroxynaphthalene levels (see Table S6) and found 15 significant associations after performing Bonferroni corrections (p < 1.57 × 10–4) (Figure 2). When restricting the analysis only to nonsmokers, these associations were greatly attenuated (Figure 2; see also Table S6), suggesting that the correlations between DNA methylation and urinary 2-hydroxynaphthalene were mainly attributable to smoking. We further investigated whether 2-hydroxynaphthalene could be a mediator of these smoking-induced methylation alterations and found that among the 15 CpGs associated with 2-hydroxynaphthalene, the smoking-related methylation variation at 12 CpGs (including cg05575921, cg23916896, cg24090911, and cg26703534 on AHRR) might be partially mediated by their associations with urinary 2-hydroxynaphthalene levels (p < 0.05) (Table 3).

Figure 2 - Four plots of beta estimates (with 95% CI) (x-axes) according to CpG (y-axes) for males from the WHZH cohort, nonsmokers from the WHZH cohort, males from the coke oven cohort, and nonsmokers from the coke oven cohort, respectively.Figure 2 – Associations of the 15 smoking-related CpGs and urinary 2-hydroxynaphthalene levels in males from the Wuhan–Zhuhai (WHZH) cohort and the Coke Oven cohort.

View larger image (TIF File) 

Table 3 - Select View Table (HTML Version) for a 508-conformant versionTable 3 – Mediation analysis of 15 significant CpGs whose methylation levels were correlated with urinary 2-hydroxynaphthalene in males from the Wuhan–Zhuhai (WHZH) cohort.

View Table (HTML Version) 
View larger image (TIF File) 

Although subjects from the Coke Oven cohort had occupational exposures to PAHs, similar association patterns between smoking, 2-hydroxynaphthalene, and methylation at these CpGs were observed in male subjects from the Coke Oven cohort after adjustment for 1-hydroxypyrene, an occupational exposure marker for coke-oven workers (Figure 2; see also Table S6).


In the present study, we identified 318 smoking-related CpGs through a genome-wide meta-analysis of DNA methylation in several Chinese populations. Among the identified CpGs, 161 annotated to 123 genes were not associated with smoking in recent studies of Europeans (Guida et al. 2015Shenker et al. 2013Zeilinger et al. 2013) or African Americans (Dogan et al. 2014Philibert et al. 2013Sun et al. 2013). We also observed that methylation levels at some smoking-related CpGs might affect the expression of corresponding genes, and some smoking-related methylation alterations might be partly mediated by exposure to naphthalene.

Although China is the largest consumer and producer of tobacco in the world (Gu et al. 2009), genome-wide methylation studies of DNA methylation and smoking have not been conducted in Chinese populations. The present study identified 318 smoking-related CpGs in a Chinese population, 157 of which have been reported by previous methylation studies, suggesting that smoking-related methylation alterations were mainly consistent across Chinese and Western populations. The 161 CpGs that have not been previously reported in Europeans or African Americans suggest novel smoking-related sites or sites specific to the Chinese population, which calls for replication by further studies among other Chinese populations. Most of the identified loci were annotated on genes involved in the metabolism of smoking-released chemicals [e.g., AHRR is a repressor of the nuclear receptor for aryl hydrocarbons that is involved in xenobiotic metabolism (Shenker et al. 2013)] or that might be involved in smoking-related health effects [e.g., methylation of F2RL3 mediates the detrimental impacts of smoking and is related to mortality caused by coronary heart disease (Breitling et al. 2012Zhang et al. 2014)].

DNA methylation might be a potential link between smoking and human diseases. In the present study, the smoking-related methylation changes to RUNX3IL6RPTAFR, and ANKRD11 (cardiovascular-related genes) and CEP135 and CDH23 (cancer-related genes) corresponded to increased gene expression. RUNX3 encodes a member of the runt domain-containing family of transcription factors, which might have important functions in innate and adaptive immune cell types and might be associated with several inflammatory-related diseases (Lotem et al. 2015). Interleukin 6 is a cytokine with vital roles in inflammatory responses, and its dysregulation has been implicated in many health problems (Ferreira et al. 2013). PTAFR encodes a receptor for platelet-activating factor (PAF) that plays a significant role in proinflammatory processes (Ninio et al. 2004). In addition to their critical role in hemostasis and thrombosis, platelets are also involved in regulating inflammatory and immune responses (von Hundelshausen and Weber 2007). ANKRD11 might be involved in apoptosis pathways (e.g., p53 signaling) (Lim et al. 2012Neilsen et al. 2008), which have been reported to play key roles in the pathogenesis of cardiovascular diseases (Lee and Gustafsson 2009). It has been speculated that smoking-induced abnormal physiological processes might be important mechanisms in the development of cardiovascular diseases (Frostegård 2013). Our findings that smoking was associated with methylation of cardiovascular-related genes that were correlated with the corresponding expression suggested that DNA methylation might contribute to disease progression though immune reactions, inflammation responses, and apoptosis induced by smoking.

CEP135 encodes a centrosomal protein that acts as a scaffolding protein during early centriole biogenesis (Kim et al. 2008). Centrosomes play crucial roles in many processes (including organizing mitotic spindle poles), and centrosome aberrations (Nigg 2002) are included in many human tumors (Rusan and Peifer 2007). Notably, antimitotic compounds have been identified in tobacco-smoke condensate, and smoking could induce mitotic abnormalities (Qiao et al. 2003Vogt Isaksen 2004). CDH23 encodes cadherin 23, which acts as a mediator at intercellular junctions and in cellular differentiation and cell migration (Agarwal 2014). Previous studies showed that CDH23 was up-regulated in breast cancer tissues and was involved in metastatic processes (Binai et al. 2013). Recent evidence has suggested that active smoking played a potentially causal role in breast cancer (Reynolds et al. 2009). Smoking is also a well-established cause of many cancers (e.g., lung, colon, and stomach) (Gandini et al. 2008). Therefore, it is possible that methylation alterations are potential mechanisms of smoking-induced adverse effects and cancers.

Cigarette smoking is a major source of PAH exposure, particularly naphthalene exposure (Ding et al. 2005Jacob et al. 2013). We estimated that cigarette smoking accounted for 18.0% of the variation in urinary 2-hydroxynaphthalene among males in the WHZH cohort, which supports further investigation of urinary 2-hydroxynaphthalene as a possible biomarker of internal exposure to smoking-sourced PAHs. Smoking-related alterations of AHRR methylation might be caused by exposure to PAHs (Shenker et al. 2013). AHRR encodes a repressor of the aryl hydrocarbon receptor (AhR) (Harlid et al. 2014). Previous studies have suggested that the AhR pathway is important in the metabolism of various xenobiotics including PAHs (Zeilinger et al. 2013) and is modified in response to exposure to smoking (Besingi and Johansson 2014). Our present data suggested that smoking-released naphthalene might alter the AhR pathway by changing the methylation levels of vital genes in the AhR pathway.

Different cells and tissues have distinct DNA methylation signatures (Ohgane et al. 2008). The use of peripheral blood DNA in the present study is reasonable for two reasons. First, peripheral blood is an important carrier for many xenobiotics absorbed into human bodies (Barr et al. 2007); peripheral blood cells have direct contact with the internal forms of xenobiotics and react to them (Bonassi et al. 2007). Second, blood samples are the most convenient to collect in large-scale studies, and using blood cells allows comparison of our results with those of other studies. A limitation of using blood leukocytes as the source of DNA for methylation analyses is that methylation varies among leukocyte subtypes, and the distribution of leukocyte subtypes may vary in association with exposure, thus resulting in potential confounding of associations between exposures and methylation (Reinius et al. 2012). As suggested in a previous study that factor-based “batch” correction methodology (such as surrogate variable analysis) can not only control for batch effects but can also empirically estimate and control for cell-type compositions (Jaffe and Irizarry 2014), we adopted surrogate variables in our genome-wide methylation analyses to limit effects of batch and cellular compositions simultaneously. When investigating associations between smoking-related CpGs and urinary 2-hydroxynaphthalene, we adjusted for differential white blood cell proportions in the analysis models. However, we cannot rule out the potential for residual confounding related to leukocyte subtype variations or to other factors. In addition, given the cross-sectional study design, we could not establish a temporal relationship between smoking and DNA methylation.


On the basis of a genome-wide analysis of smoking and DNA methylation in a Chinese population, we identified 318 smoking-related CpGs, among which 161 CpGs annotated to 123 genes have not been previously reported in Europeans or in African Americans. Some smoking-related CpGs might play a role in gene regulation. We also found that naphthalene might be one of the smoking-released chemicals inducing the methylation alterations that we observed in smokers. Additional studies are needed to replicate our findings, to determine their potential relevance to health outcomes, and to elucidate underlying mechanisms that link smoking and DNA methylation.


Agarwal SK. 2014. Integrins and cadherins as therapeutic targets in fibrosis. Front Pharmacol 5:131, doi: 10.3389/fphar.2014.00131.

Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, et al. 2014. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30:1363–1369.

Barr DB, Bishop A, Needham LL. 2007. Concentrations of xenobiotic chemicals in the maternal-fetal unit. Reprod Toxicol 23:260–266.

Besingi W, Johansson A. 2014. Smoke-related DNA methylation changes in the etiology of human disease. Hum Mol Genet 23:2290–2297.

Binai NA, Carra G, Löwer J, Löwer R, Wessler S. 2013. Differential gene expression in ERα-positive and ERα-negative breast cancer cells upon leptin stimulation. Endocrine 44:496–503.

Bonassi S, Znaor A, Ceppi M, Lando C, Chang WP, Holland N, et al. 2007. An increased micronucleus frequency in peripheral blood lymphocytes predicts the risk of cancer in humans. Carcinogenesis 28:625–631.

Breitling LP, Salzmann K, Rothenbacher D, Burwinkel B, Brenner H. 2012. Smoking, F2RL3 methylation, and prognosis in stable coronary heart disease. Eur Heart J 33:2841–2848.

Breitling LP, Yang R, Korn B, Burwinkel B, Brenner H. 2011. Tobacco-smoking-related differential DNA methylation: 27K discovery and replication. Am J Hum Genet 88:450–457.

CDC (Centers for Disease Control and Prevention). 2010. 2010 Surgeon General’s Report—How Tobacco Smoke Causes Disease: The Biology and Behavioral Basis for Smoking-Attributable Disease. Atlanta, GA:CDC. Available: [accessed 13 September 2014].

Cunningham TJ, Ford ES, Rolle IV, Wheaton AG, Croft JB. 2014. Associations of self-reported cigarette smoking with chronic obstructive pulmonary disease and co-morbid chronic conditions in the United States. COPD 12:276–286, doi: 10.3109/15412555.2014.949001.

Deng Q, Huang S, Zhang X, Zhang W, Feng J, Wang T, et al. 2014. Plasma microRNA expression and micronuclei frequency in workers exposed to polycyclic aromatic hydrocarbons. Environ Health Perspect 122:719–725, doi: 10.1289/ehp.1307080.

Ding YS, Trommel JS, Yan XJ, Ashley D, Watson CH. 2005. Determination of 14 polycyclic aromatic hydrocarbons in mainstream smoke from domestic cigarettes. Environ Sci Technol 39:471–478.

Dogan MV, Shields B, Cutrona C, Gao L, Gibbons FX, Simons R, et al. 2014. The effect of smoking on DNA methylation of peripheral blood mononuclear cells from African American women. BMC Genomics 15:151, doi: 10.1186/1471-2164-15-151.

Dunning MJ, Smith ML, Ritchie ME, Tavaré S. 2007. beadarray: R classes and methods for Illumina bead-based data. Bioinformatics 23:2183–2184.

Elliott HR, Tillin T, McArdle WL, Ho K, Duggirala A, Frayling TM, et al. 2014. Differences in smoking associated DNA methylation patterns in South Asians and Europeans. Clin Epigenetics 6:4, doi: 10.1186/1868-7083-6-4.

Ezzati M, Lopez AD. 2003. Estimates of global mortality attributable to smoking in 2000. Lancet 362:847–852.

Feil R, Fraga MF. 2012. Epigenetics and the environment: emerging patterns and implications. Nat Rev Genet 13:97–109.

Ferreira RC, Freitag DF, Cutler AJ, Howson JM, Rainbow DB, Smyth DJ, et al. 2013. Functional IL6R 358Ala allele impairs classical IL-6 receptor signaling and influences risk of diverse inflammatory diseases. PLoS Genet 9:e1003444, doi: 10.1371/journal.pgen.1003444.

Frostegård J. 2013. Immunity, atherosclerosis and cardiovascular disease. BMC Med 11:117, doi: 10.1186/1741-7015-11-117.

Furniss CS, Marsit CJ, Houseman EA, Eddy K, Kelsey KT. 2008. Line region hypomethylation is associated with lifestyle and differs by human papillomavirus status in head and neck squamous cell carcinomas. Cancer Epidemiol Biomarkers Prev 17:966–971.

Gandini S, Botteri E, Iodice S, Boniol M, Lowenfels AB, Maisonneuve P, et al. 2008. Tobacco smoking and cancer: a meta-analysis. Int J Cancer 122:155–164.

Gu D, Kelly TN, Wu X, Chen J, Samet JM, Huang JF, et al. 2009. Mortality attributable to smoking in China. N Engl J Med 360:150–159.

Guida F, Sandanger TM, Castagné R, Campanella G, Polidoro S, Palli D, et al. 2015. Dynamics of smoking-induced genome-wide methylation changes with time since smoking cessation. Hum Mol Genet 24:2349–2359.

Harlid S, Xu Z, Panduri V, Sandler DP, Taylor JA. 2014. CpG sites associated with cigarette smoking: analysis of epigenome-wide data from the Sister Study. Environ Health Perspect 122:673–678, doi: 10.1289/ehp.1307480.

IARC (International Agency for Research on Cancer). 2004. Tobacco smoke and involuntary smoking. IARC Monogr Eval Carcinog Risks Hum 83.

Jacob P III, Abu Raddaha AH, Dempsey D, Havel C, Peng M, Yu L, et al. 2013. Comparison of nicotine and carcinogen exposure with water pipe and cigarette smoking. Cancer Epidemiol Biomarkers Prev 22:765–772.

Jaffe AE, Irizarry RA. 2014. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol 15:R31, doi: 10.1186/gb-2014-15-2-r31.

Joubert BR, Håberg SE, Nilsen RM, Wang X, Vollset SE, Murphy SK, et al. 2012. 450K epigenome-wide scan identifies differential DNA methylation in newborns related to maternal smoking during pregnancy. Environ Health Perspect 120:1425–1431, doi: 10.1289/ehp.1205412.

Kim K, Lee S, Chang J, Rhee K. 2008. A novel function of CEP135 as a platform protein of C-NAP1 for its centriolar localization. Exp Cell Res 314:3692–3700.

Lee KW, Pausova Z. 2013. Cigarette smoking and DNA methylation. Front Genet 4:132, doi: 10.3389/fgene.2013.00132.

Lee Y, Gustafsson AB. 2009. Role of apoptosis in cardiovascular disease. Apoptosis 14:536–548.

Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. 2012. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883.

Li X, Feng Y, Deng H, Zhang W, Kuang D, Deng Q, et al. 2012. The dose–response decrease in heart rate variability: any association with the metabolites of polycyclic aromatic hydrocarbons in coke oven workers? PLoS One 7:e44562, doi: 10.1371/journal.pone.0044562.

Lim SP, Wong NC, Suetani RJ, Ho K, Ng JL, Neilsen PM, et al. 2012. Specific-site methylation of tumour suppressor ANKRD11 in breast cancer. Eur J Cancer 48:3300–3309.

Lotem J, Levanon D, Negreanu V, Bauer O, Hantisteanu S, Dicken J, et al. 2015. Runx3 at the interface of immunity, inflammation and cancer. Biochim Biophys Acta 1855:131–143.

Markunas CA, Xu Z, Harlid S, Wade PA, Lie RT, Taylor JA, et al. 2014. Identification of DNA methylation changes in newborns related to maternal smoking during pregnancy. Environ Health Perspect 122:1147–1153, doi: 10.1289/ehp.1307892.

Mathers CD, Loncar D. 2006. Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med 3:e442, doi: 10.1371/journal.pmed.0030442.

Neilsen PM, Cheney KM, Li CW, Chen JD, Cawrse JE, Schulz RB, et al. 2008. Identification of ANKRD11 as a p53 coactivator. J Cell Sci 121(pt 21):3541–3552.

Nigg EA. 2002. Centrosome aberrations: cause or consequence of cancer progression? Nat Rev Cancer 2:815–825.

Ninio E, Tregouet D, Carrier JL, Stengel D, Bickel C, Perret C, et al. 2004. Platelet-activating factor-acetylhydrolase and PAF-receptor gene haplotypes in relation to future cardiovascular event in patients with coronary artery disease. Hum Mol Genet 13:1341–1351.

Ohgane J, Yagi S, Shiota K. 2008. Epigenetics: the DNA methylation profile of tissue-dependent and differentially methylated regions in cells. Placenta 29(suppl A):S29–S35.

Philibert RA, Beach SR, Gunter TD, Brody GH, Madan A, Gerrard M. 2010. The effect of smoking on MAOA promoter methylation in DNA prepared from lymphoblasts and whole blood. Am J Med Genet B Neuropsychiatr Genet 153B:619–628.

Philibert RA, Beach SR, Lei MK, Brody GH. 2013. Changes in DNA methylation at the aryl hydrocarbon receptor repressor may be a new biomarker for smoking. Clin Epigenetics 5:19, doi: 10.1186/1868-7083-5-19.

Pidsley R, Wong CCY, Volta M, Lunnon K, Mill J, Schalkwyk LC. 2013. A data-driven approach to preprocessing Illumina 450k methylation array data. BMC Genomics 14:293, doi: 10.1186/1471-2164-14-293.

Qiao D, Seidler FJ, Violin JD, Slotkin TA. 2003. Nicotine is a developmental neurotoxicant and neuroprotectant: stage-selective inhibition of DNA synthesis coincident with shielding from effects of chlorpyrifos. Brain Res Dev Brain Res 147:183–190.

R Core Team. 2014. R: A Language and Environment for Statistical Computing. Vienna, Austria:R Foundation for Statistical Computing. Available: [accessed 5 July 2014].

Rea TD, Heckbert SR, Kaplan RC, Smith NL, Lemaitre RN, Psaty BM. 2002. Smoking status and risk for recurrent coronary events after myocardial infarction. Ann Intern Med 137:494–500.

Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlén SE, Greco D, et al. 2012. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS One 7:e41361, doi: 10.1371/journal.pone.0041361.

Reynolds P, Goldberg D, Hurley S, Nelson DO, Largent J, Henderson KD, et al. 2009. Passive smoking and risk of breast cancer in the California Teachers Study. Cancer Epidemiol Biomarkers Prev 18:3389–3398.

Rodgman A, Smith CJ, Perfetti TA. 2000. The composition of cigarette smoke: a retrospective, with emphasis on polycyclic components. Hum Exp Toxicol 19:573–595.

Rusan NM, Peifer M. 2007. A role for a novel centrosome cycle in asymmetric cell division. J Cell Biol 177:13–20.

Schadt EE. 2009. Molecular networks as sensors and drivers of common human diseases. Nature 461:218–223.

Shenker NS, Polidoro S, van Veldhoven K, Sacerdote C, Ricceri F, Birrell MA, et al. 2013. Epigenome-wide association study in the European Prospective Investigation into Cancer and Nutrition (EPIC-Turin) identifies novel genetic loci associated with smoking. Hum Mol Genet 22:843–851.

Smith IM, Mydlarz WK, Mithani SK, Califano JA. 2007. DNA global hypomethylation in squamous cell head and neck cancer associated with smoking, alcohol consumption and stage. Int J Cancer 121:1724–1728.

Song Y, Hou J, Huang X, Zhang X, Tan A, Rong Y, et al. 2014. The Wuhan-Zhuhai (WHZH) cohort study of environmental air particulate matter and the pathogenesis of cardiopulmonary diseases: study design, methods and baseline characteristics of the cohort. BMC Public Health 14:994, doi: 10.1186/1471-2458-14-994.

Sosnowski R, Przewoźniak K. 2015. The role of the urologist in smoking cessation: why is it important? Urol Oncol 33:30–39.

Sun YV, Smith AK, Conneely KN, Chang Q, Li W, Lazarus A, et al. 2013. Epigenomic association analysis identifies smoking-related DNA methylation sites in African Americans. Hum Genet 132:1027–1037.

Valeri L, Vanderweele TJ. 2013. Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol Methods 18:137–150.

Vogt Isaksen C. 2004. Maternal smoking, intrauterine growth restriction, and placental apoptosis. Pediatr Dev Pathol 7:433–442.

von Hundelshausen P, Weber C. 2007. Platelets as immune cells: bridging inflammation and cardiovascular disease. Circ Res 100:27–40.

WHO (World Health Organization). 2014. Tobacco. Fact Sheet No. 339. Available: [accessed 26 August 2014].

Zeilinger S, Kühnel B, Klopp N, Baurecht H, Kleinschmidt A, Gieger C, et al. 2013. Tobacco smoking leads to extensive genome-wide changes in DNA methylation. PLoS One 8:e63812, doi: 10.1371/journal.pone.0063812.

Zhang Y, Yang R, Burwinkel B, Breitling LP, Holleczek B, Schöttker B, et al. 2014. F2RL3 methylation in blood DNA is a strong predictor of mortality. Int J Epidemiol 43:1215–1225.

WP-Backgrounds Lite by InoPlugs Web Design and Juwelier Schönmann 1010 Wien