Open access
Brief Report
1 December 2016

Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology: Lessons from an Innovative Workshop

Publication: Environmental Health Perspectives
Volume 124, Issue 12
Pages A227 - A229


Quantifying the impact of exposure to environmental chemical mixtures is important for identifying risk factors for diseases and developing more targeted public health interventions. The National Institute of Environmental Health Sciences (NIEHS) held a workshop in July 2015 to address the need to develop novel statistical approaches for multi-pollutant epidemiology studies. The primary objective of the workshop was to identify and compare different statistical approaches and methods for analyzing complex chemical mixtures data in both simulated and real-world data sets. At the workshop, participants compared approaches and results and speculated as to why they may have differed. Several themes emerged: a) no one statistical approach appeared to outperform the others, b) many methods included some form of variable reduction or summation of the data before statistical analysis, c) the statistical approach should be selected based upon a specific hypothesis or scientific question, and d) related mixtures data should be shared among researchers to more comprehensively and accurately address methodological questions and statistical approaches. Future efforts should continue to design and optimize statistical approaches to address questions about chemical mixtures in epidemiological studies.


There is great interest in quantifying the impact of exposure to environmental chemical mixtures on human health. As shown in biomonitoring studies, children and adults are exposed to a large number of environmental chemicals across the life span (Aylward et al. 2013; CDC 2012; Exley et al. 2015; Frederiksen et al. 2014). Many are potentially toxic, but little is known about health effects from exposure to complex mixtures (Carlin et al. 2013; Claus Henn et al. 2014; Goodson et al. 2015; Grandjean and Landrigan 2014; Johns et al. 2012). By examining chemical mixtures, instead of one chemical at a time, it may be possible to more accurately identify risk factors for diseases with environmental origins and develop more targeted public health interventions.
In 2011, the National Institute of Environmental Health Sciences (NIEHS) hosted a workshop on chemical mixtures entitled “Advancing Research on Mixtures: New Perspectives and Approaches for Predicting Adverse Human Health Effects.” This workshop brought together experts from epidemiology, toxicology, exposure science, risk assessment, and statistics to identify key challenges in mixtures research and to suggest approaches for addressing those challenges (Carlin et al. 2013). An important theme that emerged was the need for further collaboration between experts that would help bridge the gap between toxicological and epidemiological studies that involve chemical mixtures. This cross-disciplinary collaboration is a necessary step in understanding exposure to real-world mixtures and the associated health effects. Another key concept that came from the workshop included the need to develop novel statistical approaches that would predict and evaluate effects associated with exposure to mixtures. In addition, the NIEHS has incorporated into its 2012–2017 Strategic Plan (Goal 4) the need for further study of the health effects associated with combined exposures (NIEHS 2012; see This goal includes the assessment of joint action of multiple environmental exposures, including chemicals, nonchemical stressors (e.g., socioeconomic, behavioral factors), infectious agents, the microbiome, and nutritional components on toxicity and disease. Moreover, there is a need to identify interactions resulting from combined exposures, determine how the combined exposures affect human health outcomes, and identify preventive measures to mitigate the potential impact of these exposures.


To follow up on the themes from the 2011 workshop, and in an effort to focus on statistical approaches for multi-pollutant (i.e., mixtures) epidemiology studies, the NIEHS convened another workshop in July 2015. This workshop—“Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology Studies”—was designed to bring together experts from the fields of environmental epidemiology and biostatistics (NIEHS 2015). The primary objective of the workshop was to identify and compare different approaches and methods for analyzing chemical mixture data in epidemiological studies.
An innovative approach was used to attract and engage potential workshop participants and conduct a working meeting. This approach involved having participants apply various statistical methods to two simulated data sets and one real-world data set before the workshop. Each data set included a single continuous health outcome (Y), multiple chemical exposures, and additional non-exposure variables (e.g., potential confounders). Experts were offered an opportunity to test their statistical methods of choice on the data sets and later exhibit their findings at the workshop.
The first step in the process was to make the simulated data sets available to potential participants approximately six months before the workshop. These data sets have since been made publically available on the NIEHS web site ( Participants were asked to analyze the data sets using their specific statistical approach(es) and to submit an abstract describing their approach(es). Second, the real-world data set was made available to those who submitted abstracts based on their analyses of the simulated data sets. The methods used to create the two simulated data sets were known only by the workshop organizers and were revealed to those participants that had submitted their analyses to the workshop organizers prior to the workshop. This allowed potential participants to compare their results to the known (i.e., “truth”) results and to reflect on why their results may have differed.
The planning committee received 33 abstracts from academia, government, and industry. Based on these abstracts, subsets of individuals were invited to present their approach and statistical model(s) at the meeting.

The Data Sets

Simulated data set 1 (n = 500) was designed to represent a prospective cohort epidemiologic study with seven continuous, log-normal exposures and one binary variable stipulated to be a confounder that required adjustment. Assumptions were built into the data set and included no loss to follow up, missing or censored data, mismeasurement of the variables, or other potential biases. It was also assumed that the seven exposure variables and the binary variables were neither intermediate variables nor colliders. The data sets were designed such that there were high correlations between exposures, the binary variable was a strong confounder, and directions of effect for the exposures differed. Random, normally distributed noise was added to the outcome variable, and only part of the variation in the outcome was explained by the independent variables. In addition, this data set had fewer exposure variables than the second simulated data set and smaller amounts of unexplained variation (e.g., random noise), non-linear exposure–response functions, and interactions between exposures.
Simulated data set 2 (n = 500) represented data from a cross-sectional study of 14 exposure variables. This data set included three potential confounders (two continuous and one binary), a strong correlation between exposures, and strong effect measure modification by a binary confounder (e.g., sex). The exposure variables had complex correlations based on real-world biomarker data from the National Health and Nutrition Examination Survey (NHANES). The second simulated data set featured more exposure variables and more unexplained variation than the first simulated data set, but contained linear exposure–response functions and no interactions between exposures. To understand the nature of the challenges presented to the workshop participants, the reader can find additional information regarding the complexity of the simulated data sets and the assumptions that were built into the data sets on the workshop web site (
The third data set was a modified real-world data set (n = 270) that came from a prospective pregnancy and birth cohort study of mothers and children where the results (i.e., truth) were unknown (Braun et al. 2016b). This data set included 22 exposure variables: 14 polychlorinated biphenyls (PCBs), 4 polybrominated diphenyl ethers (PBDEs), and 4 organochlorine pesticides. The outcome consisted of scores on the Mental Development Index (MDI; a measurement of cognition) (Bayley 1969) at ages 1–3 years; covariates included child’s sex and mother’s age, education, race, and smoking status during pregnancy.
For each analysis, workshop participants were encouraged to work in multidisciplinary teams including epidemiologists, statisticians, and toxicologists. They were asked to address the following qualitative and quantitative questions in their analyses:
Which exposures potentially contributed to the outcome? Are there any that did not? (qualitative).
How much did the exposures potentially contribute to the outcome? (quantitative).
Was there evidence of “interaction?” Be explicit with your definition of interaction [toxicologists, epidemiologists, and biostatisticians tend to think about this quite differently (Howard and Webster 2013)].
What was the effect of joint and cumulative exposure to the mixture? (qualitative).
What is the estimate of the function Y = f(X 1,…,X p)? (quantitative).
Workshop participants were also asked to provide specific details about their methodologies and how their assumptions may have influenced the results. These included providing a basic overview of the method(s) used, the rationale for using their approach(es), any transformation or preparation of the data necessary to using the approach(es), and assumptions inherent to the approach(es) and built into the model (e.g., departures from linearity, dose–response shapes, interactions, modifiers, and different potencies for exposures). Participants were also asked to include information about the statistical software they used and to provide the statistical code in their analysis (e.g., R, SAS). They were encouraged to state whether or not they used an existing package or procedure and identify if they had to significantly modify an existing package or procedure, or develop completely new code. The statistical code submitted by participants is available on the workshop’s web site (
Finally, participants were asked to compare the outcomes of their analyses to the correct answers (i.e., truth) associated with the two simulated data sets. If they did not achieve the correct answers for either data set, they were asked to speculate as to why this might have occurred and if changing assumptions would have enabled them to reach the correct result. In addition, the participants were requested to summarize the main strengths and weaknesses of their approach, note any particular challenges they encountered during their analysis (e.g., lack of toxicity data information, limitations in number of exposures that could be evaluated at one time), and recommend next steps.


Numerous statistical approaches were proposed at this workshop and can be categorized as classification and prediction, exposure–response surface estimation, variable selection, and variable shrinkage strategies (Table 1). In general, most of these techniques involved reduction or summation of the exposures in some way. For comparison purposes, some investigators evaluated the commonly implemented linear regression (ordinary least squares) approach. All methods were applied to both the simulated datasets 1 and 2 and some were applied to the real-world dataset.
Table 1 Examples of approaches presented at the NIEHS workshop “Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology Studies.”
Single chemical analysisClassic linear regression (ordinary least squares)
Multiple regressionClassic linear regression (ordinary least squares)
Visualization, structural equation modeling (SEM), and principal component analysis (PCA)Classification and prediction
Informed sparse PCA and segmented regressionClassification and prediction
Bayesian g-formulaClassification and prediction
PCAClassification and prediction
Classification and regression trees (CART)Classification and prediction
Bayesian profile regressionClassification and prediction
Random forestClassification and prediction
Multivariate adaptive regression splines (MARS)Classification and prediction
Bayesian non-parametric regressionClassification and prediction
Bayesian additive regression trees (BART) and negative sparse PCA (NSPCA)Classification and prediction
Conformal predictionsClassification and prediction
Bayesian kernel machine regression (BKMR)Exposure–response surface estimation
Building Bayesian networksExposure–response surface estimation
Exposure surface smoothing (ESS)Exposure–response surface estimation
Modes of action (results presented for Z = 0 strata)Other
Feasible solution algorithm (FSA)Other
Exploratory data analysis (EDA)Other
Novel approach and least-angle regression (LARS)Variable selection
Machine learningVariable selection
Two-step variable selection and least absolute shrinkage and selection operator (LASSO)Variable selection
Two-step shrinkage-based regressionVariable selection
Factor mixture modelsVariable selection
Subset and bootstrapVariable selection
Variable selection regression (VSR)Variable selection
Bayesian estimation of weighted sumVariable shrinkage strategies
Shrinkage methods (LASSO/LARS)Variable shrinkage strategies
Weighted quantile sum regression (WQS)Variable shrinkage strategies
LASSOVariable shrinkage strategies
Several general observations emerged from the discussion of these approaches. First, participants agreed that no one statistical approach seemed to perform better than another at the qualitative level for the simulated data sets. Rather, there was extensive variability across the methods and less alignment with the correct answers (i.e., truth) with increasing data complexity (i.e., simulated data set 1 was less complex than simulated data set 2). The various approaches also differed in their ability to address collinearity or correlated variables, interaction between exposures, and model assumptions. Second, many methods included some form of variable and data reduction or transformation, either prior to or while conducting the analysis. Third, there is a need to define specific types of scientific questions and hypotheses related to chemical mixtures that can be addressed by epidemiologic studies (Braun et al. 2016a). More specifically, a statistical method should be chosen based upon a specific scientific question, and the use of complementary methods should be considered when exploring scientific hypotheses. The fact that no one statistical approach appeared to perform better than another may be related the fact that the organizers did not initially pose specific study questions for the analysis. In addition, the way in which the outcomes of interest were conceptualized and analyzed varied among the participants. The organizers designed the data sets with the assumption of prediction, such that the correct and incorrect answers could be easily determined, and it appears that most of the workshop participants also assumed a predictive model approach because no specific hypotheses were identified. Fourth, the limitations of the data analyzed in this workshop must be recognized and addressed to the extent possible. The simulated data sets contained continuous exposure variables, but real-world data often include categorical data. The data sets also contained a restricted number of observations (small sample size) limiting statistical power for some methods as well as other issues inherent in many epidemiologic studies (e.g., co-pollutant correlation, paucity of relevant toxicology data, and insufficient information on potential confounding variables). These issues may be addressed with larger detailed data sets (e.g., larger and more complex simulated data sets and consortium-based or pooled data); existing statistical strategies (e.g., imputation); more collaboration between epidemiologists, toxicologists, and statisticians; or the development of novel methods.

Conclusions and Future Directions

This workshop and its format were unique and novel in the field of environmental chemical mixtures and epidemiology, because participants were asked to conduct statistical analyses of specific model data sets and to compare their results to other statistical approaches. Based on the attendance, number of abstracts submitted, and enthusiastic discussion at the workshop, this format was successful in bringing statisticians and epidemiologists together to work on a common problem. The questions participants were asked to address in their analyses helped focus the discussion on the desired outcome—specifically, “Which exposures contributed to the outcome?” and “Are there any that did not?” Based on the results from the presentations and abstracts, a significant amount of variability between the methods was evident. Therefore, a useful future activity would be to systematically characterize the variation in results across methods that are sufficiently comparable to effect estimation and statistical significance. Based on the workshop discussions and comparisons across currently used methods, further development of methods is needed to adequately determine the health effects of mixtures and combined exposures. We encourage the ongoing use of the posted simulated data sets, to facilitate collaboration across environmental health disciplines and improve our understanding of the health impacts of chemical mixtures.


Aylward LL, Kirman CR, Schoeny R, Portier CJ, Hays SM. 2013. Evaluation of biomonitoring data from the CDC National Exposure Report in a risk assessment context: perspectives across chemicals. Environ Health Perspect 121(3):287-294.
Bayley N. 1969. Bayley Scales of Infant Development. 1st Edition. San Antonio, TX Psychological Corporation.
Braun JM, Gennings C, Hauser R, Webster TF. 2016a. What can epidemiological studies tell us about the impact of chemical mixtures on human health? Environ Health Perspect 124(1):A6-A9.
Braun JM, Kalloo G, Chen A, Dietrich KN, Liddy-Hicks S, Morgan YXet al. 2016b. Cohort profile: Health 0utcomes and Measures of the Environment (HOME) study. Int J Epidemiol.
Carlin DJ, Rider CV, Woychik R, Birnbaum LS. 2013. Unraveling the health effects of environmental mixtures: an NIEHS priority. Environ Health Perspect 121(1):A6-A8.
CDC (Centers for Disease Control and Prevention). 2012. Fourth National Report on Human Exposure to Environmental Chemicals, Updated Tables, September 2012. [accessed 1 December 2014].
Claus Henn B, Coull BA, Wright RO. 2014. Chemical mixtures and children’s health. Curr Opin Pediatr 26(2):223-229
Exley K, Aerts D, Biot P, Casteleyn L, Kolossa-Gehring M, Schwedler Get al. 2015. Pilot study testing a European human biomonitoring framework for biomarkers of chemical exposure in children and their mothers: experiences in the UK. Environ Sci Pollut Res Int 22(20):15821-15834
Frederiksen H, Jensen TK, Jorgensen N, Kyhl HB, Husby S, Skakkebaek NEet al. 2014. Human urinary excretion of non-persistent environmental chemicals: an overview of Danish data collected between 2006 and 2012. Reproduction 147(4):555-565
Goodson WH, Lowe L, Carpenter DO, Gilbertson M, Manaf Ali A, Lopez de Cerain Salsamendi Aet al. 2015. Assessing the carcinogenic potential of low-dose exposures to chemical mixtures in the environment: the challenge ahead. Carcinogenesis 36(suppl 1):S254-S296
Grandjean P, Landrigan PJ. 2014. Neurobehavioural effects of developmental toxicity. Lancet Neurol 13(3):330-338
Howard GJ, Webster TF. 2013. Contrasting theories of interaction in epidemiology and toxicology. Environ Health Perspect 121(1):1-6.
Johns DO, Stanek LW, Walker K, Benromdhane S, Hubbell B, Ross Met al. 2012. Practical advancement of multipollutant scientific and risk assessment approaches for ambient air pollution. Environ Health Perspect 120(9):1238-1242
NIEHS (National Institute of Environmental Health Sciences). 2012. 2012–2017 Strategic Plan. Advancing Science, Improving Health: A Plan for Environmental Health Research. Publication No. 12-7935. [accessed 11 October 2016].
NIEHS. 2015. Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology Studies [Workshop], 13–14 July 2015, National Research of Environmental Health Sciences, Research Triangle Park, North Carolina. [accessed 11 October 2016].

Information & Authors


Published In

Environmental Health Perspectives
Volume 124Issue 12December 2016
Pages: A227 - A229
PubMed: 27905274


Published online: 1 December 2016



Kyla W. Taylor [email protected]
National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA
Bonnie R. Joubert
National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA
Joe M. Braun
Department of Epidemiology, Brown University, Providence, Rhode Island, USA
Caroline Dilworth
National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA
Chris Gennings
Department of Preventive Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
Russ Hauser
Departments of Environmental Health and Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Jerry J. Heindel
National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA
Cynthia V. Rider
National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA
Thomas F. Webster
Department of Environmental Health, Boston University School of Public Health, Boston, Massachusetts, USA
Danielle J. Carlin
National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA


Address correspondence to K.W. Taylor, National Institute of Environmental Health Sciences, P.O. Box 12233, MD K2-04, Research Triangle Park, North Carolina 27709 USA. Telephone: (919) 316-4707. E-mail: [email protected]

Competing Interests

J.M.B. was financially compensated for conducting a re-analysis of a study of child lead exposure for the plaintiffs in a public nuisance case related to childhood lead poisoning. None of these activities were directly related to the present study.

Competing Interests

The other authors declare they have no actual or potential competing financial interests.

Funding Information

This work was supported by the following NIEHS grants: R00 ES020346 and R01 ES024381.

Metrics & Citations


About Article Metrics


Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click DOWNLOAD.

Cited by

  • Early-life exposure to endocrine-disrupting chemicals and autistic traits in childhood and adolescence: a systematic review of epidemiological studies, Frontiers in Endocrinology, 10.3389/fendo.2023.1184546, 14, (2023).
  • Invited Perspective: Metals and Menarche, Environmental Health Perspectives, 10.1289/EHP12555, 131, 2, (2023).
  • Evaluating Chemical Mixtures in Epidemiological Studies to Inform Regulatory Decisions, Environmental Health Perspectives, 10.1289/EHP11899, 131, 4, (2023).
  • The exposome in respiratory diseases: multiple preventable risk factors from early life to adulthood, Breathe, 10.1183/20734735.0034-2023, 19, 2, (230034), (2023).
  • Mammographic density in the environs of multiple industrial sources, Science of The Total Environment, 10.1016/j.scitotenv.2023.162768, 876, (162768), (2023).
  • Exposure to multiple air pollutant mixtures and the subtypes of hypertensive disorders in pregnancy: A multicenter study, International Journal of Hygiene and Environmental Health, 10.1016/j.ijheh.2023.114238, 253, (114238), (2023).
  • Advancing Exposomic Research in Prenatal Respiratory Disease Programming, Immunology and Allergy Clinics of North America, 10.1016/j.iac.2022.07.008, 43, 1, (43-52), (2023).
  • Associations of urinary di(2-ethylhexyl) phthalate metabolites with lipid profiles among US general adult population, Heliyon, 10.1016/j.heliyon.2023.e20343, 9, 10, (e20343), (2023).
  • Association between chemical mixtures and female fertility in women undergoing assisted reproduction in Sweden and Estonia, Environmental Research, 10.1016/j.envres.2022.114447, 216, (114447), (2023).
  • Systematic evidence mapping informs a class-based approach to assessing personal care products and pubertal timing, Environment International, 10.1016/j.envint.2023.108307, 181, (108307), (2023).
  • See more

View Options

View options


View PDF

Get Access

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.







Copy the content Link

Share on social media