Review of Epidemiological Studies of Drinking-Water Turbidity in Relation to Acute Gastrointestinal Illness

Background: Turbidity has been used as an indicator of microbiological contamination of drinking water in time-series studies attempting to discern the presence of waterborne gastrointestinal illness; however, the utility of turbidity as a proxy exposure measure has been questioned. Objectives: We conducted a review of epidemiological studies of the association between turbidity of drinking-water supplies and incidence of acute gastrointestinal illness (AGI), including a synthesis of the overall weight of evidence. Our goal was to evaluate the potential for causal inference from the studies. Methods: We identified 14 studies on the topic (distinct by region, time period and/or population). We evaluated each study with regard to modeling approaches, potential biases, and the strength of evidence. We also considered consistencies and differences in the collective results. Discussion: Positive associations between drinking-water turbidity and AGI incidence were found in different cities and time periods, and with both unfiltered and filtered supplies. There was some evidence for a stronger association at higher turbidity levels. The studies appeared to adequately adjust for confounding. There was fair consistency in the notable lags between turbidity measurement and AGI identification, which fell between 6 and 10 d in many studies. Conclusions: The observed associations suggest a detectable incidence of waterborne AGI from drinking water in the systems and time periods studied. However, some discrepant results indicate that the association may be context specific. Combining turbidity with seasonal and climatic factors, additional water quality measures, and treatment data may enhance predictive modeling in future studies. https://doi.org/10.1289/EHP1090


Introduction
Treatment of drinking water by utilities has remarkably improved public health by decreasing the risk of waterborne infection in regions served by these water supplies (Ford 2016). Nevertheless, acute gastrointestinal illness (AGI) incidence attributable to community water systems has been estimated to be as high as 12-16:4 million cases annually in the United States. (Colford et al. 2006;Messner et al. 2006). There are substantial challenges to the quantification of waterborne illness from drinking water. Microbiological pathogens are diverse, including viruses, bacteria, and protozoa; therefore, it is not realistic to routinely measure all of the pathogens that contribute to AGI within any particular water system (e.g., for screening purposes). Proxy measures may give a global indication of potential microbiological contamination, with the tradeoff of nonspecificity.
Turbidity, a measure of the cloudiness of water, has often been used as a proxy for microbiological contamination. Some studies have found that turbidity was correlated with microbiological contamination in source water and filtered drinking water (LeChevallier et al. 1991a(LeChevallier et al. , 1991b. However, turbidity is a nonspecific measure of the scattering of light by particles suspended in water and is thus influenced by various types of particulates, including silt, clay, and organic matter, that can differ in prevalence among water systems (Burlingame et al. 1998). Particulates are of concern in water systems not only because microbes themselves are particulates, but more importantly because other, nonmicrobe particulates may serve as indicators of pathogen presence (e.g., runoff may produce spikes in turbidity as both sediments and pathogens are washed into source waters), and furthermore, these other particulates may protect the pathogens from disinfectants. Given that turbidity is a nonspecific indicator of particulates, turbidity itself is not directly linked to health concerns and is not expected to be a consistent indicator of the microbiological quality of water.
Despite its limitations, turbidity has been judged to be of sufficient relevance to be included in the drinking-water regulations of many developed countries. The U.S. rules surrounding turbidity (U.S. EPA 2015), which have been amended and revised since the 1989 Surface Water Treatment Rule, currently require filtered water supplies to conduct monitoring for turbidity at each individual filter at 15-min intervals; the rules specify that turbidity of combined filter effluent should be ≤0:3 nephelometric turbidity units ðNTUÞ in at least 95% of the measurements taken each month and no single measurement should exceed 1 NTU. The U.S. requirements are less stringent for unfiltered water supplies where the source water is considered sufficiently protected against microbiological contamination, allowing a maximum turbidity of 5 NTU. Although regulations surrounding turbidity differ between developed countries, rules establishing maximum limits are in place in many regions worldwide, including European countries, Canada, Australia, Japan, and South Africa (CWWA 2002). In several regions, water utilities strive to optimize filtration performance to achieve turbidity levels below the regulatory limits (CWWA 2002), such as the voluntary goal of 0:1 NTU set by the American Water Works Association Partnership for Safe Water (AWWA 2014).
Several epidemiological studies have investigated the association between turbidity of drinking-water supplies and incidence of AGI. These studies have generally been analyzed as timeseries, correlating regional turbidity levels to AGI counts over time. Mann et al. reviewed six of the studies published before 2007 (Mann et al. 2007), and concluded that an association between turbidity and AGI is likely in some settings or over a certain range of turbidity. Several additional studies have been published on this topic since the Mann review, conducted in Le Havre and Nantes, France (Beaudeau et al. 2012(Beaudeau et al. , 2014b and in Eastern Massachusetts, Atlanta, and New York City in the United States (Beaudeau et al. 2014a;Hsieh et al. 2015;Tinker et al. 2010). Positive results have been interpreted by some to suggest that there was a detectable excess risk of AGI from drinking water in the populations studied. However, there are issues with the research that complicate causal inference, such as the uncertain utility of turbidity as a universal proxy for microbiological contamination, confounding by season and climatic variables, and regional differences in source-water quality and treatment practices.
Given a previously described association (Mann et al. 2007) and multiple new studies on the topic, we conducted a review to synthesize the evidence for an association between drinkingwater turbidity and AGI incidence. In addition to synthesis, we aimed to describe differences between the studies that may explain discrepant results. Our goal was to evaluate the potential for causal inference from the studies, in other words, the extent to which the observed associations may be interpreted as indicating the presence of waterborne AGI from drinking water in the regions and time periods studied. Because our focus was on causal inference, rather than the strength of associations, we used qualitative, rather than quantitative, review methods to synthesize and compare information from the studies. A secondary aim, based on our review, was to identify the seemingly most promising approaches for future research. This body of work may provide insight into the utility of turbidity measurements and the time-series study design to detect increased risk of waterborne AGI from microbiological contamination of drinking-water supplies.

Methods
We conducted a literature search to identify and select epidemiological studies that have examined drinking-water turbidity in relation to AGI. Our review was open to inclusion of studies written in English and published as articles in peer-reviewed journals, as government reports, or in gray literature including conference proceedings and dissertations. Our inclusion criteria did not specify any particular time period or region. We identified studies by conducting searches in MEDLINE (PubMed), JSTOR, and ProQuest databases, using search terms such as "turbidity AND water AND gastrointestinal." In addition, we reviewed the citation lists of published reports to find additional studies, and conducted a Web of Science database linkage of identified peer-reviewed papers to all subsequent studies appearing in the peer-reviewed literature that cited one of the articles. We reviewed the abstract for each retrieved study to evaluate its suitability for inclusion in the literature review. The steps and results of our literature search are detailed in Table 1. One additional study that was published shortly after we conducted our database search met our criteria and was also included (Hsieh et al. 2015).
We extracted information from each study including design, region and time period, exposure (turbidity) measurement, outcome (AGI) definition and identification, statistical methods, assessment of confounding and other potential biases, effect modification (interaction) by age or other factors, and the authors' overall conclusions. For each study region, we recorded the authors' description of source water, water treatment practices, and turbidity levels. We noted the few studies that were conducted during a recognized outbreak of waterborne AGI. We specified the water sample collection points for the turbidity measurements included in the research (e.g., finished effluent, source water, or from within the distribution system) and the summary turbidity metric used in regression models (e.g., daily average, maximum). Studies evaluated multiple lags to test hypotheses regarding the length of time by which an increase in turbidity is followed by an increase in AGI incidence. We summarized each study's approach for testing lags and recorded the lags with notable results. We also compared the notable lag times across the studies to evaluate consistency in the collective findings. For each study, we summarized the potential confounders considered and discussed important omissions. We also noted the impact of adjustment for individual covariates, where presented by the study authors. We described where the association was stronger in, or limited to a specific subgroup of the study population or the data, such as in analyses by age group, season, water supply service area, or AGI definition. We noted subanalyses that were conducted in each study, such as sensitivity analyses to evaluate the influence of extreme turbidity levels, and described where results differed from the overall analysis. Each of the coauthors reviewed the studies independently, before coming together to discuss and critically evaluate the methods of the individual studies and overarching issues with the body of literature.

Design and Methods of the Studies
We identified 14 studies reporting on the association between measured levels of turbidity of drinking-water supplies and incidence of AGI (Aramini 2002;Bateson 2001;Beaudeau et al. 2012Beaudeau et al. , 2014aBeaudeau et al. , 2014bEgorov et al. 2003;Gilbert et al. 2006;Hsieh et al. 2015;Lim et al. 2002;Morris et al. 1998;Naumova et al. 2003;Schwartz et al. 1997Schwartz et al. , 2000Tinker et al. 2010). These were considered 14 distinct studies as they were conducted in different regions, time periods, and/or population subgroups. Several other reports were reviewed but were not considered distinct studies becauses they reported results found in one of the references cited above (Aramini 2000;Bateson et al. 2000;Beaudeau et al. 1999;Tinker 2007) or covered the same region, time period, and population (Morris et al. 1996). The first study was conducted following a documented outbreak of waterborne cryptosporidiosis in Milwaukee, Wisconsin in 1993 (Morris et al. 1996(Morris et al. , 1998. Investigators evaluated the association between turbidity levels and AGI both during and before the outbreak, using a broad definition of AGI, not specific to cryptosporidiosis. Additional studies have been conducted elsewhere in the United States (Philadelphia, Seattle, Atlanta, Eastern Massachusetts, and New York City), Canada (Vancouver, Edmonton, and Quebec City), France (Le Havre and Nantes), and Russia (Cherepovets). All of the studies were designed as time series, in which turbidity measurements and AGI counts were summarized within small time increments (usually daily) and the summary measures were correlated over the study period in order to estimate the relative incidence of AGI when comparing across turbidity levels. Table 2 describes the source water, drinking-water treatment methods including filtration and type of disinfection, and turbidity levels of the water supplies studied. Table 3 shows details of the studies including AGI definition, covariates, and findings.
Some of the studies were conducted under conditions that would not comply with current U.S. water quality regulations (Table 2). In the studies of the outbreak and pre-outbreak conditions in Milwaukee (Morris et al. 1996(Morris et al. , 1998Naumova et al. 2003), the turbidity in the Linwood (North) Plant would not be in compliance with the current U.S. requirement to be ≤0:3 NTU in 95% of samples, even before the outbreak period. The water quality for studies in Le Havre, France (Beaudeau et al. 2012) and Cherepovets, Russia  frequently had finished water turbidity exceeding 0:3 NTU. There are a number of more recent studies of water systems with turbidity levels generally <0:3 NTU, including unfiltered water supplies from protected sources, such as in Seattle (Bateson 2001), Eastern Massachusetts (Beaudeau et al. 2014a), New York City (Hsieh et al. 2015), and Vancouver (Aramini 2002) and filtered supplies derived from surface-water sources in Philadelphia (Schwartz et al. 1997, Atlanta (Tinker et al. 2010), Edmonton (Lim et al. 2002), and Nantes, France (Beaudeau et al. 2014b). It should be noted that because of changing regulations and treatment practices, the turbidity levels reported within a particular region are only directly representative of the time period studied.
The studies defined exposure using turbidity measurements performed as part of standard monitoring of drinking-water supplies. The specific instrumentation and methods used for measurement were not stated in most of the studies; however, regulatory bodies such as the U.S. EPA stipulate sampling methods for compliance with regional rules (U.S. EPA 2015). The collection points for turbidity measurement in the studies were primarily finished (filtered) effluent from treatment plants and source (raw) water of either unfiltered or filtered water supplies. Only one study included measurements taken within the distribution system (Hsieh et al. 2015). Multiple daily measurements were summarized for statistical analyses into daily mean, median, minimum, or maximum values or were summarized over longer time increments as a moving average.
Cases of AGI were defined in most of the studies by AGIrelated diagnosis codes (e.g., ICD-9) for hospital admissions, emergency department visits, or physician office (outpatient) visits (Table 3). The sources of data for case identification were Medicare records (reported by the Health Care Financing Administration) (Bateson 2001;Beaudeau et al. 2014a;Naumova et al. 2003;Schwartz et al. 2000), specific regional health care providers (Morris et al. 1996;Morris et al. 1998, Schwartz et al. 1997Tinker et al. 2010), or the government healthcare system in Canada (Aramini 2002, Lim et al. 2002. Other AGI case definitions were based on prescription drug sales in France (Beaudeau et al. 2012, Beaudeau et al. 2014b, calls to a health information line in Quebec City, Canada (Gilbert et al. 2006), and selfreported symptoms during a short-term longitudinal data collection in Russia . The study conducted in New York City used syndromic surveillance of the chief complaint reported in emergency department visits to classify a diarrhea syndrome (Hsieh et al. 2015).
The studies typically applied generalized additive Poisson regression models to the time-series data. Early application of generalized additive models (GAMs) to study the association between water turbidity and AGI was given in Schwartz et al. (1997). Since then, other researchers of this topic have followed a similar method with some variations. The studies frequently applied nonparametric smoothing methods, such as local polynomial regression (LOESS) or smoothing splines to adjust for possible nonlinearity of the effects of seasonal cycles, long-term trends, climatic factors, or other covariates. Studies also adjusted for potential autocorrelation by inclusion of an autoregressive term representing lagged values of the AGI outcome (Aramini 2002;Beaudeau et al. 2012Beaudeau et al. , 2014bGilbert et al. 2006;Lim et al. 2002;Morris et al. 1998;Naumova et al. 2003). Turbidity has been modeled using either a linear term or nonparametric smoothing, or with exploration of both types of relationships in separate models. All of the studies evaluated multiple lag times representing the latency between turbidity measurement and AGI case identification. The lag times were generally tested in multiple, separate models, and a few studies also considered multiple lags simultaneously in distributed lag models (Hsieh et al. 2015;Schwartz et al. 1997;Tinker et al. 2010). Significance testing, model fit, and various graphical methods such as temporal No additional studies were identified Search that casts a wide net to identify journal articles, conference abstracts, and books that include any combination of the search terms JSTOR: turbidity AND gastro# AND "drinking water" 200 Bateson et al. 2000 (Abstract) Search includes a 'wild card' symbol (#), which will retrieve articles with any word starting with "gastro," including both gastrointestinal and gastroenteritis, and is also limited to items containing the words "drinking water" in any search field Web of Science, cited reference search: for each previously selected peerreviewed publication, search identifies subsequent articles that cited the paper b Table 3.
Design and results of time-series studies of the association between turbidity of drinking-water supplies and risk of acute gastrointestinal illness.
Region (  Environmental Health Perspectives 086003-11 exposure response surface (TERS) plots were used to determine the importance of lagged effects. Multiple authors implemented modeling approaches equipped to deal with possible overdispersion of the AGI count data, such as assuming a quasi-Poisson distribution (Beaudeau et al. 2012(Beaudeau et al. , 2014bHsieh et al. 2015), or alternatively using negative binomial regression (Hsieh et al. 2015) or robust regression (M-estimation) (Bateson 2001;Schwartz et al. 1997), or by scaling variance estimates to account for overdispersion (Tinker et al. 2010). Additionally, several studies assessed the influence of extreme turbidity values by exclusion of the highest values (Bateson 2001;Beaudeau et al. 2012Beaudeau et al. , 2014aBeaudeau et al. , 2014bHsieh et al. 2015). Other approaches used by investigators to assess the sensitivity of results to modeling choices and model fit included comparison of a case-control study design to the time-series analysis (Aramini 2002;Lim et al. 2002), and following a split-sample approach to fit and test their models in different subsets of the data (Beaudeau et al. 2012(Beaudeau et al. , 2014b. Table 3 shows the study findings. Positive associations between turbidity and AGI incidence were found in unfiltered water supplies from protected sources, such as in Seattle (Bateson 2001), Eastern Massachusetts (Beaudeau et al. 2014a), New York City (Hsieh et al. 2015), and Vancouver (Aramini 2002). Associations were also observed in filtered supplies with relatively low turbidity levels, in Philadelphia (Schwartz et al. 1997;Schwartz et al. 2000) and Nantes, France (Beaudeau et al. 2014b). Findings were robust to alternate model specification (e.g., negative binomial models; M-estimation) (Bateson 2001;Hsieh et al. 2015;Schwartz et al. 1997) and exclusion of extreme turbidity values, above the 98th percentile (Beaudeau et al. 2012(Beaudeau et al. , 2014a(Beaudeau et al. , 2014bHsieh et al. 2015). The results of Tinker et al. (2010), from Atlanta, are mixed. In this study, the multi-day association lagged over 0 to 20 d was significant for only one of eight filtered water supplies; however, there was a positive association with source (raw) water turbidity. The one study that reported no relationship between turbidity and AGI examined a filtered water supply in Edmonton with low levels of turbidity; the investigators observed some statistically significant increases of AGI at specific lags, but no overall "significant" relationships according to their criteria involving model fit and statistical significance of lags over at least 2 consecutive days (Lim et al. 2002). Comparison of estimated effect sizes was hindered by differences between the studies, such as the turbidity contrasts used in risk estimation, assumption of a linear or nonlinear association between turbidity and AGI incidence, and the turbidity exposure metric used in the time-series (e.g., daily or multi-day average, maximum). As shown in Table 3, there were some relatively large magnitude effects with large turbidity contrasts, such as 73-182% increases in AGI incidence observed per 0:5-NTU increase in Milwaukee water 2-wk average turbidity during the outbreak (Morris et al. 1996) and 33-76% AGI increases with a contrast of 0:48 NTU daily mean turbidity found in Quebec City (Gilbert et al. 2006). Smaller magnitude effect sizes were generally found with smaller turbidity unit contrasts, such as 7-9% AGI increases per 0:04 NTU daily mean turbidity in Philadelphia (Schwartz et al. 1997 and 3% increase per 0:01 NTU daily mean turbidity in Nantes, France (Beaudeau et al. 2014b). Higher magnitude effects were estimated in Le Havre, France (23-27% AGI increases per 0:1 NTU in the Saint Laurent plant) and Cherepovets, Russia (64% AGI increase per 0:27 NTU), regions considered vulnerable to contamination during the time periods studied (Beaudeau et al. 2012;Egorov et al. 2003). Other effects were disproportionally small, such as the 6% increase in AGI per 10-NTU increase in 3-d maximum turbidity of AGI source (raw) water in Atlanta (Tinker et al. 2010), and the 5% increase in AGI per approximately 1:0-NTU increase of distribution system turbidity (turbidity interquartile range estimated from figure) found in New York City during the spring season (Hsieh et al. 2015). Interpretation of these comparisons is speculative, as the broad variation in effect sizes may reflect differences in methods as well as true heterogeneity of effect among the studies.

Study Findings
In some studies, the association between turbidity and AGI incidence was limited to relatively high turbidity values. In the two studies that stratified data by season (Bateson 2001;Hsieh et al. 2015), positive findings were limited to the season with the highest turbidity levels [winter, defined as October through March in Bateson (2001); spring in Hsieh et al. ( 2015)]. In the spring season in New York City, when median distribution system turbidity levels were generally greater than 1:0 NTU, Hsieh et al. (2015) found that this association was attenuated and lost statistical significance when excluding the top 20% of values. Further evidence of an association limited to higher turbidity levels comes from studies of the Milwaukee 1993 outbreak, in which exclusion of the outbreak period, during which turbidity levels peaked at 1:7 NTU, resulted in loss of the association at the Howard Avenue (South) plant, which measured turbidity levels lower than 0:2 NTU for most of the remaining study period (Morris et al. 1996(Morris et al. , 1998Naumova et al. 2003). Even during the outbreak period, the association appeared strongest at the highest turbidity levels, with a sharp increase in AGI incidence with turbidity greater than 1:0 NTU (estimated from TERS plots in Morris et al. 1998). The association observed in Cherepovets, Russia, a region with relatively high turbidity levels, was limited to participants who consumed nonboiled tap water . Furthermore, this association only appeared at the higher turbidity levels observed in the study, with daily averages greater than approximately 0:9 NTU (estimated from figure). Evidence of nonlinearity was also found in Nantes, France, in a water supply with very low overall turbidity, averaging 0:05 NTU during the study period (Beaudeau et al. 2014b). In this study, there was no clear association at turbidity levels lower than 0:045 NTU, and a nearly linear positive association with AGI incidence at turbidity levels beyond this point. Complicating interpretation, some studies found evidence that the association was restricted to lower turbidity values. The association during winter season in Seattle was observed only when the data were restricted to days with source-water (unfiltered supply) turbidity levels <1 NTU (80.5% of days) (Bateson 2001). In the study of an unfiltered water supply in Eastern Massachusetts (Beaudeau et al. 2014a), the relationship between turbidity and AGI showed increasing risk up to a turbidity level of 0:33 NTU and no apparent additional increase in risk at higher levels.
Several studies examined source (raw) water turbidity in addition to turbidity of finished water. In Nantes, France, sourcewater turbidity was significantly associated with AGI incidence, but without the consistency of effect across lags that was observed for the association with finished (filtered) water (Beaudeau et al. 2014b). In Atlanta, source-water turbidity was associated with daily counts of AGI-related emergency department visits, but finished (filtered) water turbidity was not (Tinker et al. 2010); the authors reported that there was little correlation between turbidity measures from the source and finished water. Nevertheless, the pattern of association of 3-d moving average turbidity with AGI in the Atlanta study (Tinker et al. 2010) was somewhat similar between the minimum source-water turbidity and average finished water turbidity, with the effect rising from 0-d lag to a peak at a 6-to 7-d lag, dipping to a low at an 11-to 13-d lag, and rising again with a peak at 15-d lag (estimated from figure), albeit with no statistically significant effects for finished water. Similarly, the source-water turbidity of the Radicatal plant in Le Havre, France was strongly associated with AGI-related prescriptions, whereas the finished (filtered) effluent turbidity had a similar magnitude effect that was not statistically significant (Beaudeau et al. 2012). In Edmonton, there was consistency in the lack of an overall relationship for both source-water and filtered-water turbidity with AGI (Lim et al. 2002).The study by Hsieh et al. (2015), conducted in New York City, was the only study to include turbidity measurements taken from within the distribution system. The authors found that an association between distribution system water turbidity (unfiltered supply) and diarrheal events, observed during the spring season, was almost completely explained by the variation in source-water turbidity (Hsieh et al. 2015).
Season was an important confounder and every study adjusted for it, by modeling covariates or by stratification. Hsieh et al. (2015) and Bateson (2001) presented only season-specific analyses. Other investigators demonstrated that they effectively adjusted for the nonlinear effects of season and other time trends by showing plots of the residuals over the study period  or by describing noncorrelation of the residuals Tinker et al. 2010). Air temperature, day-ofthe week, and holidays were also important confounders in most studies. Several studies found that precipitation was not a confounder of the association between turbidity and AGI (Aramini 2002;Beaudeau et al. 2014b;Gilbert et al. 2006;Hsieh et al. 2015;Lim et al. 2002) and other studies reported an association between turbidity and AGI even with adjustment for precipitation (Bateson 2001;Tinker et al. 2010). The studies in Canada (Aramini 2002;Lim et al. 2002) evaluated residual confounding from season and weather indirectly by conducting concurrent time-series and case-control studies including the same AGI cases, with controls identified from healthcare visits for acute respiratory illness. The rationale behind control selection was that turbidity (as a proxy for pathogens) was not expected to be a cause of acute respiratory illness, but the seasonal pattern of the control diagnoses was expected to be similar to AGI (peaking in the winter months), thereby providing indirect adjustment for seasonal trends. Aramini (2002) found similar results in the casecontrol and time-series studies in Vancouver, with excess risk of AGI peaking around the same turbidity levels and with similar lag times (as shown in TERS plots in the study), and Lim et al. (2002) reported null results with both study designs in Edmonton. The similarity of results with the two study designs suggests adequate adjustment of seasonal trends in the timeseries analyses. Egorov et al. (2003) was the only study to collect individual-level information on potential confounders, from which they adjusted their time-series for behavioral factors such as recreational water contact, attendance at summerhouses (with implied lack of running water), and out-of-town trips. The association between drinking-water turbidity and AGI was strengthened by adjustment for the individual behaviors ; however, potential biases from self-reported AGI in this study limit interpretation.
There was limited and inconsistent evidence for effect modification by age or AGI definition. Several studies that examined multiple age groups found that associations were strongest in children (Beaudeau et al. 2014b;Hsieh et al. 2015;Morris et al. 1996;Tinker et al. 2010) or the oldest of the elderly (Bateson 2001;Schwartz et al. 2000), although another found no association among the elderly (Hsieh et al. 2015). Some studies conducted separate analyses for AGI counts based on different case definitions. Morris et al. (1996) found generally higher relative risks for the association of turbidity with AGI illness from emergency department visits and hospital admissions (combined) than from outpatient physicians' office visits in Milwaukee. In Philadelphia children, the association with turbidity did not differ remarkably between AGI defined from emergency visits or hospitalizations (Schwartz et al. 1997). The results of the Vancouver study (Aramini 2002), analyzed by AGI definition and age group, do not show any clear patterns of effect modification; however, not all results were shown.
Several studies considered multiple variables for characteristics of source water, treatment, and water quality in their models, in addition to turbidity. The study in Eastern Massachusetts examined confounding by other measurements taken in the water system, including fecal coliforms, UV-absorbance, algae, cyanobacteriae, and water temperature, and found that adjustment for these variables did not remarkably change the main association of AGI incidence with turbidity of the unfiltered water supply (Beaudeau et al. 2014a). They did find, however, that algaecorrected turbidity was more strongly associated than overall turbidity with AGI-related hospital admissions, and resulted in improved model fit. Source-water temperature was strongly and inversely associated with AGI, and replacement of the independent model terms for algae-corrected turbidity and water temperature with a bidimensional spline term (to model interaction between the two variables) also significantly improved the fit of the model (Beaudeau et al. 2014a). The study conducted in Nantes, France found a strong association of AGI with finished (filtered) water turbidity on days with high river flow, and a weaker association at low flow, with a significant interaction between the two variables (Beaudeau et al. 2014b). In Le Havre, France, the model fit for finished effluent turbidity was significantly improved by inclusion of an interaction term indicating days with additional coagulation-flocculation-settling that was imposed during times of high turbidity (Beaudeau et al. 2012). There was a statistically significant increased risk of AGI associated with finished effluent turbidity on days without settling in both children and adults, whereas the association was inconsistent between children and adults on days with the additional settling treatment.
Most studies evaluated lags between turbidity measurement and AGI outcome spanning from 1 to 13 or 14 d, and a few considered lags extending beyond 20 d (Aramini 2002;Beaudeau et al. 2014a;Gilbert et al. 2006). Some investigators presented results for the entire span of lags examined, in tables or figures, such as three-dimensional TERS plots (Aramini 2002;Beaudeau et al. 2012;Egorov et al. 2003;Gilbert et al. 2006;Hsieh et al. 2015;Lim et al. 2002;Morris et al. 1998). In other studies, limited results were shown. Some studies took measures to avoid false positive results within the multiple testing framework, such as by choosing an a priori lag structure for their main analysis (Beaudeau et al. 2014a;Tinker et al. 2010), only reporting on lags with statistical significance for multiple days in a row (Gilbert et al. 2006;Lim et al. 2002), or only reporting on the best fitted model (Bateson 2001). Others addressed multiple comparisons by estimating the likelihood of the total number of positive associations observed, relative to the number expected under the null hypothesis (Schwartz and Levin 1999;Schwartz et al. 2000).

Discussion
A positive association between turbidity of drinking water and AGI incidence has been observed in different cities and time periods, in regions with varying characteristics of source water, with both unfiltered and filtered supplies, and with varying turbidity levels. There is some consistency in the lag times with associations (i.e., the time between the measured turbidity and identification of the AGI case), which fall between 6 and 10 d in many studies. The studies appear to adequately adjust for possible biases of the time-series design, including confounding by seasonal cycles and other time trends, although it is acknowledged that unknown biases may still occur. There is evidence for nonlinearity of the association from several studies, showing stronger associations at higher levels of turbidity. The collective results reveal a broad association between turbidity and AGI that suggests low-level incidence of waterborne AGI from drinking water within the systems and time periods studied.
The potential for causal interpretation was strongest from studies that presented results for all individual lags and found a pattern of increased risk at several consecutive lags, rather than a single peak (Aramini 2002;Beaudeau et al. 2012;Hsieh et al. 2015). Causal inference was also strengthened when investigators found similar results between their main analysis and with alternate modeling strategies (Aramini 2002;Hsieh et al. 2015;Schwartz et al. 1997;Schwartz et al. 2000) or with model validation such as fitting and testing in a split sample (Beaudeau et al. 2012(Beaudeau et al. , 2014b. The potential for causal inference was more limited from studies that showed fewer, selected results (Bateson 2001;Schwartz et al. 1997Schwartz et al. , 2000 or with results averaged across multiple lag days, such as from distributed lag models (Tinker et al. 2010). Findings of effects limited to low turbidity levels are inconsistent with the hypothesized causal relationship (Bateson 2001;Beaudeau et al. 2014a), although not inconceivable. Some studies had potential biases that rendered the results suggestive, at most, such as possible bias from self-reported AGI  or diagnosis bias that could have followed news reporting of the 1993 outbreak in Milwaukee, although this bias would not have affected positive results found during the preoutbreak period (Morris et al. 1996;Morris et al. 1998;Naumova et al. 2003).
Given that the ratio of particulates to pathogens can vary greatly among water systems, turbidity alone may not be adequately correlated with microbiological contamination to serve as a useful proxy in every situation. The findings from New York City (Hsieh et al. 2015) and Seattle (Bateson 2001), in which an association was observed only in the season with the highest turbidity, and those from Atlanta (Tinker et al. 2010) and Le Havre, France (Beaudeau et al. 2012) in which there was a significant association with source (raw)-water turbidity but not with finished (filtered) water suggest that in some water systems, the utility of turbidity as a proxy for microbiological contamination may be limited to high-turbidity scenarios. Nevertheless, as other studies found an association at low levels of turbidity similar to those measured in Atlanta finished water (Beaudeau et al. 2014b;Schwartz et al. 1997Schwartz et al. , 2000, the discrepancies also suggest that the utility of turbidity as a proxy may be context specific-dictated by watershed characteristics, treatment system approaches and performance, and local environmental factors related to season. For this reason, future studies would be most informative not by asking "Is there an association?" but, rather, "Under what conditions does an association exist?" The body of work demonstrates the efficacy of studies correlating turbidity with AGI counts in time-series for preliminary investigation of the safety of water supplies. However, the apparent context-specific nature of turbidity in its association with AGI has implications for the optimal conduct of such studies, including design issues and modeling strategies aimed toward elucidating specific-rather than generalized associations. Additionally, based on our review, we propose recommendations for enhancing the potential for causal interpretation from future studies.

Exposure Measurement
Turbidity measurements for the purpose of monitoring filtration performance are generally taken using continuously operating, automated turbidimeters at each individual filter [as per U.S. EPA guidance (U.S. EPA 2004)]. The measurement of turbidity using automated turbidimeters is known to be imprecise, particularly at low levels (Burlingame et al. 1998). However, averaging of multiple measurements per day for use in time-series regression would result in a summary measure with lower error than an individual measurement. Some studies used longer averaging periods than 1 d, such as Morris et al. (1996) who examined Milwaukee water 2-wk average turbidity (lagged by 1 wk) in relation to 2-wk AGI counts, or Tinker et al. (2010) who analyzed Atlanta water turbidity as a 3 d moving average across lags. Although the error of individual measurements is greatly decreased with such an approach, the loss of variation by averaging over a longer time period may result in dilution of any true association. Two studies relied on a single turbidity measurement for their defined daily exposure-the daily maximum (Bateson 2001)-and these single measurements are more subject to error than averages. The error and resultant misclassification of turbidity measurements would occur nondifferentially with respect to the AGI outcome (when considering a linear relationship), but with left-truncation of low-level turbidity values at 0, it is unclear how such measurement error might bias the results. Evaluation of both short-term averages (such as daily) and single measurements (such as maximum or minimum) of turbidity may be a reasonable approach to fitting the most predictive model in a study, when coupled with evaluation of the influence of the averaging period and extreme outliers. People consume water outside of their home, and potentially outside their service area, and also consume bottled water. Individual use patterns may result in nonrepresentativeness of one regional turbidity value for exposure of individual cases; this is an inherent weakness given the ecological nature of data used in the time-series design. The lack of individual-level information on nontap water consumption and resultant exposure misclassification detracts from evaluation of the etiologic association between microbiological quality of a drinking-water supply (as indicated by turbidity as a proxy measure) and AGI incidence. However, it should be noted that the lack of information does not detract from the results of these studies for estimation of the population-wide AGI risk associated with regional drinkingwater quality, given real patterns of use, the interpretation most relevant for management of a specific water system.
Because the utility of turbidity as an exposure proxy for microbiological contamination may be context specific, epidemiological research would be enhanced by combining turbidity with additional seasonal and climatic variables, water quality, and treatment measures to optimize predictive models of AGI incidence. Indeed, the association between turbidity and AGI differed by season in the only studies that reported this stratification (Bateson 2001;Hsieh et al. 2015), and studies that evaluated multiple water indices found significant interactions of turbidity with factors such as streamflow, water temperature, and systems operations in their associations with AGI (Beaudeau et al. 2012(Beaudeau et al. , 2014b. Climatic variables considered as confounders in the studies, such as season and air temperature, could theoretically determine the conditions in which turbidity is most correlated with microbiological contamination, and as such these variables should first be evaluated as effect modifiers, and secondly as confounders. Two studies suggest that the association between turbidity and AGI is more significant for source (raw) water than finished (filtered) water (Tinker et al. 2010;Beaudeau et al. 2012). This might seem counterintuitive as finished water would be expected to be more representative of human exposure than raw water (except given exposure by recreational contact). The findings may be spurious or they may be confounded by recreational contact with surface waters, but they may alternatively be interpreted as indicating that turbidity of raw water may be more strongly correlated with microbiological contamination than the turbidity of filtered water. One would expect raw water pathogen loads to vary greatly, even over orders of magnitude, which should exceed the variability following an engineered and tightly monitored process such as water filtration. Again, with the goal of elucidating the specific conditions under which an association exists, raw water turbidity should be routinely evaluated, in addition to, and in combination with finished water turbidity.

Outcome Assessment
Nondifferential misclassification of AGI is certainly present in the studies. Many of the studies identified cases based on a broad grouping of diagnosis codes for infectious gastroenteritis and also included a catch-all code for noninfectious AGI, as well as symptom codes. The grouping of specific diagnoses into broad categories (such as all infectious gastroenteritis including ICD-9 codes 001-009) results in less misclassification for the group, overall, than for specific diagnoses. The most representative set of diagnosis codes for waterborne AGI has not been determined; however, studies that evaluated the sensitivity of study results to variations in case definition found little impact on their findings (Schwartz et al. 1997;Tinker et al. 2010). More extensive evaluation of the sensitivity of results to various case definitions would be prudent, given the expected misclassification in diagnosis codes.
The AGI outcome definitions are also subject to undercounting, resulting from lack of sensitivity of the data sources used. For example, although Medicare claims can be expected to be fairly complete in capturing hospital admissions for persons ≥65 y in the United States, hospitalizations are not a complete representation of all incident AGI cases. Other sources of data, such as prescriptions for AGI-related drugs or calls to a health information line are also likely to represent only a fraction of all AGI cases on any given day. Undercounting was reflected in the Schwartz et al. (1997) study in Philadelphia, in which AGI cases in children were identified through emergency department visits to one hospital, and rates differed considerably between the water treatment service area located closest to the hospital (18.9 per 1,000 person-years) and the service area located farthest away (1.5 per 1,000). Such undercounting of AGI, likely common to all the studies, reduces the study power but will not cause bias, as long as the variation in counts over time is representative of the variation in the underlying true counts (i.e., day-to-day increases or decreases in the number of AGI-related hospitalizations are representative of day-to-day changes in the true underlying number of AGI cases). This assumption may be reasonable unless turbidity is related to the severity of AGI symptoms. In this situation, AGI identified from data sources capturing more severe cases (e.g., hospitalizations, emergency department visits) may be disproportionately represented. This suggests that studies should explore effect modification by the sources of data used for AGI identification, in order to detect associations that may differ by case severity.

Confounding
It is difficult to conceive of a variable not examined in the studies that could plausibly confound the relationship between turbidity and AGI incidence to the extent that it would produce a false positive association in a consistent manner between the many study regions and time periods examined. Season is a major factor of concern for confounding, as AGI exhibits seasonal cycles that have nothing to do with microbiological water quality, such as from low air exchange rates in winter that make indoor transmission of pathogens more common, and season (and related climatic factors) is also highly correlated with source-water turbidity. Most studies used flexible nonlinear terms to estimate seasonal trends and potential confounding effects of air temperature or other climatic variables, and several studies described adequate adjustment based on independence of the residuals over time Schwartz et al. 2000;Tinker et al. 2010). In other studies there was little information reported on the model fit with respect to the covariates. Because the studies in Vancouver and Edmonton, Canada (Aramini 2002;Lim et al. 2002) found similar patterns of association of turbidity with AGI in both timeseries and case-control analyses, this suggests that seasonal trends were accounted for in the time-series analyses, and also strongly argues against false positive findings in the Vancouver study. In addition to primary consideration of seasonal and climatic factors as effect modifiers in future analyses (discussed above) and secondarily as confounders, improved reporting of the adequacy of seasonal trends adjustment would enhance causal interpretation of the results.
Multiple studies have reported an association between heavy precipitation and increases in AGI incidence using the time-series study design (Guzman Herrador et al. 2015). Overadjustment is a concern, as precipitation directly affects turbidity levels by causing runoff of pathogens into source waters (thus, precipitation may be a proxy for turbidity). Nevertheless, adjustment for precipitation was inconsequential in the studies that examined it, possibly suggesting that other local factors influence turbidity to a greater extent than precipitation. It seems most prudent to exclude precipitation from models of the turbidity-AGI relationship as it may interfere with accurate estimation of the turbidity effect.
An advantage of the time-series design is its inherent adjustment for potential confounders that do not greatly vary by the time scale used in the analysis (e.g., such as socioeconomic status and neighborhood factors, which do not vary by day); for this reason, individual-level measurements are often not needed. Despite this practical advantage, targeted data collection on potential confounders would be useful to inform the evidence base for causal inference from studies of drinking-water turbidity and AGI incidence. The hybrid design employed by Egorov et al. (2003) in Russia illustrates collection and examination of individual-level information (recreational water contact, consumption of nonboiled tap water, out-of-town trips) for inclusion in a time-series analysis; however, the small size of the study (367 participants), which allowed for data collection, also limits study power and is likely not a feasible strategy in regions with lower rates of AGI and more pristine drinking-water conditions. Rather than individual-level data collection from each person in a study, efforts could instead focus on data collection from a representative sample of persons in the region to obtain information on the variation of potential confounders over time. For example, recreational contact with source waters is known to pose an increased risk of AGI (Sunger and Haas 2015). Such contact would vary by season, and could thus confound the association between drinking-water turbidity and AGI. The sheer number of persons exposed through drinking water greatly overwhelms the number of persons exposed through recreational contact, which argues against the potential for unmeasured recreational water contact to cause a spurious association between turbidity and AGI. Nevertheless, studies would benefit from collection of additional information on recreational water contact by regional residents, such as the frequency and variability of contact over time-by season, day-of-the week, and holidays. This distributional information would allow for application of probabilistic bias analysis methods to place bounds on the possible bias from the (unmeasured) confounder on the risk estimates for the main effect of interest (Lash et al. 2009).

Evaluation of Multiple Lags
All of the studies evaluated multiple lag times to test multiple hypotheses regarding the latency between exposure and AGI. The lags represent the elapsed time from the point of sampling (usually as effluent from the water utility) until presentation of an AGI case to the health system, theoretically incorporating the time before water reaches the consumer's tap, incubation periods for common waterborne microbiological infections, and delay in seeking medical care for AGI. Given the ecological nature of the time-series study design, the lag time represents the average lag among AGI cases within the region studied. Testing multiple lags is admittedly exploratory, but necessary given that different pathogens have different incubation periods and an investigator typically does not have knowledge at the outset about the pathogens of concern within a particular water system. Many of the studies found the most significant or prominent associations with lag times of 6-10 d. Lag times from 6 to 10 d may be consistent with latency periods for common waterborne AGI, such as viruses, giardiasis, and cryptosporidiosis, but this depends on the amount of time water spends in each of the distribution systems studied. The study of the Milwaukee outbreak, with known contamination by Cryptosporidium, found prominent lags between 6 and 10 d, as well as longer lags of 13-16 d, which the authors speculated may reflect secondary transmission of the initial waterborne outbreak (Morris et al. 1998;Naumova et al. 2003). Differing lag times between studies may suggest different waterborne pathogens or distribution system residence times among the study regions; however, inference regarding specific pathogens will ultimately rely on identification of specific agents in water samples.
In this type of analysis in which associations are explored across multiple lags, the pattern of association may be more important than the statistical significance of individual associations. The most satisfying presentations of multiple lags tested showed the entire range of results for the association between turbidity and AGI at all lag times examined, in tables or figures (Aramini 2002;Beaudeau et al. 2012;Egorov et al. 2003;Gilbert et al. 2006;Hsieh et al. 2015;Lim et al. 2002;Morris et al. 1998). This comprehensive presentation allows the reader to assess whether associations represent generalized patterns of increasing risk across multiple lags or appear as single associations showing no pattern with consecutive lags. A separate issue arises from summarizing results over multiple lags examined, as from a distributed lag model, which may dilute any association limited to a few consecutive days. Comparison of lag times between different studies would be optimized by consistency in presentation of all results, such as for single-day lags over a period of at least 2 wk spanning lags from 6 to 10 d.

Modeling Strategies
With some exceptions (Beaudeau et al. 2014a;Hsieh et al. 2015;Schwartz et al. 2000), plots of the raw data and its relationship to the fitted model, or residual diagnostic plots or other diagnostic information were not shown, making it difficult for the reader to ascertain whether the model is a good fit. Authors rarely described formal testing of the Poisson distributional assumption that the sample variance equals its mean, which may be violated when spikes occur in AGI counts, although multiple authors employed alternate models equipped to handle overdispersed count data. The linearity of the relationship between AGI and lagged turbidity is another assumption that was not explicitly justified in most of the studies, and evidence for a nonlinear association from several studies underscores the need for evaluation of the suitability of a linear fit to the relationship within a particular water supply. Additionally, in most studies, there was little information reported on the model fit with respect to the covariates. Imperfect model fit with respect to turbidity or the covariates does not necessarily undermine basic conclusions about the existence of a relationship between AGI and lagged turbidity, but may impact estimates of this relationship, for example by introducing some bias into parameter estimates or by invalidating estimated standard errors and thus the stated level of significance, which can lead to false conclusions either away from or toward the null. Improved reporting of distributional assumptions and model fit would allow greater confidence in interpretation of results. Special attention should be paid in the studies to the fit of models with respect to extreme values (spikes) of either turbidity or AGI.

Future Directions
Moving forward, it may be useful to conduct additional studies of turbidity in relation to AGI to screen drinking-water supplies for source-water quality and treatment effectiveness. Time-series studies are recommended as a reasonable first-step for epidemiological evaluation of a water supply; although the time-series design has weaknesses that limit causal inference, such studies are relatively inexpensive to conduct, given ongoing generation of turbidity (and other water quality) measurements by water systems and increasing availability of electronic health records. These studies may help identify regions, seasons, and sourcewater conditions of potential concern, which could then guide more targeted research and data collection to help explain those high-risk conditions. Targeted data collection would be useful to inform the evidence base for causal inference from time-series studies of drinking-water turbidity and AGI incidence, such as data to better understand water use or confounder distributions over time, or through water sampling for pathogens. Sampling efforts to describe the relationship between turbidity and pathogens across regions under various conditions (different sourcewaters, climatic conditions, and treatment approaches) may add valuable information about the specific contexts in which turbidity is most useful as a proxy for microbiological contamination. Ultimately, the collective results from a variety of preliminary studies may help to effectively allocate funding for more extensive studies (e.g., randomized trials) to regions most likely to benefit from the information.
There are important gaps in the conditions that have been studied. Cities with pristine source-waters that employ filtration have not been well studied. This may reflect a tendency among researchers to undertake, and among funding agencies to sponsor studies in areas in which positive associations are likely to be found. This approach is reasonable initially when the existence of any sort of association would be unclear, but as a substantial number of studies show positive associations, the scope of future studies should be broadened so that the conditions under which such associations exist can be better demarcated. Following this line of reasoning, a priority for future study is the evaluation of systems employing enhanced disinfection, such as UV. An association between turbidity and AGI has been found in chlorinated and ozonated supplies; the association might not be present when UV disinfection is used, given that this form of disinfection is effective even against highly resistant pathogens such as Cryptosporidium oocysts (Nasser 2016). Three of the treatment plants studied in Atlanta (Tinker et al. 2010) employed UV disinfection, but the paper did not list the type of treatment specific to each water plant. New York City and Boston are particularly attractive locations for studies of UV disinfection, as the method has been recently adopted in those regions, and previous studies found an association between turbidity and AGI (Beaudeau et al. 2014a, Hsieh et al. 2015. Hence, a subsequent study could assess whether the implementation of UV disinfection has removed the observed association between turbidity and illness. Studies at sites using other advanced disinfection methods might be similarly valuable. Multi-region investigations using unified methods would be an efficient approach to further our understanding of the relationship, particularly given contrasts of interest between study regions (such as in turbidity levels or treatment methods).
The studies we reviewed were generally designed to evaluate AGI risks in relation to drinking-water turbidity as a proxy for microbiological contamination of source-water that is not fully eliminated by treatment. These studies (with the exception of Hsieh et al. 2015) did not evaluate AGI risk caused by microbiological contamination introduced within the distribution system (such as through main breaks, intrusions, and biofilms).
Distribution system contamination has been implicated in waterborne disease outbreaks (Ford 2016), and furthermore may be a cause of endemic (nonoutbreak) waterborne AGI, as suggested by an epidemiological study linking longer water residence time in the distribution system with increased AGI incidence in Atlanta (Tinker et al. 2009). In future research, it may be useful to partition out health risks between various inputs to the water system (i.e., potential points of contamination), in order to understand more fully where to target interventions. Hsieh et al. (2015) addressed the question by simultaneously modeling both distribution system turbidity and source-water turbidity in relation to AGI. The association with distribution turbidity was almost completely explained by source-water turbidity (Hsieh et al. 2015), suggesting that the fraction of AGI associated with turbidity was not caused by contamination introduced within the distribution system. Likewise, Beaudeau et al. (2014b) found significantly increased risk associated with turbidity of finished effluent, with simultaneous adjustment for pipe break service interventions (which itself had a small, but not statistically significant relationship with AGI). Additional studies with multivariable evaluation of source-water/effluent turbidity in addition to indicators of potential contamination within the distribution system may shed light on the relative importance of contamination inputs from various points in the water system.
Our understanding of the relationship between drinking-water turbidity and AGI incidence would be improved through consistency of analysis and reporting in future studies. Greater consistency will more readily allow direct comparison of results across regions, water systems, and time periods, as well as enable metaanalyses to quantitatively summarize the effect and evaluate factors contributing to heterogeneity of effect. To enhance consistency, we suggest evaluation of daily average turbidity (rather than longer averaging periods or alternate turbidity metrics) and evaluation of daily lags with presentation of results for all lag times examined. Although authors may determine that longer exposure or lag periods lead to the best-fitted model in their study, presentation of the results for daily average turbidity and daily lag times in a supplement or by request would allow comparison of results, as stated. We also suggest fitting both linear and nonlinear associations between turbidity and AGI incidence, with presentation of the linear association, where appropriate, even if limited to certain levels of turbidity or one season. The nonlinear association across multiple lags is effectively summarized using TERS plots; however, providing risk estimates (and confidence intervals) for the contrast between defined turbidity levels at key lag times is needed to allow accurate comparison of the magnitude of effect across studies. Quantitative comparison (and summarization) of results across studies may add to the weight of evidence for a causal interpretation of waterborne AGI from drinking water in the regions and time periods studied, and could ultimately lead to more accurate estimation of the total risk (attributable risk) through this exposure pathway.

Conclusions
In summary, multiple time-series studies have observed an association between turbidity of drinking-water supplies and risk of AGI. Associations have been observed in unfiltered and filtered water systems and at levels of relatively high and low turbidity. The positive studies suggest an underlying risk of waterborne AGI during the time periods at which the systems were studied. Nevertheless, inconsistencies between the studies indicate that the utility of turbidity as a proxy for microbiological contamination may be context specific. The body of work demonstrates the efficacy of studies correlating turbidity with AGI counts in time-series for preliminary investigation of the safety of water supplies. The context-specific nature of the association between drinking-water turbidity and AGI suggests that future research will be most effective if strategized towards elucidating specific-rather than generalized associations; for example, through exploration of effect modification by seasonal variables, stream conditions, and water treatment. Time-series results supplemented by targeted data collection to help determine whether there is indeed a causal link may be useful as a means to assess the effectiveness of utilities in managing various conditions posing increased risk for exposure to microbiological agents of AGI.