Within-Day, Between-Day, and Between-Week Variability of Urinary Concentrations of Phenol Biomarkers in Pregnant Women

Background: Toxicology studies have shown adverse effects of developmental exposure to industrial phenols. Evaluation in humans is challenged by potentially marked within-subject variability of phenol biomarkers in pregnant women, which is poorly characterized. Objectives: We aimed to characterize within-day, between-day, and between-week variability of phenol urinary biomarker concentrations during pregnancy. Methods: In eight French pregnant women, we collected all urine voids over a 1-wk period (average, 60 samples per week per woman) at three occasions (15±2, 24±2, and 32±1 gestational weeks) in 2012–2013. Aliquots of each day and of the whole week were pooled within-subject. We assayed concentrations of 10 phenols in these pools, and, for two women, in all spot (unpooled) samples collected during a 1-wk period. We characterized variability using intraclass correlation coefficients (ICCs) with spot samples (within-day variability), daily pools (between-day variability), and weekly pools (between-week variability). Results: For most biomarkers, the within-day variability was high (ICCs between 0.03 and 0.50). The between-day variability, based on samples pooled within each day, was much lower, with ICCs >0.60 except for bisphenol S (0.14, 95% confidence interval [CI]: 0.00, 0.39). The between-week variability differed between compounds, with triclosan and bisphenol S having the lowest ICCs (<0.3) and 2,5-dichlorophenol the highest (ICC >0.9). Conclusion: During pregnancy, phenol biomarkers showed a strong within-day variability, while the variability between days of a given week was more limited. One biospecimen is not enough to efficiently characterize exposure; collecting biospecimens during a single week may be enough to represent well the whole pregnancy exposure for some but not all phenols. https://doi.org/10.1289/EHP1994


Introduction
Phenols include high-production-volume chemicals with widespread uses in daily life products. For example, bisphenols are employed in the manufacture of epoxy resins and certain polymer plastics used in food and beverage containers and in other consumer products (Chen et al. 2016;INERIS 2014;NTP 2008). Parabens are used as preservatives in cosmetics, food, beverages, and pharmaceuticals; benzophenone-3, an ultraviolet-filter, is used in plastics and cosmetics; triclosan is used for its antibacterial properties in personal care products, clothing, or kitchenware (Cosmetic Ingredient Review Expert Panel 2008;Krause et al. 2012;NLM 2016); 2,4-dichlorophenol is used in the production of certain pesticides; and 2,5-dichlorophenol is a major metabolite of 1,4-dichlorobenzene, which is used in moth balls and room deodorizers (Crinnion 2010;HSDB 2016).
Concern exists regarding the health effects of phenols, which are potential endocrine disruptors, particularly following exposure during fetal life (Braun 2017). In terms of study design, most biomarker-based studies in humans relied on biomarker concentrations assessed in very few (one to three) spot biospecimens per pregnant woman. For chemicals with strong within-subject temporal variations, relying on a small number of biospecimens is expected to imperfectly characterize the average exposure (e.g., over a day, a week or longer), to lead to exposure misclassification, and consequently bias dose-response functions (Carroll et al. 2006;Perrier et al. 2016). The biological half-life of phenols in pregnant women is not known and could strongly differ from that of nonpregnant women, for example, as is the case for urinary biomarkers of tobacco smoke exposure such as cotinine, which has been found to have about twice as fast elimination half-life during pregnancy compared with postpartum (Dempsey et al. 2002). Studies based on nonpregnant adults reported a short (<12-h) half-life for some phenols (Janjua et al. 2007;Sandborgh-Englund et al. 2006;Völkel et al. 2002). Consequently, the relevance of relying on one spot biospecimen to provide a proxy of exposure for time windows of one day or longer is probably limited. This issue is of importance given the expected impact of exposure misclassification on bias in doseresponse functions relating biomarker levels to health parameters (Perrier et al. 2016). Several studies evaluated the reproducibility of urinary phenol concentrations during pregnancy (Bertelsen et al. 2014;Braun et al. 2011Braun et al. , 2012Guidry et al. 2015;Jusko et al. 2014;Meeker et al. 2013;Philippat et al. 2013;Smith et al. 2012;Stacy et al. 2017). These studies relied on generally two or three spot biospecimens collected from each pregnant woman several weeks or months apart. Such a design did not allow characterizing the within-day or the within-week variability in biomarkers concentrations. Based on complete urine collections throughout several days in eight nonpregnant participants, two studies reported high within-subject and between-day variability for bisphenol A Ye et al. 2011), whereas this variability was relatively small for some parabens, triclosan, and benzophenone-3 . High within-day variability of bisphenol A concentrations was also reported in 66 pregnant women with complete urine collection during one or two days (Fisher et al. 2015). In the context of pregnancy, estimations of the within-subject variability of phenols other than bisphenol A are lacking.
Accurate description of the variability of phenol urinary concentrations during pregnancy is crucial for adopting biospecimens sampling strategies that limits exposure misclassification in etiological studies. Our aim was consequently to characterize the withinday, between-day (within a week), and between-week variability of urinary concentrations of 10 phenols in pregnant women.

Study Participants
This study relied on a subgroup of the feasibility study conducted between July 2012 and July 2013 in the planning of the SEPAGES cohort (Suivi de l'Exposition à la Pollution Atmosphérique durant la Grossesse et Effets sur la Santé [Assessment of Air Pollution Exposure during Pregnancy and Effects on Health]). In this feasibility study, 40 women with a singleton pregnancy and living in the Grenoble urban area (France) were recruited from private obstetrical practices, before 17 gestational weeks (calculated from the date of the last menstrual period). The exclusion criteria included inability to write or speak French, being <18 y of age, planning to give birth outside of one of the four maternity hospitals of the Grenoble urban area, and not being enrolled in the French social security system. All participating women and their partners provided written informed consent for themselves and their offspring for biological measurements and data collection (Ouidir et al. 2015). The SEPAGES feasibility cohort was approved by the appropriate ethical committees [CPP (Comité de Protection des Personnes Sud-Est); CNIL (Commission Nationale de l'Informatique et des Libertés); CCTIRS (Comité Consultatif sur le Traitement de l'Information en matière de Recherche dans le domaine de la Santé); ANSM (Agence Nationale de sécurité du Médicament et des produits de santé)]. The involvement of the Centers for Disease Control and Prevention (CDC) laboratory did not constitute engagement in human subject research.

Study Design and Urine Collection
The urine collection protocol is described in Figure 1. Urine collection took place during 7 consecutive days at three periods of pregnancy (first collection week, median: 13 gestational weeks, minimum-maximum [min-max]: 10-18 gestational weeks; second collection week, median: 23, min-max: 21-26; and third collection week, median: 32, min-max: 29-33; Table 1). Thirty of the 40 women participating in the SEPAGES feasibility study were asked to collect ∼ 60 mL of each urine void, and to report in daily diaries micturition time for collected and missing voids. The remaining 10 participating women were asked to collect ∼ 60 mL of only three urine voids per day, and were therefore not considered in the present study. The women collected the urine in polypropylene containers and stored it in a refrigerator (4 C ½39 F) in their home. When they were not at home, collected urine was stored in a cooler with ice packs. Specimens were retrieved two or three times a week by the study staff and brought in coolers to the Inserm research center [Institute for Advanced Biosciences (IAB), Grenoble, France]. Each sample was aliquoted into 2-mL polypropylene cryovials (up to five vials per sample) and frozen at −80 C ð − 112 FÞ until undergoing the pooling procedure or shipping for analysis. Because of cost constraints, we quantified only phenol biomarkers in the subgroup of eight women with the smallest rate of missed voids. Among these women, two had managed to collect a sample of each of their urine voids (no missing void, subgroup 1), while the other six women collected more than 95% of their weekly urine voids (subgroup 2).

Pooling Procedure
We thawed at 4 C ð39 FÞ and vortexed aliquots in polypropylene containers and pooled them according to the protocol detailed in Figure 1. For each woman and each study day, we took equal volumes of urine from all samples of the day and combined them within-woman, leading to seven within-subject daily pools for a 1-wk period (days 1 to 7). For each subject, we then prepared three weekly pools by combining an equal volume of the seven daily pools from each collection week (weeks 1, 2, and 3).
Immediately after preparation, pooled samples were placed in 2-mL polypropylene cryovials and frozen at −80 C ð − 112 FÞ. The pools and all aliquots from spot samples to be analyzed were kept frozen until shipment on dry ice to the CDC laboratory in Atlanta (Georgia, USA). At the CDC laboratory, all urine samples were stored at or below −70 C ð − 94 FÞ until analysis. Figure 1. Urine collection, pooling procedure, and biomarker assays in all the study population (N = 8 women) and in the nested subgroup 1 (N = 2 women).

Phenols, Creatinine, and Specific Gravity Measurements
The total urinary concentrations of 2,4-and 2,5-dichlorophenols, benzophenone-3, bisphenol A, bisphenol S, triclosan, and butyl, methyl, ethyl, and propylparabens were quantified at the CDC using a modification of an online solid-phase extraction highperformance liquid chromatography-isotope dilution-tandem mass spectrometry method (Zhou et al. 2014). Limits of detection (LODs) are listed in Table 2. The coefficients of variation of quality control measurements ranged, according to compounds and concentration ranges, between 3.4% and 14.7%. It was higher for methylparaben, bisphenol A, benzophenone-3, and triclosan, for which the range was 5.8-14.7% and lower for the other compounds (3.4-6.4%). Moreover, the team in charge of urine collection added eight replicates to the samples assayed for phenols, in a way that was blinded to the lab. Correlation coefficients between biomarker concentrations in samples and their replicates ranged from 0.95 (bisphenols A and S) to 1.00 (2,4-dichlorophenol, ethyl, propyl and butylparabens, and triclosan). Two urine dilution markers were also quantified in the same samples: urinary creatinine measured at the CDC using a Roche/Hitachi MODULAR ANALYTICS Urine Work Area (SWA) P (photometric analysis) module (Roche Diagnostics); and urinary specific gravity, measured at room temperature using a handheld Atago PAL 10-S refractometer (Atago) at the Inserm Grenoble laboratory.
We analyzed a total of 216 samples (136 spot samples, 8 × 7 = 56 daily pools, and 8 × 3 = 24 weekly pools; Figure 1) for phenol biomarkers, creatinine, and specific gravity. For all women, we analyzed the seven daily pools of the first study week (56 daily pools) and the three weekly pools (24 weekly pools). Additionally, for the two women in subgroup 1, we also analyzed all spot samples of week 1 (total, 114 samples) Week 1 of urine collection 15:0 ± 1:9 1 2 :9 ± 1:5 0.01 Week 2 of urine collection 24:0 ± 1:6 2 3 :3 ± 1:4 0.21 Week 3 of urine collection 32:3 ± 0:7 3 1 :8 ± 0:9 0.16 Time between two weeks of urine collection (wk) Week 1-Week 2 9:0 ± 1:9 1 0 :5  and 2 spot samples randomly selected among those from the two other collection weeks (1 random spot sample in each collection week). Finally for each of the six participants in subgroup 2, we analyzed 3 spot samples randomly selected among all the samples from the three collection weeks (one in each week of collection; Figure 1), so that, for all eight women, we could assay 1 random spot sample for each of the three collection weeks.

Statistical Analyses
Concentrations below the LOD were replaced by instrumental readings, or by the compound-specific lowest non-zero instrumental reading divided by the square root of 2 when the instrumental reading was zero. We log 10 -transformed the urinary concentrations of phenol biomarkers to achieve approximate normality in the distributions. Correlations of biomarker concentrations between types of sample (unpooled samples, daily and weekly pools) for a given biomarker and between biomarker concentrations for a given type of sample were calculated using Spearman correlation coefficients.
Our assessment of variability relied on intraclass correlation coefficients (ICCs) estimated using one-way random-effect ANOVA models. The approach was identical for all ICCs estimations. ICCs close to zero indicate poor reproducibility of a concentration within the considered period, whereas values close to one indicate high reproducibility. In the case of a negative ICC estimate (which can happen with ANOVA models; Wang et al. 1992), we considered the ICC not to be computable and only reported the 95% confidence interval (CI), truncating its lower bound to zero.
To characterize the within-day variability, we defined ICC as the ratio of the between-day variance to the total variance (sum of within-and between-day variances). We relied on women of subgroup 1 (a total of 114 samples, collected during the first collection week for the two women). The woman-and compound-specific weekly mean was subtracted from the spot concentrations to correct for the between-subject variability before estimating ICCs representing the within-day variability.
We assessed the within-week (between-day) and betweenweek variability with ICCs calculated as the ratio of the betweensubject variance to the total variance (sum of within-and between-subject variance). For the between-day variability, we ran models based on all 56 daily pools of the study week 1 (eight women, each with seven daily pools). For the between-week variability, we used all 24 weekly pools (eight women, each with three weekly pools).
To allow comparisons with previous studies that relied on two to three spot samples collected during pregnancy, we additionally estimated ICCs based on the three random spot samples collected during pregnancy (each sample being randomly selected in each collection week for the eight women). To assess the robustness of the findings to the statistical methods, we also computed ICCs using random intercept linear mixed models (maximum likelihood estimates) instead of ANOVA models. To assess the potential impact of urinary dilution on variance estimates for phenol biomarkers, the ANOVA analyses were repeated using phenol biomarker concentrations corrected for creatinine (ratio of the phenol biomarker concentrations to the creatinine concentration in the same sample) and for specific gravity using a formula previously described (Philippat et al. 2013), or by including creatinine concentration or specific gravity as a covariate in the random intercept linear mixed models. Data were analyzed using STATA (version 12.1; Stata Corporation).

Study Population and Samples
At enrollment, women were 29.6 y of age on average [standard deviation (SD): 3.8]; most of them were primiparous (63%) and did not smoke during pregnancy (88%) and all of them had college education (Table 1). Women collected from 3 to 15 urine samples per day (total from 132 to 240 samples per woman). Detection frequencies were generally between 79% (triclosan) and 100% (methyl and propylparabens), except for benzophenone-3, which was only detected in 35% of the samples. The highest (betweencompound) coefficients of correlation were observed between structurally similar compounds (e.g., between the two dichlorophenols and between the four parabens), and between creatinine and specific gravity (see Table S1), regardless of the type of sample (unpooled, daily, or weekly pools). Correlation between creatinine and urinary concentrations of phenol biomarkers ranged from −0:06 (with propylparaben) to 0.83 (2,4-dichlorophenol) in spot samples of subgroup 1, from −0:28 (triclosan) to 0.61 (bisphenol A) in daily pools of all studied women, and from −0:15 (triclosan) to 0.56 (2,4-dichlorophenol) in weekly pools.

Within-Day Variability
For the spot samples, urinary concentrations of most phenols and of creatinine varied within-woman by several orders of magnitude throughout the first collection week and within a day ( Figure  2). For all biomarkers including creatinine and specific gravity, ICCs were low to moderate (Table 2), with the highest ICC observed for bisphenol S (0.50; 95% CI: 0.26, 0.73), and the lowest for ethylparaben (0.03; 95% CI: 0.00, 0.15) and specific gravity (0.03; 95% CI: 0.00, 0.15). Creatinine or specific gravity standardization, respectively, slightly increased (by 0.02 to 0.07) ICCs for 5 and 9 compounds out of 10 (see Table S2). Using linear mixed models instead of ANOVA methods to estimate ICCs (see Tables S3 and S4) led to similar results.

Between-Week Variability Based on Weekly Pools
On average, 9.0 wk (SD: 1.9) elapsed between collection weeks 1 and 2, and 8.3 wk (SD: 1.2) between collection weeks 2 and 3 (range, 5.6-12.0 wk). For 7 of the 10 phenol biomarkers, detection frequencies in weekly pools were similar to those in daily pools (>79% ; Table 4), whereas they were somewhat higher in weekly pools for butyl-and ethylparabens (88% and 79%, compared with 80% and 70%, respectively, in daily pools), and lower for benzophenone-3 (38% vs. 50% in daily pools). Between two study weeks, concentrations of almost all phenols varied by several orders of magnitude for some women (Figure 4). ICCs for 2,5-dichlorophenol; butyl-, methyl-, and propylparabens; and creatinine were >0:8, while they were <0:6 for the other biomarkers (Table 4, Figure 5), and lowest for bisphenol S (0.26; 95% CI: 0.00, 0.73). ICC for triclosan could not be computed, but was probably in the low range (95% CI: 0.00, 0.44). ICCs slightly decreased (by 0.01 to 0.2) for most compounds when using creatinine-or specific gravity-corrected concentrations, but stayed within ranges of the uncorrected confidence intervals (see Table  S5). Adjusting for creatinine or specific gravity did not change the results (see Tables S3 and S6).

Discussion
To our knowledge, this study is the first to evaluate the withinday, between-day, and between-week variability of 10 phenol biomarkers as well as of creatinine and specific gravity in pregnant women. Most compounds showed very high variability over the course of a day (ICCs generally <0:3), while the between-day variability of the daily averages over the course of a week was much lower. This pattern was opposite for bisphenol S, which had a stronger between-day than within-day variability. The variability of the weekly averages considered several weeks apart exhibited more contrasted patterns across compounds, with low between-week variability for some compounds (2,5-dichlorophenol, butylparaben, propylparaben, methylparaben) and a high variability for others (ethylparaben, bisphenol S, triclosan). Urinary dilution or creatinine levels did not explain much of the observed within-subject variability in phenol biomarkers.

Strengths and Limitations
A key strength of our study is the reliance on pregnant women who agreed to collect samples from each micturition over three weeks. This study considered a large number of phenols, including bisphenol S and others for which the literature is rather sparse. Also, contrary to previous studies, our design allowed characterization of the temporal variability of phenol biomarkers over several time windows during pregnancy, and in particular within the day and between the days of a week. From eight blinded samples analyzed in duplicate, we observed very high (>0:95) correlation between two analyses of the same urine sample for all biomarkers, making it very unlikely that the reported ICCs were strongly influenced by analytical error. A limitation relates to the fact that our estimate of the within-day variability relied on samples collected by only two women, contrary to the estimates of the between-day and between-week variability, which relied on eight women. Volumes of urine voids were not collected, preventing us from calculating the excretion rate. Caution is required in interpreting estimates for benzophenone-3, which was the compound with the lowest detection frequency in all samples (27% to 54%), and in interpreting the results related to the analyses of the weekly pools, given the large confidence intervals. Even though we restricted our study population to women with only a few missed voids (<5%), pools were created using all available urine samples and these missing voids may be a source of error. Given that the present study was restricted to a specific population and specific chemicals, generalization of our results to other populations or other compounds should be considered with great caution.

Study Population
Our study relied on a small population of women who agreed to collect repeated urine samples for several weeks. It was not meant to be representative of the general population or of all pregnant women from France or even the Grenoble area. Among those approached to participate, women with a high education level or interest in environmental or health issues were more Table 3. Between-day variabilitydescriptive statistics of the non-transformed biomarker concentrations (lg=L) for the within-woman daily pooled samples from subgroup 1 and subgroup 2 (8 women, n = 56 daily pools, one daily pool for each day of the first week of collection), and ICCs based on log 10 -transformed phenol biomarker concentrations, creatinine concentration, and specific gravity. Values were not standardized for creatinine or specific gravity.  Figure 3. Between-day (within a week) variability of pooled daily samplesurinary concentrations of 10 phenols (lg=L), creatinine concentration (mg/dL), and specific gravity in log 10 -scale in the within-woman daily pooled samples from subgroup 1 and subgroup 2 (8 women, n = 56 daily pools, one daily pool for each day of the first week of collection). Note that to facilitate visualization, each biomarker is displayed on a specific scale.
likely to participate. Consequently, the behaviors (use of personal care products, diet, etc.) of our population are unlikely to represent those of all pregnant women. One might anticipate that this possibly led to an underestimation of the between-woman variability in urinary concentrations for some of the considered compounds, although overrepresenting women using few personal care products (as may be the case for some highly educated women aware of the health concerns regarding the use of health care products during pregnancy) or with a diet low in industrial phenols may also have led to underestimating the within-subject variability. Other factors such as physical activity, which might influence the toxicokinetics of xenobiotics (Persky et al. 2003), may also have differed in our population. Most phenol urinary concentrations were lower than those reported in previous cohorts of pregnant women (Guidry et al. 2015;Meeker et al. 2013;Philippat et al. 2013;Smith et al. 2012) and in 1,230 U.S. nonpregnant women in (CDC 2015. In addition to differences in behaviors, composition of consumers' products, regulation of chemicals in each country, and analytical methods across laboratories performing the assays may also have differed across studies.

Variability over the Course of Pregnancy
Our analyses based on three random spot samples were meant to describe the ability of a simple sampling approach to capture the whole pregnancy exposure, and to allow comparison with previous studies, which relied on up to three spot samples per participant to assess biomarker concentration variability during pregnancy (Bertelsen et al. 2014;Braun et al. 2011Braun et al. , 2012Guidry et al. 2015;Jusko et al. 2014;Meeker et al. 2013;Philippat et al. 2013;Smith et al. 2012;Stacy et al. 2017). None of these studies investigated bisphenol S. The moderate ICCs (between 0.4 and 0.5) observed for 2,4-dichlorophenol and butyl-and ethylparaben urinary concentrations were consistent with previous reports (Guidry et al. 2015;Meeker et al. 2013;Philippat et al. 2013;Smith et al. 2012), while we observed greater ICCs for 2,5dichlorophenol and methylparaben (>0:8 compared with 0.4-0.5 in these previous studies). Also, considering their rather large confidence intervals, ICCs for bisphenol A (0.4; 95% CI: 0.0, 0.8) and propylparaben (0.7; 95% CI: 0.4, 1.0) were within the range of previous studies ( ∼ 0:3 for bisphenol A and from 0.3 to 0.6 for propylparaben), although at the upper end of the range. In contrast, ICCs for benzophenone-3 (0.3; 95% CI: 0.0, 0.8) and triclosan (0.1; 95% CI: 0.0, 0.6) were at the lower end of the range of previously reported results, which were between 0.3 and 0.6 (Bertelsen et al. 2014;Meeker et al. 2013;Philippat et al. 2013;Stacy et al. 2017). Detection rates in our population were low (<55%) for these two compounds, that might have decreased ICCs compared with studies with higher detection rates, due to more homogeneity between women, and hence, a proportionally larger within-subject variability.
Our study is, to the best of our knowledge, the first to rely on within-subject weekly pools instead of random spot samples to describe the variability of select phenols during pregnancy. Compared with the results based on random spot samples from our study, ICCs based on weekly pools tended to be higher but the overall conclusion was similar than with the three samples approach: variability was contrasted between compounds; it was low for 7 of the 10 phenols (ICCs >0:59), and high (ICCs <0:4), for the other compounds (ethylparaben, bisphenol S, and triclosan).

Between-Day (within-Week) Variability
For most biomarkers, the between-day variability of the urinary concentrations over a week was low (ICCs >0:7 for 8 of the 10 phenols, and ICC of 0.6 for bisphenol A). For bisphenol S, between-day variability was high (0.14; 95% CI: 0.0, 0.39). To the best of our knowledge, no study had relied on within-subject daily pools to investigate the variability of phenol urinary concentrations over several consecutive days. Two previous studies had relied on 24-h simulated urine concentrations (volumeweighted averages of all daily urine voids) in a nonpregnant population of eight males and females (who collected all their complete urine voids and recorded urine volumes) to characterize the between-day variability of bisphenol A over a week ) and of several phenols over four consecutive days ). We observed a somewhat lower variability of bisphenol A urinary concentration in daily pools (0.6; 95% CI: 0.30, 0.89) compared with these previous studies in nonpregnant subjects (ICCs between 0.12 and 0.28), whereas for ethyl-, methyl-, and propylparabens; triclosan; and benzophenone-3, ICCs (between 0.73 and 0.98) were in close agreement with those Table 4. Between-week variabilitydescriptive statistics of the non-transformed biomarker concentrations (lg=L) for the within-woman weekly pooled samples from subgroup 1 and subgroup 2 (8 women, n = 24 weekly pools, one weekly pool for each of the three weeks of collection), and ICCs based on log 10 -transformed phenol biomarker concentrations, creatinine concentration, and specific gravity. Values were not standardized for creatinine or specific gravity.

Percentiles
Between-week ICC (95% CI) a 5th 25th 50th 75th 95th Phenols (  Week 1 Week 2 Week 3 Creatinine (mg/dL)  Figure 4. Between-week variability of weekly samplesurinary concentrations of 10 phenols (lg=L), creatinine concentration (mg/dL), and specific gravity in log 10 -scale in the within-woman weekly pooled samples from subgroup 1 and subgroup 2 (8 women, n = 24 weekly pools, one weekly pool for each of the three weeks of collection). Note that to facilitate visualization, each biomarker is displayed on a specific scale.
reported by Koch et al. (2014) (between 0.71 and 0.99). These findings suggest a good reproducibility of daily averages urinary concentrations over a week for most target phenols but the bisphenols.

Within-Day Variability
Within-day variability was high for all phenol biomarkers (ICCs ≤0:50), showing that a random spot sample collected within a day does not accurately represent the daily average. For bisphenol A (0.21; 95% CI: 0.01, 0.41), this result is in line with findings from Ye et al. (2011), who measured urinary bisphenol A concentration in all spot urine samples collected in a 1-wk period from eight nonpregnant participants (ICCs, 0.12-0.21 for the within-day variability). This is also in agreement with a reported low reproducibility of bisphenol A concentrations in urine samples from 1 day (ICCs between 0.31 and 0.33) in a study in pregnant women who provided all their urine voids during 1 or 2 d (66 women) as well as spot samples at different time points during and after pregnancy (Fisher et al. 2015). The high within-day variations in phenol urinary concentrations are probably related to the (expectedly) very low half-life of phenols in pregnant women, as can be deduced from studies in nonpregnant human adults (Janjua et al. 2007;Sandborgh-Englund et al. 2006;Völkel et al. 2002), and to exposure being episodic, with the main suspected exposure sources being diet (for bisphenols in food containers, for parabens used as preservatives in some industrial food preparation) and personal care products (for parabens, triclosan, benzophenone-3). The fact that the within-day variability was higher than the between-day variability for most compounds could be due to the behaviors driving exposure tending to be similar from one day to another.

Urine Dilution and Within-Subject Pooling
Creatinine and specific gravity are commonly used to adjust for urine dilution in studies relying on urinary biomarkers (Barr et al. 2005;Boeniger et al. 1993). The two markers were strongly correlated (coefficient of correlation, 0.86-0.92) in spot samples and daily and weekly pools. As previously reported for creatinine, Figure 5. Intraclass correlation coefficients (error bars for 95% confidence intervals) for the within-day variability using the unpooled spot samples from subgroup 1 (2 women, n = 114 spot samples collected over the first week of collection, triangle markers), the between-day variability using the within-woman daily pooled samples from subgroup 1 and subgroup 2 (8 women, n = 56 daily pools, one daily pool for each day of the first week of collection, square markers), and the between-week variability in the within-woman weekly pooled samples from subgroup 1 and subgroup 2 (8 women, n = 24 weekly pools, one weekly pool for each of the three weeks of collection, circle markers). For triclosan, we give only the confidence interval truncated to zero due to the negative estimate of between-week ICC. Table 5. Alternative estimate of between-week variability based on three random spot samplesdescriptive statistics of the non-transformed biomarker concentrations (lg=L) for the random spot samples from subgroup 1 and subgroup 2 (8 women, n = 24 random spot samples, one sample in each of the three weeks of collection), and ICCs based on log 10 -transformed phenol biomarker concentrations, creatinine concentration, and specific gravity. Values were not standardized for creatinine or specific gravity. creatinine concentration and specific gravity had high withinsubject variations throughout a 1-wk period (Boeniger et al. 1993). In our study, correcting phenol concentrations by either creatinine or specific gravity did not greatly improve ICCs, suggesting that these parameters do not explain much of the biomarkers' variability. Because the total urine volumes were unknown, we created within-subject daily pools using an equal volume of each spot sample collected each day and results might have differed if pooling volumes had been based on specific gravity or creatinine concentration. Creatinine is a body waste product primarily excreted by glomerular filtration (Barr et al. 2005;Boeniger et al. 1993;Cheung and Lafayette 2013). Excretion profiles of phenols are not well characterized in humans and specifically in pregnant women. However, the low correlation between creatinine or specific gravity and most of our exposure biomarkers may reflect that urinary excretion processes for creatinine and these phenols might differ, and that the dilution of urine samples may not affect substantially biomarker concentrations. Also, in some areas of research, as a replacement for creatinine standardization, correction for urine dilution relies on parameters other than creatinine or specific gravity. For example, the Integral Quotient Normalization approach in metabolomics relies on adjustment by the median value of all biomarkers (Lindon et al. 2007).

Implications for Sampling Strategies in Etiological Studies
We confirmed that studies aiming at characterizing the health effects of pregnancy exposure to compounds with a high withinsubject variability such as most of those considered here should generally collect several biospecimens per subject to reduce exposure misclassification (Perrier et al. 2016). More importantly, we report for the first time period-specific ICCs for select phenols, which can be used to refine the urine sampling scheme in epidemiologic studies aiming at characterizing the health effects of such exposures. As shown in Table 4, for some compounds (e.g., 2,5-dichlorophenol, several parabens), if a good estimate of the exposure averaged over a specific pregnancy week is available (e.g., through collection of a sample of all urine voids over this week), then this can conveniently be used as an estimate of the average exposure over all pregnancy weeks. For triclosan and bisphenol S, for which the between-week ICCs are <0:3, assessing exposure during a small number of weeks is unlikely to provide a reasonable estimate of the whole pregnancy exposure average. For these compounds, focusing on a few specific weeks of pregnancy may be inefficient, and one may rather consider collecting random samples during pregnancy to estimate the pregnancy average. Relying on the simulation by Perrier et al. (2016) and on the ICCs estimated in the current study, for compounds such as triclosan and bisphenol S, two to three dozen urine samples would be required, while for compounds such as 2,4-dichlorophenol, with an ICC close to 0.6 for the pregnancy window, pooling approximately five urine samples would allow for strongly limiting bias in the dose-response function. Relying on a spot urine sample, is, under the assumption of classical-type error, likely to induce an attenuation bias by 30% (for propylparaben) to 50% or more in the dose-response function if a single spot sample is used in etiological studies (Perrier et al. 2016). It is only for 2,5-dichlorophenol and methylparaben, for which ICCs based on three random spot samples during pregnancy (Table 5) were equal to 0.85, that using one or two spot urine samples collected randomly during pregnancy may provide a reasonable estimate of the whole pregnancy exposure average. If one is interested in an exposure window of a length of a week (for example the week when a specific fetal organ starts developing or at the end of which some biological parameter is assessed in the mother), then for dichlorophenols, triclosan, and parabens, assessing exposure during a single day of the week should do the job, whereas for benzophenone-3 and bisphenols A and S, it is safer to assess exposure during several days of the week. If one is now interested in assessing exposures over a specific day, collecting one spot sample is likely not enough for all of the studied phenols, given that the within-day ICCs were all <0:5. For biomonitoring (and not etiological studies), there is no issue related to bias in dose-response functions, and collecting a spot biospecimen might be a good option if the population is large enough. Collecting a random sample rather than the first morning void is likely to provide a much better estimate of the population average.

Conclusion
Biospecimens sampling strategy for accurate exposure assessment is a key issue in epidemiological studies based on short half-lived chemicals such as phenols. Our findings confirm that exposure misclassification may be high when collecting a small number of random spot samples. Future etiological studies should adopt a carefully thought-out design for the biospecimen sampling instead of using the default option of a single biospecimen per subject. Our results suggest that collecting more than one biospecimen per day for preferably several days during pregnancy is likely to allow a reduction in exposure misclassification.