Whence Healthy Children? | Mini-Monograph Prospective Pregnancy Study Designs for Assessing Reproductive and Developmental Toxicants

The determinants of successful human reproduction and development may act as early as periconceptionally, underscoring the need to capture exposures during these critical windows when assessing potential toxicants. To identify such toxicants, couples must be studied longitudinally prior to conception without regard to a couple's ability to ascertain a clinically recognized pregnancy. We examined the utility and feasibility of prospective pregnancy study designs by conducting a systematic review of the literature to summarize relevant information regarding the planning, implementation, and success of previously published prospective pregnancy studies. Information concerning design elements and participation was abstracted from 15 eligible studies (from a total of 20 identified studies) using a standardized form. The primary author of each study was contacted to review our summary of their work and obtain missing information. Our findings confirm the ability to recruit women/couples from diverse populations using a variety of recruitment strategies. Among the studies we reviewed, 4-97% of eligible individuals were successfully contacted, with enrollment rates ranging from 42 to 100%. Length of follow-up varied from 3 to 12 months. A high percentage of women provided urine (57-98%) and blood (86-91%) specimens and most male partners (94-100%) provided semen samples. These data support the feasibility of this design.

A growing body of evidence challenges traditional thinking that only in utero exposures are of concern for the health of the developing fetus. Specifically, reproductive biologists, epidemiologists, and toxicologists recognize the potential importance of parental exposures at critical periconceptional windows, in addition to exposures during organogenesis (Chapin et al. 2004;Selevan et al. 2000). A spectrum of human health end points can be conceptualized for study, as reflected in the evaluative guidelines set forth by various regulatory agencies or organizations (California Environmental Protection Agency 1991; European Commission 2002; International Programme on Chemical Safety 2001; Moore et al. 1995;U.S. Environmental Protection Agency 1991. Recent biomedical advances offer promise for population-based studies of this type that can potentially address the many critical data gaps that confront this field. Strategies for weighing scientific evidence regarding reproductive and developmental toxicity highlight study design as a criterion for evaluating the strength of available evidence (Subcommittee on Reproductive and Developmental Toxicology 2001). Although experimental study designs present the strongest data, they are not an ethical option for assessing the effect(s) of potentially toxic exposures on human reproductive and developmental end points. Hence, observational designs are the sole choice for epidemiologic investigation. Among observational studies, data from properly designed and implemented prospective cohort studies usually receive more weight than data obtained via retrospective cohort or case-control studies. This is mainly because of the investigator's ability to ensure a temporal ordering between exposure(s) and outcome(s), measure exposure more accurately, measure relevant covariates at multiple time points, and minimize potential information biases (e.g., recall bias) (Adams 2001;Andersson et al. 2000;Reichman and Hade 2001;Werler et al. 1989). A recent example of recall bias in retrospective design is in a study that found poor reliability and recall bias in women's retrospective reports of exposure to chemicals during pregnancy (Till et al. 2002).
Several cohort studies have followed human development by studying pregnant women (Golding et al. 2001;Niswander and Gordon 1972). These studies, however, could not ascertain exposures (or collect biospecimens) at critical periconceptional windows and could not assess early reproductive outcomes (before clinically recognized pregnancy).
The most comprehensive and informative observational design is a prospective cohort study that measures exposures longitudinally (on both parents) beginning prior to pregnancy and continuing throughout pregnancy (if it occurs) and beyond. This study design, which we call a prospective pregnancy study with preconception enrollment, allows for the assessment of early exposures and a complete range of reproductive and developmental outcomes, key information for avoiding bias in evaluating effect(s) of potential toxicants (Tingen et al. 2004).
Prospective pregnancy studies are often described as difficult, intensive, and expensive to conduct, with limited overall yield. In this article, we examine the empirical evidence on the utility and feasibility of prospective pregnancy study designs for identifying reproductive and developmental toxicants. Although The determinants of successful human reproduction and development may act as early as periconceptionally, underscoring the need to capture exposures during these critical windows when assessing potential toxicants. To identify such toxicants, couples must be studied longitudinally prior to conception without regard to a couple's ability to ascertain a clinically recognized pregnancy. We examined the utility and feasibility of prospective pregnancy study designs by conducting a systematic review of the literature to summarize relevant information regarding the planning, implementation, and success of previously published prospective pregnancy studies. Information concerning design elements and participation was abstracted from 15 eligible studies (from a total of 20 identified studies) using a standardized form. The primary author of each study was contacted to review our summary of their work and obtain missing information. Our findings confirm the ability to recruit women/couples from diverse populations using a variety of recruitment strategies. Among the studies we reviewed, 4-97% of eligible individuals were successfully contacted, with enrollment rates ranging from 42 to 100%. Length of follow-up varied from 3 to 12 months. A high percentage of women provided urine (57-98%) and blood (86-91%) specimens and most male partners (94-100%) provided semen samples. These data support the feasibility of this design. most prospective pregnancy studies focus on the determinants of sensitive end points (e.g., time to pregnancy and early pregnancy loss), a review of these issues is beyond the scope of this article. Our work is based on a systematic literature review to summarize relevant information on the planning, implementation, and relative success of this design.

Search Strategy
We conducted a MEDLINE (http://www. ncbi.nlm.nih.gov/entrez/query.fcgl) search in May 2002 to locate published prospective pregnancy studies using the following search terms: prospective studies [MeSH term] AND (fertility OR fecundity OR time to pregnancy OR urine OR pregnancy). We sought to identify all large epidemiologic prospective pregnancy studies with preconception enrollment and at least a 3-month follow-up period. We reviewed the references cited by each study investigator to ensure that all relevant published works had been identified. Our initial search yielded 18 studies, of which 13 were selected for review. Five studies were excluded for the following reasons: a) clinical study focusing on postimplantation pregnancy (Miller et al. 1980); b) small sample size (n = 24, 13, and 13, respectively) (Hilgers et al. 1978;Li et al. 2002b;Sanders and Bruce 1997), and c) prospective study comprising only women with clinically recognized pregnancies (Li et al. 2002a). We later added two studies, published while we were finalizing this work (Buck et al. 2002;Wang et al. 2003), which resulted in a total of 15 studies available for review.

Data Collection
We developed a standardized data abstraction form that included author and year of primary (or methodologically oriented) publication; size of the target population; number of individuals contacted; number of eligible individuals; number of study participants; length of follow-up; type(s) of data collection, specifically, use of daily diaries and biospecimen collection (namely, urine and blood); semen collection; number of people dropping out of the study; and type(s) of incentives offered for participation.
Requests for specific information were sent to all primary authors in June 2002, with 100% response. The authors were asked to review and approve our summaries of their work and to provide missing information if possible. Both published and unpublished data obtained from the authors were summarized for our review. Several investigators were unsure or unable to enumerate the exact size of the target population, given the sampling strategy employed. Thus, the eligibility and participation percentages presented here should be regarded as best estimates. Table 1 summarizes the sampling and recruitment strategies of the 15 selected prospective pregnancy studies. The first prospective pregnancy study with preconception recruitment was published in 1984 (France et al. 1984). By definition and selection, all studies used a prospective design with women/couples recruited prior to becoming pregnant. All but four studies (Brown et al. 1997;Ellish et al. 1996;Hakim et al. 1995;Zinaman et al. 1996) required that women/couples enroll prior to discontinuing contraception to ensure that the first ovarian cycle, measured in terms of the menstrual cycle, was at risk for pregnancy. Six authors estimated the size of their target population (Bonde et al. 1998;Brown et al. 1997;Buck et al. 2002;Ellish et al. 1996;Eskenazi et al. 1995;Hakim et al. 1995). Nine studies did not enumerate a denominator because of their reliance on community volunteers responding to recruitment advertisements or other such attempts to solicit participation (Colombo and Masarotto 2000;de Mouzon et al. 1988;France et al. 1984;Sweeney et al. 1988Sweeney et al. , 1989Vartiainen et al. 1994;Wang et al. 2003, Wilcox et al. 1988Zinaman et al. 1996).

Results
Participants have been recruited from a number of diverse referent populations (general or medical communities, job sites, population-based registries), and on the basis of recreational exposures (e.g., anglers). Most investigators studied women, with only four studies focusing on couples (Bonde et al. 1998;Colombo and Masarotto 2000;de Mouzon et al. 1988;Zinaman et al. 1996). All but three studies (Hakim et al. 1995;Sweeney et al. 1988Sweeney et al. , 1989 were restricted to presumably fecund women, leaving us with limited understanding of the exposure profiles of couples with impaired fecundity. One author specifically addressed the yield of mixed recruitment strategies, with targeted letters being the most successful (72%), followed by health care providers (12%), health maintenance organization (HMO) newsletters (9%), clinic posters (4%), radio and television announcements (1%), and other methods (2%) (Brown et al. 1997).
When recruitment details were available (Table 2), the percentage of women/couples who were successfully contacted ranged from 2% in a population-based study of first pregnancy planners (Bonde et al. 1998) to 67% in a study of women working in the semiconductor industry (Eskenazi et al. 1995). Of particular note is the high percentage of contacts (46%) achieved by one group of investigators by mailing questionnaires to women of reproductive age who were listed in the New York State registry of licensed drivers (Ellish et al. 1996). The percentage of women/couples successfully contacted or eligible for enrollment could not be determined for every study because of the lack of available denominator information.
The percentage of eligible women/couples among those who were contacted ranged from 4% in a population-based study that targeted women of reproductive age (Ellish et al. 1996) to 95% in a volunteer community-based sample of couples desiring pregnancy (Zinaman et al. 1996) and 97% in a group of newly married textile workers in China (Wang et al. 2003). It should be noted that the number of women contacted and deemed eligible appeared to vary according to the recruitment strategy (i.e., those who publicized their  Brown (1997) HMO women of reproductive age Women Letters to female HMO members (also media and health providers) Buck (2002) Anglers and partners Women Letters Colombo (2000) Women seeking medical care Couples Fertility awareness teaching centers de Mouzon (1988) Community Couples Media and letters Ellish (1996) Motor vehicle registry Women Letters Eskenazi (1995) Semiconductor workers Women Letters (also informational meetings) France (1984) Women seeking medical care Women Media and fertility awareness teachers Hakim (1995) Semiconductor workers Women a Outreach talks and posters Sweeney (1988) Community Women a Media and letters Sweeney (1989) Motor vehicle registry and Women a Letters telephone directory Vartiainen (1994) Community Women Media Wang (2003) Newly wed textile workers Women Letters Wilcox (1988) Community Zinaman (1996) Community Couples c Media, physician, and acquaintance referral a The sampling units were not required to be free of known fecundity or fertility impairments. Media include television, radio, and newspaper/poster announcements. b Men were enrolled after study was implemented; baseline questionnaire data available from approximately two-thirds of husbands. (Personal communication with authors.) c Female partner of couple had to be free of fecundity impairments. eligibility criteria during the recruitment process contacted fewer women, but more of their contacts were eligible for participation). Participation rates seemed to be influenced by both the recruitment strategy and the study design features, with rates ranging from 42% of women originally enrolled in a larger cohort study with a less intense protocol to 100% of community volunteers meeting eligibility criteria at one urban medical center. One group of investigators examined the degree to which pregnancy intentions influenced a woman's decision to participate (Sweeney et al. 1989). They found that only 2% of enrolled women reported actively trying to conceive during the 3-month study period, 46% reported using oral contraceptives or intrauterine devices, 24% reported using barrier methods or monitoring their cervical mucus and basal body temperature (BBT) to avoid pregnancy, 18% reported being sexually inactive, 8% reported being sexually active but not using contraception, and 2% reported being infertile. Table 3 summarizes the follow-up and specimen collection details for each selected study. The length of follow-up varied by study purpose and intensity of the data collection. Study durations ranged from 3 to 12 months. The least intensive protocols included a minimum of baseline interviews with some prospective recording of relevant study factors with or without the collection of biologic specimens.
Daily diaries were used by 12 (80%) studies for varying periods of time ranging from 1 month to 12 at-risk menstrual cycles. The type of data collected with these diaries varied but typically included exposure(s) of interest, menstruation, fecundity signs (namely, vaginal mucus discharge and/or BBT), sexual intercourse, lifestyle behaviors (e.g., cigarette smoking, alcohol, caffeine, or vitamin/mineral consumption, illnesses, medications), and home pregnancy test results. Among those studies in which compliance rates were available, rates ranged from 80 to 98%, with the exception of one study that reported a 38% completion rate for the entire study protocol (France et al. 1984).
Four types of biospecimens have been collected in prospective pregnancy studies: urine, blood, semen, and breast milk. Biospecimen compliance rates were quite high among the studies for which information was available, ranging from 57 to 98% for urine, 86 to 93% for blood, 94 to 100% for semen, and 97% for a single postpartum breast milk sample. Our review suggests that once enrolled, women (and male partners, if applicable) will provide a variety of specimens for study purposes.
The reported study dropout rates varied widely, in part depending on how withdrawals were handled. (Some authors counted withdrawals as ineligible.) Moreover, some investigators requested that women/couples participate as long as possible, while others asked a priori for participation for a set period of time (e.g., 6 months). The lowest dropout rate (3%) was reported by one group of investigators in their 6-month prospective pregnancy study of community volunteers desiring pregnancy (Wilcox et al. 1988). France and associates reported the highest dropout rate (62%) in their study of couples desiring pregnancy who wished to preselect the sex of their child (France et al. 1984). Of the 148 women who dropped out of that study, 28% cited a change in pregnancy plans, 18% stated that the study was too demanding, 12% felt the study was too stressful, and 7% failed to become pregnant. Among other studies reporting reasons for dropout, the most common reasons were changes in pregnancy plans or health status (Bonde et al. 1998;Brown et al. 1997;Buck et al. 2002;Ellish et al. 1996;Sweeney et al. 1988).
Prospective pregnancy studies have offered varying levels of incentives for study participation. Notably, four (27%) authors reported offering no incentives for participation (Colombo and Masarotto 2000;Sweeney et al. 1988Sweeney et al. , 1989Vartiainen et al. 1994). The largest incentive was US$500, which was given to couples upon completion of a protocol that required multiple clinic visits and sensitive procedures such as midcycle postcoital tests (Zinaman et al. 1996). Among U.S. studies reporting the use of incentives, the smallest was US$10, which was given either weekly (Wilcox et al. 1988) or every 2 months (Ellish et al. 1996)  participated in a protocol that included daily diaries and urine collection (the former had an attrition rate of 3% and the latter 7%). A recent study conducted in China paid women US$1 per three urine samples provided (Wang et al. 2003). Only two studies reported providing feedback to participants in the form of summarized menstrual cycle information (Buck et al. 2002;Hakim et al. 1995).

Discussion
This review suggests that prospective pregnancy studies are a relatively new, powerful, and feasible design for examining the relation between biological, environmental, and lifestyle exposures and various reproductive and developmental outcomes. The utility of prospective pregnancy studies has greatly furthered our understanding of human reproduction and development, including notable advances such as estimates of the incidence of Abbreviations: f/u, follow-up; N/A, information not applicable. a Cycles refer to menstrual cycles, whereas months refer to calendar time. b Personal communication with author(s). c Information not available. early [i.e., human chorionic gonadotrophin (hCG) identified] pregnancy loss and the elucidation of daily and cumulative probabilities of conception. Such information is crucial for accurately measuring the reproductive effects of exposures along the continuum of susceptible windows of human development.
Although more contacts may be required to identify a woman eligible for preconception enrollment in a prospective pregnancy study, the participation rates of eligible women are comparable to those seen in prospective studies of pregnant women. For example, 60% of eligible women enrolled in the Pregnancy, Infection, and Nutrition study, a prospective cohort study of the risk factors for preterm birth in North Carolina (Siega-Riz et al. 2001). In a captured HMO population, 39% of the eligible pregnant women were successfully recruited to participate in a populationbased prospective cohort study in the Kaiser Permanente Medical Care Program in Northern California (Li et al. 2002a).
To address the lack of a sampling frame for women at risk of pregnancy, one investigator employed commercially available telephone directories (Lobdell et al. 2003). These inexpensive (< US$100) computerized directories contain the names, addresses, and telephone numbers of U.S. households, with each entry linked to basic census information. The census information enables investigators to assess sociodemographic differences between respondents and nonrespondents, as well as those that could not be reached because of inaccurate contact information. Targeted sampling is also possible by weighting or stratifying on ZIP or area code, if a specific subpopulation is desired.
An often-cited concern regarding the utility of prospective pregnancy studies is that participants are not representative of pregnant women as a whole because approximately half of all pregnancies in the United States are unintended (Henshaw 1998). Approximately 46% of unintended pregnancies result in live births (many are electively terminated) (Kaunitz and Schnare 2001). Little empirical evidence exists to assess whether the prospective pregnancy study design results in a biased estimate of effect because of differing exposure scenarios among women with intended versus unintended pregnancies. However, the possibility of differing exposure profiles should always be given careful consideration, as women who plan their pregnancies are healthier, smoke and drink less, and have better diets than women who do not (Brown and Eisenberg 1995). Similarly, women who change unhealthy or risky behaviors are reported to be more educated, more likely to be employed, and from higher socioeconomic backgrounds than women who do not change behaviors (Beck et al. 2002;Joyce et al. 2000b;Kost et al. 1998).
Though yet unproven, the xenobiotic exposure profiles of women may also vary by pregnancy intention status. For example, hazardous waste sites and industrial sources of environmental pollution are often located in low-income communities (Farber and Krieg 2002;Morello-Frosch et al. 2002;Wilson et al. 2002) whose residents typically do not participate in research studies in the absence of targeted recruiting. Further, lifestyle factors such as cigarette smoking, alcohol use, and medications can influence the effects of environmental chemical exposures in humans (Anwar 1993;McCauley 1998). Given the potential for differing exposure profiles among pregnant women, coupled with the likelihood that some behaviors will be modified during pregnancy, the possible interactive effects of toxic agents and divergent lifestyle profiles during the periconceptional period (including those that are paternally mediated) must be evaluated. Prospective pregnancy study designs are the only reliable approach for such inquiry.
Additional concerns have been raised regarding the generalizability of prospective pregnancy studies because of research suggesting that women with intended pregnancies have fewer adverse pregnancy outcomes compared with mothers with unintended pregnancies (Piccinino and Mosher 1998). However, data from the National Longitudinal Survey of Youth suggest that differences in pregnancy outcomes by pregnancy intentions might be explained by the women's socioeconomic status rather than by planning status per se (Joyce et al. 2000a).
The conceptualization and measurement of intended or planned pregnancies has recently come under intense scrutiny, with many researchers in the field suggesting that more accurate measures are needed (Klerman 2000;Luker 1999;Sable 1999;Stanford et al. 2002;Trussell et al. 1999). For example, one study reported that 25% of women gave discordant responses to questions designed to assess pregnancy intentions in two large population-based surveys (Kaufmann et al. 1997). Discrepancies in pregnancy intention responses were associated with age, marital status, income, education, parity, time since pregnancy, and pregnancy outcome.
As with any epidemiologic investigation, researchers must weigh the relative importance of external validity in relation to internal validity (Grimes and Schulz 2002;Rothman and Greenland 1998). Given the difficulty in defining the exact size of the population from which participants in prospective pregnancy studies are recruited, empirical evaluation of external validity is often not possible. Although results from prospective pregnancy studies may not be generalizable to all women of reproductive age, they are likely to yield important observations that prompt additional studies.
As demonstrated in other pregnancyrelated studies (Wyatt et al. 2002), prospective pregnancy studies with semen collection were successful in obtaining specimens from most male participants (Bonde et al. 1998;Vartiainen et al. 1994;Zinaman et al. 2000). Couple-based studies permit exploration of developmental toxicants that may be mediated through exposure of the embryo or fetus to the components of seminal fluid via intracanicular exposure or by absorbance of seminal fluid components into the bloodstream of the mother (Benziger and Edelson 1983;Sandberg et al. 1968). Semen collection provides the opportunity to measure biological and chemical components of the seminal fluid (Lay et al. 2001;Younglai et al. 2002), perform standard sperm analyses, and even examine spermatozoal gene expression profiles (Ostermeier et al. 2002). The routine collection of semen specimens would further the assessment of human reproductive function, as these data could identify paternally mediated developmental effects. Semen analyses afford an opportunity to identify biomarkers that could delineate causal mechanisms of paternal toxicant exposure and/or fertility.
Our review suggests that study participants were generally willing to participate in studies even when they included time-consuming and/or invasive protocols for extended periods of time. Future studies may yield even higher rates of participation as technologic advances are incorporated into study protocols. Examples of relatively inexpensive technologies that could be implemented include specially programmed handheld devices to record menstrual cycle symptoms (Wyatt et al. 2002), home fertility monitors based on daily urine dipsticks (Behre et al. 2000) or salivary or vaginal probes (Fehring and Schlaff 1998), one-step luteinizing hormone tests (Nielsen et al. 2001), fingerprick blood spots (Worthman and Stallings 1997), home semen collection (Royster et al. 2000), and mouthwash methods for collecting genomic DNA (Lum and Le Marchand 1998). These technologies will be a useful addition to the biomarkers of fecundity and ovulation currently in use (e.g., vaginal mucus and BBT) (Stanford et al. 2002). For example, one recent study suggests early pregnancies can be detected with home pregnancy test kits (Buck et al. 2002). These kits have high sensitivity and specificity for detecting hCG concentrations of 25 mIU/mL, the level anticipated on the day following expected menstruation when conception has occurred (Ehrenkranz 2002). Because the timing of ovulation can vary in healthy women, this approach would be most accurate if used with a marker for ovulation (Wilcox et al. 2001).
Our assessment of the utility and feasibility of prospective pregnancy studies has several limitations. Only published prospective pregnancy studies were summarized for review. Though we made every effort to learn of all large-scale prospective pregnancy studies undertaken to date, both published and unpublished, the possibility remains that we may have missed some studies. Further, although it would have been valuable to be able to include estimates of study costs and personnel, most investigators were unable to provide us with that information.
In summary, recruiting women/couples for prospective pregnancy studies prior to conception is feasible for both those planning pregnancy and those at risk of pregnancy. Among the population-based studies of women of reproductive age examined in this review (Bonde et al. 1998;Brown et al. 1997;Ellish et al. 1996), the number of participants divided by the size of the target population ranged from 0.8 to 4%. Using a conservative estimate, it therefore appears that about 120 women of reproductive age would need to be approached to identify one eligible woman/ couple planning pregnancy who might be willing to participate in a study of this type. Our review suggests that once recruited, women/ couples are often willing to complete very intensive protocols, even if only a modest incentive is provided. In one study, when an urban sample of women was presented with the choice of four protocols that ranged in intensity, 74% opted to participate in the most intense protocol (Sweeney et al. 1989).
As previously noted, individuals from underrepresented minority or economically disadvantaged groups should be targeted for recruitment, given their potentially higher risk of exposure to toxicants and possibly greater susceptibility (Sexton 1997). In so doing, investigators should consider factors reported to enhance participation, such as building trust with community participants (Shavers et al. 2002). Finally, couples experiencing fecundity-related impairments, including those undergoing assisted reproductive technologies, might represent another group suitable for study, in that exposure(s) to toxicants may be impairing their ability to conceive or carry a pregnancy to term.