Statistical Issues: Barr et al. Respond

Mage et al. criticize our article (Barr et al. 2006), stressing six “… factual and conceptual errors that need to be called to the readers’ attention.” We appreciate their careful reading of our work, and they raise several important points regarding survey design. However, we take issue with some of their statements. 
 
Many investigations are designed to generalize the results of the research performed within a sample population to a larger population. In these types of investigations, enrollment of a representative sample is a necessary condition for making inferences to the larger population through known selection probabilities that are then used for applying sampling weights to study results. However, in order to generalize the results, these studies must have an adequate sample size, high response rate, and, importantly, a preliminary assessment of whether the factors for probability selection and weighting will be relevant to the condition being measured. Although representative samples are desirable and have been achieved in many studies, some studies in rare or difficult-to-reach populations cannot practically meet the criteria mentioned above. 
 
Farmworkers are often not the ones applying pesticides (i.e., they are not applicators); quite often these farmworkers are unaware of the actual pesticides being applied or when they are applied. The potential for undue exposure may be more likely if farmworkers are not properly informed of the application or reentry times or if they do not understand the potential exposure scenarios. 
 
Research investigations involving farmworker exposures can present particular difficulties in selecting a representative sample of the population. Data for developing relevant sampling frames and selection probabilities are often limited by demographic and work factors. Obtaining high response rates for farmworkers can also be challenging because of problems associated with access, high mobility, geographic dispersion, trust, and cultural practices (Arcury et al. 2006). However, these populations remain important and potentially vulnerable populations that should be included in research investigations, even if the conditions for using a probability sample cannot always practically be met. 
 
In any particular situation, the decision to use probability sampling will depend on the hypothesis. However, important and relevant research questions can be investigated without selecting a representative sample of the population, as noted by Mage et al. regarding particle exposure studies in high-risk subpopulations. Nonprobability samples can provide useful information on particular hard-to-reach populations, for intensive examination of conditions and factors related to exposures, or for hypothesis generation. By forcing all studies to conform to the same design, we may not be able to answer specific research questions. 
 
The studies cited by Mage et al. were designed as probability samples, but each had differential drop-out rates during selection and sampling. Potential drop-out non-representativeness can be accounted for if the effect on exposure or the outcome variable is known. However, in some populations, the details of the accounting are not easily accomplished. Mage et al. cited the National Health and Nutrition Examination Survey (NHANES) as “another excellent example of proper probability-based sample selection.” However, even NHANES III (1988–1994) used a nonprobability sample for the environmental subset (Hill et al. 1995). These data provided an invaluable first look at U.S. population exposures and served as a basis to add statistical sampling for environmental chemicals to the current NHANES series. These data have also been used to estimate doses in the U.S. population for comparison to the U.S. Environmental Protection Agency’s reference doses (Mage et al. 2004). In fact, it is often difficult to design a population-based study without preliminary data. 
 
We have participated in the design and implementation of studies using both probability and nonprobability sampling that have added invaluable information on various population exposures. We recognize the practical difficulties and challenges for meeting the criteria for representative sampling in farmworker populations and also the important information that such studies can provide. Although we agree that without a probability sample, the results should only apply to individuals in studies and should not be generalized to a population, we disagree with the contention made by Mage et al. that no useful information can come from studies using samples that do not fulfill their criteria.

Should hormesis, as Thayer et al. (2006) implied in the title of their letter in the November 2006 issue of Environmental Health Perspectives, be dismissed by scientists, regulators, and others as simply a new faithbased religion? No. Hormesis is a data-based biological reality, one that challenges the lowdose assumptions that currently drive risk assessment processes used by regulatory and public health agencies worldwide.
As we discussed in our recent commentary (Cook and Calabrese 2006), we believe that default assumptions, however well intentioned, should not trump data in the formulation of public health policy. Published scientific information supporting the hormetic nonmonotonic dose-response curve is extensive. The most recent comes from an article based on a large National Cancer Institute antitumor drug screening database , which reports that effects at lowlevel exposures are inconsistent with the threshold model and supportive of the hormetic model.
We believe the current regulatory mandated approach of narrowly gathering effect data at high doses of exposure and then dogmatically imputing an excess burden of harmful outcomes monotonically down to and below the markedly lower levels that actually occur in the environment is wrong. This approach is wrong because it censors the observations that can be considered (only high-dose adverse effects and often just the worst-case sentinel effect) and requires the use of nonscientific assumptions that are either untested or untestable. The hormetic model addresses both of those shortcomings. It encourages the collection of data across a broader range of dose and thereby allows evaluation of both risks and benefits (specific and holistic) that would occur at these lower levels. In addition, findings based on the hormesis model are subject to tests using empirical data.
Without evidence, Thayer et al. (2006) argued that we were wrong to suggest that public health might be better served by setcollected based on the hormetic model. We mation, we believe policies could be developed that would not only prevent excess disease or death over background but also promote better health, quite possibly for both the general public and more sensitive subgroups.
Although we differ with Thayer et al. (2006) on a number of points, we all seem to agree that hormesis exists. Building on that consensus, perhaps we all can also agree with the perspective recently presented by Rietjens and Alink (2006): the discipline of toxicology should refocus its efforts to better address the regulatory issues of low-dose effects and risk-benefit analysis.  surveyed statistical issues related to farmworker exposure studies. However, they made several factual and conceptual errors that need to be called to the readers' attention.

Statistical Issues in Farmworker Studies
First,  claimed incorrectly that "representativeness" is optional and not a necessary condition for a well-designed investigation. For convenience samples, [T]he results only pertain to the sample itself, and should not be used to make quantitative statements about any population -including the population from which the sample was selected." [U.S. Environmental Protection Agency (EPA) 2003]  stated that "because responses from convenience samples are likely to be better than that for a representative sample, they may actually be more 'representative.'" The fallacy of this statement is shown by a hypothetical CNN call-in response to a question from 100% of its viewers that perfectly represents all CNN viewers. In this illustrative example, the 100% response would not represent the entire population of the United States as well as a probability-based survey of the U.S. population that included non-CNN viewers that achieved an 80% response rate.
In their article,  claimed that "perfectly random sampling across all relevant factors is therefore almost universally impractical." Acquavella et al. (2004) monitored a probability sample of pesticide applicators; U.S. EPA provided several TEAM (Total Exposure Assessment Methodology) studies using a scientific probability design (Thomas 1993;Wallace 1991;Wallace et al. 1987), as did the World Health Organization, U.S. EPA, and Harvard University for the government of Kuwait during the 1991 oil fires (Mage DT, Wallace LA, Kollander M, personal communication). The Centers for Disease Control and Prevention's (CDC) National Health and Nutrition Examination Survey study (CDC 2003) is another excellent example of proper probability-based sample selection.
According to , it is possible to identify and "sample known or anticipated 'hot spots' of [pesticide] exposure." There are only two categories of applicators expected to be at high risk of a high pesticide exposure event: the inexperienced applicators who are still learning how to apply pesticides safely, and those applicators who do not follow the mandatory manufacturer's label requirements in violation of federal law (Mage et al. 2002). Whereas the former cohort might be identified by a screening question about prior numbers of applications, there is no certain way to identify the latter group, who will likely not admit to taking shortcuts or refusing to use required personal protective equipment, because they might be incriminating themselves. Finally, such an applicator may succumb to the Hawthorne effect [not mentioned by  as a caveat], defined by Last (1988) as "the effect of being under study upon the persons being studied."  claimed that "some form of convenience sampling is typically adopted in practice." Unfortunately, this claim is true; some of these authors did use convenience sampling in previous studies (Curwin et al. 2002(Curwin et al. , 2005 in which subjects were recruited by "word of mouth." A friend or neighbor recruited by an enrolled subject might not be "an independent sample" if he or she has some similar characteristics (e.g., crops grown, acreage, age, race, education, sex) as the recruiter. This haphazard practice of using volunteers for convenience, or even subjects based on expert choice , limits the validity of the study, as theoretical confidence intervals and significance p-values become meaningless. VOLUME 114 | NUMBER 12 | December 2006 ¥ Environmental Health Perspectives

Perspectives Correspondence
The correspondence section is a public forum and, as such, is not peer-reviewed. EHP is not responsible for the accuracy, currency, or reliability of personal opinion expressed herein; it is the sole responsibility of the authors. EHP neither endorses nor disputes their published commentary. strongly disagree. With the additional infor-ting exposure standards at levels using data

A 688
The weakness of all nonprobability sampling is its subjectivity that precludes the development of a theoretical framework for it. (Kalton 1983) Finally, as former U.S. EPA scientists who pioneered agency exposure science, we are disappointed that this article was cleared for publication by the U.S. EPA because it is not in accordance with U.S. EPA (and other agency) requirements to follow the Office of Management and Budget's (OMB) data collection policies (OMB 2006) that require "selecting samples using generally accepted statistical methods (e.g., probabilistic methods that can provide estimates of sampling error)." The U.S. EPA (2003) stated: Probability sampling must be used at each stage of respondent selection. You may encounter difficulties in clearing the survey through OMB if you do not insist that probability selection methods be used.
Recent samples of high-risk subpopulations and their exposures to particles were undertaken by the U.S. EPA using doctoridentified subjects, and these were therefore not probability-based samples. The OMB allowed these studies but required that a statement be made in all resulting publications that the results could be applied only to the participants, even if chosen in this case by expert judgment, and must not be extrapolated to larger populations. We believe a similar statement should be made in all publications of studies using alternatives to probability-based sampling.
In summary,  attempted to review survey design practices, but they do not seem to understand that the convenience samples they advocate apply only to the subjects selected and not to the larger populations from which they are taken.  , stressing six "… factual and conceptual errors that need to be called to the readers' attention." We appreciate their careful reading of our work, and they raise several important points regarding survey design. However, we take issue with some of their statements.
Many investigations are designed to generalize the results of the research performed within a sample population to a larger population. In these types of investigations, enrollment of a representative sample is a necessary condition for making inferences to the larger population through known selection probabilities that are then used for applying sampling weights to study results. However, in order to generalize the results, these studies must have an adequate sample size, high response rate, and, importantly, a preliminary assessment of whether the factors for probability selection and weighting will be relevant to the condition being measured. Although representative samples are desirable and have been achieved in many studies, some studies in rare or difficult-to-reach populations cannot practically meet the criteria mentioned above.
Farmworkers are often not the ones applying pesticides (i.e., they are not applicators); quite often these farmworkers are unaware of the actual pesticides being applied or when they are applied. The potential for undue exposure may be more likely if farmworkers are not properly informed of the application or reentry times or if they do not understand the potential exposure scenarios.
Research investigations involving farmworker exposures can present particular difficulties in selecting a representative sample of the population. Data for developing relevant sampling frames and selection probabilities are often limited by demographic and work factors. Obtaining high response rates for farmworkers can also be challenging because of problems associated with access, high mobility, geographic dispersion, trust, and cultural practices (Arcury et al. 2006). However, these populations remain important and potentially vulnerable populations that should be included in research investigations, even if the conditions for using a probability sample cannot always practically be met.
In any particular situation, the decision to use probability sampling will depend on the hypothesis. However, important and relevant research questions can be investigated without selecting a representative sample of the population, as noted by Mage et al. regarding particle exposure studies in highrisk subpopulations. Nonprobability samples can provide useful information on particular hard-to-reach populations, for intensive examination of conditions and factors related to exposures, or for hypothesis generation. By forcing all studies to conform to the same design, we may not be able to answer specific research questions.
The studies cited by Mage et al. were designed as probability samples, but each had differential drop-out rates during selection and sampling. Potential drop-out nonrepresentativeness can be accounted for if the effect on exposure or the outcome variable is known. However, in some populations, the details of the accounting are not easily accomplished. Mage et al. cited the National Health and Nutrition Examination Survey (NHANES) as "another excellent example of proper probability-based sample selection." However, even NHANES III (1988)(1989)(1990)(1991)(1992)(1993)(1994) used a nonprobability sample for the environmental subset (Hill et al. 1995). These data provided an invaluable first look at U.S. population exposures and served as a basis to add statistical sampling for environmental chemicals to the current NHANES series. These data have also been used to estimate doses in the U.S. population for comparison to the Environmental Health Perspectives ¥ VOLUME 114 | NUMBER 12 | December 2006 Correspondence U.S. Environmental Protection Agency's reference doses (Mage et al. 2004). In fact, it is often difficult to design a population-based study without preliminary data.
We have participated in the design and implementation of studies using both probability and nonprobability sampling that have added invaluable information on various population exposures. We recognize the practical difficulties and challenges for meeting the criteria for representative sampling in farmworker populations and also the important information that such studies can provide. Although we agree that without a probability sample, the results should only apply to individuals in studies and should not be generalized to a population, we disagree with the contention made by Mage et al. that no useful information can come from studies using samples that do not fulfill their criteria.

Cox Models for Ecologic Time-Series Data?
In a recent article,  proposed using Cox regression with time-dependent covariates to estimate the acute health effects of air pollution. Their results were similar to those they obtained in a previous case-crossover analysis (Filleul et al. 2004), and they claimed that the Cox model approach is more precise. Understanding their results and why the claim is misleading requires considering how case-crossover and Cox model analyses work.
The case-crossover design ) requires a choice of referent strategy or a method for choosing control time periods (referent windows). With a valid referent strategy-a localizable design (Janes et al. 2005a;Janes et al. 2005b)-a conditional likelihood is constructed by conditioning on the number of events experienced by each person over the study period. Conveniently, there is no information on the exposure effect from people who do not have an event, so no information is lost by dropping them from the analysis. The information comes from variations in exposure within person and within referent window. We must assume that all variables that confound the variation in risk within an individual across a referent window have been measured. The estimated β is the value that equates the exposure on the index day to its expected value over the referent window, averaged over all subjects.
The Cox model  uses the same principle of equating the observed and expected exposure, but across people rather than within a person. Time points with no events do not contribute information for estimating the exposure effect and may be discarded. The information comes from comparisons between people at the same point in time. We must assume that all variables that confound variation in risk between individuals at the same point in time have been measured. The estimated β is the value for which the exposure for the person with the event equals its expected value over the atrisk cohort, averaged over all time points.
If the same time scale is used for the case-crossover and Cox analyses, the two sets of information do not overlap: the casecrossover analysis is purely within person; the Cox model analysis is purely between persons. When exposure measurements vary both over time and by individual, the two analyses provide independent estimates of risk. In a data set that includes only chronic exposure measurements, there is no temporal exposure variation so the Cox model captures all the information. Conversely, in an ecologic time-series data set, there is no variation in exposure between people at a given time; therefore, the case-crossover analysis uses all of the information.
In order to estimate acute effects with ecologic exposure measurements using a Cox model,  used age as the time scale. That is, they chose β, so that the exposure for an individual who died at a given age is equal to the average exposure for at-risk individuals at exactly that age. Because all individuals have the same exposure measurement on any given day, this is equivalent to comparing exposure on the day of death with exposure on a selected set of other days determined by the dates other members of the cohort reach that age. That is, it is a casecrossover design, albeit one with an unusual choice of referent strategy. Note also that the Cox regression estimating equations are exactly the same as those used in conditional logistic regression, making the case-crossover and Cox regression estimates identical. This Cox model approach is a casecrossover design. Theoretical development is needed to determine whether it is a localizable design. It is more effcient than a semisymmetric bidirectional case-crossover design only because more referent time points are used.
We see at least two potential biases associated with this design. First, it is not clear that the strong seasonality and time trends in air pollution and mortality data are controlled with this referent strategy; typically, referent windows are designed to be small to control for time-dependent confounders by design. This referent strategy necessitates controlling such factors by modeling, as these authors have done. Second, there may be minor bias due to subjects who die very young or very old being dropped from the analysis because they have no referents (no one else is at risk at that age).
We read with interest the letter by Lumley et al. regarding our article , and we appreciate their comments and interesting suggestions.
Our results  showed that the Cox model  approach gave more precise results for cohort data than the case-crossover design . As stated by Lumley et al., the Cox model is more efficient than the semisymmetric bidirectional case-crossover design because more referent time points are used. In fact, because the casecrossover design is a within-people approach, people who do not have the event are not included in the analysis, whereas they are in the Cox model and so contribute to the information for estimating the exposure effect.
Lumley et al. specify that the estimating equations for the Cox regression are the same as those used in the conditional logistic regression for the case-crossover design and that we applied them to the same data. Despite that, we cannot say that the Cox model is a casecrossover design with an unusual choice of referent strategy. As we stated in our article , the results of both approaches are very similar, and when a cohort is available, the Cox model should be applied because survival analysis uses all available information and increases the power of the study. The case-crossover analysis is a withinperson approach; the referent time points are chosen by the operator, and the design is the same for all the subjects. The Cox model is a between-people approach; the referent time points cannot be chosen because they depend on the number of live subjects who will be included in the risk set, which varies at each time of death. Moreover, with age used as the basic time scale, the dispersion of the referent time points included in the risk set around the time of death varies at each age of death. Otherwise, the number of referent time points is almost always higher in the Cox model than in the case-crossover design (i.e., two referent time points in the bidirectional design).
We do not agree with the statement of Lumley et al. that They chose β; thus, the exposure for an individual who died at a given age is equal to the average exposure for at-risk individuals at exactly that age.
In fact, we assess β as the exposure for a person who died at a given age compared with the exposures for at-risk people at exactly that age: β is the mean effect for an increase in air pollution concentration on the mortality, whatever the age. Thus, in both cases, when the exposure is either a chronic measurement or an ecologic time-series data set, the Cox model captures all of the information available, whereas the case-crossover design cannot be used with chronic exposure measurements. Therefore, the Cox model should prove particularly useful in the future to simultaneously analyze both the chronic (long-term) and the short-term effects of air pollution concentrations.
Concerning the first possible bias noted by Lumley et al., the adjustment of the results for the seasonality effect and for time trends in air pollution concentration is more of an advantage than a disadvantage. These pieces of information are very easy to take into account with truncated power basis splines (Heuer 1997) without data collection. Moreover, this process allows for the assessment of the magnitude of these effects, which is not possible with the case-crossover design. The second bias noted by Lumley et al. on the extreme age of death is a very minor bias that was not present in our study. This bias appears only if there is no risk set for the first or the last subject who has the event.
Furthermore, numerous results from time-series studies have shown an association between mortality and particulate air pollution, and the results observed were similar (Filleul et al. 2001;Goldberg et al. 2001;Samet et al. 2000). Despite that, causality was discussed (Filleul et al. 2003) and statistical methods have sometimes been criticized. For example, generalized additive models using nonparametric smoothing, which could lead to biased estimates and to underestimation of the true variance (Dominici et al. 2002, Ramsay et al. 2003. Thus, using the Cox model could be an alternative approach if data are available. Our study ) is the first in which a Cox model has been used to study the short-term effect of air pollution. We found that the Cox method and casecrossover design gave the same results as times series. This information supports the hypothesis of a causal relationship between mortality and air pollution.