The question of declining sperm density revisited: an analysis of 101 studies published 1934-1996.

In 1992 Carlsen et al. reported a significant global decline in sperm density between 1938 and 1990 [Evidence for Decreasing Quality of Semen during Last 50 Years. Br Med J 305:609-613 (1992)]. We subsequently published a reanalysis of the studies included by Carlsen et al. [Swan et al. Have Sperm Densities Declined? A Reanalysis of Global Trend Data. Environ Health Perspect 105:1228-1232 (1997)]. In that analysis we found significant declines in sperm density in the United States and Europe/Australia after controlling for abstinence time, age, percent of men with proven fertility, and specimen collection method. The declines in sperm density in the United States (approximately 1.5%/year) and Europe/Australia (approximately 3%/year) were somewhat greater than the average decline reported by Carlsen et al. (approximately 1%/year). However, we found no decline in sperm density in non-Western countries, for which data were very limited. In the current study, we used similar methods to analyze an expanded set of studies. We added 47 English language studies published in 1934-1996 to those we had analyzed previously. The average decline in sperm count was virtually unchanged from that reported previously by Carlsen et al. (slope = -0.94 vs. -0.93). The slopes in the three geographic groupings were also similar to those we reported earlier. In North America, the slope was somewhat less than the slope we had found for the United States (slope = -0.80; 95% confidence interval (CI), -1.37--0.24). Similarly, the decline in Europe (slope = -2.35; CI, -3.66--1.05) was somewhat less than reported previously. As before, studies from other countries showed no trend (slope = -0.21; CI, -2.30-1.88). These results are consistent with those of Carlsen et al. and our previous results, suggesting that the reported trends are not dependent on the particular studies included by Carlsen et al. and that the observed trends previously reported for 1938-1990 are also seen in data from 1934-1996.

In 1992 Carlsen et al. (1) stated that ...reports published worldwide indicate clearly that sperm density has declined appreciably during 1938-1990. Subsequently, this conclusion has been supported by findings from some studies (2-4), but not by others (5)(6)(7). The critical issues raised concerning this study fall, broadly, into three categories. Some authors suggested that poor or highly variable data invalidated any inference about trends in sperm counts (8,9). Others questioned the validity of the statistical methods used in this analysis (8,10,11). Bias due to changing study populations (12) or confounding by factors such as age and abstinence time (time between sample collection and last ejaculation) were also suggested (4,8).
We conducted several analyses designed to examine these concerns. The first, published in 1997, reanalyzed the studies used by Carlsen et al. (1) to examine model selection, confounding, and selection bias (13). In that paper, we noted that estimates of mean sperm density from the United States and Europe declined somewhat more rapidly than had been reported by Carlsen et al. (1). In other parts of the world, where studies were few and most were quite recent, there was insufficient data to evaluate this question. We also found that controlling for confounding bias, to the extent possible, provided additional support for the conclusions of Carlsen et al. (1) rather than reducing the estimated decline in sperm density. In the second analysis, published in 1999, we looked at sperm counting methods and the reliability of measurements from these historical studies (14). We found no evidence that counting methods had changed appreciably or that counts from older studies were less reliable than those from recent studies.
The current study extends our previous analyses in three ways. First, we conducted an independent literature review to evaluate possible bias in the selection of studies used by Carlsen et al. (1). Second, we examined the robustness of the models utilized in that analysis (and ours) by applying these models to an expanded data set. Finally, we assessed the consistency of post-1990 data with trends in sperm density from studies published before 1990. Carlsen et al. (1) screened studies published from 1930 to mid-1990 to identify studies that included estimates of sperm density. They excluded studies that included men in infertile couples, men who were referred because of genital abnormalities, and studies that selected men on the basis of their sperm count. Studies that used nonmanual methods for counting sperm were also excluded. Carlsen et al. (1) included 61 studies published between 1938 and 1990. The authors estimated the rate of change in mean sperm density as a function of publication year by fitting a simple regression model.

Analysis of Carlsen et al. study.
Current analysis. The current analysis includes 54 of the 61 studies analyzed by Carlsen et al. (1). As in our previous paper (13), we excluded three non-English language studies (15)(16)(17) because it was not practical for us to systematically review the non-English language literature on this subject. We also excluded two studies that included men who conceived only after an infertility work up (18,19), studies that did not meet the eligibility criteria of Carlsen et al. (1). Finally, we did not include any studies with less than 10 subjects, which resulted in two additional exclusions (20,21). The most recent study in Carlsen et al.'s analysis (1) and our 1997 reanalysis (1) was published in June 1990. To extend the study period, we conducted a search of Medline (National Library of Medicine, Bethesda, MD) for English-language studies published between 1990 and 1996 and found 19 that met these eligibility criteria. We also conducted a less systematic search of the 60-year period 1930-1990 and identified 28 additional eligible studies. Therefore, the current analysis is based on 101 English-language studies published in 1934-1996 (54 "Carlsen" studies and 47 "non-Carlsen" studies), each with at least 10 men and all satisfying the eligibility criteria published by Carlsen et al. (1). The 47 "non-Carlsen" studies are summarized in the Appendix.
Each of these 101 studies was reviewed independently by two of us to systematically abstract detailed information on potential confounders and several measures of semen quality. These variables included mean (or median) sperm density, publication year, study location (state and country), study goal (to estimate population parameters, other), criteria for recruiting study subjects (proven fertility, prevasectomy, potential sperm donor, other), percent of men with proven fertility, semen collection method (masturbation into container, other, unspecified), sperm counting methods (manual, not reported), number of samples per individual, age (mean or range), and abstinence time (mean or range, protocol requirement if applicable). Information on the completeness of this information was also recorded.
Previous analyses, including ours (13), have looked at the trend in sperm density as a function of publication year. However, because time of sample collection always predated publication, often by several years, we decided to use the time of sample collection, or its estimate, rather than the year of study publication. For the 22 studies that reported the beginning and end of the sample collection period, which often spanned several years, we used the midpoint to estimate the year of sample collection. The median lag time from the midpoint year to publication was 3 years for these studies. Therefore, when the dates of sample collection were unavailable, we subtracted 3 years from the publication year to estimate the year of sample collection. Finally, to obtain intercepts that were more easily interpretable and to aid in convergence of more complex models, we subtracted 1,900 from the estimated year of sample collection.
The arithmetic mean sperm density was reported in all but six studies. For these six studies we estimated the difference between the arithmetic mean and the reported summary measure (median or geometric mean) using data from studies for which multiple summary measures were available. For the five studies that reported median sperm density only, we estimated the arithmetic mean by adding 12.0 to the median, whereas for the single study that reported only a geometric mean, we added 22.7 to approximate the arithmetic mean.
We followed an analysis strategy similar to the one we used previously (13). After conducting a simple linear regression, we stratified the 101 studies into three broad geographic groupings: North America (44 studies, published 1934-1996), Europe (34 studies, published 1949-1996), and other countries (23 studies, published 1978-1995). We then used multiple regression models (using procedures for linear and nonlinear regression as well as generalized linear models) to fit linear, step, spline, and quadratic models (22). In these models we included confounders that were related to sperm density and/or year in univariate analyses. Interactions between year and region, which can indicate geographic differences in the rates at which sperm density changed, were examined in all multiple regression models. To assess the extent to which each variable confounded the relationship between sperm density and year, we calculated the slope (in the model without interaction terms) with and without that variable included in the model. The magnitude of confounding is estimated by the degree of discrepancy between these two estimates (23). As with previous analyses, data from each study were weighted by the number of men included in that study, and sperm densities are given in units of 10 6 /mL.

Results
The estimated year of sample collection in these 101 studies ranged from 1931 to 1994 (publication year 1934-1996). As shown in Table 1, the majority of new studies were published after 1980. Mean sperm density and mean publication year from studies with and without information about year(s) of sample collection did not differ appreciably. The geographic distribution of these 101 studies, representing 28 countries and 19 states within the United States, was similar to that in previous analyses, but with a somewhat greater proportion of European studies ( Table 2). We made two changes in our geographic strata; the stratum we previously labeled United States is now denoted as North America in order to include a (new) Canadian study. In addition, Australia, which was previously included with European studies, is now included with "other countries." Simple linear model. For comparison with Carlsen et al. (1), we first replicated their simple linear regression. As shown in Table 3, the slope for the regression line in the expanded data set ( -0.94 × 10 6 mL/year; p < 0.0001) is very similar to that found for the original 61 studies ( -0.93 × 10 6 mL/year; p < 0.0001). These estimates differ only slightly from the slope we reported in our 1997 analysis (13): ( -0.95 × 10 6 mL/year; p < 0.0001). The fit of the regression line to the 101 data points is shown in Figure 1.
Assessing confounding and interaction. To select variables for our analysis, we initially included all variables for which we had abstracted data and we noted the percent change in the slope that resulted when we removed them one at a time. Several of these were unrelated to sperm density or publication year (change < 10%) and were dropped from further analysis. These variables were the number of samples per subject, whether the years of sample collection were reported, whether the arithmetic mean was reported, and purpose of the study. Although removing age changed the slope by only 1.2%, we included this variable in the final model because it is a basic demographic variable  a Includes two studies with < 10 men that were excluded from the current analysis. b Includes one Australian study included in "other" in the current analysis. often included in analyses of sperm density. The method of counting sperm was also included (although removing it changed the slope by only 6%). In this expanded set of studies, recruiting criteria and the percent of men with proven fertility were highly correlated, so only one of these variables (fertility) was retained for further analyses. The following variables were included in all subsequent multiple regression models: geographic region, age, abstinence time, percentage of men with proven fertility, method of counting sperm, and method of sample collection ( Table 4). Of these, all but the method of counting sperm had been included in our previous analysis (13). Because one of the goals of this study was to examine the effect of adding new studies, we also kept a variable that indicated whether the study had been included by Carlsen et al. (1), even though removing it from the model had little effect on the slope. Despite the incompleteness of data on many covariates, the inclusion of the variables contained in Table 4 did improve model fit. When the simple linear model was compared with the multivariate linear model, including these covariates, the adjusted R 2 increased from 0.22 to 0.59. In addition to including these covariates singly, we examined interaction terms to allow for different slopes in the three geographic regions. In our previous analysis (13), the three slopes that we estimated differed considerably ( -1.50, -3.13, and +1.56, respectively, for the United States, Europe/Australia, and other countries). In the current analysis the European slope ( -2.35) still differed from that for North American (difference in slopes -1.55; CI, -2.90--0.21), indicating significant interaction ( Figure 2). Although we did include the slope of the best fitting line for other countries ( -0.60), the fit to a linear model for data from these countries was not good and the confidence interval was very broad. Given the limited data, there was no evidence that this slope differed appreciably from those from other regions ( Table 5).
Nonlinear models. We also fit a number of nonlinear models (quadratic, spline, and step) using the same set of covariates that were used for the linear model (Table 4). Olsen et al. (8) suggested that these models were preferable to Carlsen's simple linear model (1). In our 1997 analysis (13), we showed this was not the case, once geographic region and the interaction of region and year were included in the model.
In our previous study (13), we had not seen any difference between the spline and linear models except a slight (nonsignificant) change in the United States post-1970 (from -1.52 to -1.47; p for spline term = 0.97). When a spline model was fit to the current expanded data set, the pre-1970 North American studies showed a somewhat steeper decline than those published after 1970 ( -0.93 vs. -0.55), although this difference was still not significant (p for spline term = 0.71).
In our 1997 analysis (13), quadratic terms could not be estimated and we found no evidence of curvature within any of the three regions studied. In the present analysis, it was possible to estimate the quadratic term, but its addition did not improve the fit of the model; the quadratic terms were negligible and none approached statistical significance. Thus, we again found no evidence to support either curvature, or a "leveling off" in the rate of decline in recent years.
In our 1997 analysis (13), we also fit a step function and found a significant post-1970 decrease in sperm density in all regions relative to pre-1970 data (which was entirely from the United States). Again, results were similar in the current analysis. When a step function was fit, comparing the mean sperm density for North America before and after 1970, a large step was seen (138 × 10 6 /mL vs. 113 × 10 6 /mL; p for difference < 0.001). The pre-1970 mean from North America was also significantly higher than the mean for studies from other (p < 0.001), whereas the mean for all post-1970 European studies fell between the pre-and post-1970 North American mean.

Year of sample collection
North America Europe Other adjusted R 2 s were between 0.56 and 0.61), but not quite as well as the models fit the data in our previous analyses (13) ( Table 5). As in our 1997 analysis (13), when multiple regression models that include terms for the interaction of geographic region and year are used, there is no support for the use of a nonlinear model.

Discussion
As we stated previously (13), control for confounding in these analyses can be only partial because of incomplete data. Therefore, it is possible that residual confounding remains. How large is this likely to be? One of the strongest confounders in this analysis was the type of population studied. We examined this factor in two ways: the percent of men with proven fertility and the type of study population (sperm donor, prevasectomy, etc.) Because these variables were highly correlated, we retained only one (the percent of men with proven fertility) in the final model. When this variable was added to the other variables in the multiple regression model, it increased the magnitude of the slope considerably (37.2%).
Zavos and Goodpasture (24) reported that sperm concentration is higher when semen samples are obtained using a collection device during intercourse than when the same subjects collect samples by masturbation (p < 0.01), a result that has been reported by others (25). In the current analysis, studies that did not require collection by masturbation tended to be earlier (mean publication year 1970 vs. 1978). Therefore, this variable was a strong (positive) confounder; when it was added to the model, the magnitude of the slope decreased 34.1%.
Carlsen et al. (1) required that sperm be counted by manual methods in all the studies that they included in their analysis. Nevertheless, because manual counting devices have changed somewhat over the study period, when reviewing these studies, we abstracted information on the specific counting method that was used. When the particular counting device was not specified, we assumed it was manual. Nonmanual methods are a relatively recent advance and are still considered experimental, so that studies that use nonmanual methods are likely to specify the use of such methods. In 62 of these 101 studies, the counting device was specified to be the hemocytometer, the method that has been continually recommended by the World Health Organization since 1980 (26,27). The only other counting method that was specified, the Makler chamber (28), was mentioned in only 2 studies of 101 studies. Thus, we found no evidence that the introduction of newer counting devices has resulted in lower sperm counts.
In fact, when systematic changes have been introduced by newer methods, they tended to result in higher counts (14). In any case, this variable appeared to have little effect on the observed decline in sperm density.
Some researchers have criticized the use of sperm count estimates from early in the study period, arguing that greater measurement error was likely in these historical studies. Greater imprecision in earlier studies could not have produced the negative slope we observed in Western countries. A change in the variability of sperm counts would, however, violate a basic assumption underlying the regression methods used in these analyses, the assumption of constant variance. Was this assumption justified? To answer this question we looked for a trend in the standard deviation of sperm density in these historical studies. We modeled the standard deviation (which was reported in 34 studies) as a function of year and found no evidence of a trend (slope = -0.24; p = 0.22) (14). We also used a multiple regression model to examine possible confounding of this relationship, but found no evidence of this. We concluded, therefore, that there has been no significant change in the standard deviation of sperm density over time.
Geographic region and the interactions of region and year were important covariates in these analyses. However, these geographic groupings are large and heterogeneous. For example, the category "other countries" included Thailand, India, Hong Kong, Brazil, Australia, Kuwait, Nigeria, Israel, Libya, Tanzania, Peru, Egypt, China, and Saudi Arabia. Several studies suggested that mean sperm density and trends in semen quality may vary considerably, even within small areas (29,30), so that it would have been desirable to stratify studies into narrower geographic categories if sufficient data had been available. Unfortunately, because many of these countries contributed only one study, it was not possible to use narrower geographic strata.
Abstinence time is known to be strongly related to sperm density (31)(32)(33). In this analysis, when abstinence time was added to a linear model that included all other variables, the magnitude of the slope decreased by 10.6%, suggesting moderate confounding. Although the inclusion of abstinence time in the model appears to have reduced confounding to some extent, control for this variable was undoubtedly incomplete because less than one-third of these studies included reported abstinence times. An additional 49% of studies noted that abstinence times were restricted by study protocol but, as has been demonstrated, these protocols are only advisory. Auger et al. (2) noted that only 66% of men adhered to the protocol-specified abstinence time of 3-5 days. On the other hand, to account for the observed decline in sperm density, abstinence time would have had to decline appreciably over the study period. The evidence for this is not strong; studies with longer abstinence times (none < 3  days) were published only slightly earlier than those that included some abstinence times < 3 days (1976 vs. 1983).
After controlling for abstinence time and other covariates, the addition of age to the model increased the magnitude of the slope by only 1.2%. However, we found little evidence that age is an important predictor of sperm density. Information was quite incomplete for this variable. Twenty-five studies contained no information on age, and these tended to be older studies (mean publication year 1962). For the remaining studies, many only included an age range, so that we were only able to categorize age into broad categories. Nevertheless, we chose to retain this variable in the model for comparability to other analyses.
The current analysis suggests that the previously reported trends have continued, at least until 1996. We have also shown that the studies initially used by Carlsen et al. (1) did not represent a biased selection of the English language literature. Nevertheless, it is likely that neither this publication nor further statistical analyses of historical data will resolve the continuing debate over declining sperm counts. Critics will continue to challenge the reliability of historical data, and most will agree that residual confounding, which may be appreciable, cannot be completely eliminated.
The entire issue of declining sperm count has gained in importance because of the recognition of several other trends that reflect a decline in male reproductive health.
Testicular cancer incidence has increased significantly for at least the past 20 years in most of the Caucasian populations that have been studied (30,34,35). Trends in rates of cryptorchidism are consistent with those for testicular cancer, for which cryptorchidism is a significant risk factor (30). These increases in rates of testicular cancer and male genital tract abnormalities, like decreasing sperm density, have primarily been seen in Western countries. Several authors have suggested that these trends, together with decreases in semen quality, may reflect a more generalized increase in testicular dysfunction (30,36,37). Although few of these trend studies have examined possible causes, common environmental exposures are plausible. If environmental factors have produced some, or all, of the temporal changes in sperm density, the regional differences that have been reported in semen quality, even within countries (6,38), may also reflect variation in these environmental factors.
Studies that examine differences in semen quality between geographically diverse cohorts may help identify such factors. An ongoing network of international studies, begun in 1997, was designed to address this question. In these collaborative prospective studies, the use of common study protocols, analytic methods, and quality control procedures should minimize extraneous interstudy differences. These studies should provide unbiased estimates of variability among cities that have been reported to differ widely in semen quality, provide baseline levels of male biomarkers for future studies, and generate hypotheses of environmental causes of variation in these parameters.