Effect of air pollution on lung cancer: a Poisson regression model based on vital statistics.

This article describes a Poisson regression model for time trends of mortality to detect the long-term effects of common levels of air pollution on lung cancer, in which the adjustment for cigarette smoking is not always necessary. The main hypothesis to be tested in the model is that if the long-term and common-level air pollution had an effect on lung cancer, the death rate from lung cancer could be expected to increase gradually at a higher rate in the region with relatively high levels of air pollution than in the region with low levels, and that this trend would not be expected for other control diseases in which cigarette smoking is a risk factor. Using this approach, we analyzed the trend of mortality in females aged 40 to 79, from lung cancer and two control diseases, ischemic heart disease and cerebrovascular disease, based on vital statistics in 23 wards of the Tokyo metropolitan area for 1972 to 1988. Ward-specific mean levels per day of SO2 and NO2 from 1974 through 1976 estimated by Makino (1978) were used as the ward-specific exposure measure of air pollution. No data on tobacco consumption in each ward is available. Our analysis supported the existence of long-term effects of air pollution on lung cancer.


Introduction
Very high levels of air pollution have been shown to have some short-term effects on human health. One typical example is the smog episode in London 1952 (1)(2)(3)(4). A sophisticated study design and analysis method is not neccessary to detect these short-term health effects. By contrast, there is uncertainty about the long-term effects of common levels of air pollution, because there are several difficulties in analyzing those effects, including the lack of quantitative data on individual cumulative exposures, the misspecification of the relevant unknown latency, and the existence of many confounders that might be misclassified or measured with errors. Further, most studies exploring the association of air pollution and lung cancer are based on crosssectional ecologic study design (5)(6)(7)(8).
A natural approach to examining the long-term health effects of common levels of air pollution will be to compare the time trend of incidence or mortality of the target disease among regions having different air pollution exposure levels and to detect the This paper was presented at the 4th Japan-US Biostatistics Conference on the Study of Human Cancer held 9-11 November 1992 in Tokyo, Japan. subtle changes, if any, expected to be seen in the patterns. Following this idea, we have designed the retrospective cohort study based on vital statistics data in the Tokyo metropolitan area between 1972 and 1988. As air pollution measures, S02 and NO2 were used. The death registration system in Japan is and has been reasonably complete and reliable.
Recently, Trichopoulos et al. (9) compared the time trends of standardized lung cancer mortality between Athens and the rest of Greece, taking into account tobacco consumption trends, but they failed to detect the effect of air pollution on lung cancer mortality. Their approach is similar to that of this article but their design and analysis seem to be not sophisticated enough to detect the subtle change.

Materials and Methods Mortality and Population Data
Mortality data, restricted to females aged 40 to 79 in 23 wards of the Tokyo metropolitan area outlined in Figure 1, were read from the mortality tapes of Japanese Vital Statistics for the years 1972 through 1988. This population segment was targeted because we felt that middle-aged women tend to spend more time in their home neighborhoods every day than men, who may spend many hours at work in other wards.
Annual population data were obtained from the Annual Health Report of Tokyo (10). Causes of death examined are lung cancer (ICD9=162) as the target disease and ischemic heart disease (ICD9 = 410-414) and cerebrovascular disease as control diseases (ICD9=430-438).
Mortality and population data in each of 23 wards were tabulated in two-way, age-by-period contingency tables with unequal person-years at risk in each cell to perform a cohort analysis. Age was divided into 14 three-year segments: 40 to 42, 43 to 45,...,76 to 78, and 79; time segments were divided into 6 three-year periods: 1972 to 1974, 1975 to 1977,...,1987 to 1988. This structure is illustrated in Table  1. The diagonals of the table (from upper left to lower right) in Figure 2 define approximate birth cohorts with 5-year intervals. A total of nine birth cohorts are obtained and used in this study.
In the analysis, we ignored the inflow and outflow of population. We also  Table 1. Three-year by three-year contingency structure for cohort analysis.a 1972-1974 1975-1977 1978-1980 1981-1983 1984-1986 1987-1988   excluded two wards, Chuoh and Chiyoda, located in the central part of Tokyo, because these two wards are typical business districts. The population size in these areas is small compared with other wards, and the outflow rate from these wards to urban areas outside the 23 targeted wards has increased partly due to rapid increase of land prices in recent years.
Air Poilutants Tokyo metropolitan ward-specific mean exposure levels of SO2 and NO2 per day during 1974 to 1976, estimated by Makino (11), were used as the exposure measure in each of 23 wards ( Figure 2). There are two major reasons why these estimates were used in this study. First, these estimates were considered to be approximately proportional to the cumulative exposure levels of these two pollutants up to 1972, although it is difficult to show reliable data since ward-specific data on SO2 and NO2 are incomplete before 1972 in Tokyo. Second, as shown in Figure 3, the relative rank of ward-specific concentrations of these two pollutants among 23 wards has not changed largely over the past 20 years. Therefore, in this study, specification of unknown latency between exposure and mortality is considered to be not so important.  (11).
to use an approach that does not always need adjustment for cigarette smoking. In practice, it is unlikely that the exposure level of air pollutants such as SO2 or NO2 increases in proportion to the increase in tobacco consumption. Some unpublished data suggest no association between the levels of NO2 and smoking habits in middle-aged women in the 1975 survey, and respiratory symptoms among residents in the Tokyo metropolitan area (K. Makimo, personal communication).

Hypothesis and Statistical Model
The main hypothesis in our approach is if the long-term and common-level air pollution had an effect on lung cancer, it would be expected that the death rate from lung cancer would gradually increase at a higher rate in the region with relatively high levels of SO2 and NO2 than in the region with low levels, and that this trend would not be expected for ischemic heart disease and cerebrovascular disease.
As a primary statistical model to test this hypothesis, we considered the following Poisson regression model for the time trend of mortality \Xt): LogX0iXt) = ai+ yj+ (ri+3,i)t [1] where i: cohort (= 1,..., 9) . Since there are many potential sources of variation in population-based data, it is very likely that the variance may be larger than the mean. To deal with this extra-Poisson variation, we use a quasilikelihood ap2roach where over-dispersion parameter is estimated by Pearson %2 goodness-of-fit statistics devided by degrees of freedom (12). Model fitting was carried out using Generalized Linear Interactive Modeling (GLIM) (13), and one-tailed p-value was used as the indicator of significance. Figure 4 shows the ward-specific time trend of age-adjusted death rate (standard population is from the 1985 national census) from lung cancer for females, aged 40 to 79, for the years 1972 to 1988. From this figure, we cannot observe any meaningful change among wards but we can see the whole trend of gradually increasing death rates. Therefore, a linear time trend analysis was performed for each of the ward-specific time trends and then the estimated slope of secular trend (rate of increase per year) was linearly regressed on NO2, SO2, and their product NO2 x SO2, independently. In Figure 5, the estimated slope was plotted against NO2. The Figure 5. Association of the slope of the secular trend Figure 6. Association of the slope estimate f3 in the of age-adjusted death rate from lung cancer with NO2 model (Equation 1) with NO2 concentrations.

Preliminary Aaysis
concentrations. "One-tailed p-value was calculated by the quasilikelihood approach ( 12), considering the over-dispersion.
trend of lung cancer mortality. The association between SO2 levels and slope estimates was weaker.

Fitting Poisson Regression Model
From the results of fitting the model (Equation 1) for lung cancer, we plotted the estimated slope Pf (logarithmic scale, different from the above slope) against the NO2 concentration in Figure 6. Association between NO2 and slope estimates is seen more clearly than the association shown in Figure 5. A summary of fitting Equation On the other hand, no significant association with SO2 or NO2 was found for the slope estimates of ischemic heart disease and cerebrovascular disease.

Discussion
In this analysis, we used the ward-specific NO2 and SO2 levels estimated by Makino (11). His estimation is based on twodimensional spline interpolation using daily data from 38 measurement stations scattered in the Tokyo metropolitan area. Therefore, measurement error or estimation error is not negligible and an errors-invariables formulation for Equations 1 and 2 is required as follows: LogX,i(t) = a + yj+ (i,+ + Oz )t where Zj is the true exposure level and its relation to x; might be xj= i+4zj+ £j, £j-N(0, (E ) Unfortunately, because information on the estimation error aE is not provided, we cannot apply this model. However, the naive approach ignoring the measurement error usually underestimates 0 in absolute value and has reduced power for the test of Ho: 0=60 (14). Namely, our results are also attenuated and the true effect of air pollution on lung cancer might be larger than that estimated.
In our analysis, we did not adjust the result for cigarette smoking simply because data on tobacco consumption are not available. If we can obtain such data, we can use them for adjustment. But, practically speaking, it will be unlikely that the wardspecific tobacco consumption is strongly correlated with the ward-specific level of NO2 or S02. Although our results are largely based on this unproved important assumption, we shall conclude that our analysis suggested the existence of the longterm effects of air pollution of common levels on lung cancer.