Validating National Kriging Exposure Estimation
Environ Health Perspect 115:2-8 (2007). doi:10.1289/ehp.10205 available via http://dx.doi.org [Online 19 June 2007]
Referencing: GIS Approaches for the Estimation of Residential-Level Ambient PM Concentrations
In their article Liao et al. (2006) argued that it is feasible to use national scale daily kriging to estimate ambient air pollution exposure, even in locations where monitoring data are limited. In addition, they argued that national scale kriging is preferable to regional kriging and that automated variogram estimation is preferable to manual. The advocated methodology seems appealing when compared with the more standard approach of estimating ambient exposure separately in individual metropolitan cities (Dockery et al. 1993; Jerrett et al. 2005; Miller et al. 2007; Pope et al. 1995) because it simplifies exposure assessment for multicity studies and allows inclusion of subjects far from monitoring sites. Liao et al. (2006) also suggested estimating daily variograms without accounting for day-to-day relationships and variations in data availability. This is a convenient simplification if it produces reliable results, but the evidence is not convincing.
The primary focus of the article by Liao et al. (2006) is on kriging daily ambient PM10 (particulate matter with aerodynamic diameter ≤ 10 µm) based on the U.S. Environmental Protection Agency Air Quality System (AQS) measurements. Three cross-validation statistics were reported, namely prediction error (PE), standardized prediction error (SPE), and root mean square standardized (RMSS). PE is the difference between predicted and measured concentrations at each site; SPE is the PE divided by the estimated SE; and RMSS is the SD of the SPEs across sites. PE and SPE can be regarded as measures of bias, and RMSS is a measure of the accuracy of the SE estimates (RMSS should be near 1, with RMSS > 1 indicating that the estimated SEs are too small). Cross-validation SE statistics were not reported, but the SE at Women's Health Initiative (WHI) subject addresses is reported for some models.
The goal of kriging is accurate predictions at locations without measurements. This could be verified by a cross-validation mean square error (MSE) or similar summary of unsigned prediction error. In lieu of this, a reasonable alternative is to examine SE and RMSS together. If the RMSS is near 1 (< 1) then it is reasonable to regard the mean estimated SE as a valid estimate (upper bound) for the MSE. However, Liao et al. (2006) did not always report both the RMSS and SE, and some of their conclusions are erroneously supported by only one of these. We also note that limiting cross-validation to AQS sites may not be representative of performance at subject addresses.
The primary claim of Liao et al. (2006) is that the "data support the overall validity of kriging-based estimation approaches to estimate location-specific PM concentrations across the contiguous United States." The authors argued that the average cross-validation PE and RMSS statistics are "acceptable" for 95% of days. However, PE and RMSS alone do not provide a reliable estimate of the prediction accuracy. An RMSS near 1 suggests that the SE is a good estimate of prediction accuracy, but the cross-validation SE was not reported. In another section of the article, the daily mean SE of predicted PM10 at WHI subject locations was reported to be 27.35 µg/m3, which is high compared with the overall mean concentration of 26.29 µg/m3.
Liao et al. (2006) claimed that national kriging is preferable to regional kriging, and they compareed their national model to one in which the continental United States is divided into five regions. They reasoned that the two models perform equally well under cross-validation based on comparisons of SPE and RMSS, so other issues such as missing data and locations near regional boundaries argue for a national approach (we note that the boundary issue is easily addressed by overlapping regions). However, because they did not report SE for the regional model, it is impossible to verify their claim that the national model performs equally well.
Finally, Liao et al. (2006) claimed that automated variogram estimation is preferable to manual. Based on 6 days of data, the authors argued that the manually fit model is worse because it produces somewhat larger SEs. However, the RMSS values on these days for the automatically fit model were all > 4, whereas they were near 1 for the manually fit model. Thus, comparing SEs to assess model accuracy is not valid because the SEs for the automatically fit model are unreliable. In fact, because the SEs were fairly similar and the RMSSs were significantly larger for the automatic fit, we would be inclined to favor the manual fit.
In summary, the methodology proposed by Liao et al. (2006) for national kriging would be appealing if it could be shown to be reliable. However, the reported statistics are not convincing.
The authors declare they have no competing financial interests.
Adam A. Szpiro
Lianne Sheppard
Paul D. Sampson
Sun-Young Kim
University of Washington
Seattle, Washington
References
Dockery D, Pope C, Xu X, Spengler J, Ware J, Fay M, et al. 1993. An association between air pollution and mortality in six US cities. N Engl J Med 329:1753–1759; doi:10.1056/NEJM199312093292401.
Jerrett M, Burnett R, Ma R, Pope C, Krewski D, Newbold K, et al. 2005. Spatial analysis of air pollution and mortality in Los Angeles. Epidemiology 16:727–736; doi:10.1097/01.ede.0000181630.15826.7d.
Liao D, Peuquet DJ, Duan Y, Whitsel EA, Dou J, Smith RL, et al. 2006. GIS approaches for the estimation of residential-level ambient PM concentrations. Environ Health Perspect 114:1374–1380; doi:10.1289/ehp.9169.
Miller K, Siscovick D, Sheppard L, Shepherd K, Sullivan J, Anderson G, et al. 2007. Long-term exposure to air pollution and incidence of cardiovascular events in women. N Engl J Med 356:447–458; doi: 10.1056/NEJMoa054409
Pope C, Thun M, Namboodiri M, Dockery D, Evans J, Speizer F, et al. 1995. Particulate air pollution as a predictor of mortality in a prospective study of US adults. Am J Respir Crit Care Med 151:669–674.
National Kriging Exposure Estimation: Liao et al. Respond
Environ Health Perspect 115:2-8 (2007). doi:10.1289/ehp.10205R available via http://dx.doi.org [Online 19 June 2007]
Szpiro et al. suggest that our findings Liao et al. (2006) do not adequately support using national-scale, log-normal ordinary kriging to estimate daily mean concentrations of PM10 (particulate matter with aerodynamic diameter ≤ 10 µm) at unmonitored locations in the contiguous United States. They posit that the absence of the cross-validation SE prevents evaluating the validity of kriging estimation, as we implemented in this context, and the comparability of both regional- versus national-scale kriging and manually modified versus semiautomated, default-calculated semivariograms.
Little literature is available on the use of kriging methods to estimate daily air pollution data for large population–based multicenter epidemiologic studies. The four studies cited by Szpiro et al. (Dockery et al. 1993; Jerrett et al. 2005; Miller et al. 2007; Pope et al. 1995) all used cohort analyses for which only long-term average exposures are required, and only one of those (Jerrett et al. 2005) actually involved interpolation methods at all, although the study was restricted to a single city. In contrast, our objective was to create an interpolated daily pollutant concentration database for a multisite population-based epidemiologic study.
The cross-validation mean square error (MSE) mentioned by Szpiro et al. is also termed the "root-mean-square (RMS) prediction error," which is the empirical SE based on the mean square of the predictions, as opposed to SE, the mathematical formula for the RMS prediction error. RMS and SE, both are available from ArcView (ESRI Inc., Redlands, CA), are often considered jointly as an alternative measure (to RMSS) of the validity of spatial analysis.
The average RMS and SE from 366 daily PM10 spherical model cross-validations based on year 2000 PM10 data were 19.48 and 16.19 µg/m3, respectively, from the log-normal regular kriging, and 26.43 and 25.60 µg/m3, respectively from a ordinary kriging. The validity of the model is supported by RMSS alone (≈ 1), by the similarity of RMS and SE, and by SPE (≈ 0). Additionally, the average daily SD of PM10 measured at the monitor locations was 27.20 µg/m3. Comparing SD with the kriging-RMS provides a measure of the reduction in error due to interpolation. If RMS is less than the SD, then the kriging approach has some benefit, compared with using long-run averages. From both ordinary and log-normal kriging, especially for the latter, we see a notable reduction in RMS compared with SD. Meanwhile, substantial variability remains, suggesting that kriging error should be taken into account when using the kriged values.
Szpiro et al. also implicitly criticize our use of daily kriging when the objective was to interpolate daily data. Spatial–temporal models have potentially greater power than a 1-day-at-a-time spatial analysis but are not easy to apply in practice, with large datasets and many missing values.
Regional kriging could be superior to national kriging if the spatial dependence parameters (range, sill, and nugget) vary substantially from region to region, in which case a national kriging model could result in misspecified covariances. However, regional kriging also uses fewer data points to estimate those parameters and could result in greater errors. We would welcome theoretical or empirical studies that could cast further light on this trade-off. However, as far as our article (Liao et al. 2006) is concerned, our main purpose was to note that the national kriging method appears to be competitive when assessed by overall RMS error. We compared the results of regional- and national-scale kriging on a small set (17%) of days when the largest number of monitors (≥ 400) were reporting data—a scenario heavily favoring regional spatial interpolation strategies. On the remaining days when only 120–400 monitors were reporting data, regional kriging was inherently problematic given the restricted availability of monitors within regions. Szpiro et al. suggest that the problems of interpolation near the boundary could be solved by "overlapping," but this is only one of the issues encountered using regional-kriging: One would still need to decide how to consistently define the regions, considering the number of available data points that change substantially from day to day, to achieve a meaningful reduction in RMS error.
Based on the 12 "optimal" days in 2000, the average RMS and SE were 12.68 and 12.82 µg/m3, respectively, from the national scale kriging, compared with 12.22 and 12.49 µg/m3, respectively, from regional-scale kriging (Liao et al. 2006). These results, together with RMSS and SPE we reported, support our conclusion that national kriging performs comparably to regional kriging even when restricted to optimal days.
Szpiro et al. correctly note that it is possible to improve the RMSS values by manual adjustment. However, typically we found that when one of the validity measures (RMSS, PE, or SPE) was improved by manual adjustment, other measures became worse. It is difficult to manually adjust models to improve all cross-validation parameters simultaneously. Manually adjusting daily semivariogrms is not feasible when kriging over 10 years. Moreover, the predicted SE at unmeasured locations was uniformly lower in automatically fit models.
Szpiro et al. are correct that cross-validation may not be representative of the performance at participant address locations, although it is unclear what alternative methods they would like us to use. The ability to do semi-automatic cross-validations was a major attraction of ArcView and, despite limitations, is the best tool we know for validating spatial predictions.
The semiautomated kriging approach presents considerable advantages in estimating daily residential-level pollutant concentrations in large cohorts over long periods. Our proposed method (Liao et al. 2006) used log-normal kriging based on a spherical model to interpolate daily data on a national scale, and the weighted least squares method of parameter estimation without manual adjustment. We believe that the cross-validation statistics, presented in our article and amplified here, provide adequate support for these recommendations against reasonable alternatives that we considered.
The authors declare they have no competing financial interests.
Duanping Liao
Donna J. Peuquet
Hung-Mo Lin
Yinkang Duan
Pennsylvania State University
Hershey, Pennsylvania
Eric A. Whitsel
Richard L. Smith
Gerardo Heiss
University of North Carolina
Chapel Hill, North Carolina
References
Dockery D, Pope C, Xu X, Spengler J, Ware J, Fay M, et al. 1993. An association between air pollution and mortality in six US cities. N Engl J Med 329:1753–1759; doi:10.1056/NEJM199312093292401.
Jerrett M, Burnett R, Ma R, Pope C, Krewski D, Newbold K, et al. 2005. Spatial analysis of air pollution and mortality in Los Angeles. Epidemiology 16:727–736; doi:10.1097/01.ede.0000181630.15826.7d.
Liao D, Peuquet DJ, Duan Y, Whitsel EA, Dou J, Smith RL, et al. 2006. GIS approaches for the estimation of residential-level ambient PM concentrations. Environ Health Perspect 114:1374–1380; doi:10.1289/ehp.9169 [Online 8 June 2006].
Miller K, Siscovick D, Sheppard L, Shepherd K, Sullivan J, Anderson G, et al. 2007. Long-term exposure to air pollution and incidence of cardiovascular events in women. N Engl J Med 356:447–458; doi: 10.1056/NEJMoa054409
Pope C, Thun M, Namboodiri M, Dockery D, Evans J, Speizer F, et al. 1995. Particulate air pollution as a predictor of mortality in a prospective study of US adults. Am J Respir Crit Care Med 151:669–674.