Assessing the Distribution of Air Pollution Health Risks within Cities: A Neighborhood-Scale Analysis Leveraging High-Resolution Data Sets in the Bay Area, California

Background: Air pollution-attributable disease burdens reported at global, country, state, or county levels mask potential smaller-scale geographic heterogeneity driven by variation in pollution levels and disease rates. Capturing within-city variation in air pollution health impacts is now possible with high-resolution pollutant concentrations. Objectives: We quantified neighborhood-level variation in air pollution health risks, comparing results from highly spatially resolved pollutant and disease rate data sets available for the Bay Area, California. Methods: We estimated mortality and morbidity attributable to nitrogen dioxide (NO2), black carbon (BC), and fine particulate matter [PM ≤2.5μm in aerodynamic diameter (PM2.5)] using epidemiologically derived health impact functions. We compared geographic distributions of pollution-attributable risk estimates using concentrations from a) mobile monitoring of NO2 and BC; and b) models predicting annual NO2, BC and PM2.5 concentrations from land-use variables and satellite observations. We also compared results using county vs. census block group (CBG) disease rates. Results: Estimated pollution-attributable deaths per 100,000 people at the 100-m grid-cell level ranged across the Bay Area by a factor of 38, 4, and 5 for NO2 [mean=30 (95% CI: 9, 50)], BC [mean=2 (95% CI: 1, 2)], and PM2.5, [mean=49 (95% CI: 33, 64)]. Applying concentrations from mobile monitoring and land-use regression (LUR) models in Oakland neighborhoods yielded similar spatial patterns of estimated grid-cell–level NO2-attributable mortality rates. Mobile monitoring concentrations captured more heterogeneity [mobile monitoring mean=64 (95% CI: 19, 107) deaths per 100,000 people; LUR mean=101 (95% CI: 30, 167)]. Using CBG-level disease rates instead of county-level disease rates resulted in 15% larger attributable mortality rates for both NO2 and PM2.5, with more spatial heterogeneity at the grid-cell–level [NO2 CBG mean=41 deaths per 100,000 people (95% CI: 12, 68); NO2 county mean=38 (95% CI: 11, 64); PM2.5 CBG mean=59 (95% CI: 40, 77); and PM2.5 county mean=55 (95% CI: 37, 71)]. Discussion: Air pollutant-attributable health burdens varied substantially between neighborhoods, driven by spatial variation in pollutant concentrations and disease rates. https://doi.org/10.1289/EHP7679


Introduction
Air pollution is associated with a large burden of death and disability worldwide, with fine particulate matter [PM ≤2:5 lm in aerodynamic diameter (PM 2:5 )] estimated to be responsible for 4:9 million deaths globally in 2015 (GBD 2017Risk Factor Collaborators 2018. Nitrogen dioxide (NO 2 ), a traffic-related air pollutant, is also linked with adverse health outcomes, although it is often not quantified in pollution-attributable disease burden studies, potentially because coarsely resolved concentration estimates are often unable to capture highly spatially variable patterns in NO 2 (Anenberg et al. 2017). Recent advances in the understanding of the health effects of NO 2 , meta-analyses U.S. EPA 2016), and published recommendations from a committee of scientists (Atkinson and Butland 2018) provide guidance on evaluating and interpreting NO 2 , as a marker of the mixture of traffic air pollution, in health impact assessments.
Much of the air pollution disease burden is concentrated in cities . Cities are home to about half the world's population (United Nations 2019) and 80% of the U.S. population (U.S. Census Bureau 2018). Many cities also experience both high air pollution levels (Krzyzanowski et al. 2014;Marlier et al. 2016) and health inequity challenges (Grant et al. 2017;Kioumourtzoglou et al. 2015;Stephens 2018). However, estimated health impacts from air pollution have typically been reported at the country, state, or county level, masking potential heterogeneity in impacts at fine spatial scales.
Understanding how air pollution-related health risks vary within cities could help inform policies aimed at improving public health and reducing population disparities in exposure and risk in urban areas. Recent efforts have estimated air pollution health impacts at the city level, finding dramatic variation in health risks across cities globally Anenberg et al. 2019). However, only a limited number of studies have assessed air pollution mortality risks at the neighborhood level, and these have focused on individual cities and have generally not compared the advantages and disadvantages of different concentration data sources (Brønnum-Hansen et al. 2018;Kheirbek et al. 2013;Kihal-Talantikite et al. 2018;Martenies et al. 2018;Mueller et al. 2017Mueller et al. , 2018Mueller et al. , 2020Pierangeli et al. 2020). In addition, these previous city-scale studies may not have captured the spatial distribution of air pollution-related health risks given that the grid sizes used in those studies can dilute hotspots of high concentrations co-located with large populations (Fenech et al. 2018;Korhonen et al. 2019;Li et al. 2016;Punger and West 2013). Beyond horizontal grid size, the resolution of emissions inputs to estimate concentrations can also influence the resulting estimated air pollution-related health impacts. Two studies examining the impacts of both varying horizontal grid and emissions resolution on health burden estimates report mixed results. Paolella et al. (2018) reported a reduced ability of coarse resolution concentration estimates to identify disparities in health impacts, whereas a study by Thompson et al. (2014) found limited difference for PM 2:5 attributable health impacts with varying emissions and grid resolution (Thompson et al. 2014).
Despite these differences, finer-resolution exposure estimates may decrease the potential for exposure misclassification. Estimating air pollution health impacts at the "hyperlocal" scale (resolving neighborhoods within cities) is now possible with high-resolution pollutant concentrations derived from mobile monitoring and modeling, complemented by satellite remote sensing. Here, we exploit a novel and extremely high-spatial-resolution pollution concentration data set from mobile monitoring of NO 2 and black carbon (BC) using Google Street View (hereafter referred to as Street View) cars throughout the Bay Area, California, from 2015 to 2017. Previously, these measurements have been used to create street-level annual average concentrations of NO 2 and BC, a land-use regression (LUR) model , and an epidemiological analysis of relationships between long-term exposure to NO 2 and cardiovascular disease (CVD) outcomes (Alexeeff et al. 2018), all for Oakland, California. Jointly, these efforts demonstrated the application of highly resolved concentration data to analyze intra-urban variation in pollutant exposure and the associated health risks. Building upon these efforts, here we use Street View concentrations to assess air pollution health impacts at the neighborhood scale. To our knowledge, our analysis is the first to use air pollution levels from sensor-aided mobile monitoring in a health impact assessment. Given that most cities globally do not have the same availability of highly spatially resolved concentration data as the Bay Area, we compare pollutant-attributable health risks estimated using the Street View concentrations vs. less data-and resource-intensive predictive models. These predictive models use land-use variables and satellite observations of aerosol optical depth (AOD) that can be applied in any city globally to create street-level annual average concentrations of NO 2 and PM 2:5 (Larkin et al. 2017;van Donkelaar et al. 2016).
Neighborhood-level health risks from air pollution are driven not just by exposure levels but also by baseline disease rates, which themselves vary within cities (e.g., Fann et al. 2012), influencing attributable mortality estimates (Chowdhury and Dey 2016;Hubbell et al. 2009). Prior air pollution morbidity and mortality assessments have typically used baseline disease rates at the state, county, or national level owing to the limited availability of more highly resolved health data (Alotaibi et al. 2019;Caiazzo et al. 2013;Cohen et al. 2017;Fann et al. 2012Fann et al. , 2017Zhang et al. 2018). Here, in addition to comparing across concentration data sets, we also assess the influence of baseline disease rates with varying spatial resolutions (i.e., county-level vs. census block group (CBG)-level baseline disease rates) on estimated pollution-attributable health risks.
The San Francisco Bay Area of California has a population of >7 million people. This case study for the Bay Area, where highresolution concentration and disease rate data are available, allows us to explore intra-urban disparities in air pollutant exposure, pollution-attributable health risks, and pollution-attributable disease burdens-three related but distinct metrics that are used in policy contexts. The objectives of our study were to a) identify the degree of spatial heterogeneity in air pollution-related health impacts at the neighborhood scale within a city; b) compare the spatial patterns of air pollution disease burdens estimated using different concentration and baseline disease rate data sets; and c) draw lessons learned for conducting neighborhood-scale air pollution health impacts in cities where highly resolved concentration and baseline disease rate data sets are not available. We anticipate that our results can be used to inform best practices (currently under development) for assessing air pollution-related health risks within cities globally, as well as efforts by policymakers to address disparities in the health impacts of air pollution.

Methods
We used epidemiologically derived health impact functions to estimate mortality and morbidity that may be attributable to NO 2 , BC, and PM 2:5 , on a 100-m grid resolution for the Bay Area, using different concentration inputs and varying spatial resolutions for baseline disease rates. We used the Bay Area Air Quality Management District's (BAAQMD) nine-county definition of the Bay Area, which included Alameda, Contra Costa, Marin, Napa, San Francisco, Santa Clara, San Mateo, Solano, and Sonoma counties ( Figure 1). Within the Bay Area, we focused on Alameda County, for which we were able to obtain CBG-level disease rates, and within Alameda County, the areas of West, Downtown, and East Oakland, where the Street View cars measured pollution levels (Table 1). Oakland is home to a major container port and has four large interstates (I-880 to the south and west; I-80 and I-580 to the north; and I-980 transecting West and Downtown Oakland), as well as numerous rail yards and rail lines. East and West Oakland have been designated by the California Environmental Protection Agency (EPA) Environmental Justice Task Force as priority communities bearing disproportionate pollution burdens (Environmental Justice Task Force 2017).

Health Impact Function
For each pollutant-outcome pair, we derived concentrationresponse factors (CRFs) from relative risk (RR) estimates (Table  1) identified through a literature review using PubMed and Google Scholar (see the Supplemental Material "Literature Review" and Tables S1-S15 and Figures S1-S11). We used epidemiological studies with large geographic areas as opposed to those conducted in single cities, assuming large epidemiological studies more fully account for population variation and confounding factors and have more statistical power. Where available, we used pooled risk estimates from meta-analyses. We applied a loglinear function to all analyses, based on current evidence for PM 2:5 and, for NO 2 , a combination of limited evidence for linear vs. log-linear functions and only small differences between the two at the concentrations in our study. Equation 1 describes the log-linear health impact function used for all pollutant-health end point pairs: where y h,i,a represents the number of cases of the health outcome (h) for age group (a) attributable to the pollutant for each grid cell (i); m h,i,a represents the baseline disease rate for each health end point (h), age group (a), and grid cell (i); P i,a represents the population count for each grid cell (i) and age group (a); and 1 − e −b h,a Dx i represents the attributable fraction, with b h,a the natural log of the RR per x concentration above the baseline (Dx) in each grid cell (i), for each health end point (h) and age group (a). We accounted for uncertainty by calculating the attributable cases at the 2.5th and 97.5th percentiles of the RR estimates. All health impact calculations were conducted in R (version 3.5.3; R Development Core Team). For all pollutants, we assumed no threshold for low concentrations because a recent study identified health impacts at PM 2:5 concentrations as low as 2 lg=m 3 (Crouse et al. 2012) and a recent NO 2 epidemiological study included concentrations as low as 2 ppb (Khreis et al. 2017). Given that we applied a log-linear function to both PM 2:5 and BC, we likewise assumed no threshold for BC. For NO 2 , the U.S. EPA (2016) determined that there are causal and likely causal relationships for short-term and longterm exposure and respiratory effects, respectively. Because we were able to obtain baseline disease rates for pediatric asthma emergency room (ER) visits and pediatric asthma incidence, two of which included respiratory outcomes included in the U.S. EPA's "Integrated Science Assessment for Nitrogen Oxides" (U.S. EPA 2016), we included these health end points for shortand long-term exposure to NO 2 . Recent meta-analyses have also determined that there is a likely causal relationship between longterm exposure to NO 2 and increased risk of mortality (COMEAP 2018) and potentially for CVD mortality, the most commonly included cause-specific mortality end point among included studies in the meta-analysis . We estimated impacts of NO 2 on all-cause and CVD mortality. Although we examined NO 2 , there remains active debate on the independent causal relationship between long-term NO 2 on mortality and other health outcomes. NO 2 is, however, a well-established marker of localized traffic-related air pollution, such as ultrafine particles and polycyclic aromatic hydrocarbons and is used as a proxy to estimate the mortality burden due to highly variable local traffic-related air pollution (Atkinson and Butland 2018) important for urban air pollution policy decision making.
For PM 2:5 , we included health end points determined to be causal or likely to be causal by the U.S. EPA, including all-cause mortality, CVD mortality, CVD hospitalizations among the elderly, and pediatric asthma incidence and ER visits (U.S. EPA 2019). For BC, the U.S. EPA concluded that there is currently insufficient evidence to ascribe any one component of PM 2:5 as more strongly associated than total PM 2:5 mass, although some studies found associations between long-term exposure to BC and all-cause and CVD mortality, and between short-term BC exposure and CVD hospitalizations (U.S. EPA 2019). We therefore included all-cause and CVD mortality, as well as CVD hospitalizations for BC. Because applying the log-linear model to individual PM 2:5 components can distort the risk estimates given nonlinearity at the low end of the curve ), we performed a sensitivity analysis in which we assumed the BC contribution to PM 2:5 mortality was the same as its contribution to PM 2:5 concentrations.

NO 2 , BC, and PM 2:5 Concentrations
We used multiple pollutant concentration data sets, including mobile monitoring (BC and NO 2 ) and predictive models for the United States and globally using an LUR model (NO 2 ), and for the United States (BC and PM 2:5 ) and globally (PM 2:5 ) using satellite-based models. Maps of concentrations for each pollutant, data set source, and geographical extent are provided in Figures S12-S28.
For the mobile monitoring data set, two Street View cars equipped with fast-response instrumentation [NO 2 via cavity attenuation phase shift spectroscopy (Model T500U, Teledyne Inc.), and BC via photoacoustic absorption spectroscopy (Droplet Measurement Technologies)] repeatedly drove every road in West, Downtown, and East Oakland during daytime hours ( ∼ 0900-1800 hours) on weekdays between 28 May 2015 and 21 December 2017, producing >3 million data points (Apte et al. 2017;Aclima et al. 2019). These measurements were aggregated to independent drive pass means, and then medians of the drive pass means were calculated for 30-m road segments, reflecting long-term spatial differences in concentrations (Messier et al. 2018). The resulting data set indicated substantial spatial variability at fine scales, with median concentrations for road segments within the same city blocks observed to vary by up to a factor of five. Here, we further aggregated the 30-m segment averages to a 100 m × 100 m grid resolution using a mean of all the mobile measurement points in each grid cell. This resulted in a concentration data set with an annual average NO 2 concentration range of 3.37 to 45 ppb [mean = 12:7, standard deviation ðSDÞ = 6:6] and annual average BC concentration range of 0.2 (limit of detection) to 2:59 lg=m 3 (mean = 0:47, SD = 0:35).
For NO 2 , LUR models offer full spatial coverage in addition to the very high spatial resolution needed to capture near-roadway concentrations (Hystad et al. 2011). Here, we used a global LUR that estimated annual average NO 2 at 100 m × 100 m resolution for 2011 using satellite measurements, numerous land-use predictor variables, and annual measurement data from 5,220 air monitors in 58 countries (Larkin et al. 2017). The resulting NO 2 concentrations for 2011 in the Bay Area ranged from 1 to 37 ppb (mean = 8, SD = 4), and the model explained 54% (adjusted R 2 of 0.54) of the variance in global NO 2 concentrations, with an absolute mean error of 3:7 ppb. This data set has been applied in recent health impact assessments quantifying the global burden of NO 2 on pediatric asthma incidence . Because the global LUR model was not calibrated specifically for the United States, we also estimated results using a U.S.-specific LUR (Bechle et al. 2015). Results from the Street View concentrations were not included in the global LUR; therefore, we do not expect spatial distributions in concentrations to match. We reported estimates using the global LUR as the main results to inform best practices for neighborhood-scale health impact assessments in cities globally.
Although PM 2:5 was not measured by the Street View cars, PM 2:5 is more spatially homogenous compared with NO 2 and can therefore be estimated using more coarsely resolved predictive models. Therefore, we used surface concentrations derived from satellite observations of AOD from both global (van Donkelaar et al. 2016) and U.S.-specific models Di et al. 2016). The global PM 2:5 data set [0:01 × 0:01 ( ∼ 1 km 2 )-degree resolution] combined AOD from three satellite products, Goddard Earth Observing System (GEOS)-Chem chemical transport modeling, and geographically weighted regression to merge surface monitor in situ measurements of PM 2:5 . The model accounted for 81% of the variance in PM 2:5 and resulted in annual average surface PM 2:5 concentrations ranging from 3 to 18:5 lg=m 3 (mean = 9, SD = 2:8) across the Bay Area for 2016. The global PM 2:5 data set was inclusive of BC, although the authors recently developed a North American product, employing similar methods to estimate PM 2:5 and speciated components of PM 2:5 also at 0:01 × 0:01-degree resolution. Although U.S. estimates for BC explained 68% of the total variance in BC, estimates for BC in the U.S. Northwest are considerably lower (R 2 = 0:29). For the North American data set in the Bay Area for 2016, BC concentrations for 2016 ranged from 0.1 to 0:7 lg=m 3 (mean = 0:3, SD = 0:1) and PM 2:5 was slightly lower than the global model with concentrations ranging from 2.9 to 11 lg=m 3 (mean = 5:9, SD = 1:5). For PM 2:5 , we compared health burden estimates using global satellite-derived estimates to North American satellite-derived estimates, whereas for BC, our main analysis compared the satellite-derived model to Street View mobile monitoring concentrations. Given that satellite-derived PM 2:5 concentrations are highly uncertain (Diao et al. 2019), we also estimated results using a more statistically based PM 2:5 model for the United States (Di et al. 2016(Di et al. , 2017. Note: RRs are reported per 10 lg=m 3 for PM 2:5 , per 10 ppb for NO 2 , and per 1 lg=m 3 for BC. RRs for NO 2 reported per 10 lg=m 3 were converted to RR per 10 ppb assuming ambient air pressure of 1 atmosphere and temperature of 25°C. Adults, 25-99 years of age; BC, black carbon; CBG, census block group; CI, confidence interval; CVD, cardiovascular disease; elderly, 65-99 years of age; ER, emergency room; NO 2 , nitrogen dioxide; PM 2:5 , fine particulate matter; pediatric, 0-17 years of age.

Baseline Disease Rates and Demographics
Maps of baseline disease rates for all health end points and spatial resolutions are provided in Figures S29-S35. We obtained allcause and CVD mortality rates at both the CBG and county levels (Table S16). For the CBG level, we obtained counts and rates for all-cause and CVD mortality [categorized according to the International Statistical Classification of Diseases, 10th Revision (ICD-10; WHO 2016) ICD-10 codes I10-I75] from the Alameda County Public Health Department for adults and the elderly. CBG rates were based on 7-y averages of death counts (2011-2017) over average population counts for 2012, 2014, and 2016 (Eayres and Williams 2004) and were age-adjusted using the standard 2000 U.S. Census population (Pickle and White 1995).
In addition, CBGs with counts <10 were suppressed to protect confidentiality (Brillinger 1986). Combined, these methods avoid interannual variability for small-area (CBG-level) baseline disease rates and resulted in a conservative mean relative standard error of 15 (range = 7-58, SD = 5) for 1,046 CBGs. For all-cause mortality ages ≥25 y, there were 6 (0.5%) missing block groups, and for all-cause mortality ages ≥65 y, there were 9 (0.86%) missing block groups. For CVD mortality, there were 37 (3.53%) missing block groups for ages ≥25 y and 71 (6.79%) missing block groups for ages ≥65 y. To impute missing CBG baseline disease rates, we used an average of the five nearest neighbor rates. We obtained age-adjusted county-level mortality data for 2016 for both all-cause and CVD mortality most closely matching our CBG disease categories (ICD-10 codes I00-I78) from CDC Wonder (CDC 2018). CBG baseline mortality rates show more heterogeneity in the spatial distribution of disease. Annual all-cause mortality for adults ranged from 29 to 331 per 10,000, compared with 21 to 38 per 10,000 using the county rates. We were unable to obtain baseline disease rates at the CBG level for nonmortality end points. For CVD hospitalizations rates, we used county-level rates from the BenMap-CE 1.4.14 (BenMap) software produced by the U.S. EPA for conducting health impact assessment (Sacks et al. 2018). Rates were available in BenMap for the elderly in 5-y age groups: ages 65-69, 70-74, 75-79, 80-84, and 85-99 y. The BenMap program uses 2010 U.S. Census data as the denominator when pooling age groups into a single rate. We applied the 5-y age group rates to the 10-y age groups (65-74, 75-84, and ≥85 y) available from the 2010 U.S. Census and used the U.S. Census data from BenMap as the denominator. We weighted the rates by age group count and created an aggregated rate per county (n = 9) for CVD hospitalizations. CVD hospitalization (ICD-9 codes 390-429) rates in 2014 ranged from 296 to 604 per 10,000 for ages 65-99 y, across counties in the Bay Area. For asthma ER visits (ICD-9 code 493/ICD-10 code J45), we used county-level rates and ZIP-codelevel rates from the California Department of Public Health for 2016 (CDPH 2017(CDPH , 2019, and we used county rates to impute data for missing ZIP-code rates (17% of the pediatric population and 10% of the adult population). Across the ZIP codes in the Bay Area, 2016 baseline rates of asthma ER visits among children ranged from 1 to 154 per 10,000, and for adults, from 1 to 175 per 10,000. For pediatric asthma incidence, we applied a California statewide baseline rate for 2008 of 107 per 10,000 persons (n = 96,550) (Milet et al. 2013) because more recent and finer resolution data were not available. Preprocessing of baseline disease rates was conducted in ArcMap (version 10.4; Esri).
We used nighttime (i.e., estimates of permanent residents) population counts from the LandScan USA data set at 100 m × 100 m resolution for 2017 given that it most closely aligned with the temporal availability of our pollutant and baseline disease rate data sets (Oak Ridge National Laboratories 2020; Bhaduri et al. 2007). Compared with the daytime population, we considered the nighttime population to be more consistent with the common approach of epidemiological studies to assign exposure based on home address. LandScan USA employed a multidimensional dasymetric modeling technique, spatially redistributing the U.S. Census data to inhabited land-use areas. Because LandScan USA does not include age breakdowns, we calculate the fraction of the total population in different age groups using age-specific counts from the Gridded Population of the World (version 4) for 2010, available at 1-km resolution from the Socioeconomic Data and Applications Center at the Center for International Earth Science Information Network at Columbia University (CIESEN 2019). To ascertain whether using fine-resolution baseline disease rates identifies disparities in pollutant-attributable disease between population subgroups, we estimated the percentage of pollutantattributable cases in Alameda County in CBGs with >50% minority (Black, Asian, Hispanic, Pacific Islander, and American Indian) population, using U.S. Census population counts at the CBG level for 2010, the most recent year available.
Because our intention was to inform best practices for cities around the world to conduct within-city health impact assessments, we compared the spatial distributions of NO 2-and BC-attributable pollutant-attributable disease burdens estimated using Street View vs. globally available modeled concentrations from LUR- (Larkin et al. 2017) and satellite-based models (van Donkelaar et al. 2016).
To do this comparison, we used the Getis-Ord local statistic in ArcPro (version 2.7), which provides a Z-score with accompanying p-value indicating whether each area has estimated pollutantattributable cases that are higher or lower than surrounding grid cells (Ord and Getis 1995). We also compared the influence of CBG vs. county-level disease rates on the spatial patterns of estimated air pollution-related all-cause mortality in Alameda County given that baseline disease rates are not typically available at the CBG scale.
We also conducted several policy-relevant sensitivity analyses to assess the pollutant-attributable health impacts that could be avoided if air pollutant concentrations were reduced to lower levels. We specifically assessed two hypothetical scenarios in which concentrations of each pollutant were reduced to the minimum and median grid-cell-level concentrations of each data set. These scenarios are conceptually similar to pollution reduction targets for West Oakland established in the West Oakland Community Action Plan (BAAQMD and West Oakland Environmental Indicators Project 2019) as part of efforts by the State of California to identify and reduce air pollution among disproportionately exposed California communities, as required by 2017 Assembly Bill 617 (Cal AB 617 2017).

Influence of Varying Pollutant Concentrations Data Sets and Baseline Disease Rates
We next focused in on West, Downtown, and East Oakland ( ∼ 30 km 2 ), the part of the Bay Area where Street View measurements of NO 2 and BC are available and can be compared with the application of concentrations from predictive models. Total estimated NO 2 -attributable deaths in Oakland approximately doubled when using the LUR [77 annual attributable deaths (95% CI: 23, 127)] compared with the Street View concentrations [39 (95% CI: 12, 66)] (Figure 2). NO 2 -attributable mortality rates ranged across 100-m grid cells by a factor of 11 and 26 using LUR and Street View estimates, respectively (LUR range = 32-342 annual attributable deaths per 100,000, mean = 101, SD = 31; Street View range = 15-396, mean = 64, SD = 39).
Despite moderate-to-low correlation between concentrations from the Street View monitoring and predictive models, we found similar spatial clusters of NO 2 and BC-attributable fractions using both concentration data sets. For the grid cells in the Oakland area for which we had both Street View and LUR (NO 2 ) and U.S. satellite-derived (BC) results, a large fraction of grid cells (NO 2 : 45%, n = 1,619; BC: 37%, n = 1,334) were fully concordant using the Z-score statistic, meaning that both concentration data sets identified clusters of attributable cases that are both significant (p < 0:05) and in the same direction (either a higher or lower value cluster) (Figures S36 and S37). Another 37% (n = 1,309) and 22% (n = 799) of grid cells for NO 2 and BC had directional concordance, but concentration data sets identified differing significance in the clusters. We also found that 13% (n = 452) and 15% (n = 546) of grid cells for NO 2 and BC, respectively, were directionally discordant between the two concentration data sets but had the same significance level. About 5% (n = 193) and 25% (n = 899) of grid cells for NO 2 and BC were completely discordant. Although both data sets identified similar hotspots, the Street View data set identified a wider range of Z-scores (NO 2 range = − 3:98 to 10:02, mean = − 0:06, SD = 2:42; BC range = − 3:80 to 12:77, mean = − 0:09, SD = 2:20) as compared with LUR and satellite-derived concentrations (NO 2 range = − 4:07 to 7:51, mean = 0:19, SD = 2:17; BC range = − 8:46 to 2:53, mean = − 0:04, SD = 2:06).
We next estimated the percentage of pollutant-attributable mortality and morbidity cases in CBGs with >50% minority population. We found that using CBG instead of county baseline disease rates resulted in a larger percentage of pollutant-attributable cases for CBGs with >50% minority population in Alameda County (Table S17 and Figures S38-S40). For example, using CBG disease rates, we estimated that 75% of NO 2 -attributable mortality occurred in majority minority CBGs, whereas that percentage was 72% when using county-level disease rates. The differences between applications of CBG vs. county-level disease rates were small but generally consistent across pollutants and health end points. Within Oakland, CBGs with the highest percentage of minorities and highest estimated NO 2 -attributable mortality rates were located in West Oakland near I-880, a hightraffic-volume truck route, and in Chinatown, in the southeastern part of Downtown Oakland (Figures S41 and S42

Discussion
We estimated the spatial distribution of NO 2 , BC, and PM 2:5 -attributable health impacts at the neighborhood-scale within the Bay Area, California, where high spatial resolution concentrations from mobile monitoring, as well as CBG-level disease rate data sets are available. We found 38-, 4-, and 5-fold variation in mortality attributable to NO 2 , BC, and PM 2:5 across grid cells in the Bay Area, indicating that pollution-attributable risks can vary considerably within individual cities. This variation was observable regardless of whether predictive models or mobile monitoring concentration data sets were used, although the mobile monitoring concentrations revealed more spatial heterogeneity. Spatial heterogeneity in air pollution-attributable health risks was more pronounced when we applied CBG rather than county-level baseline disease rates. Depending on the concentration and baseline disease data sets used, estimated NO 2 -attributable mortality in Oakland at the 100-m grid-cell level varied by a factor of 2-26, BC-attributable mortality (annual deaths per 100,000) varied by a factor of 2-39, and PM 2:5 -attributable mortality varied by a factor of 2-8. We found the least heterogeneity using county baseline disease rates and concentration estimates from a global model, and the greatest variation using Street View concentrations with CBG baseline disease rates. Using concentrations from Street View mobile monitoring and predictive models yielded similar spatial patterns in air pollution-attributable health risks because baseline disease rates also play an important role. For the same reason, CBGs with the highest air pollution-attributable health risks were not necessarily those with the highest pollutant concentrations.
Comparing the influence of baseline disease rates and concentrations on spatial distribution, we found that neighborhoods with the highest air pollution-attributable health risks were not necessarily those with the highest pollutant concentrations. Each additional input to the health impact function changed the spatial distribution of the estimated health burden. For example, calculating the percent of mortality that can be attributed to air pollution incorporates only the CRF and concentrations ( Figure 3A). When baseline disease rates were also incorporated to estimate attributable mortality rates ( Figure 3B), spatial patterns in risk shifted to different areas within West and Downtown Oakland. Finally, when population was included to estimate pollutantattributable disease burdens ( Figure 3C), spatial patterns shifted yet again. Therefore, considering only concentrations without incorporating baseline disease rates and population distribution may not adequately capture the neighborhoods with greatest pollutant-attributable health risks and burdens. The relative importance of disease rates and concentrations on spatial heterogeneity in risk estimates depended on the data set and its spatial resolution, as well as the risk metric used (attributable fraction, attributable rate, or attributable cases).
Our aggregated estimate of 3,080 (95% CI: 2,100, 4,020) PM 2:5 -attributable annual deaths in the Bay Area was approximately double a previously published estimate of 1,500 deaths attributable to PM 2:5 in San Francisco ) that used satellite-derived PM 2:5 concentrations from the Global Burden of Disease Study (Shaddick et al. 2018). Our estimate was also higher than the estimate from the BAAQMD (2017) Clean Air Plan of ∼ 2,500 annual deaths attributable to anthropogenic PM 2:5 emissions, which used county-level baseline disease rates and population estimates, a Community Multiscale Air Quality Modeling model estimate of PM 2:5 and a mean of 12 different CRFs (BAAQMD 2017). Analysis of various PM 2:5 concentration estimates have indicated substantial differences in spatial patterns, complicating comparability between risk assessments (Diao et al. 2019). We also used different disease rates, concentrationresponse functions, and low-concentration thresholds.
Mobile monitoring offers a spatially explicit observational record but has incomplete spatial coverage. Predictive models using land-use variables and satellite remote sensing have the advantage of complete spatial coverage, but estimated concentrations are uncertain. In areas where there were overlapping Street View and predictive model data for the same pollutant (NO 2 ), we found higher NO 2 -attributable deaths when using the LUR [77 annual deaths (95% CI: 23, 137)] compared with the Street View concentrations [39 (95% CI: 12,66)]. Compared with concentrations, NO 2 -attributable mortality rates using Street View and LUR were more correlated owing to the smoothing effect of applying the same baseline disease rates (R = 0:67). These results indicated that the Street View data set detects extremes in concentrations and associated health burdens that are not identified by the LUR concentration data set. Comparing these results was challenged by inherent differences in the data sets: First, NO 2 concentrations decreased between 2011, for which we had LUR concentration estimates, and 2015, when the Street View mobile monitoring occurred (Duncan et al. 2016). Second, the Street View data set captured high near-roadway exposures, whereas the LUR model represented broader spatial average concentrations with smaller decay gradients of concentrations as you move away from main thoroughfares and highways, resulting in higher concentrations in residential areas. In addition, Street View measurements were taken during the daytime, which may underestimate daily NO 2 concentrations by 15-20% given that daytime ambient NO 2 is depressed by photolysis. However, this effect may have been balanced by the Street View data set, which did not account for lower weekend concentrations. Given these differences, NO 2 concentrations from these two data sets were not well correlated (R = 0:55), and the LUR concentrations were overall higher with less spatial variability.
Our study was limited in several ways. Although concentration and population data sets are increasingly available at high resolutions, baseline disease rates are still difficult to obtain at urban and intra-urban scales. For example, for asthma incidence, we were only able to apply a statewide incidence rate, although prevalence data for asthma shows spatial heterogeneity of asthma within the Bay Area and California. Expanding disease surveillance and increasing access to highly resolved baseline disease rates in cities around the world would improve health impact assessment estimates and capacity to detect areas within cities that have elevated pollution-attributable health risks. Our analysis of racial disparities in air pollution health risks was also limited because we did not incorporate racial-and ethnicity-specific baseline disease rates or RR estimates. In addition, we applied CRFs from meta-analyses and large, nationwide cohort studies that had high quality and statistical power although their populations may not have matched the population distribution in our analysis, introducing additional unquantifiable uncertainty to our analysis.
Our pollutant-specific estimates cannot be summed together because we applied single-pollutant epidemiological models and, as such, there could be a significant amount of overlap between the deaths estimated to be attributable to each individual pollutant. Some of the relationship between NO 2 and adverse health outcomes may have been accounted for by concurrent PM 2:5 exposures, resulting in overlap of attributable deaths in our results presented for NO 2 and PM 2:5 , although PM 2:5 alone does not fully capture the effects of near road traffic pollutants more strongly correlated with NO 2 . Similarly, estimates presented here for PM 2:5 are inclusive of BC and other PM 2:5 species. Therefore, results for BC should be interpreted as a subset of PM 2:5 -attributable health outcomes. We calculated results for both BC and PM 2:5 because although PM 2:5 can have multiple sources, BC is a combustion-related particle that represents the impacts of PM 2:5 traffic-related pollution as opposed to pollution form other regional sources (Janssen et al. 2011). The low magnitude of results precluded us from drawing strong conclusions from this comparison. The high spatial heterogeneity of the Street View BC concentrations resulted in poor correlation between the two data sets (R = 0:03), although the smoothing effect of applying the same CBG baseline disease rates to each data set resulted in increased correlation between BCattributable mortality rates (R = 0:36).  The effective use of a two-pollutant CRF in health impact assessment relies upon RR estimates from studies able to meaningfully parse relationships between correlated pollutants (Dominici et al. 2010;Stafoggia et al. 2017), which were not available for most of our pollutant-health outcome pairs. For one sensitivity analysis wherein a two-pollutant model CRF for NO 2 was available, we applied a PM 2:5 -adjusted RR estimate for NO 2 and CVD mortality among the elderly from the study by Eum et al. (2019), which resulted in lower total attributable case estimates (254 attributable all-cause deaths within Alameda County, using the unadjusted CRF, and 121 deaths, using the adjusted CRF), indicating a considerable portion of the burden estimated for NO 2 may be attributable to PM 2:5 .
As demonstrated, pollutant concentrations varied substantially within cities, whereas air pollution cohort studies in the United States have often compared exposure between cities. No longterm North American cohort studies have analyzed within-city variation in RR estimates for PM 2:5 and all-cause mortality (Vodonos et al. 2018). The meta-analysis we employed for the relationship between NO 2 and all-cause mortality relied upon a mix of studies examining within-(i.e., the ESCAPE cohorts) and between-city (i.e., the Harvard Six Cities cohort) exposure comparisons. As a sensitivity analysis, we estimated results using a within-city RR estimate (Table 1) from a subcohort of the Canadian Census Health and Environment Cohort, which found significant relationships between NO 2 and all-cause for withincity exposure comparisons but not for between-city exposure comparisons (Crouse et al. 2015). This within-city RR estimate was also included in the meta-analysis for our main CRF . Although the choice of CRF changed the overall magnitude of aggregated air pollutionattributable health impacts for each pollutant, it did not affect our main conclusions about the intra-urban heterogeneity because we applied the CRF uniformly across the domain. Future use of statistical methods able to assess correlated exposures (Stafoggia et al. 2017) will allow for improved application of two-pollutant model estimates health impact assessment and policy-making.
Interpreting and communicating the uncertainties in a health impact assessment is a known challenge (Nethery and Dominici 2019) because, with each input parameter to the health impact function, there is associated uncertainty. We estimated uncertainty using the CIs of the RR estimates, but we were unable to quantify uncertainty in the pollutant concentrations, baseline disease rates, and population estimates. Gridded population estimates are also increasingly available at a fine spatial resolution. Prior to selecting a population data set, we examined use of WorldPop (Tatem 2017) estimates for 2016, which are also available on a 100 m × 100 m resolution. Although provided at a high resolution, we found the WorldPop estimates lacked the spatial heterogeneity available in other population data sets. Although population estimates are still a source of uncertainty in our assessment, we believe using a data set specific to the United States that incorporates both census and satellite data reduced part of this uncertainty. Among our pollutant concentration data sets, the Street View data set included only measurements that were taken during the daytime and on weekdays and may not, therefore, have fully captured long-term annual averages. The LUR data set incorporates in situ data, although concentrations are ultimately estimations of the sum of all oxidized atmospheric odd-nitrogen species (NO y ) estimations and not observations of actual NO 2 concentrations (Dickerson et al. 2019). Future work can make use of more spatially refined estimates of PM 2:5 (Di et al. 2019). In addition, NO 2 satellite-derived models have now been developed (Di et al. 2020) that can be compared with results from LUR models.
In addition, we assumed a causal relationship between NO 2 and all-cause mortality, although the putative agent(s) in the traffic-related pollution mixture are unknown, adding to uncertainty in our estimates. Epidemiological studies often use NO 2 as a marker of traffic-related air pollution because it is easily measured and for consistency in characterizing spatial patterns in traffic-related air pollution (Beckerman et al. 2008;Levy et al. 2014). However, none of the studies in the meta-analysis we used to derive RRs for NO 2 and all-cause and CVD mortality adjusted for traffic-related particles or other chemicals, including BC and PM 2:5 , in the traffic pollution mixture . It therefore remains unclear whether NO 2 itself is associated with mortality or whether NO 2 serves as a proxy for other elements of the traffic-related air pollution mixture. Following COMEAP recommendations, the NO 2 mortality impacts should be interpreted as a metric of the overall mortality burden due to mixture of near field traffic-related air pollution.
Another challenge with hyperlocal air pollution health impact assessment that requires further exploration was capturing pollution exposure accurately for population movement. We believe this limitation was mitigated for two reasons: a) air pollution disproportionately affects the very young and very old, who tend to stay closer to home throughout the day (Chambers et al. 2017;Spalt et al. 2016); and b) most air pollution epidemiological studies use residential address as the method of assigning exposure, thus accounting for population movement would be inconsistent with the epidemiological studies from which we drew concentration-response relationships. In addition, using only residential address in exposure assessment within epidemiological studies has been found to underestimate health effects of PM 2:5 by about 10% (Nyhan et al. 2019). Similarly, without information available on the time-varying activity patterns of our population, we were unable to account for time-activity data in our risk assessment; however, exposure misclassification likely contributed less than other variables (e.g., RRs) to uncertainty in our health impact results. Our results indicate population movement out of highly polluted areas may substantially reduce population pollutantattributable health burden. However, this points to the need for more epidemiological analysis using exposure assessment techniques beyond central site monitors, as well as techniques that account for people's movements rather than assigning exposure at the residential address. This factor should be explored in greater detail to understand how population movement affects actual exposure levels and estimated health impact assessment results. A related limitation is that the high-resolution concentration data sets we used did not match the exposure assessment techniques used in the epidemiological studies from which we derived our CRF, which most frequently used stationary monitors and, increasingly, LUR and satellite-based models, which may be more coarsely resolved than the data sets we used here. Thus, the CRFs we applied may be inconsistent with the exposure estimates we used.
Some recommended best practices in conducting air pollution health impact assessments in cities globally can be derived from the insights from this work. First, we found that applying finescale mobile monitoring or satellite LUR-derived air pollution data in health impact assessment reveals large and unequal distributions of the air pollution burden in cities. This indicates that spatial distribution of air pollution impacts could be routinely assessed in city air quality health impact assessments. Second, the distribution of air pollution and its risks and burdens did not follow the same patterns owing to large underlying spatial health disparities (reflected in baseline disease rates) and population distribution. Although most of the research in this area has focused on producing increasingly fine resolution estimates of air pollutant concentrations, similar emphasis has not been given to estimating or measuring spatial patterns of disease. Ignoring health disparities results in underestimating air pollution impacts in areas already burdened by poor health and masks the disproportionate impact faced by disadvantaged communities within cities. This has important environmental justice implications, and local disease rates should be incorporated as a best practice into city air pollution health impact assessments.

Conclusions
We found that air pollution health risks vary considerably within cities and that information on the spatial distribution of pollutant concentrations alone is insufficient to identify areas of elevated risk and burden of disease attributable to air pollution. We anticipate that these findings will apply to other health impact assessments conducted on the local scale given that spatial heterogeneity in disease rates is not unique to the Bay Area. Using pollutant concentrations from predictive models and mobile monitoring measurements identify similar spatial patterns of disease because disparities in baseline disease rates drive a substantial portion of heterogeneity in air pollution-attributable health risks. For areas with limited resources or where intensive mobile monitoring is not feasible, LUR-and satellite-derived models may be sufficient for identifying intra-urban areas of elevated risk, but additional research is needed to determine whether these findings hold in other areas. In addition, LUR-and satellitederived models typically do not account for the mixture of vehicle types and traffic volume and, therefore, may be improved with the information about roadway concentrations captured in mobile monitoring data sets. Future work may seek to integrate multiple sources of pollutant concentration information to leverage the advantages of each. It is also important to expand reporting of disease rates at subcity scales.