Mapping Human Vulnerability to Extreme Heat: A Critical Assessment of Heat Vulnerability Indices Created Using Principal Components Analysis

Background: Extreme heat poses current and future risks to human health. Heat vulnerability indices (HVIs), commonly developed using principal components analysis (PCA), are mapped to identify populations vulnerable to extreme heat. Few studies critically assess implications of analytic choices made when employing this methodology for fine-scale vulnerability mapping. Objective: We investigated sensitivity of HVIs created by applying PCA to input variables and whether training input variables on heat–health data produced HVIs with similar spatial vulnerability patterns for Detroit, Michigan, USA. Methods: We acquired 2010 Census tract and block group level data, land cover data, daily ambient apparent temperature, and all-cause mortality during May–September, 2000–2009. We used PCA to construct HVIs using: a) “unsupervised”—PCA applied to variables selected a priori as risk factors for heat-related health outcomes; b) “supervised”—PCA applied only to variables significantly correlated with proportion of all-cause mortality occurring on extreme heat days (i.e., days with 2-d mean apparent temperature above month-specific 95th percentiles). Results: Unsupervised and supervised HVIs yielded differing spatial vulnerability patterns, depending on selected land cover input variables. Supervised PCA explained 62% of variance in the input variables and was applied on half the variables used in the unsupervised method. Census tract–level supervised HVI values were positively associated with increased proportion of mortality occurring on extreme heat days; supervised PCA could not be applied to block group data. Unsupervised HVI values were not associated with extreme heat mortality for either tracts or block groups. Discussion: HVIs calculated using PCA are sensitive to input data and scale. Supervised HVIs may provide marginally more specific indicators of heat vulnerability than unsupervised HVIs. PCA-derived HVIs address correlation among vulnerability indicators, although the resulting output requires careful contextual interpretation beyond generating epidemiological research questions. Methods with reliably stable outputs should be leveraged for prioritizing heat interventions. https://doi.org/10.1289/EHP4030


Introduction
Extreme heat poses a current and future threat to human health (Crimmins et al. 2016). In response to this threat, public health practitioners and researchers are tasked with developing preparedness, response, and mitigation plans and policies that protect those who are experiencing and who will experience most of the health burden related to extreme temperatures.
Considerable progress has been made in recent years to understand the relationship between extreme heat and human health (Anderson and Bell 2011;Anderson et al. 2013), and findings from epidemiological studies have laid the groundwork for identifying population characteristics associated with adverse effects of extreme heat on human health (Basu and Samet 2002;Curriero et al. 2002;Gronlund 2014;Ostro et al. 2009;Schwartz 2005). Socioeconomic and demographic factors such as older age (Bouchama and Knochel 2002;Ostro et al. 2009), racial and ethnic minority status, low income, having less than a high school education (Hajat et al. 2005;Semenza et al. 1996;Stafoggia et al. 2006), being unmarried (Jones et al. 1982), air conditioning prevalence (O'Neill et al. 2005), and social factors such as living alone or having access to transportation (Semenza et al. 1996) have been associated with increased risk of mortality during extreme heat events. Measures of green space such as the percent impervious surface (Barnett et al. 2006;Hass et al. 2016) and having access to green space (Dadvand et al. 2016;Medina-Ramon and Schwartz 2007;Yeager et al. 2018) have garnered attention as protective area-level characteristics. Further, there is widespread agreement that the distribution of these heat-related risks varies across populations and communities (Ebi et al. 2018). With numerous variables to consider when determining a population's risk of health effects from extreme heat, public health practitioners who are looking to translate research into actionable preventive programs are challenged to simplify a complex relationship.
A commonly used approach to assess human health risk during hot weather is to characterize it in terms of measurable vulnerability. Vulnerability research grew in popularity in the context of social vulnerability, a unitless measure of the extent to which a population is resilient to natural disasters and hazards (Cutter et al. 2003;Flanagan et al. 2011). Although vulnerability can be defined in numerous ways (Cutter et al. 2003;Fussel 2007;Harlan et al. 2006;NRC 2010), it broadly consists of environmental, demographic, and population-specific health and societal characteristics. One definition of vulnerability is presented by Wilhelmi and Hayden (2010), who define vulnerability within a multifaceted top-down and bottom-up framework that draws on populations' exposure, sensitivity, and adaptive capacity. The dynamic interactions between exposure, sensitivity, and adaptive capacity make characterizing vulnerability-a fluid, populationand locale-specific concept-challenging.
In the United States, state and local public health practitioners are identifying populations and locations most vulnerable to environmental hazards such as extreme heat to design and implement protective interventions (Manangan et al. 2014). Public health departments in Michigan (Seroka et al. 2011), Minnesota (Minnesota Climate andHealth Program 2012), New York State (Nayak et al. 2018), and San Francisco, California (San Francisco Department of Public Health 2013), for instance, have drawn from and used the methods put forth in Reid et al. (2009) to develop heat vulnerability indices and maps that consider current and future climate conditions. These methods have become the conventional approaches for incorporating environmental, demographic, and socioeconomic data to capture population-level heat vulnerability. Individually, characteristics associated with vulnerability can be quantitatively and qualitatively measured. Because of data limitations, vulnerability is often represented via proxy indicators rather than a direct measure. For example, a researcher may choose to represent heat exposure indirectly through estimating prevalence of impervious surfaces or lack of vegetation, both of which are associated with the urban heat island, or more directly through remotely sensed land surface temperatures (Bao et al. 2015;Wolf and McGregor 2013). These measures are then aggregated to create a single index of vulnerability. Characterizing vulnerability as a single measure is either often discussed or actually used as a tool for translating research into action via vulnerability maps (Harlan et al. 2013;Johnson et al. 2012;Reid et al. 2009;Wolf and McGregor 2013) that can inform policy and planning (Bradford et al. 2015;Hoppe et al. 2018;Johnson et al. 2012;Nayak et al. 2018).
Principal components analysis (PCA) is a technique commonly used to construct heat vulnerability indices (Harlan et al. 2013;Reid et al. 2009). PCA is a dimension-reduction technique that can distill multiple, potentially correlated variables into new, independent constructs/factors; typically, the number of constructs is much smaller than the number of variables in the original data set. This technique can be an appealing approach for handling heat vulnerability data sets. The growing use of PCA to construct vulnerability indices for heat could extend to other climate-related exposures, such as floods, aeroallergens, and wildfires. Despite the increasing trend in developing single measures of vulnerability via indices (Bao et al. 2015), there have been few studies that have assessed the appropriateness of the methods or reliability of the products themselves (Reid et al. 2012;Tate 2012).
Social vulnerability indices that have been constructed using PCA and non-PCA methods (Cutter et al. 2003;Flanagan et al. 2011) have been assessed. Validation studies of social vulnerability indices have indicated that the mapped products are sensitive to input data, suggesting that they should be interpreted with caution (Schmidtlein et al. 2008;Tate 2012). Although different methodologies for constructing vulnerability indices exist, here we focus on a methodology that is commonly used to construct HVIs-PCA-and conduct a critical assessment of indices produced with input data that were intended to capture similar constructs relevant to heat exposure (i.e., vegetated land cover or lack thereof) but were derived from different publicly available data sources.
In recognition of the growing interest in identifying intraurban patterns of heat-related vulnerability, we explore three questions regarding PCA-derived heat vulnerability indices, using Detroit, Michigan, USA, as a case study. First, how and to what extent are heat vulnerability indices sensitive to physical environment input variables, specifically land cover measures, when describing spatial patterns of heat vulnerability? Second, what is the relationship between a heat vulnerability index (HVI) and all-cause mortality (2000-2009) on extreme heat days at both fine (i.e., block group) and neighborhood (i.e., tract) levels? Third, does screening for which variables are used when creating a heat vulnerability index based on their association with the health outcome (i.e., a supervised HVI) produce the same spatial patterns?

Study Location
Detroit, which covers 142 square miles, is home to roughly 670,000 residents, more than 80% of whom are African American; roughly 35% live below the poverty line; and about 14% are over the age of 65 (www.census.gov/acs). Although located in the northern United States, it is common for Detroit to experience prolonged periods of heat and high humidity during the summer months. The City of Detroit and neighboring Southeast Michigan municipalities have been planning for heat events via an established network of cooling centers, outreach and education, utility assistance programs, and community emergency response teams (Sampson et al. 2013). The demographic and socioeconomic profiles of the resident population reflect a high level of sensitivity to high temperatures, suggesting this population is particularly at risk during extreme heat events (Gronlund et al. 2015).

Data Sources and Variable Selection
We created HVIs that represent the period between 2000 and 2009 in Detroit, Michigan. The Cities of Highland Park and Hamtramck, which are located within the boundaries of the City of Detroit, were treated as part of the City of Detroit for this analysis. Following the methodology established by Reid et al. (2009), we determined a priori the variables to include in the calculation of the HVI, with the addition of four different variables to represent nongreen space: nontree canopy, nonvegetation including water, nontrees, and distance to water (Table 1). Variables were defined so that an increase in value would correspond to a hypothesized increase in heat vulnerability. Demographic variables were extracted from the American Community Survey (ACS) 5-y estimates for 2006-2010, for the census tract and block group geographies for the City of Detroit, Michigan (www.census.gov/acs). Our analysis was conducted at both the census tract levels and the block group levels given the interest in understanding intraurban patterns of heat vulnerability (Christenson et al. 2017;Johnson et al. 2012). Environmental Health Perspectives 097001-2 128(9) September 2020 Variables included proportions of the following groups in each tract or block group: over the age of 65, living alone, over the age of 65 and living alone, less than a high school education, living at or below the poverty level, and of race/ethnic minority status. Minority status (in relation to the Metropolitan Statistical Area; www.census.gov/acs) was defined as being not white and not Hispanic.
Variables to represent heat exposure were obtained from different sources and iteratively included in the calculation of the HVI to assess the sensitivity of PCA to input variables. Land cover, including the prevalence of impervious surface and nonvegetated land cover, is highly associated with heterogeneous intraurban heat exposure due to the urban heat island (Weng et al. 2004). We tested different variables for inclusion in the HVI to represent the proportion of nonvegetative land cover in each census tract or block group, specifically the proportion of impervious surfaces, nontree canopy, nonvegetation, and nontree areas. Although correlated (Figure 1), these variables estimate vegetative land cover differently. For instance, nontree canopy coverage is not equivalent to a measure of percentage of nontrees; the tree canopy could cover more area than the percentage of trees. Tree canopy coverage, however, is not always available for the geography and time period of interest. Vegetation variables in heat-health analyses are not always represented using the same metric (Yeager et al. 2018). Because vegetative land cover can be modified within a city-it is possible to change the amount, location, and type of vegetation-we consider vegetative land cover an index variable amenable to intervention by a given municipality (e.g., 10% increase in vegetation by geographic unit).
Land cover data was derived from three products. Each variable was defined to reflect the hypothesis that less vegetated land cover increases heat vulnerability. First, we extracted the impervious surface layer from the 2006 National Land Cover Database (NLCD) (http://www.mrlc.gov/nlcd06_leg.php). NLCD is available for the conterminous United States and is often used for characterizing vegetative and impervious land cover (Pearsall 2017). The 30-m resolution product has been used in heat-health studies to characterize nonvegetated land cover. In this analysis, the impervious surface layer ("Impervious"), was calculated as a proportion of tract and block group and represented the commonly used nongreen space characterization. Second, we used the 30-m 2001 NLCD tree canopy layer ("Nontree canopy") to calculate the proportion of a tract or block group that is not covered by tree canopy. The 2001 NLCD data set was the only publicly available tree canopy assessment for the City of Detroit for the study period (Homer et al. 2007). The NLCD tree canopy layer represents a snapshot of the tree canopy for the study area.
We developed HVIs using fine-scale land cover data to estimate fine-scale vulnerability to heat, because some analyses indicate that NLCD underestimates vegetation (Nowak and Greenfield 2010). To do this, we used 1-m resolution aerial photography of the metropolitan Detroit area from late spring 2005, which we acquired from the Southeast Michigan Council of Governments Environmental Health Perspectives 097001-3 128(9) September 2020 (SEMCOG) Imagery product (SEMCOG 2005). Land cover classifications from this source included proportions of impervious surface, bare earth, open space, trees, and water. We defined "Nonvegetation" (Equation 1) at the block group and tract levels as: 1 − X ðOpen Space + Trees + WaterÞ: (1) We developed a final vegetative land cover variable, which was aerial photograph-derived "nontrees" (Equation 2) to represent the sole contribution of vegetation that is not trees, calculated as: 1 − ðTreesÞ: (2) All land cover variables were averaged and assigned to census tracts and block groups in ArcMap using the Zonal Statistics tool on 2010 Census TIGER shapefiles.
Distance to water, which has been demonstrated to have a cooling effect in urban microclimates (Steeneveld et al. 2014), was calculated as a straight-line distance from the Detroit River to the centroid of each tract and block group in ArcGIS (ArcMap, version 10.6). The measurements were scaled by dividing the largest distance to have a value between 0 and 1, so that 1 indicated the furthest distance from the river, with further distances hypothesized to confer higher vulnerability to heat exposure.

Characterizing Extreme Heat
It is well established that mortality increases significantly at higher temperatures (Anderson and Bell 2009) and that apparent temperature on the day prior to and the day of death (AT 01 ) captures the acute effect of heat (Barnett and Åström 2012). Hourly daily mean temperature and dew-point data were extracted from the National Centers for Environmental Information for airport weather stations in Detroit and were used to calculate apparent temperature (Global Surface Summary of the Day 2012). The 2-d mean apparent temperature (AT 01 ) captures the acute effect of heat by averaging the apparent temperature for the day of and day prior (Barnett and Åström 2012). We defined extreme heat days as days during the study period (2000-2009) on which AT 01 exceeded the monthspecific 95th percentile for Detroit during the summer months (May-September).

Unsupervised PCA and HVI Calculation
The first method for calculating the HVI applied PCA (PROC FACTOR, SAS version 9.3; SAS Institute Inc.) to demographic variables that have been associated with heat-related mortality (proportion >65 years of age, living alone, >age 65 and living alone, less than high school education, at or below poverty level, and of race/ethnic minority status) [i.e., "unsupervised" (Bair et al. 2006)] plus one of the four measures of nonvegetative land cover. Following Reid et al. (2009), we rotated the factor pattern, retained factors whose eigenvalues >1, normalized factor scores, summed the scores to calculate a final HVI value, and classified them by standard deviation. A total of eight unsupervised HVIs-four at the census tract level and four at the block group level, respectivelywere calculated and mapped.

Agreement Maps
In addition to mapping scores for each of the four individual HVIs for each census tract and block group, we present maps that illustrate the agreement between each of the HVIs. Specifically, we present maps that show, for each tract or block group, a) the difference between highest HVI value and the lowest HVI value obtained for the given tract or block group; and b) the number of individual HVIs with scores in the highest quartile for each census tract or block group (range 0-4). The agreement maps offer an alternative perspective that may be useful for determining areas with higher vulnerability relative to other locations in a given area.
Supervised PCA and HVI calculation. We next applied a supervised PCA (Bair et al. 2006) approach, with variables selected based on associations with heat-related mortality. For this purpose, we obtained daily, geocoded mortality data from the Michigan Department of Community Health (MDCH) for the years 2000-2009. Institutional Review Boards (IRBs) for the University of Michigan (UM) and MDCH approved this study (UM IRB: HUM00067448). Daily nonaccidental deaths [International Classification of Diseases 10th revision (ICD-10): A00-R99, T67, X30] were assigned a census tract identifier and a block group identifier [in ArcGIS (ArcMap, version 10.6)], and subsequently aggregated at the census tract level and block group level. We limited the analysis data set to May-September, when extreme heat days (two consecutive days with apparent temperature above the month-specific 95th percentile for Detroit) were most likely to occur.
To determine which variables to use in the supervised HVI, we first estimated the proportion of all-cause mortality that occurred on extreme heat days vs. other days during the May-September period. Then, we regressed the proportion of all-cause mortality that occurred on an extreme heat day on each variable used in the creation of the unsupervised HVI. We examined assumptions of independence and normality, finding that the errors were approximately normally distributed. Variables that were moderately significantly associated (p < 0:20) with all-cause deaths occurring on an extreme heat day were selected and used to calculate the HVI. We chose a conservative p-value in order to err on the side of including potential variables for constructing the supervised HVI. Only two variables (impervious surface and living below poverty) would have been included had the p-value been set at 0.05; a third, living alone, would have been added if set at 0.10.
We conducted an alternate calculation of the ratio of deaths on extreme heat days vs. nonextreme heat days, adjusting for citywide seasonal and long-term trends, as described in detail in Table S1. However, we decided against using this approach because the ratio of deaths on extreme heat days vs. nonextreme heat days was almost perfectly correlated with the ratio derived using our default model (>0:99, Table S1), and our primary interest was in examining how well HVI explained differences among census tracts.
Because the number of all-cause deaths on an extreme heat day is a variable that is likely to be spatially correlated, we assessed the assumption of independence of the residuals in the linear regression models. Particularly, we wanted to evaluate whether the residuals displayed spatial correlation. For this purpose, we fitted simple linear regressions with the proportion of all-cause deaths occurring on an extreme heat day per tract and block group as the outcome and each individual variable as the sole covariate. We used the OLS tool in ArcMap. We derived the residuals of each linear regression model and computed Moran's I to assess the presence of residual spatial correlation and, thus, a need to account for spatial correlation in the error terms of the linear regression model. As the Moran's I for the residuals corresponding to each simple linear regression were nonsignificant (data not shown), we did not perform spatial regression.

Comparison of Unsupervised and Supervised HVIs
To evaluate the robustness of our findings across the different approaches used to derive HVI, we conducted simple OLS regression analyses, regressing the proportion of all-cause mortality occurring on extreme heat days on the tract-and block group-specific HVI values obtained using both unsupervised and supervised PCA.

Results
During the period of our analysis, Detroit contained 913 populated census block groups and 308 populated census tracts (Table 1). Population characteristics were similar between census tract and block group calculations. City residents were primarily African American and lived in areas with low vegetated land cover. Land cover measures differed from each other, with most of the city, on average, having almost no tree canopy coverage and about half of the city covered with nonvegetation, including water (Table 1). Land cover measures were highly correlated with each other, as were age and living alone status (Figure 1). The first factor in all eight unsupervised HVIs, at both geographic scales, was composed of three variables: over the age of 65, living alone, and over the age of 65 and living alone ( Table 2). The first factor represents variables that describe age/isolation. The remaining five variables loaded onto the second and third factors; Table 2. Variance explained and factor loadings for PCA outputs for tract level and block group level unsupervised HVIs calculated by including impervious surface (NLCD-derived), nontree canopy (NLCD-derived), nonvegetation including water (aerial-derived), and nontrees (aerial-derived), respectively, for Detroit, Michigan, USA. Environmental Health Perspectives 097001-5 128(9) September 2020 across all iterations, minority status loaded separately from education and income variables, which consistently loaded together.
Variables that indicated lack of vegetation either loaded with minority status or onto the factor containing education/income. Land cover variables did not indicate the same direction. Impervious surface, nonvegetation, and nontrees all loaded in the negative direction with minority status, indicating that tracts and block groups with higher percent minority populations had higher vegetated land cover, whereas the nontree canopy coverage variable loaded with education and income variables, indicating that locations with higher proportions of residents with low income and low educational attainment occurred in areas where there was less tree canopy coverage. The directions of the factor loadings were consistent between census tract and block group analyses. On average, the three factors for tract HVIs accounted for 67% of the variance in the data; the three factors for block group HVIs accounted for about 64% of the variance (Table 2).

Unsupervised HVI Calculation
Simple pairwise correlations between each of the four unsupervised HVI scores (based on the different measures of nonvegetative land cover) produced moderate to high correlations ranging from 0.69 to 0.97 at the census tract level and 0.64 to 0.97 at the block group level. Qualitative comparison across both census tract and block group maps of the four individual unsupervised HVI scores (Figures S1 and S2, respectively) indicated inconsistent patterns of spatial heterogeneity. For example, Figure 2D shows high vulnerability block groups appearing further to the west and north than reflected by the high vulnerability tracts in Figure 1D. These inconsistencies were not so substantial as to completely alter the general spatial pattern, however.
Maps showing the agreement among the four different unsupervised HVIs according to differences in the highest and lowest HVI score for each tract or block ( Figure 2) indicated spatial patterns that were generally similar between both spatial scales. Both sets of agreement maps suggest that inconsistencies among scores for the four unsupervised HVIs were relatively high in the northwest and southeastern portions of the city, and in a few locations in central and north central neighborhoods. However, the block group map indicated variation within tracts. For example, in the southwestern area of the city, unsupervised HVIs at the tract level generally agreed, whereas the block group map suggests that the unsupervised HVIs did not agree on the severity of heat vulnerability in some smaller areas.
A map of the number of individual HVIs with scores in the highest quartile for each census tract ( Figure 3A) suggests that tracts in the central and eastern part of the study area and two tracts in the southwestern part of the city were the most vulnerable based on multiple HVIs. On the other hand, areas that did not appear vulnerable based on this metric when mapped at the tract level did indicate areas with multiple HVI scores in the highest quartile when mapped at the block group level ( Figure 3B), including several southwestern block groups. Further, 585 block groups (64%) had a score in the highest quartile for at least one of the four HVIs, compared with 113 (37%) of census tracts. This finding suggests that HVIs defined at the census tract level may provide a more sensitive means of identifying the areas with the highest vulnerability.
The total number of deaths in Detroit during the 10-y study period was 32,717. Neither tract nor block group unsupervised HVI scores were associated with deaths on extreme heat days (AT 01 ) when HVI was modeled as a continuous variable (Table 3). When unsupervised HVIs were modeled as categorical variables (using approximately equal score intervals) using scores of 0-3 as the reference category, the top four interval HVI categories were positively associated with mortality on extreme heat days for all HVIs in both census tracts and block groups. However, associations did not increase in magnitude with higher scores in most cases, and trend tests were not statistically significant (p > 0:05), with the exception of the HVI constructed with tree canopy at the tract level. Regressing the proportion of deaths occurring on an extreme heat day against unsupervised HVIs resulted in R 2 values near 0.00 when the analysis is performed at both the census tract and block group levels.
When unsupervised HVIs were categorized according to quintiles at both the census tract level and block group level, most associations with mortality on extreme heat days remained positive but were closer to the null, without any consistent patterns with increasing HVI scores (Table S2).

Supervised HVI
At the census tract level, the variables used to calculate a supervised HVI (selected based on p < 0:20 for associations with mortality on extreme heat days, Table 4) were: age 65 and older, living alone, living below the poverty level, percent nonvegetated, and percent nontree canopy. We did not derive a supervised HVI at the block group level because only two predictors-living alone and percent nontrees-met the criterion for inclusion.
The tract-level supervised HVI resulted in two factors: The first included living under the poverty level and vegetated land cover variables, and the second factor combined age and living alone ( Table 5). The combined two factors account for 62% of the variance in the variables included in the supervised PCA. The two factors could be characterized as residential environment and age/social isolation, respectively. More generally, these could be considered representative of the exposure and sensitivity components of vulnerability, although poverty may also be closely related to health conditions increasing sensitivity to heat. The mapped scores for the supervised HVI ( Figure 4) suggest that census tracts with the greatest heat vulnerability are located in the central portion of the city.
Supervised HVI scores were positively associated with the proportion of all-cause deaths occurring on an extreme heat day (Table 6). However, the magnitude of the association was small and did not increase monotonically with higher score categories. The percent variance explained by the regression models against the proportion of deaths occurring on an extreme heat day were extremely low, ranging between 0.00 and 0.01, for both unsupervised and supervised HVIs.

Discussion
The goals of these analyses were threefold: to determine the influence of input variables representing nonvegetative land cover on PCA-derived HVI mapping products; to evaluate how changing the spatial scale of the analysis affects our interpretation of where vulnerable populations are located; and to investigate whether spatial patterns produced by unsupervised vs. supervised HVI calculations differed. To do this, we constructed a supervised HVI and screened the variables used in the creation of the index based on their marginal association with the health outcome. Our findings suggest that PCA-derived HVI metrics are sensitive to input variables. Specifically, substituting different proxy measures for vegetated land cover resulted in differing spatial distributions of vulnerability scores, with low agreement between the unsupervised HVIs. Calculating and mapping the index values at different geographic scales (census tracts and block groups) also produced inconsistent patterns for locating the most vulnerable areas.
Environmental Health Perspectives 097001-6 128(9) September 2020 A number of studies have constructed unsupervised HVIs for urban areas (Bao et al. 2015). In these studies, the factor loadings are inconsistent between locations, the HVI results are inconsistent between geographic resolutions, and the HVI results and factor loading are sometimes highly sensitive to choices of input variables. The census tract level national HVI presented by Reid et al. (2009) elucidated heat vulnerability across the United States and within some urban areas, yet relied on relatively coarse scale data (i.e., tract level) and in a validation study, generally predicted overall population health rather than heat-related health Environmental Health Perspectives 097001-7 128(9) September 2020 (Reid et al. 2012). Finer scale indices (i.e., block group level), such as those done for Chicago, Illinois (Johnson et al. 2012), and Phoenix, Arizona (Harlan et al. 2013), demonstrated finer variation in intraurban heat vulnerability. Consistent with the results presented here, the aforementioned studies observed different resulting factors from their PCA calculations, despite having similar data sources and variables. For instance, the Phoenix socioeconomic and elderly/isolation characteristics loaded separately and were distinct from the land cover factor. By contrast, in Chicago, economic status and age were grouped Environmental Health Perspectives 097001-8 128(9) September 2020 together, obscuring any distinction of age and socioeconomic status in vulnerability. In Chicago, lower education and Hispanic variables loaded together, and African-American race and land surface temperature loaded together. In this Detroit study, unsupervised and supervised HVI results consistently grouped elderly and isolation characteristics. In the unsupervised HVIs, educational attainment and income consistently loaded together but notably loaded separately from minority status. This differed from the unsupervised HVI in Phoenix where minority status loaded onto the same factor as land cover. A recent metaanalysis determined that some of the strongest predictors of heat-related mortality were elderly ages (65 and older and 75 and older) and low socioeconomic group status (Benmarhnia et al. 2015). Other reviews have identified race and ethnicity as characteristics of vulnerability (Gronlund 2014). In the HVIs presented here, these characteristics were not identified as   Table 3. Linear regression estimates (b) and 95% confidence interval for the association of proportion of all-cause deaths on extreme heat days on unsupervised HVI scores characterized by land cover type (continuous and equal interval categorizations), by geography for Detroit, Michigan, USA (2000 Environmental Health Perspectives 097001-9 128(9) September 2020 stand-alone factors and often moved between factor loadings when using different methods.
Modifying vegetated land cover, such as increasing tree canopy coverage, is a common climate and health adaptation strategy (Stone et al. 2014) via reducing urban heat (Ziter et al. 2019), ameliorating fine particulate air pollution (Nowak et al. 2013), and improving psychosocial health (Ulmer et al. 2016). Mapping an HVI that incorporates a health metric (e.g., mortality, as well as other health end points and indicators of quality of life and thermal comfort) could be important for decision-makers who are concerned about health equity when choosing siting for tree-planting campaigns or strategies to better maintain existing vegetation. Our results did not identify a vegetated land cover variable that was clearly superior to other variables for developing an HVI for Detroit. However, these results do not suggest that vegetated land cover wouldn't provide protection to Detroit residents during extreme heat days. These results do indicate that an HVI map may not provide the information needed to determine either the location or the type of vegetated land cover that would be needed to provide protection during extreme heat. Chuang and Gober (2015) found that a generic vulnerability index, constructed at the tract level, was sensitive to scale and relatively imprecise in predicting heat-related morbidity in Phoenix. Comparisons within index groupings agree with results from Maier et al. (2014) that adverse heat-health outcomes are positively associated with HVI scores, with the exception of the areas with the lowest vulnerability scores. Locations of vulnerable populations, however, changed depending on both the scale and input data used in the index creation. This calls into question the appropriateness of HVIs for use in decision-making given the lack of consistency or robustness across differing approaches.
To our knowledge, this is the first study that constructs a supervised HVI using mortality data to select the variables used for the index calculation. Identifying local characteristics known to contribute to adverse heat-related health outcomes is considered best practice for developing interventions or communication strategies for reducing the impacts of heat on health (Bao et al. 2015). Based on our screening to select only the variables that contributed to heat-related mortality in Detroit, a supervised HVI provides some  Environmental Health Perspectives 097001-10 128(9) September 2020 assurance that the index values reflect an indication of vulnerability for this region. Heat-related morbidity and mortality can be specific to a place, with populations having geographically differential responses to ambient conditions (Vaidyanathan et al. 2019). PCA is a method used for dimension reduction and can be very useful for identifying the principal modes of variations in the data set. However, PCA was not developed as a method to identify a subset of variables among many variables that are most predictive of an outcome. There are a variety of methods, such as the overlay approach (Manangan et al. 2014), a simple additive strategy (Flanagan et al. 2011), or complex weighting schemes that could be employed to construct HVIs that are tailored to a specific use. Vulnerability maps that are used to convey a message to decision-makers and the public may benefit from simple methods that can be easily explained. For some HVIs, there is arguably sufficient epidemiological evidence to extract weights for use in an additive index that does not rely on factor loadings.
In our work, inconsistencies across unsupervised and supervised HVIs may reflect the lack of data that characterizes a population's adaptive capacity or ability to cope and adapt to extreme heat (Morss et al. 2011). Air conditioning, a commonly used proxy for adaptive capacity, is used in the index calculated in Reid et al. (2009) but is excluded in the methods presented here due to lack of reliable estimates of air conditioning prevalence data at tract or block group scale. Behavioral measures, such as personal cooling (e.g., swimming, taking showers), are arguably more indicative of adaptation, but require survey data (Bélanger et al. 2015). Adaptive capacity may affect the Detroit population in at least two ways that are not captured in an HVI. First, adaptive capacity may influence heat vulnerability differently in direction or magnitude than exposure or sensitivity variables. Second, adaptive capacity is not uniform across the Detroit population, further contributing to spatial heterogeneity of vulnerability (Hayden et al. 2011). In either case, omitting a measure of adaptive capacity in an HVI could substantially alter the interpretation of vulnerability.
Inconsistencies among census tract and block group HVIs exemplify the modifiable areal unit problem, whereby summaries of spatial phenomena vary depending on the size and shape of the geographic unit over which the data are aggregated (Schuurman et al. 2007). Few heat vulnerability studies have considered the impact that large margins of errors, especially for block group level geographies, may have on identifying spatial patterns of heat vulnerability (Spielman et al. 2014). Large margins of error in block group estimates may systematically introduce bias into the HVIs (Jung et al. 2019). Block group estimates are calculated as 5-y averages, rather than as point-in-time estimates. In a location that has experienced substantial demographic change, as Detroit at the beginning of 2000 had, uncertainty in these estimates could have implications for interpreting the spatial distribution of population characteristics. Further, that we cannot point to one unit as superior over the other in this study reflects a lack of understanding of geographic and contextual units over which a vulnerability characteristic primarily operates (Kwan 2012). For instance, a characteristic such as poverty might operate at a census tract level through resources readily available within a short distance from an individual's home, such as publicly available cooled indoor locations. However, poverty might also operate primarily at an individual level or for a household through other means, such as whether a person possesses the financial resources to own and operate an air conditioner. If an individual characteristic is highly heterogeneous within a census tract, then aggregation over a smaller unit, such as a block group, might better reflect spatial heterogeneity in vulnerability across a city. Further, determining break points for mapping individual characteristics or HVI values can also influence how spatial patterns are interpreted and communicated. Ultimately, selecting whether to create an HVI at the census tract or block group level is at the developer's and end-user's discretion. These tools should be interpreted and applied with a full, contextual understanding of the intended use.
Because the HVI is often promoted as a tool that can be used to identify relative spatial priorities for heat-related interventions, absolute HVI values have the potential to be misleading. Constructing agreement maps, however, provides an opportunity to communicate the sensitivity of maps to their input variables. The top quartile HVI score designation represents the highest priority areas. These maps would be of greatest use to health practitioners and policy makers if areas are consistently deemed vulnerable to heat. They could provide reasonable rationale for developing place-based interventions. The agreement maps suggest that cities opting to use a particular set of variables to characterize vulnerability in an HVI may allocate resources to neighborhoods that another HVI may classify as relatively low priority. Communicating the results of composite indices has been successful when integrating end-user suggestions and ground-truthing in the development stages (Weber et al. 2015). As Wolf et al. (2015) found, the process of developing heat vulnerability indices can aid in communicating the need for policy change but may not be sufficient for prioritizing interventions. For planning purposes, end users must be aware of the inherent subjectivity in HVI design and the influence variable selection can have on mapped HVI products.
Future research could consider how other epidemiological, environmental and more statistically robust approaches to identifying intraurban heat vulnerability compare with HVIs in their ability to predict heat-related health impacts (Bennett et al. 2014;Heaton et al. 2015;Hondula et al. 2015;Klein Rosenthal et al. 2014;Uejio et al. 2011). Comparative analyses of temperature patterns, health end points (e.g., cause-specific morbidity and mortality, emergency service request calls), and their relationship with HVIs would also contribute valuable insight into how HVIs may detect spatial patterns of heat vulnerability and how these patterns may differ across different health outcomes. Including metrics to capture temperature patterns, which may characterize differences in exposure, introduces additional complexities to HVI interpretation. Inostroza et al. (2016) incorporated land surface temperatures into an HVI for Santiago, Chile, and identified clear spatial patterns of heat vulnerability for exposure variables, which differed in complexity from sensitivity variables. The trade-offs inherent in the calculation of HVIs continue to emerge. Recent Environmental Health Perspectives 097001-11 128(9) September 2020 research indicates that a single PCA-derived HVI score alone may not be strongly associated with total heat-related mortality. Mallen et al. (2019) constructed a PCA-derived HVI score using the Reid et al. (2009) method at the census tract level in Dallas, Texas. A simple spatial regression between this HVI score and statistically attributed heat-related mortality resulted in an R 2 of only 0.03. However, in a multiple spatial regression keeping the indicators used to construct the HVI score as separate variables, the R 2 rose to 0.40. Additionally, only age, education, diabetes, and lack of central air conditioning were significant predictors of heat-related mortality. Mallen et al. (2019) recommended that vulnerability indicators be modeled as individual variables, because this approach vs. a combined index facilitates informing a unique policy or planning response strategy related to the individual variables (e.g., lack of central air conditioning). Similarly, when regressed with proportion of deaths occurring on an extreme heat day, the unsupervised and supervised HVIs in this analysis resulted in very low R 2 , potentially indicating that the indices were inadequately capturing spatial variations in heat risk across the study area. We also recommend modeling of individual vulnerability indicators in a given city because the relationship between these indicators and health outcomes may vary by city and potentially may vary by time period or heat event under differing conditions of duration or intensity. Furthermore, epidemiological regressions of heat-related health outcomes' proximate mechanisms driving vulnerability, such as lack of air conditioning and lack of green space, and their relative importance would allow for the construction of vulnerability maps using these individual variables to predict future vulnerability using future values (e.g., air conditioning, green space).

Limitations
As is common when constructing many types of composite indices, we assumed that all input variables were equally likely to contribute to the overall measure of vulnerability. The intent of the analysis presented here was to replicate the PCA method that is commonly used for constructing HVIs. Population-level health metrics (e.g., prevalence of cardiovascular disease or diabetes) and air conditioning prevalence, although they may better reflect heat vulnerability, were omitted from these indices because the estimates were available only at coarse spatial resolutions (e.g., county level). Last, the comparative assessment presented here is specific for Detroit, meaning that other locations may or may not observe similar results. Demographic and environmental characteristics vary from location to location. Although the PCAderived HVI-unsupervised or supervised-may yield products that seemingly indicate areas of high heat vulnerability, limitations in the data (e.g., missing observations, large confidence intervals) and a lack of context about the study area could produce misleading or inaccurate representations of heat vulnerability. Additionally, we focused on heat-associated mortality rather than heat-associated hospitalizations or emergency department visits. Potentially, an entirely different mapping of vulnerability would have resulted using a different health outcome. Heat vulnerability map users should consider that these maps could potentially differ not only by mapping method but also by the type of heat vulnerability-mortality or various heat-related morbidities.

Conclusions
We demonstrated that PCA-derived HVIs for Detroit are sensitive to input data and mapping choices when employing unsupervised and supervised methodologies. The different approaches resulted in spatial variability, although their construction employed similar, but not identical, input variables. Both methodologies produced positive associations between all-cause mortality occurring on extreme heat days and higher vulnerability. The identified locations of highest vulnerability, however, were dependent on the input data used in the index creation. The supervised HVI, because it inherently captures health impact in comparison with the unsupervised HVI, provides a more specific, although generic, indication of vulnerability to extreme heat exposure. HVIs calculated using PCA are sensitive to input data and, when mapped, can indicate patterns of heat vulnerability that may not capture the nuance of the data used to construct the index. Other literature has shown that PCA-derived HVIs did not always correlate well with the actual heat-related health outcomes, and from statistical theory we know that a PCA-based index does not always lead to an index that is correlated with the health outcome. We recommend that users carefully consider the contextual appropriateness of using PCA-derived HVIs for decision-making around policies for heat interventions. It is incumbent on end users to interpret the resulting HVIs in the context of the study population. Instead, PCA-derived HVIs may better serve as screening tools (i.e., tools for generating research questions) that can then be investigated in epidemiological studies, and different types of HVIs that may be more intuitive and straightforward could be used for prioritizing specific actions.