Child Survival and Early Lifetime Exposures to Ambient Fine Particulate Matter in India: A Retrospective Cohort Study

Background: Ambient fine particulate matter [PM ≤2.5μm in aerodynamic diameter (PM2.5)] is a major health risk for children, particularly in South Asia, which currently experiences the highest PM2.5 levels globally. Nevertheless, there is comparatively little epidemiological evidence from this region to quantify the effects of PM2.5 on child survival. Objectives: We estimated the association between PM2.5 exposure and child survival in India. Methods: We constructed a large, retrospective, and nationally representative cohort of children <5 years of age, born between 2009–2016, from the publicly available, cross-sectional 2015–2016 Demographic Health Surveys in India. In utero and post-delivery lifetime average ambient PM2.5 exposures were estimated with data from satellite remote sensing, meteorology, and land use information (model R2= 0.82). We used Cox proportional hazards regression to estimate the association between both average in utero and post-delivery lifetime PM2.5 and all-cause child mortality, controlling for individual- and household-level covariates, seasonality, location, and meteorology. Results: Over 7,447,724 child-months of follow-up, there were 11,559 deaths at <5 years of age reported by the children’s mothers. The mean concentrations of 9-month in utero and post-delivery lifetime average ambient PM2.5 exposure were 71.1 μg/m3 (range: 20.9–153.5 μg/m3) and 73.7 μg/m3 (range: 14.0–247.3 μg/m3), respectively. Estimated child mortality adjusted hazard ratios were 1.023 [95% confidence interval (CI): 1.008, 1.038] and 1.013 (95% CI: 1.001, 1.026) per 10-μg/m3 increase of in utero and post-delivery lifetime PM2.5, with both exposures in the model. Discussion: This study adds to the growing body of evidence about the adverse health effects of PM2.5 by demonstrating the association between exposure, both in utero and post-delivery, on child survival at the national level in India. Strategies to reduce ambient air pollution levels, including steps to minimize in utero and early life exposures, are urgently needed in India and other countries where exposures are above recommended guideline values. https://doi.org/10.1289/EHP8910


Introduction
Fine particulate matter [airborne particles with an aerodynamic diameter of ≤2:5 lm (PM 2:5 )] has been shown to be associated with adverse health effects in children, including preterm birth (Balakrishnan et al. 2018;Huynh et al. 2006;Li et al. 2018;Stieb et al. 2012), low birth weight (LBW) (Bell et al. 2007;Pedersen et al. 2013), low height-for-age z-score (Spears et al. 2019), respiratory infections (Gurley et al. 2013;Smith et al. 2011), and mortality (Heft-Neal et al. 2018;Son et al. 2017;Woodruff et al. 2006). In India, ambient PM 2:5 levels are among the highest in the world, with an estimated 91:7 lg=m 3 population-weighted annual average (India State-Level Disease Burden Initiative Air Pollution Collaborators 2021) in 2019. The Global Burden of Disease (GBD) Project identified ambient PM 2:5 as the second largest risk factor for disease burden in India (GBD 2019 Diseases and Injuries Collaborators 2020) and estimated that >980,000 premature deaths every year are associated with ambient PM 2:5 air pollution (India State-Level Disease Burden Initiative Air Pollution Collaborators 2021), >49,000 of which are premature deaths in children at <5 years of age (India State-Level Disease Burden Initiative Air Pollution Collaborators 2019, 2021). To date, however, most evidence on child mortality due to exposure to ambient air pollution during pregnancy and early childhood was from regions with lower ambient PM 2:5 levels (Glinianaia et al. 2004;Hans et al. 2011;Son et al. 2017;Woodruff et al. 1997Woodruff et al. , 2006. Although prospective pregnant mother-child cohort studies have recently been initiated in India to examine the association between air pollution exposure and indicators of child health, such as birthweight (Balakrishnan et al. 2015(Balakrishnan et al. , 2018Clasen et al. 2020), to our knowledge, estimates of child mortality have not been reported. Furthermore, owing to the low density of ambient air pollution monitoring stations in India and other low-and middle-income countries (LMICs) (Brauer et al. 2019;Landrigan et al. 2018), current studies of air pollution and infant mortality in India and LMICs are mainly based on annual average ambient PM 2:5 prediction models (Shaddick et al. 2018;van Donkelaar et al. 2016). Such exposure estimates may be prone to error or misclassification, especially for outcomes such as child mortality, which has been associated with monthly or daily changes in air pollution levels (Hans et al. 2011;Son et al. 2017). The recent expansion in the number of ground PM 2:5 monitoring stations in India and the development of spatial PM 2:5 prediction models enable more spatiotemporally resolved estimation of ambient PM 2:5 exposure throughout India.
The primary objective of our study was to evaluate the association between child mortality and ambient PM 2:5 exposure during in utero and post-delivery lifetime periods among a nationally representative, retrospective cohort based on maternal-reported data in the Demographic and Health Surveys [DHS, also known as the National Family Health Survey 2015-2016 (NFHS-4)] in India. To estimate time-dependent ambient PM 2:5 exposure for the cohort, we developed a monthly ambient PM 2:5 model with 0.1°resolution between 2009-2018 over the entire Indian subcontinent. Our ground station-validated model was developed using machine learning, with multiple inputs from satellite remote sensing, meteorological data, and land use information.

Study Design and Population
We conducted a retrospective cohort study of a nationally representative sample of children 0-59 months of age born throughout India (as of 2019, 29 states and 5 Union Territories, excluding the Andaman and Nicobar Islands and Lakshadweep Union Territories), whose mothers participated in the 2015-2016 DHS/ NFHS-4 between January 2015 and November 2016. The design of the DHS/NFHS-4 has been described elsewhere (IIPS and ICF 2017). Briefly, DHS/NFHS-4 is cross-sectional survey using a two-stage stratified sampling framework to collect information from a nationally representative sample of households with women 15-49 years of age and all of their children. Each respondent answered survey questions separately for all children no matter whether the child was still alive at the time of interview. We included children born to women no more than 5 y preceding the survey. Three tiers of all-cause mortality-including neonatal mortality (death at <1 month of age), infant mortality (death at <12 months of age), and child mortality (death at <60 months of age)-were outcome variables in our study. Child birth year and month, child death year and month (if the child died before interview), and potential confounders were reported by mothers during the DHS/NFHS-4 interview during the 2015-2016 period. The child-, maternal-, and household-level covariates were also reported by mothers at the end of the follow-up period. We calculated follow-up time in months, starting from child birth until child death as the failure event and the month of the DHS/NFHS-4 survey or passing the at-risk age as the censoring event.
Because this study analyzed only publicly available data without personal identifiers, the institutional review board (IRB) of Emory University determined this study not to be human subject research and did not require IRB review. Data analyses were conducted between March and September 2019.

Ambient PM 2:5 Exposure and Meteorological Variables
In this study, we developed a random forest (RF) model to predict monthly ambient PM 2:5 concentrations at a 0.1°spatial resolution ( ∼ 11 km at the equator) over the entire Indian subcontinent, from 2009 to 2018 (10 y). This approach has been shown to have reliable historical prediction capabilities, to allow a variety of input variables, and to have good agreement with ground measurements from previous studies for modeling ambient PM 2:5 concentrations in East Asia and Latin America (Vu et al. 2019;Xiao et al. 2018). Other advantages of RFs are that these models can work with correlated predictors, can provide variable importance vectors to compare the influence of variables, and are not likely to overfit (Xiao et al. 2018).
First, we collected daily ambient PM 2:5 concentration between 2015 and 2018 from air monitoring stations across India from the Central Pollution Control Board (https://app.cpcbccr.com/ccr/#/ caaqm-dashboard-all/caaqm-landing) and hourly data at the U.S. Embassy and Consulates in India from AirNow (https://airnow. gov/). We excluded daily PM 2:5 concentrations that were outside of 3 standard deviations of the mean for the log-transformed data (0.05%; exclusions were values outside of 4:2-845 lg=m 3 ) because we believed these measurements may not be plausible. For hourly PM 2:5 concentration data from U.S. embassies or consulates, we removed data that lacked valid quality checks and daily average concentrations with at <18 hourly measurements (2.2%).
Daily average PM 2:5 measurements from stations within the same 0:1 × 0:1 grid were averaged, resulting in 56,834 grid-day PM 2:5 measurements (Table S1). We averaged gridded daily PM 2:5 measurements into gridded monthly PM 2:5 averages if more than 15 d had valid daily measurements for a given grid cell in a given month. We ended up with 1,446 grid-months of PM 2:5 measurements during 2017-2018 from 134 ground air pollution monitoring stations for model development and 465 gridmonths of PM 2:5 measurements during 2015-2016 from 57 stations for model hindcasting evaluation.
We used multiple satellite-retrieved aerosol, nitrogen dioxide (NO 2 ), fire spot data, and vegetation index products; global meteorological and aerosol reanalysis models; land use information; and population density data for the period between 2009 and 2018 as inputs for our model. Briefly, we used Moderate Resolution Imaging Spectroradiometer (MODIS) Collection 6 level-2 aerosol optical depth (AOD) products (MOD04_L2 and MYD04_L2) at a 10-km resolution collected by the Aqua and Terra satellites, from the Distributed Active Archive Center (https://ladsweb.modaps. eosdis.nasa.gov/) (Levy and Hsu 2015). We used Deep Blue (DB) and Deep Blue/Dark Target combined parameters (Combined) from MODIS retrials and assigned and averaged the centroid of each retrial to create 0:1 × 0:1 grid cells and calculate monthly average AOD value for DB and Combined algorithms. We then conducted gap-filling of the missing AOD values for the period similar to previous study (Xiao et al. 2018). Table S2 summarizes the AOD data missingness patterns, and Figure S1 shows the Combined AOD parameter and DB AOD parameter spatial distribution after gap-filling. The aerosol absorbing index in visible and ultraviolet light and tropospheric NO 2 density data were obtained from the Ozone Monitoring Instrument, downloaded from the Goddard Earth Sciences Data and Information Service Center (https://mirador.gsfc.nasa.gov/). We also collected active fire data obtained from the Information for Resource Management System (https://earthdata.nasa.gov/earth-observation-data/near-real-time/ firms) and the Normalized Difference Vegetation Index from the MODIS Vegetation Indices in the Distributed Active Archive Center (https://ladsweb.modaps.eosdis.nasa.gov/).
Daily meteorological data and aerosol diagnostics were obtained from the Goddard Earth Observing System Data Assimilation System 5/Modern-Era Retrospective analysis for Research and Applications (MERRA-2; https://gmao.gsfc.nasa. gov/reanalysis/MERRA-2/) (Bosilovich et al. 2015). Elevation data was collected from the National Aeronautics and Space Administration's Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (version 2). Highway, primary road, and other road density parameters were obtained from the Global Roads Inventory Project (Meijer et al. 2018). Yearly varying population numbers for 2009-2018 were obtained from the LandScan Global Population Database. All of the above variables were assigned into 0:1 × 0:1 grid cells, and their monthly averages were used as predictors in the PM 2:5 model. Details of these variables are listed in Table S3.
We trained the RF model on gridded monthly mean PM 2:5 concentrations from the 2017-2018 period (N = 1,446) and evaluated model performance using standard, spatial, and temporal 10-fold cross-validation (CV). Spatial CV refers to using data from 90% of grids for developing the model and then testing the model on the remaining 10% of grids; temporal CV relies on using 90% of monthly data to develop the model and then testing the model based on the remaining 10% of monthly data. We also evaluated the hindcasting performance of the model using monthly data from the 2015-2016 period (N = 456). Root-mean-squared error (RMSE) and R 2 values from 10-fold cross-validation and from hindcasting were reported to evaluate the historical prediction performance of the model. We then estimated population-weighted annual mean PM 2:5 concentrations over India during 2009-2018, evaluated the change of PM 2:5 levels over those 10 y, and compared our estimates with other published databases (India State-Level Disease Burden Initiative Air Pollution Collaborators 2019; Shaddick et al. 2018;van Donkelaar et al. 2016). Predicted ambient monthly PM 2:5 levels were matched to each child with geocoordinates at the cluster level [a primary sampling unit (PSU) or a segment of a PSU, of around 100-150 households] provided by DHS/NFHS-4. For each child, we assigned the 9-month average PM 2:5 concentrations prior to birth as the in utero PM 2:5 exposure. We treated post-delivery average PM 2:5 exposures as time dependent and defined them as the average PM 2:5 concentration from the calendar month of child birth through each month the child was at risk, until the month of death or censoring.
Meteorological variables including monthly air temperatures at 2 m and monthly rainfall precipitation were included as well, and they were also matched with the geo-coordinates of households (in the same way as performed for PM 2:5 exposure). Monthly air temperatures at 2 m values were extracted from MERRA-2 data (Bosilovich et al. 2015), and monthly rainfall precipitation in mm/ month were extracted from Climate Hazards Group InfraRed Precipitation with Station data (Funk et al. 2015). R (version 3.5; R Development Core Team) with the randomForestSRC (version 4.6) package was used for PM 2:5 modeling, prediction, and assignment of PM 2:5 and meteorological variables for each child.

Statistical Analysis
We used Cox proportional hazards regression models with followup in calendar months as the time variable and child death as the outcome. We included 9-month in utero and time-dependent, postdelivery average ambient PM 2:5 exposure to estimate adjusted hazard ratios (HRs) relating child mortality risk with 10-lg=m 3 increases of PM 2:5 exposure. In this model, the estimated effects of PM 2:5 exposure during in utero and post-delivery periods were adjusted for each other. We also used single-exposure models with PM 2:5 exposures from either the in utero or post-delivery periods for comparison. We used a set of confounders from the previous literature and a directed acyclic graph ( Figure S2) to identify covariates to be included in the Cox regression model (Heft-Neal et al. 2018;Spears et al. 2019;Subramanian et al. 2009). The prespecified time-independent covariates included child sex, birth month and year, birth order, and location of birth (institutional birth or not); multibirths; and mother's age at child birth, height, marital status, education (whether above secondary level), maternal smoking (yes/no), secondhand smoke exposure in the home (yes/no), wealth index, urban or rural location of households, geographical region/zonal council (Northern, Central, North Eastern, Eastern, Western, and Southern, based on https://www.mha.gov.in/zonalcouncil), primary cooking fuel, and toilet facilities. The household wealth index was calculated and provided by DHS/NFHS-4 using principle component analysis of household ownership of assets (Staveteig and Mallick 2014). We dichotomized toilet facilities as whether they were safely managed based on the World Health Organization (WHO) and United Nations Children's Fund Joint Monitoring Program (Croft et al. 2018; WHO/UNICEF JMP for Water Supply, Sanitation and Hygiene 2018), and dichotomized cooking fuel as clean (electricity, liquefied petroleum gas, natural gas, and biogas) or dirty (kerosene, coal, wood, straw, agriculture crop residue, and dung). Time-dependent covariates included monthly rainfall precipitation and monthly temperature. Child birth year and month, sex, and the geographical region of households were modeled as strata. Unique baseline hazard functions were given for all 927 strata of combinations of these strata. Robust variance estimation for coefficients was used. For tied outcomes, Breslow's method was used. Table S4 provides the details on the variables used in the Cox regression models. Although we did not include children born with LBW because it has been found to be a potential intermediate variable between air pollution and childhood death (Hernández-Díaz et al. 2006), we conducted sensitivity analysis adjusting for LBW in our study population, as described below. We also tested the association between LBW and in utero PM 2:5 exposure, and between LBW and child mortality. We stratified the Cox regression by child birth month and year, child sex, and geographical regions (927 strata) to account for spatiotemporal and sex differences in child survival. In addition, we included the sample probability weight from DHS/NFHS-4 in the Cox regression (Equation 1). h g ðt,Z,XðtÞÞ = h 0g ðtÞ exp ½b T Z + d T XðtÞ: (1) h g ðtÞ is the hazard function (probability of death) at month t, and g refers to the (1-927) strata of child birth month, sex, and geographical location. Z is the vector containing time-independent variables, including in utero PM 2:5 exposure, and XðtÞ are the timedependent variables including post-delivery PM 2:5 exposure. The association between PM 2:5 and child mortality was reported as the HR per 10-lg=m 3 increase of PM 2:5 , and we also calculated the HR per interquartile range (IQR) increase of PM 2:5 (40:6 lg=m 3 ) to compare other studies. To evaluate whether effects of PM 2:5 differed across covariates, we conducted subgroup stratified analysis of the two-exposure models. The proportional hazards assumption for in utero and post-delivery average PM 2:5 exposure was tested on the basis of Schoenfeld residuals on time, and we found it did not violate the proportional hazards assumption (p > 0:05). The correlation matrix of the predictor estimation from the Cox regression model was calculated to check multicollinearity, and the Akaike information criterion (AIC) was calculated to compare single-and two-exposure models. We conducted subgroup analysis by conducting separate models by stratifying variables including sex, LBW, delivery mode, whether multibirth, maternal age, maternal height, education, maternal smoking, secondhand smoking, household wealth index, cook fuel type, toilet type, urbanity, marital status, and geographical location, and we tested the hypothesis that there were no interaction effects between in utero and post-delivery PM 2:5 effects with each of the stratifying variables.
To examine the robustness of our results, we conducted additional sensitivity analyses. We added LBW back to the model (modify 1); conducted analysis using stratified variables as timeindependent covariates (modify 2); adding additional clusterlevel covariates, including the percentage of household with improved toilets and clean cooking fuels (modify 3); performing unweighted analysis without sample weight (modify 4); using generalized estimating equations to account for the additional correlation within household (modify 5) and cluster level (modify 6); replacing post-delivery lifetime PM 2:5 exposure with monthly exposure (modify 7); and performing crude analysis without adjusting covariates (modify 8). In addition, we also conducted an analysis treating in utero and post-delivery PM 2:5 as categorical variables using quartiles of exposure in the model and tested the linear trend of the categorized exposure.
Last, we calculated the population attributable fraction (PAF) as the burden of child mortality from ambient PM 2:5 exposure.
We estimated the PAF from the modeled in utero PM 2:5 exposure distribution by every 10th percentile, and the risk of mortality (derived from the single-exposure model only) at a given exposure level compared baseline exposure levels. Stata (version 14; StataCorp) and R (version 3.6; R Development Core Team) were used in the statistical analyses.

Results
The RF PM 2:5 prediction model characterized monthly PM 2:5 levels well, with CV R 2 = 0.82 at the monthly level (RMSE of 25 lg=m 3 ) in 2017-2018. The spatial and temporal CV R 2 (RMSE) of the RF model was 0.77 (26 lg=m 3 ) and 0.77 (25 lg=m 3 ) respectively, indicating a stable and good fit of monthly ambient PM 2:5 measurements from ground stations. Compared with historical measurements in 2015-2016, our model provided good predictions of monthly PM 2:5 with R 2 (RMSE) of 0.82 (32 lg=m 3 ). Figure S4 shows the performance of the CV and hindcasting of the RF model. Figure S3 shows changes in model performance across different numbers of decision trees and the variable importance matrix for the model. The historical prediction shows that our model underestimated PM 2:5 levels during periods with extreme levels, when monthly PM 2:5 levels were >350 lg=m 3 . Regardless, our model still had relatively good hindcasting capacity to predict historical monthly PM 2:5 levels. The annual within-sample RMSE from the model CV in 2017-2018 was 10:9 lg=m 3 , and the annual RMSE from historical hindcasting in 2015-2016 was 21:7 lg=m 3 , with better performance compared with previous ambient air pollution models used in the GBD study in the South Asia Region (Shaddick et al. 2018), which had within-sample RMSE of 17:6 lg=m 3 . Figure 1 shows the predicted annual PM 2:5 in India in 2018 and the annualized change of PM 2:5 over a 10-y period. The predicted annual population-weighted mean PM 2:5 in India in 2018 was 71:7 lg=m 3 . The highest annual PM 2:5 concentrations (>120 lg=m 3 ) were in the Indo-Gangetic Plain Region, covering the states of Haryana, Uttar Pradesh, and Bihar and the National Capital Territory of Delhi. These values are three times >40 lg=m 3 , the recommended limit set by the National Ambient Air Quality Standards of India (Gautam 2009). We observed increased ambient PM 2:5 concentrations over a 10-y period in most areas, with the strongest increase of ambient PM 2:5 concentrations at ∼ 1 lg=m 3 per year in the Indo-Gangetic Plain Region and along the west coast of the Western Region, whereas the Northeastern Region and the north part of the Northern Region showed decreases in ambient PM 2:5 levels. Details of predicted PM 2:5 in India ( Figures S5 and S6), major metropolitan areas (Delhi, Mumbai, Kolkata, Chennai, and Bangalore) (Table S5), and population-weighted PM 2:5 exposures from our model vs. other global models are listed in Table S6. The predicted ambient PM 2:5 air pollution can be downloaded at https://doi.org/ 10.15139/S3/RFILYH.
The present study included data from 259,627 live-birth children born 5 y preceding the survey from 175,865 women in the DHS/NFHS-4 data. We excluded 6,839 (2.6%) children in the analysis due to a) living in the Andaman and Nicobar Islands and Lakshadweep Union Territories, which are islands more than 200 km from the India subcontinent (n = 2,282, 0.9%); b) not living with their mothers (n = 1,912, 0.7%); or c) missing variables (n = 2,645, 1.1%). A total of 252,788 children born between January 2010 and November 2016 were included from 27,853 clusters (Table 1). In the follow-up period of 7,447,724 childmonths, 11,559 deaths were reported in children <5 years of age by their mothers. Among these deaths, 7,520 deaths occurred before the first month of life, and 10,862 deaths occurred during the first year of life (including 7,520 neonatal deaths). For the entire cohort, the mean (SD) of 9-month in utero and postdelivery lifetime average PM 2:5 exposure until failure or censoring was 71.1 (28.2) and 73.3 ð29:8Þ lg=m 3 , respectively. The characteristics of the cohort are shown in Table 1, stratified by 9month in utero average PM 2:5 exposure, at levels <49:7 lg=m 3 , 49.7 to <62:4 lg=m 3 , 62.4 to <90:3 lg=m 3 , and ≥90:3 lg=m 3 . In general, compared with children in the lowest quartile, children in the highest quartile of exposure were more likely to reside in rural areas (80.1% vs. 70.8%), were born to mothers with shorter stature (47.4% vs. 34.2% of mothers were <149:9 cm in height), lower educational levels (44.3% vs. 19.4% were below the secondary education level), and lived in households with a lower wealth index (35.6% vs. 15.6% were in the lowest quartile of the wealth index). Children residing in central (Chhattisgarh,  Table 2). The correlation between average in utero and post-delivery lifetime PM 2:5 exposure for the same child was 0.74. Table 1. Unweighted characteristics of children <5 years of age by four quartiles of 9-month in utero PM 2:5 exposure who were born between 2010-2016 in the Demographic and Health Surveys [DHS, also known as the National Family Health Survey 2015-2016 (NFHS-4)] in India.
We found little evidence of effect modification by most covariates on PM 2:5 exposure on mortality in subgroup analysis. Figure 2 shows the estimated in utero and post-delivery lifetime average ambient PM 2:5 effects on child mortality across covariates, and the results were similar for infant mortality and neonatal mortality ( Figures S7 and S8). We observed some heterogeneity of postdelivery PM 2:5 -mortality HRs across different maternal height, maternal age, and LBW, indicating some interactions between these covariates and post-delivery PM 2:5 exposure. Numerical values for these results are shown in Table S11. For children with LBW, we observed in utero PM 2:5 -mortality HRs of <1. In addition, we found that LBW was associated with both in utero PM 2:5 exposure and child mortality, as shown in additional analyses (Tables S12 and S13). Our results were robust to different variable selections (modify 1-3 in Figure S9, Table S14), unweighted analysis (modify 4 in Figure S9, Table S14), and Cox regressions with generalized estimating equations to account for the correlation of outcomes within households and clusters (modifies 5 and 6 in Figure S9, Table S14), inclusion of time-dependent post-delivery monthly PM 2:5 exposure to replace post-delivery lifetime PM 2:5 exposure (modify 7 in Figure S9, Table S14), and crude models with single or two exposures (modify 8 in Figure S9, Table S14). The results for the categorized exposure models are listed in Table  S15, showing a consistent linear trend for both categorized in utero and post-delivery PM 2:5 exposure on mortality.   Cluster is a primary sampling unit (PSU) or a segment of PSU, of around 100-150 households, in the two-stage clustering sampling design of the DHS/NFHS-4. b 9-month average of ambient PM 2:5 exposure prior to child birth month.
c Average of post-delivery lifetime ambient PM 2:5 exposure from month of child birth until the month of death, or month of DHS/NFHS-4 interview of the child's mother (censoring). Note: Adjusted hazard ratios and 95% confidence intervals for all-cause neonatal mortality, infant mortality and child mortality are shown for 10-lg=m 3 increase of ambient PM 2:5 during 9-month in utero period before child birth and post-delivery periods. Models stratified on child sex, birth month and year, geographical zone (927 strata), adjusted for birth order, multibirth, birth location, mother's age, height, marital, education, maternal smoking, household wealth, secondhand smoking, cooking fuel, improved toilet, urban or rural location of households, monthly temperature, monthly precipitation. PM 2:5 , particulate matter ≤2:5 lm in aerodynamic diameter (fine particulate matter).
a Two-exposure models include both ambient PM 2:5 exposure during 9-month in utero period before child birth and post-delivery lifetime average until death, or censoring. b Single-exposure models include either one of PM 2:5 exposure during 9-month in utero period before child birth or post-delivery average until death, or censoring.   Table S8. All model specifications are the same as for the main analysis except for the subgroup analysis of low birth weight (LBW), which is not included in the main analysis. Red circles (A) represent effects of in utero PM 2:5 exposure, and blue triangles (B) represent the effects of post-delivery lifetime average PM 2:5 exposure. p-Values for testing differences among strata for each stratifying variable are listed. Note: PM 2:5 , particulate matter ≤2:5 lm in aerodynamic diameter (fine particulate matter); Q, quartile; SHS, secondhand smoking.
As for the child mortality burden from air pollution, we estimated that 17.6% (95% CI: 12.3%, 22.4%) of mortality could be reduced if exposures were reduced to 10 lg=m 3 and that 9.3% (95% CI: 6.5%, 11.9%) of mortality could be reduced if exposures were reduced to the National Ambient Air Quality Standard of India of 40 lg=m 3 (Table S16).

Discussion
In a large, nationally representative retrospective cohort of children <5 years of age in India, we found a consistent association between exposure to ambient PM 2:5 during in utero and postdelivery periods and elevated child mortality. The estimated associations were stronger in single-exposure models compared with two-exposure models, perhaps due to the strong correlation between in utero and post-delivery exposures. Similar results were observed for infant and neonatal mortality. Even though in utero and post-delivery lifetime PM 2:5 exposure showed some multicollinearity in the two-exposure models, two-exposure models had smaller AICs compared with single-exposure models, indicating the proper fit of the two-exposure models. To our knowledge, this is the first cohort study conducted in India to estimate the effects of ambient PM 2:5 exposure on child survival in a nationally representative cohort of children.
To estimate monthly PM 2:5 concentrations, we developed a model based on multiple inputs from satellite observations, meteorological data sets, and land use information from between 2009 and 2018. We identified disproportionally high levels of PM 2:5 in the Indo-Gangetic Plain Region, where the population density is high. In addition, we found increasing levels of ambient PM 2:5 over the past 10 y, emphasizing an urgent need to control ambient air pollution in India. Children included in our study had high exposure levels to ambient PM 2:5 at nearly twice the National Ambient Air Quality Standard of India (40 lg=m 3 ) (Gautam 2009) and more than seven times the WHO Air Quality Guidelines (10 lg=m 3 ) (WHO 2005). Given the high levels of ambient PM 2:5 throughout India, reducing ambient air pollution levels could provide a relatively large reduction in child mortality and substantial health benefits.
The child mortality-HR estimates per 10 lg=m 3 of PM 2:5 exposure in our study were in the range of 1-3%, lower than previous estimates in LMICs. In sub-Saharan Africa, Heft-Neal et al. (2018) found a 9% increase in risk of infant mortality associated with a 10-lg=m 3 increase in annual PM 2:5 concentration, and Loomis et al. (1999) found a 7% increase of infant mortality associated with a 10-lg=m 3 increase of PM 2:5 concentration during the 3-5 d before death in Mexico City. In the present study, given that the PM 2:5 exposure range was three times higher than those in previous studies, the estimated increases of mortality per IQR (one IQR = 40:6 lg=m 3 ) increase of PM 2:5 were in the range of 5-12%, more similar to earlier findings (Heft-Neal et al. 2018;Loomis et al. 1999). One study, which assessed early life ambient PM 2:5 exposure and child mortality in 43 LMICs, did not find a significant association between air pollution and child mortality (Goyal et al. 2019). This could be due to their use of annual PM 2:5 exposure as a proxy for exposure at <1 year of age, which thus introduced measurement error and which may have biased results toward the null (Zeger et al. 2000). Another study conducted in Beijing using a time-series design also did not find a significant association between current-month PM 2:5 exposure and infant mortality (Wang et al. 2019).
The present study has several strengths. First, we used machine learning to develop a high-quality model of ambient PM 2:5 concentrations that had monthly temporal resolution and high spatial resolution that could reflect seasonal differences in ambient air pollution. These estimates of ambient PM 2:5 levels could be potentially applied in future health impact assessment and epidemiological analysis in India. Second, we used a large, retrospective, and nationally representative cohort of children <5 years of age in India based on the well-documented and -conducted DHS/NHFS-4 survey. Third, we applied Cox regression with time-dependent air pollution, temperature, and rainfall precipitation on child mortality using the calendar month as the time variable. This approach allowed us to examine the longitudinal association between air pollution and child mortality and compare post-delivery PM 2:5 exposure for children with the same follow-up time. This approach could largely reduce exposure misclassification compared with previous studies (Goyal et al. 2019;Heft-Neal et al. 2018), which were based on yearly PM 2:5 averages to assess a binary outcome of child death at the end of 1 year of age. Our analysis controlled for potential bias from seasonality and different weather patterns, in addition to using individual-level, household-level, and cluster-level covariates. Notably, we did not include LBW as a covariate in the model due to the potential selection bias of including LBW as a covariate, and for children with LBW, we found protective effects of in utero PM 2:5 exposure. This is similar to the protective effects of maternal smoking on infant mortality for children with LBW observed in the United States as a result of selection bias (Hernández-Díaz et al. 2006). Finally, the PM 2:5 -mortality associations were consistently positive in single-and two-exposure models, and these associations were robust in additional sensitivity analyses.
Our study also has some limitations. First, no national PM 2:5 monitoring data were available during the study period of 2009-2014, so we lacked ground-level PM 2:5 measurement data before 2015 with which to validate our PM 2:5 prediction model. The monitoring stations were more heavily concentrated in urban areas, which could have led to differential errors across the country as well. Similar to previous PM 2:5 prediction models (Xiao et al. 2018), our model tended to underestimate some very high PM 2:5 values, when monthly PM 2:5 concentrations were >350 lg=m 3 ( Figure S3). Second, we used only total ambient PM 2:5 mass concentration at the cluster level as our exposure for participants at the household level, without estimation of different PM 2:5 components, and did not include the impact of other ambient air pollutants, such as PM with an aerodynamic diameter of ≥10 lm, NO 2 , and ozone, or direct household air pollution exposures from cooking, which are significant in many parts of India. Third, our retrospective cohort was based on a cross-sectional survey of the DHS/NFHS-4. We focused only on all-cause mortality reported in the survey, whereas it is likely that ambient PM 2:5 is associated with specific cause of death. Child birth year, birth month, vital information, and covariates were collected at the end of the study and recalled by mothers and may have differed from values at the time the children were born. In addition, we did not have the children's gestational age at birth and assigned in utero PM 2:5 exposure levels as a 9-month average prior to the birth, which may not be the case for preterm infants who have a shorter in utero exposure period. Given this data collection limitation, our study could have suffered from recall bias and misclassification. However, we believe most of the maternal and household-level covariates will be nondifferential misclassification of outcomes and could most likely bias our results toward the null. Last, as with all observational studies, our study may have potential unmeasured confounders that were inadequately controlled for in this analysis. Our results indicate a need for future prospective cohort studies (Balakrishnan et al. 2015), including intervention trials of reducing air pollution levels (Clasen et al. 2020), to investigate air pollution effects on child health.
In conclusion, this study found increased mortality risk associated with ambient PM 2:5 during the 9-month in utero and postdelivery periods for children <5 years of age. This study, based on a nationally representative retrospective cohort in India, provides evidence suggesting that in utero and post-delivery PM 2:5 exposure contribute to child mortality in developing countries. Given the high levels of ambient PM 2:5 throughout India, expanding air pollution monitoring stations (both in rural and urban areas), adding more epidemiological research, and making a substantial effort to reduce ambient air pollution and early life exposures are needed.