Contribution of Long-Term Exposure to Outdoor Black Carbon to the Carcinogenicity of Air Pollution: Evidence regarding Risk of Cancer in the Gazel Cohort

Background: Black carbon (BC), a component of fine particulate matter [particles with an aerodynamic diameter ≤2.5 μm (PM2.5)], may contribute to carcinogenic effects of air pollution. Until recently however, there has been little evidence to evaluate this hypothesis. Objective: This study aimed to estimate the associations between long-term exposure to BC and risk of cancer. This study was conducted within the French Gazel cohort of 20,625 subjects. Methods: We assessed exposure to BC by linking subjects’ histories of residential addresses to a map of European black carbon levels in 2010 with back- and forward-extrapolation between 1989 and 2015. We used extended Cox models, with attained age as time-scale and time-varying cumulative exposure to BC, adjusted for relevant sociodemographic and lifestyle variables. To consider latency between exposure and cancer diagnosis, we implemented a 10-y lag, and as a sensitivity analysis, a lag of 2 y. To isolate the effect of BC from that of total PM2.5, we regressed BC on PM2.5 and used the residuals as the exposure variable. Results: During the 26-y follow-up period, there were 3,711 incident cancer cases (all sites combined) and 349 incident lung cancers. Median baseline exposure in 1989 was 2.65 10−5/m [interquartile range (IQR): 2.23–3.33], which generally slightly decreased over time. Using 10 y as a lag-time in our models, the adjusted hazard ratio per each IQR increase of the natural log-transformed cumulative BC was 1.17 (95% confidence interval: 1.06, 1.29) for all-sites cancer combined and 1.31 (0.93, 1.83) for lung cancer. Associations with BC residuals were also positive for both outcomes. Using 2 y as a lag-time, the results were similar. Discussion: Our findings for a cohort of French adults suggest that BC may partly explain the association between PM2.5 and lung cancer. Additional studies are needed to confirm our results and further disentangle the effects of BC, total PM2.5, and other constituents. https://doi.org/10.1289/EHP8719


Introduction
Strong evidence over recent decades allowed classifying outdoor air pollution and fine particulate matter [fine particulate matter with an aerodynamic diameter ≤2:5 lm (PM 2:5 )] as carcinogenic (Loomis et al. 2013;Pedersen et al. 2017;Raaschou-Nielsen et al. 2013;IARC 2016). Yet the separate effects of each PM 2:5 component (sulfates, nitrate, ammonium, organics, metals, etc.) are rarely quantified (Beelen et al. 2015;Ostro et al. 2011;Raaschou-Nielsen et al. 2016). Black carbon (BC), a component of PM 2:5 , comes from incomplete combustion processes, mainly from anthropogenic sources such as fossil fuel or biomass burning (Chylek et al. 2015;Koelmans et al. 2006). The first health concerns about exposure to BC appeared decades ago (Mumford et al. 1990); since then, reports have accumulated linking exposure to BC with increased morbidity and mortality, including lung cancer mortality (Anenberg et al. 2012;Grahame et al. 2014;Hvidtfeldt et al. 2019;Yang et al. 2019), lower lung function and slower cognitive development in children (Paunescu et al. 2019;Sunyer et al. 2015), increased bone loss , and decreased cognitive functions in the elderly Wurth et al. 2018). Although evidence has accumulated on toxicity of BC, we still know little about the effects of chronic low-level exposure on cancer risk, partly because the paucity of available data on general population long-term exposure to BC left little opportunity for such studies. Recently, the ELAPSE project estimated annual outdoor BC concentrations between 1990 and 2015 at fine resolution over Europe (de Hoogh et al. 2018). In this study, we aimed to investigate the relationships between long-term exposure to BC and incident all-site and lung cancer in the population-based French cohort Gazel with a 26-y follow-up.

Study Population
The Gazel cohort enrolled 20,625 participants in 1989 from the French national gas and energy company, Electricité-de-France Gaz-de-France (Goldberg et al. 2015). These participants, aged 35-50 y at enrollment, completed a baseline detailed selfadministered questionnaire, then a follow-up questionnaire sent every yearwith a high response rate during the follow-up (>80% from 1990 to 1992, and >70% from 1993 to 2015). Participants' histories of main residential addresses were collected and geocoded for each year since 1989. To minimize misclassification while retaining the largest possible number of participants, we excluded participants with the poorest exposure coverage [i.e., more than 20% of missing geocodes during their follow-up (due to stays abroad mainland France)]. This residential history was collected on an annual basis, so each address corresponds to a calendar year. We used last-observation carried forward to impute any missing addresses for the concerned participants. We identified the numbers of residential changes during the study period by choosing a threshold of 1-km difference in the geocodes to identify a substantial residential change; during the study period, we observed 13,981 of changes of more than 1 km, for 9,112 participants. Geocoding precision ranged from postal code (13%) to address level (48%).
We excluded 823 participants with any primary incident cancer diagnosed or censored before 1999, to take account of a potential 10-year lag between exposure and incidence/censoring (see "Statistical Analyses" section). This approach led to a slightly different study population for analyses on all-site and lung cancer. We also excluded 90 participants who were lost on follow-up (because they definitively left the company) or who asked to be removed from the study and their data to be deleted. Our study included participants who died during the follow-up without a diagnosis of cancer and who were censored at the date of death. Further, in the lung cancer analysis, we compared lung cancer cases to subjects not developing any cancer; thus we excluded participants with other cancers at any time during the study period (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) from the study population. Our study population included 19,348 and 15,694 participants for the analyses on primary incident all-site and primary incident lung cancer ( Figure S1).
The Gazel study protocol was approved by the French authority for data confidentiality (Commission Nationale de l'Informatique et des Libertés No. 105,728) and by the Ethics Evaluation Committee of the Institut national de la santé et de la recherche médicale (Inserm, National Institute of Health and Medical Research) (IRB0000388, FWA00005831). The invitation to participate was sent by post to eligible persons, accompanied by a document detailing the project, the voluntary nature of their participation, the data collected, the conditions of security and confidentiality and the future use of the data. The subjects solicited were invited to complete a questionnaire indicating their consent.

Cancer Incidence
Incident cancer cases were ascertained from three sources: a) French national health administrative databases containing listings of incident cancers (more details below) during the study period (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015); b) company records which have systematically recorded all cancers (except nonmelanoma skin cancers) diagnosed among their current employees [with pathology reports and the date of diagnosis, and coded according to the International Classification of Diseases (ICD)] (Goldberg et al. 1996); and c) cancer diagnoses self-reported by participants via the follow-up questionnaires from 2008 onward. Participants who gave consent were contacted for collecting medical information to obtain the date of diagnosis and the type of cancer. The linkage of participants to the French national health administrative databases that record each use of the health system allowed identifying cancer from data on hospitalizations and from the "Chronic Diseases" register (diseases including cancer for which all the treatment is reimbursed) with dates and diagnoses (Tuppin et al. 2017).
The ICD-10 classification system was used to code the type of cancer, with the whole ICD-10 chapter except C77-79 (secondary malignant neoplasms) and C44 (nonmelanoma skin cancers); we used C34 to identify lung cancer.

Exposure Assessment
For each subject of our study population, we estimated BC, PM 2:5 , and nitrogen dioxide (NO 2 ) exposure in each year from 1989 to 2005, based on the subject's residential address linked to data from land use regression (LUR) models developed at a fine spatial scale (100 × 100 m) over Europe (de Hoogh et al. 2018). This linkage also accounts for any residential address change over the years (see "Study Population" section). PM 2:5 and BC measurement data came from two sources: PM 2:5 absorbance in samples collected in the ESCAPE project for BC (436 sites), and from regulatory monitoring data maintained in the AirBase European database for PM 2:5 (543 sites). For the year 2010, LUR models were developed by regressing the measured pollutant concentrations against a range of predictor variables (including land-use variables, road density, and altitude, as well as satellite-derived and chemical transport modeled pollutant estimates) followed for PM 2:5 only by universal kriging to explain spatial autocorrelation in the residuals. The full model (based on all monitoring sites) explained 72%, 54%, and 59% of PM 2:5 , BC, and NO 2 respectively. Models were further validated, and shown to be robust, using a five hold out validation strategy which explained 66, 51, and 57% of the spatial variation in the respective measured PM 2:5 , BC, and NO 2 concentrations (de Hoogh et al. 2018). Finally, the estimated concentrations for 2010 were rescaled annually for the years 1990-2015, by European Nomenclature of Territorial Units for Statistics -1 (NUTS-1) regions (i.e., European Union-defined administrative regions within countries) in France, using back-and forward extrapolation. This rescaling process was based on annual mean estimates  from the 26 × 26 km Danish Eulerian Hemispheric Model, downscaled from the original 50 × 50 km resolution using bilinear interpolation (Brandt et al. 2012). In addition, in this study, we further back-extrapolated PM 2:5 exposure to 1989. To visualize spatiotemporal differences in BC exposure over France, we mapped the differences in BC exposure over France for Gazel nonmover participants between periods 1995-2000 and 2000-2005. We calculated the relative change in %, for each nonmover participant for each pair of years. To improve the maps' readability, we averaged the results on a 5 × 5 km 2 grid.

Covariables
Based on recognized and suspected risk factors, we a priori selected the following sets of variables as potential confounders and/or effect modifiers: Sociodemographic and occupational variables. Sex, education (attending school for 6-11 y, 12-13 y, 14-15 y, other secondary education, other diploma), and socioeconomic status (SES; low: blue-collar workers or clerks; medium: first-line supervisors or sales representatives; high: management), all at baseline. We also included a synthetic summary of occupational exposure to nine known lung carcinogens (asbestos, cadmium, chlorinated solvents, chromium, coal gasification, coal-tar pitch, creosotes, crystalline silica, and hydrazine) over the whole employment period [categorized into none, one, two, or at least three carcinogens, from each Gazel participant's career-long history linked to the French job-exposure matrix MATEX (Imbernon 1991)].
Lifestyle variables. Time-varying variables for tobacco (cumulative smoking pack-years), alcohol intake (abstinent, light drinker, moderate drinker, heavy drinker, unclear pattern), family status (single or not), body mass index (BMI, weight in kilograms divided by the square of height in meters). Some questions were only asked occasionally, such as passive smoking at home or at work (yes or no) in 1990 and 1996, or fruit and vegetable intake (never or less than once a week; once or twice a week; more than twice a week but not every day; every day or almost) in 1998, 2004, 2009 and 2014. We processed these two variables to make them time-varying annually, attributing the data collected in 1990 and 1996 to each year between 1989-1995 and 1996-2015, respectively, for passive smoking, and the data collected in 1998, 2004, 2009, and 2014 to each year between 1989-1998, 1999-2004, 2005-2010, and 2011-2015, respectively, for fruit and vegetable intake.
Contextual variables. For all participants and every year, we calculated the distance to the nearest major road. At the municipality level, we obtained the population density in 1989, 2000, and 2010, from which we derived a urban classification: urban (high population density), semiurban (intermediate pop density), and rural (low population density). The population density cutoffs are based on the European urban/rural classification. To define whether the participants had lived solely in urban, semiurban, rural area, or in mixed areas over all the follow-up, we used the information from the 3 y for which we obtained such data. Also at the municipality level, we obtained the French deprivation index (Rey et al. 2009) calculated for 2009 for all participants who were still alive and therefore geocoded in 2009. To take into account participants who died before this variable could be calculated in 2009, we categorized the values from the French deprivation index into tertiles and added the missing values as a category so as to not lose any participant in the analyses because of this variable.

Imputations
For baseline variables, missing values ranged from 0% to 2.1% for sex and education, respectively. Throughout the follow-up, missing data ranged from 21% to 29% for alcohol consumption and BMI variables, respectively. We imputed all variables (except air pollution exposure and contextual variables) longitudinally for each participant using multiple imputations by chained equations from the R packages MICE and MICEADDS (van Buuren and Groothuis-Oudshoorn 2011), iterating 10 datasets 10 times with good convergence. All the variables described above were used as predictors. For the geographic variables (exposure to pollutants, deprivation index, distance to the road or urban classification), because the other predictors cannot predict them accurately, we recreated the initial missing values in our final dataset. We used the functions "2l:pmm" and "2l:only:pmm" for time-varying and time-independent variables, respectively. Following the MICE package manual (https://cran.r-project.org/web/packages/mice/ mice.pdf), we assessed convergence visually using Figure S2; the streams are supposed to mingle well and without showing a clear trend. Only for fruit and vegetable intake did we observe a poor mingling of the streams, due to the correlation between the responses to this question asked in 1999, 2004, 2009 and 2014. Because we aimed at pooling these values into one time-dependent categorical variable, we discarded this issue. Model-based estimates were pooled following Rubin's rules.

Statistical Analyses
We used extended Cox models with BC as a time-dependent variable, with attained age as underlying time scale, to validate the proportional hazard assumption, while annually describing the exposure to air pollutants. We estimated the associations between incident cancer and BC, as a single pollutant, or taking PM 2:5 into account, following an approach based on residuals (Mostofsky et al. 2012). Since we have a rare opportunity to utilize a long timeseries of BC exposure to study the long-term association with cancer incidence, we used cumulative exposures for each participant from baseline to incidence or censoring and adjusted for calendar time and age at inclusion (Pencina et al. 2007).
• 1. Single-pollutant approach: We included cumulative timedependent black carbon exposure or cumulative timedependent PM 2:5 exposure or cumulative time-dependent NO 2 exposure separately as a single pollutant in our main model. For these three pollutants, we used a spline function with 3 degrees of freedom (df) to test for nonlinearity. Based on visual inspection, the response to exposures approximately followed a natural logarithm-shaped curve for all-site cancer incidence ( Figure S3), but not for cumulative NO 2 and lung cancer. To consider these nonlinear relationships, and to facilitate interpreting the results of the Cox models, we natural log-transformed the cumulative annual BC and PM 2:5 time-dependent exposures for both outcomes, and NO 2 time-dependent exposure for all-site cancer only. • 2. Residual approach for black carbon: The Spearman's correlation coefficient between PM 2:5 and BC was 0.74. Including both of those exposure variables in a regression model can distort the true effects of one or both of those variables. As an attempt to isolate the effect of BC from that of PM 2:5 , we followed the approach of Mostofsky et al. (2012), who suggested using the residuals of a regression between the constituent of PM 2:5 and PM 2:5 total mass. To do so, we first regressed BC (dependent variable) on PM 2:5 (independent variable). The residuals of this regression should represent the variations of BC independently of PM 2:5 . When correctly specified, the residuals should be uncorrelated with PM 2:5 . We included BC as natural log-transformed cumulative exposures in the regression, and cumulative PM 2:5 using a spline function with 4 df. This specification decorrelated the BC residuals from the cumulative PM 2:5 exposure. In Cox models using BC residuals as exposure, the coefficient represents the risk associated with higher levels of cumulative black carbon exposure, while holding cumulative PM 2:5 exposure constant. We did not further adjust for PM 2:5 , because it would provide information to interpret the effect of the other constituents of PM 2:5 , which was not the aim of this study. To test the independence of the effects of BC from those of NO 2 , we did the same analysis using the residuals of a regression between BC and NO 2 because BC and NO 2 were also highly correlated (Spearman's correlation coefficient of 0.89) and precluded a bi-pollutant analysis with the two pollutants in the same model. We provided hazard ratios for one interquartile range (IQR) of all variables of exposure to air pollution (after natural logtransforming cumulative BC and PM 2:5 for both outcomes, as well as for cumulative NO 2 for all-site cancer), and of the residual variables.
To consider cancer latency, it is customary, especially when using time-varying variables in the statistical analyses, to discount exposures that occurred recently, because these are unlikely to have affected the cancer risk (Rothman et al. 2008). Thus, we implemented a 10-y lag period during which exposure was not counted for any time-dependent variables except for passive smoking and for fruit and vegetable consumption, which already included a time-lag due to the way these variables were collected and interpolated (therefore using passive smoking values in 1996 only). The inclusion of contextual variables was explored only in sensitivity analyses. We used several levels of adjustment when using BC as exposure: a) Model 1 included sex, and calendar time and age at inclusion as continuous variables; b) Model 2 also included cumulative smoking pack-years and passive smoking (yes/no as defined in 1996); and c) Model 3 (main model) was additionally adjusted for baseline education, SES, and occupational exposure to lung carcinogens and for timevarying alcohol consumption, family status, BMI, and fruit and vegetable consumption. We used the main model to derive estimates of the associations between PM 2:5 , NO 2 , or BC residuals and incident cancer. We calculated the Akaike Information Criterion (AIC) of each model, according to the level of adjustment or according to the exposure included.
We used multiple imputations by chained equations to conduct all the analyses (see section below) unless specified otherwise.
Sensitivity analyses were conducted for BC or PM 2:5 separately (single-pollutant models) by: a) implementing a 2-y lag period with the same study population as that of the analysis implementing the 10-y lag period; and, after again implementing a 10-y lag period; by b) further adjusting for the French deprivation index; c) restricting the study population to the participants with address-level geocodes throughout their follow-up; d) restricting the study population to the participants with at least 20 y of follow-up (thus an exposure window of at least 10 y); and e) considering three other ways to deal with missing data: conducting these analyses with complete cases only, considering missing values as a category (therefore categorizing continuous variables by quartiles), or imputing missing data as the median (for continuous variables) or the mode (for categorical variables). As a further sensitivity analysis to explore the nonlinear relationship of the association between BC exposure and all-site cancer, we developed a two-piece linear model, by including an interaction term between BC exposure and a Boolean variable with the most appropriate cut-off found out to be at 24 10 −5 =m (as visually observed in Figure S3 and with the maximum likelihood among values from 15 to 30 10 −5 =m).
For all-site cancer, to search for any effect modification by sex, smoking status, urban classification, and distance to the nearest major road, we used the single-pollutant models for BC or PM 2:5 separately in single-pollutant models restricted to the following subsets: female or male, ever or never smokers, solely urban or solely semiurban or solely rural during the follow-up, and closer or farther than 500 m from the nearest major road over all the follow-up. Stratified analyses were not done for the analyses on lung cancer due to the small number of cases.
We conducted all the analyses with R (version 3.5.1; R Development Core Team 2018) with the SURVIVAL package (Therneau 2015;Therneau and Grambsch 2000).

Study Population and Exposure
Study population characteristics are shown in Table 1. More than 70% were men. Mean age at baseline was 43.5, and median follow-up period was 27 y. Among those who eventually got cancer, the mean baseline age was 44.5 y, and the median follow-up until diagnosis was about 21 y. Most had 11 y or fewer of schooling, and most were employed in intermediate level jobs. Slightly fewer than half had been regular smokers at some time. Former and current smokers had cumulated 15.4 pack-years on average at baseline; those diagnosed with incident all-site or lung cancer cumulated 18 and 32 pack-years at baseline, respectively. We classified 71.4% participants as light drinkers, and 8.8% as heavy drinkers. Almost 70% of the participants ate fruit and vegetables almost every day and we calculated a median BMI of 24.3.
The exposure assessment yielded BC concentrations ranging between 0.7 and 8.9 10 −5 =m with a median of 1.9 between 1989 and 2015 (Table 2) with a modest decline over time (Figure 1) that differed slightly between regions (Supplementary Figure S4). The cumulative exposure ranged between 1.5 and 104.1 10 −5 =m with a median of 19.7. The exposure assessment yielded PM 2:5 concentrations ranging from 2.6 to 57.3 lg=m 3 with a median of 21.6 between 1989 and 2015, and a cumulative exposure ranging between 3.0 and 691.8 lg=m 3 with a median of 252.8 (Figure 1; Table 2).
Between 1989 and 2015, a total of 4,354 incident primary cancers (excluding nonmelanoma skin cancers) were diagnosed in the cohort, of which 410 were lung cancers. Our analysis lagged exposures by 10 y and thus included cancers diagnosed from 1999 onward only (n = 3,711 for all-site cancers; n = 349 for lung cancers.). (See Table S1 for diagnoses by calendar year.) All-site cancer cases included prostate (1,301 cases, 34%), breast (378 cases, 10%), and colorectal (362 cases, 9%); other cases counted for less than 8% each (Table S2).
Associations between Long-Term Exposure to Black Carbon and Cancer Incidence All-site cancer. Using the single-pollutant model, BC exposure was significantly associated with increased cancer incidence in single-pollutant models (Table 3), with a hazard ratio (HR) of 1.17 [95% confidence interval (CI): 1.06, 1.29] for one IQR increase in ln-transformed cumulative BC exposure using the main model. Results were similar for models adjusted only for age, calendar time, and sex (Model 1), or models that also included smoking pack-years and passive smoking (Model 2). AICs were similar among the three models, but the smallest for the main model. Long-term PM 2:5 exposure was also significantly associated with all-site cancer, with a HR of 1.20 (95% CI: 1.06, 1.34) using the main model for one IQR increase in ln-transformed cumulative PM 2:5 exposure, but the association with long-term NO 2 exposure was smaller and statistically nonsignificant, with a HR of 1.05 (95% CI: 0.97, 1.14) for one IQR increase in ln-transformed cumulative NO 2 . The BC residual approach (using the residuals of the cumulative BC exposure regressed on the cumulative PM 2:5 exposure) yielded a HR of 1.05 (95% CI: 1.00, 1.11) for an IQR increase of the residuals. This means that, holding total PM 2:5 constant, an increase in BC and closely linked constituents (and consequently, a decrease in other PM 2:5 constituents) was associated with a nonsignificant increase in the risk of incident all-site cancer. We estimated a similar and significant HR of 1.05 (95% CI: 1.02, 1.10) using the residuals of BC regressed on NO 2 . AICs were similar for fully adjusted models of all-site cancer in relation to the different exposure variables, but slightly smaller for the model of cumulative BC exposure.
The association between cumulative BC and all-site cancer incidence (single-pollutant model) was similar when we implemented a 2-y lag period instead of a 10-y lag period (Table S3). The association between cumulative BC and all-site cancer was also similar in sensitivity analyses with further adjustment for the deprivation index and when restricting our analysis to addresslevel geocoded participants (1,664 cases vs. 3,711 cases), but stronger when restricting our study population to the participants followed at least 20 y (i.e., with a minimal exposure window of 10 y, 1,962 cases, HR 1.40; 95% CI: 1.20, 1.64 for one IQR increase) (Figure 2, Table S4). Using a missing data category or imputing missing data as the median/mode provided similar associations, whereas using the complete cases data did not yield any substantial association. Using a two-piece linear model, we found the most optimal cut-off at 24 10 −5 =m of cumulative black carbon; below and above this cut-off, we estimated HRs of 1.22 (95% CI: 1.07, 1.40) and of 1.08 (95% CI: 1.00, 1.17), respectively.
The main association between long-term BC exposure and all-site cancer was also stable in population subsets defined by sex, smoking status, and distance to the nearest major road (Figure 2; Table S4). We estimated slightly different associations according to the population density with higher HRs among rural and semiurban than among urban participants. For PM 2:5 , the main association was also stable in population subsets, with a slightly decreasing values from urban to rural participants, contrarily to the results of BC. Lung cancer. We estimated a positive association between long-term BC exposure and lung incident cancer with a HR of 1.31 (95% CI: 0.93, 1.83) for one IQR increase in ln-transformed cumulative BC exposure, supported by the BC residual approach that yielded a significant HR of 1.24 (95% CI: 1.05, 1.47) for one IQR increase (Table 3). This means that, holding total PM 2:5 constant, we estimated an increased risk of incident lung cancer for an increased content in BC. In comparison with the main model estimate, the association between lung cancer and one IQR increase in ln-transformed cumulative BC was positive but weaker when adjusted only for age, time, and sex (Model 1, HR 1.19; 95% CI: 0.87, 1.63) and was weakest for Model 2 (with additional adjustment for cumulative pack-years of smoking and secondhand smoking, HR 1.07; 95% CI: 0.77, 1.47) ( Table 3). AICs were similar, but smallest for the main model. Long-term PM 2:5 exposure was not associated with lung cancer, with a HR of 1.01 (95% CI: 0.68, 1.51) for one IQR increase in lntransformed cumulative PM 2:5 exposure; long-term NO 2 exposure was also not associated with lung cancer, with an HR of 1.06 (95% CI: 0.88, 1.27) for one IQR increase of the cumulative NO 2 exposure. However, the residuals of BC regressed on NO 2 were associated with an increased risk of lung cancer, with a HR of 1.14 (95% CI: 1.01, 1.28). This means that, holding NO 2 levels constant, we estimated an increased risk of incident lung cancer for increased levels of BC. AICs were similar for fully adjusted models of lung cancer in relation to the different exposure variables, but AICs were smallest for the models of BC residuals.
Associations between lung cancer and cumulative BC and BC residuals were positive but closer to the null when exposures were lagged for 2 y instead of 10 y (e.g., for black carbon: HR 1.21; 95% CI: 0.85, 1.71, compared with HR of 1.31 for a 10-y lag) (Table S3). For cumulative PM 2:5 , the association with lung cancer was stronger but still nonsignificant when exposures were lagged for 2 y (HR 1.08; 95% CI: 0.68, 1.72, compared with HR 1.01 for a 10-y lag). The estimated associations between lung cancer and cumulative BC or cumulative PM 2:5 were similar to the main model when additionally adjusted for the deprivation index, but HRs were stronger when the analysis was restricted to participants with address-level geocodes (110 vs. 349 cases, HR 1.66; 95% CI: 0.85, 3.24 for BC) and to those with ≥20 y of follow-up (meaning a minimal exposure window of 10 y) (225 cases, HR 1.67; 95% CI: 1.05, 2.66 for BC) (Figure 3, Table S5). Associations were also stronger when missing covariate data were modeled using a missing indicator category and when missing data were imputed as the mean value, but the association became inverse when based on a complete-case analysis (101 cases, HR 0.71; 95% CI: 0.35, 1.43 for BC).

Discussion
In the French Gazel cohort, a predominantly male population of French adults who were employees of the national gas and energy company, long-term BC exposure was positively associated with incident all-site cancer and lung cancer based on single-pollutant models of cumulative exposure, and on models of BC residuals used to estimate the effect of BC as a PM 2:5 constituent, while holding the effect of total PM 2:5 constant. Associations were also positive for BC residuals regressed on NO 2 . In general, results were robust to sensitivity analyses, and we found no substantial effect modification by sex, smoking, or urban classification.    Note: All-site cancer cases were defined as the whole ICD-10 chapter except C77-79 (secondary malignant neoplasms) and C44 (nonmelanoma skin cancers). Participants were excluded from the analysis if they were diagnosed with cancer before 1999. Exposures were lagged 10 y. IQR: Interquartile range.
a p-Value of Wilcoxon tests for differences between participants diagnosed with all-site cancer or not.
The Gazel cohort provides a relatively large and well characterized study population with many variables collected throughout the follow-up. BC, PM 2:5 , and NO 2 exposures were estimated using a validated LUR model, at fine scale and over a long time period using an accepted methodology to back-and forward extrapolate exposures between 1990 and 2015. We used time-dependent Cox models. We handled confounding by total PM 2:5 exposure or by NO 2 via a residual approach that yielded findings corroborating the inference that BC could at least partially explain the health effects of total PM 2:5 and also seemingly independently from NO 2 .
Though not always statistically significant, the analyses generally yielded point estimates above the null, and the analysis using the residuals of black carbon regressed on PM 2:5 or on NO 2 provided significant associations. Further, associations with all cancers were similar when we implemented a 2-y lag instead of a 10-y lag, but HRs for lung cancer and BC (cumulative and residuals) were closer to the null when exposures were lagged by only 2 y, and confidence intervals were wider than for corresponding estimates using a 10-y lag. This 2-y lag could have induced more misclassification, therefore yielding point estimates closer to the null; we hypothesize that Black carbon (10 −5 =m) and PM 2:5 (lg=m 3 ) are depicted by yearly boxplots in black (minimum, 25th percentile, median, 75th percentile, outliers calculated as 75th percentile plus 1.5 times the interquartile range, and maximum) and violin plots in gray (two rotated kernel density plots depicting the probability of each exposure level and informing on the skewedness of the distribution).

Figure 2.
Associations between cumulative black carbon (left) and PM 2:5 (right) and all-site incident cancer in the main, sensitivity, and stratified analyses in the Gazel cohort, with the number of identified cancer cases among the number of participant-year over the follow-up. Hazard ratios and confidence intervals expressed for one IQR increase in ln-transformed cumulative exposure to black carbon or PM 2:5 in separate single-pollutant Cox model with attained age as underlying time-scale and time-dependent variables, adjusted for sex, cumulative smoking pack-years, passive smoking, alcohol use, BMI, education, socioeconomic status, family status, fruit and vegetable consumption, occupational exposure to lung carcinogens, age at inclusion, and calendar time. Exposures were lagged 10 y. Participants were excluded from the analysis if they were diagnosed with cancer before 1999. See Table S4 for corresponding numeric data. Unless specified otherwise, these model-based estimates were computed using MICE to address missing data and were pooled following Rubin's rules. Note: BMI, body mass index; IQR, interquartile range. 2 y could be too short a latency period for lung cancer. In addition, the residual approach needs a careful interpretation: higher levels of BC mean higher levels of components correlated with BC, and thus, at constant total PM 2:5 , this implies lower levels of all other components that are not correlated with BC. Considering this, for all-site cancer, we found that the risk estimate yielded using the residuals regressed on PM 2:5 was not significant, contrarily to the one using BC. This suggests that the association using BC was probably confounded by total PM 2:5 or by co-occurring components in PM 2:5 . On the contrary, in the lung analysis, unexpectedly we found no association with PM 2:5 , but we did find an association with BC (although still nonsignificant), and a significant and substantial association using the residuals of BC; one hypothesis would be that BC and its co-occurring components might be suitable chemicals to at least partly explain the association between outdoor air pollution and lung cancer. These co-occurring chemicals may come from the same sources as BC and constitute primary components of PM 2:5 such as metals, as opposed to secondary components, such as secondary aerosols made of, e.g., sulfates or nitrates. We found significant associations using the residuals of BC regressed on NO 2 , suggesting that the associations found for BC were not confounded by NO 2 . Despite providing novel evidence on BC exposure and incident cancer, our approaches still did not allow for fully disentangling the effects of BC, its correlated pollutants, and total PM 2:5 . The BC and PM 2:5 measurements used in the LUR model came from two different measurement networks. Yet we think this would not likely hamper our findings, because we found a strong correlation between the two exposure estimates-as it is expected from literature-and reasonable levels for BC. The association between cumulative BC and all cancers did not appear to differ by distance to the nearest major road (<500 m vs. ≥500 m). A slightly stronger association among semiurban and rural dwellers than among urban dwellers might reflect differences in BC sources, particle sizes, or coexposures, but may also be explained by random variation.
Research on health impacts of ambient BC is quite recent. In 2013, the World Health Organization's (WHO) review of evidence on health aspects of air pollution (WHO 2013) concluded that annual exposure to BC could be involved in cardiovascular diseases and cardiopulmonary mortality. Studies published after this review have corroborated the above-mentioned conclusions, notably for cardiovascular diseases (Kirrane et al. 2019), and started unraveling the effects of long-term exposure to BC on respiratory health Mordukhovich et al. 2015;Rice et al. 2016;Wang et al. 2019;Yang et al. 2018). In their review on the likeliness of PM 2:5 constituents to cause adverse health effects, Yang et al. (2019) found only two studies dealing with long-term exposures to BC or organic carbon and respiratory mortality, which prevented the authors from conducting a meta-analysis. Indeed, only a few epidemiological studies focused on residential outdoor long-term exposure to BC. To our knowledge, none dealt with incident cancer. In their review, Grahame et al. (2014) suggested BC was related to lung cancer mortality, according to studies investigating diesel effects. Hvidtfeldt et al. (2019) found no statistically significant association between lung cancer mortality and BC with a point estimate close to the null in a population-based and gender-balanced Danish cohort of approximately 50,000 participants with 1,223 cases of lung cancer out of 10,193 total cases. In most articles and as highlighted by Luben et al. (2017) for cardiovascular mortality, the main challenge consists of disentangling the effects of BC and those of total PM 2:5 (and we think this remains true whatever the health outcome of interest). Because BC is part of PM 2:5 , the above-described associations with health outcomes intertwine with those of total PM 2:5 . Additionally, the WHO warned that the epidemiological findings on BC may not reflect BC's effects by itself, but that BC estimates could proxy harmful co-occurring combustion-derived chemicals (WHO Regional Office for Europe 2013). Indeed, BC is more influenced by smallscale sources than PM 2:5 is and thereby more linked to local traffic and wood burning.
BC could trigger oxidative stress, inflammation [especially lung inflammation as described in ANSES (2019)], and DNA methylation (Niranjan and Thakur 2017), by itself or due to its co-occurrence with other forms of carbon or with metals. Chemical analyses of different sizes of PM 2:5 revealed that BC was found in all size bins (Bein et al. 2005); yet black carbon Figure 3. Associations between cumulative black carbon (left) and PM 2:5 (right) and lung incident cancer in the main and sensitivity analyses in the Gazel cohort, with the number of identified cancer cases among the number of participant-year over the follow-up. Hazard ratios and confidence intervals expressed for one IQR increase in ln-transformed cumulative exposure to black carbon or PM 2:5 in separate single-pollutant Cox model with attained age as underlying time-scale and time-dependent variables, adjusted for sex, cumulative smoking pack-years, passive smoking, alcohol use, BMI, education, socioeconomic status, family status, fruit and vegetable consumption, occupational exposure to lung carcinogens, age at inclusion and calendar time. Exposures were lagged 10 y. Participants were excluded from the analysis if they were diagnosed with cancer before 1999. See Table S5 for corresponding numeric data. Unless specified otherwise, these model-based estimates were computed using MICE to address missing data and were pooled following Rubin's rules. Note: BMI, body mass index; IQR, interquartile range. seems mostly concentrated in the finer fractions, that is particulate matter with a diameter <1lm or even ultrafine particles with a diameter <0:1lm (Li et al. 2003;Pérez et al. 2010), those which go deeper into the lungs. Diesel ultrafine particles include mostly BC, which can enter lung cells in vitro (Steiner et al. 2016). These very fine particles may reach blood circulation; but the capability of BC to do so remains only partly understood, as shown in observational and experimental studies (Saenen et al. 2017;Shimada et al. 2006). On the other hand, metals contained in such tiny particles (which are often correlated with BC) may also enter this circulation and trigger the biological effects above mentioned (Nakane 2012).
More measurements and fine scale exposure models are needed to support studies such as ours in other areas, to increase understanding of how BC may affect health and cancer, and to provide data for policy makers. As suggested by the WHO, BC could act as a better indicator of exposure to combustion sources than PM 2:5 does. As many other studies do, this study emphasizes the need for disentangling the effects of PM 2:5 as a mass, as the sum of various constituents such as BC, and as the sum of various size bins (such as ultrafine particles). In particular, we need more tools to do so, including methods complementing the use of residuals.
In Gazel, as in most observational studies on outdoor air pollution, we could not obtain exposure data for individuals at work or during their commutes. But even if we could not obtain a complete lifetime exposure with left-censored participants, we still obtained up to 16 y of air pollution exposure at residential addresses, taking into account any residential address change during the follow-up. In addition, in our sensitivity using participants with at least 20 y of follow-up (i.e., a minimal exposure window of 10 y), we estimated associations between cumulative BC and both all-site and lung cancer with higher point estimates and lower p-values. Although we used three different sources to identify cancer cases, 85% were identified through one of them: the French national health administrative databases. Furthermore, 80% of cases were identified after 2007, when there was overlap among all sources, and the case identification was consistent between the three sources (national databases, company health records, and self-reported diagnoses). The number of all-site cancer cases allowed for a satisfying statistical power, but the estimated associations with air pollutants may be driven by the associations between air pollutants and the most frequently identified cancer sites in our study population (prostate and breast). The small number of lung cancer cases considerably reduced our statistical power in the sensitivity analyses and prevented us from conducting stratified analyses for lung cancer. In addition, for lung cancer, the model adjusted for age, sex, time, and smoking characteristics yielded a smaller and nonsignificant association. Smoking is a major risk factor for lung cancer yet can be confounded by other covariates such as SES, which may explain this inconsistent result. For both cancer types, due to the amount of missing data in each of the covariates included in the main model, using complete cases only led to selecting half of the study population, yielding very different results as the models using imputed data (with three different imputation methods) and very likely biased estimates. We are more confident in the results using multiple imputations, as generally recommended (Raghunathan 2004;White and Carlin 2010). Although the Gazel cohort recruited participants with a range of occupations and incomes, it was not representative of the general population. Additional studies are needed to confirm the findings of this study in other populations and locations.

Conclusions
In this study on BC and the risk of incident cancer in a predominantly male population of French adults, we found positive associations between BC and incident all-site and lung cancers that were generally consistent across sensitivity analyses. Further, we used a residual approach to theoretically isolate the effect of BC from that of PM 2:5 , and it also provided positive associations. Therefore, this study suggests BC could at least partly explain the carcinogenic effects of outdoor air pollution. Further studies on the long-term associations between cancer risk and BC exposure are warranted and, in addition to seeking to replicate our findings, should include more approaches to disentangle the effects of BC from its co-occurring chemicals and total PM 2:5 .