Breast screening, prognostic factors and survival: Results from the Swedish two county study

The results of the Swedish two-county study are analysed with respect to tumour size, nodal status and malignancy grade, and the relationship of these prognostic factors to screening and to survival. It is shown that these factors can account for much of the differences in survival between incidence screen detected, interval and control group cancers but to a lesser extent for cancers detected at the prevalence screen where length bias is greatest. Furthermore, examination of the relationships among the prognostic factors and mode of detection indicates that malignancy grade, as a measure of inherent malignant capacity, evolves as a tumour grows. The proportion of cancers with poor malignancy grade is several fold lower for cancers of diameter less than 15 cm than for cancers greater than 30 cm, independent of the length bias of screening. The implications of these findings for screening frequency are briefly discussed. It has been shown that mortality from breast cancer can be reduced by mass screening using mammography (Shapiro et al., 1982; Tabar et al., 1985), a reduction resulting from earlier diagnosis. The natural history of breast cancer, however , is clearly heterogeneous, with substantial variation among tumours in their malignant potential, rate of growth and prognosis. Further, little is known of the rate at which prognosis deteriorates as a tumour develops or conversely how prognosis improves as the time of diagnosis is advanced. It is known that screening does reduce rates of larger tumours and of metastases (Fagerberg et al., 1985; Tabar et al., 1987). Moreover, these factors affect survival, as does malignancy grades. However, these relationships have not been fully quantified in a screening context, so the mechanism whereby screening can reduce mortality is not fully understood. The purpose of the present paper is to examine, using the results of the Swedish two-county study: (1) the relationships among the prognostic factors: tumour size, nodal involvement and malignancy grade; (2) the change in these factors brought about by screening; (3) the extent to which the change in the distributions of prognostic factors achieved by screening can account for the mortality reduction; (4) whether malignancy grade is affected by an advance in the time of diagnosis or whether it is an inherent characteristic ; and (5) whether the prognostic variables listed above, or a subset of them, can be used as surrogate variables for the final endpoint, breast cancer mortality, i.e. can future mortality be …

It has been shown that mortality from breast cancer can be reduced by mass screening using mammography (Shapiro et al., 1982;Tabar et al., 1985), a reduction resulting from earlier diagnosis. The natural history of breast cancer, however, is clearly heterogeneous, with substantial variation among tumours in their malignant potential, rate of growth and prognosis. Further, little is known of the rate at which prognosis deteriorates as a tumour develops or conversely how prognosis improves as the time of diagnosis is advanced.
It is known that screening does reduce rates of larger tumours and of metastases Tabar et al., 1987). Moreover, these factors affect survival, as does malignancy grades. However, these relationships have not been fully quantified in a screening context, so the mechanism whereby screening can reduce mortality is not fully understood. The purpose of the present paper is to examine, using the results of the Swedish two-county study: (1) the relationships among the prognostic factors: tumour size, nodal involvement and malignancy grade; (2) the change in these factors brought about by screening; (3) the extent to which the change in the distributions of prognostic factors achieved by screening can account for the mortality reduction; (4) whether malignancy grade is affected by an advance in the time of diagnosis or whether it is an inherent characteristic; and (5) whether the prognostic variables listed above, or a subset of them, can be used as surrogate variables for the final endpoint, breast cancer mortality, i.e. can future mortality be accurately predicted using these variables, for screened and unscreened populations?

Subjects and methods
The data is from the Swedish two-county trial of mammographic screening for breast cancer, and is confined to women aged 40-69 at entry, among whom compliance was good. In this age group, 66,741 women were invited to regular mammographic screening and 48,678 women were not (Tabar et al., 1989). Principal results and further details of the trial are given elsewhere (Tabar et al., 1985. Screening in the control group began in the year of the first publication of the results with respect to mortality. The breast cancers included in the present study were those diagnosed between the date of entry to the trial and the date at which the control group was invited to screening. Tumours in situ were not included. The total number of breast cancer cases thus considered was 1,582. Follow-up for breast cancer mortality for this analysis terminated on 31st December 1990, with an average followup period since randomisation of 11 years. Deaths from causes other than breast cancer were treated as censored observations in the survival analysis. The criteria for determining cause of death are given by Tabar et al. (1989).
Tumour size was determined histologically, and was available for all but four cases. Axillary lymph nodes were histologically examined in 1499 cases (95%) and malignancy grade determined in 1369 (87%). For brevity we use the term 'node status' to refer to the factor with three classes at time of diagnosis: nodes negative; nodes positive without distant metastases; and distant metastases. Malignancy grade (Bloom & Richardson, 1957;Scarff & Torloni, 1968) was determined by one pathologist in each county, but as results demonstrate, there were differences between the two counties in proportions of grades 1, 2 and 3, probably reflecting subjectivity in classification of tumour grade rather than a difference in the two tumour populations. No such differences were observed between counties for tumour size or node status. Statistical analysis of associations among tumour characteristics was performed using log-linear modelling and logistic regression (Aitkin et al., 1989). These methods yield likelihood ratio (deviance) chi-squared tests for significance of associations and odds ratio estimates of relative risks (for example of being nodes positive for given grade relative to grade 1). Survival analysis was performed using proportional hazards regression (Cox, 1972).
Mode of detection in the study group was categorised as detected at the prevalence (first) screening test, at a later screening test, in the interval between screening tests, or in a woman who refused screening.
For some purposes, cancers among the refusers and interval cancers are combined with screen-detected cancers, excluding the prevalence screen, to form a set of incident tumours in the study group which approximately corresponds to the incident tumours arising clinically in the control group. If a screening test is of relatively high sensitivity, then those cancers diagnosed in the period from immediately after a screening test to immediately after a subsequent screening test form a set of tumours from which length bias has been removed (Day et al., 1984). If this group of cancers is augmented with those occurring in the refusers, then the resulting 'unbiased set' can be compared with the set of cancers occurring in the control arm during a corresponding time period; the two sets of cancers are basically equivalent except that the former have been diagnosed earlier. Their comparison allows one to assess the effect of earlier diagnosis on tumour size, nodal involvement and malignancy grade, without distortion of length bias.

Results
The relationship between size, nodal status, malignancy grade and detection mode Table I presents the univariate distribution for the three prognostic variables, by mode of detection and for malignancy grade by county. The malignancy grade distribution clearly varies by county. All three prognostic variables, as expected, are significantly related to detection mode, being more favourable among screen detected cancers; cancers among refusers tend to have poor malignancy grade, to be very large and to have distant metastases.
The relationship between the three prognostic variables has been examined in two ways. First, we have considered the proportion of cancers with positive nodes in relation to the size and malignancy grade of the primary tumours, i.e. the probability of dissemination as determined by the characteristics of the primary tumour. Table Ila and Ilb crosstabulate the proportion node positive with size and grade; a strong relationship with both is evident. Grade and size are also closely related (Table III), so a logistic regression was performed of proportion node positive against size, grade, age, dectection mode and county as shown in Table IV. The major factor is clearly tumour size, athough there was a moderate significant residual effect of grade. Detection mode remained significant, with an appreciably lower proportion of node positive cancers among the screen detected, particularly cancers detected at later screens. Interval and control group cancers performed similarly. There was no indication that the relative effect of malignancy grade on dissemination varied with the size of the tumour.
The second way in which the interrelationships of the prognostic factors has been examined is in terms of the primary tumour itself, the relationship between size and malignancy. The proportion with the worst malignancy grade (grade 3 tumour) was regressed (logistic regression) against size, detection mode, age and county, as shown in Table Va. Size again is the overwhelmingly dominant factor. There is some residual effect, however, of detection mode, with interval cancers and especially cancers in the refusers displaying a higher proportion of poor grade cancers than would be predicted on the basis of size. The poor grade of interval cancers occurs mainly in cancers less than 2 cm in diameter,  The increase with size of the proportion of malignancy grade three tumours could arise in two ways. One explanation would require that malignancy grade deteriorates as a tumour grows. An alternative explanation would invoke length bias, i.e. malignancy grade remains unchanged as a tumour grows but grade 1 or 2 tumours grow more slowly and hence provide greater opportunity for diagnosis at a smaller size. One can test which hypothesis is correct by comparing two sets of tumours from which length bias has been largely removed, namely the control group and the 'unbiased set' described in the methods section, consisting of study group cancers less those diagnosed at the prevalence screen. The distribution by size and malignancy grade of these two sets of cancers is given in Table VI. The 'unbiased set' is smaller in size, indicating the effect of screening, but also more favourable in malignancy grade. If one applies the joint distribution of size and malignancy grade seen in the control group to the size distribution observed in the 'unbiased set', one obtains an expected malignancy grade distribution for the 'unbiased set'. As can be seen from Table  VI, this expected distribution corresponds closely to the observed, demonstrating that the difference in malignancy grade distribution between the two groups of cancers can be accounted for solely in terms of size (and confirming the absence of length bias). This comparison indicates strongly that malignancy grade worsens with increasing size in Table  V and by the changing proportion of grade 1 and 3 cancers seen in Table III.
Survival in terms of size, nodal status, malignancy grade and detection mode Figure 1 displays survival by detection mode. R-efusers have particularly poor survival. Interval cancers are similar to, in fact do slightly better than, cancers in the control group. The screen detected cancers as expected have much better survival, the survival of those detected at the prevalence screen and those detected at later screens being almost indistinguishable. Univariate proportional hazards regression analysis gives the results shown in Table VII, for the three prognostic tumour characteristics and for detection mode.
The primary question that has been investigated is the extent to which the survival differences by detection mode can be accounted for by the characteristics of the cancers at diagnosis. The results of a multivariate survival analysis (with each factor's effect adjusted for all the other factors) are given in Table VIII. All of the factors are highly significant. Size, nodal status and malignancy grade each retain their Significance of difference between observed and expected, P = 0.7.  univariate significance with a smooth increase in risk in the expected direction, albeit less extreme. The change in the relative hazards for the different detection modes is interesting. Adjusted survival curves for cancers in the control group, interval cancers and those detected at later screens are similar. The adjusted survival curves of cancers detected at the prevalence screen, however, are considerably better than that in the three former groups, and that of cancers diagnosed among the refusers is appreciably worse. The adjusted survival analysis was repeated using only women aged 50-69 at entry to the study, to assess the performance of the three tumour characteristics in the age groups most frequently targetted for screening at this moment. The results are shown in Table IX, and are similar to those in Table VIII. There was no significant difference between survival in incident screen detected tumours and that in controls, nor between survival in incident screen tumours and interval cancers, when adjusted for the three factors grade, size and node status.

Discussion
The two main results of this paper are first, that the favourable prognosis of screen detected cancers (apart from prevalence screen detected cancers) can largely be accounted for by three tumour characteristics, diameter of the primary cancer, nodal status and malignancy grade and second that the malignancy of at least some cancers evolves as the cancer grows. Adjustment for the three characteristics altered the relative hazard of the incident screen tumours compared with the controls from 0.26 to 0.66. The alteration was more striking in those aged 50-69 at entry to the study. Taking the first point, the three prognostic factors appear adequate to describe survival in cancers detected at incident screens, interval cancers and those in the control group. This result is somewhat surprising. For given size, there is known to be major heterogeneity in the behaviour of breast cancers, related to varying underlying malignancy. Malignancy grade is usually regarded as an unsatisfactory way to summarisẽ~~~I   .26, 9.77) this malignant potential, since it is clearly subjective. In this study, malignancy grade was assessed separately and independently by one pathologist in each county. Differences between pathologists are evident even at a crude level, as one can see in Table I. Malignancy grade had therefore to be considered as a separate variable in each country. Nevertheless, the relationship of malignancy grade with size and nodal status, and with survival, was virtually the same in the two counties. This finding strongly indicates that the two pathologists were assessing the same underlying variable, but scoring it differently. Furthermore this underlying variable is an adequate description of malignant capacity of a cancer, especially for tumours in those aged 50-69, in those cancers not detected at first screening or in women who refused screening. Malignancy grade is incapable of accounting fully for the favourable survival of prevalence screen cancers. Length bias among these cancers is a particular problem; one would expect to detect a disproportionate number of very slow growing cancers, the lack of malignancy of which is beyond the capacity of a three-point malignancy grade to express. The poor prognosis of cancers among the refusers, even when adjusted for the three tumour characteristics, is more difficult to understand. It is possible, however, that the poor survival in this group may be related, in addition to tumour characteristics, to attitudes to health care reflected in a refusal of screening. This is borne out by the fact that the refuser cases are also more likely than compliers to die of causes other than breast cancer (relative hazard = 1.60).
Nevertheless, these two groups apart, one can see that for incident cancers among the compliers in the study group (interval plus later screens) and for cancers in the control group, which are all incident, survival is well accounted for by tumour size, nodal status and malignancy grade. The survival analysis indicates that malignancy grade is a meaningful measure of malignant capacity for incident cancers (i.e. groups from which serious length bias has been removed) in both study and control groups. The results of Tables III and VI then suggest that for many cancers this malignant capacity increases as the tumour grows. The increase in malignant capacity has been discussed by Ponten and his co-authors (Ponten et al., 1990). They conclude that the evidence is against it occurring, citing results that DNA ploidy is similar in (incidence) screen detected as in clinically detected cancers, even though the former are diagnosed on average some 3 years earlier.
On the other hand, giving support to the possibility that the malignant capacity of a cancer may evolve are recent results demonstrating that many cancers display considerable heterogeneity in terms of thymidine labelling index (TLI) and steroid hormone receptors, and a lower degree of within tumour heterogeneity for DNA ploidy (Meyer & Wittliff, 1991). Heterogeneity provides a potential for differential growth rates of different cell populations in a tumour. The situation would be greatly clarified if biochemical or genetic measures of malignancy could be developed, demonstrably related to survival and accounting for the effects of the subjective measure of malignancy grade, and for which evidence analogous to Table VI could be adduced.
The implications for breast screening if malignancy evolves with tumour growth are important. It would suggest that the benefit from screening comes not only from the smaller size at which cancers are detected, but also from an overall reduction in the degree of malignancy. Screening more frequently would, then, in addition to reducing the number of interval cancers, improve appreciably the prognostic characteristics of the screen detected cancers.
It is interesting to note that much of the effect of size, both its independent effect on survival and its relationship with tumour grade, is absent if one considers only tumours greater than 2 cm in diameter, that is, cancers which are generally detectable clinically. Most of the size effect occurs in the difference between these tumours and those smaller than 2 cms in diameter, when detection by mammography is of greatest relevance. In clinically detected cancers one would not expect to observe a major independent effect for size on either survival (Haybittle et al., 1982) or on malignancy grade.
Screening women over 50 years of age every 33 months reduces breast cancer mortality by some 40% (over a 10-year period, for the women screened). The results of this paper suggest that more frequent screening may yield substantially improved benefits.