On the utility of the lognormal model for analysis of breast cancer survival in Sweden 1961-1973

Parametric models have been suggested as an alternative to conventional life table techniques for interpretation of observed survival patterns in cancer. This paper extends earlier work on breast cancer by studying the fit of Boag's lognormal model to the survival of 8,170 breast cancer cases reported to the Swedish Cancer Registry during 1961-1963. The model was also used to analyse the upward survival trend for breast cancer cases in Sweden during 1961-1973. The model fitted the 1961-1963 data well for the entire

As an alternative to the conventional 5or 10-year life table survival rates, several authors have suggested the use of parametric statistical models to interpret observed survival patterns in cancer (Boag, 1948;Berkson & Gage, 1952;Mueller & Jeffries, 1975;Fox, 1979;Campos, 1972;Haybittle, 1959;1965). The interest in such models has been stimulated by claims that they permit, for example, reliable predictions of long term results from available short term data, and assessment of whether an improvement in survival is due to long term cures or merely to protracted survival with cancer (Mould & Boag, 1975;Mould et al., 1976).
One of the difficulties with parametric models is that it is never possible to prove that a particular model is accurate, but only to reject it if the predicted values differ significantly from observed data. Validation of a model should therefore ideally be based on different populations that include large numbers of patients with long follow-up.
Two previous investigations concerning patients with carcinoma cervix uteri and breast cancer have assessed the relative merits of different models (Mould & Boag, 1975;Rutqvist et al., 1984). Both studies showed that Boag's lognormal model provided the best overall fit to the observed survival data. For breast cancer, the lognormal model was the only model which did not show a significant lack of fit. Both studies were based on Correspondence: L.E. Rutqvist. Received 11 February 1985;and in revised form 3 September 1985. large case materials but the minimum follow-up times were fairly short (10 and 6 years respectively).
The present paper extends the earlier work on breast cancer by studying a population of more than 8,000 cancer cases from the Swedish Cancer Registry with follow-up times ranging from 18 to 21 years. The aim of the study was to examine if the lognormal model fitted the findings on this new material with longer follow-up, and to assess the consistency of predictions of long term results from short term data.
Furthermore an attempt has been made to validate a 2-parameter lognormal model. Boag's original model includes 3 parameters which have to be estimated simultaneously. In order to reduce the standard errors of the estimates the original model might be turned into a 2-parameter model by keeping one of the parameters describing the cancer specific survival fixed at an assumed value. For small populations, large standard errors might otherwise make the estimated parameter values meaningless.
A report based on cases diagnosed in a limited geographical area of Sweden has indicated that there may have been an upward survival trend for breast cancer during 1961-1973(Rutqvist, 1984. Considerable uncertainty exists, however, as to the proper interpretation of such a trend. It could simply be due to earlier diagnosis without death from breast cancer being delayed or avoided (lead time bias). Such bias might also be a confounding factor in analyses of age-related differences in survival. The lognormal model was therefore used L.E. RUTQVIST to interpret the observed survival pattern of cases reported to the Swedish Cancer Registry during 1961-1963 and 1971-1973 Registry during 1961Registry during -1963. The respective number of cases in the age groups <50, 50-69 and . 70 y was 1,934, 3,980 and 2,256. The Cancer Registry is nation-wide and is based on obligatory reports on new cancer cases from both clinicians and pathology/cytology laboratories. The Registry's coverage of diagnosed breast cancer has been estimated to 98% (Mattsson & Wallgren, 1984). Follow-up data were obtained from the Swedish Registry of Causes of Deaths, which receives reports on all persons dying in Sweden and on Swedish citizens who die abroad. Data from the registry were available on deaths before December 31, 1981. The observation period thus ranged from 18 to 21 years. The observed and relative survival rates were calculated by means of the life table method. The relative survival is the ratio between the observed survival and the expected survival of an age-matched general population (Ederer et al., 1961). If the relative survival becomes constant, i.e. if the survival curve runs parallel to the time axis, the remaining patients are experiencing a mortality which is equal to that of a general population and they are often considered to constitute a cured group of patients. The expected survival was derived from life tables for the general Swedish female population for periods approximately covering the follow-up period. The standard error of the relative survival was calculated according to Ederer et al. (1961).
The Swedish Cancer Registry does not contain data on tumour stage. The analyses presented here therefore relate to all cases reported because of invasive breast cancer.
The lognormal model This model was originally proposed for cancer survival by Boag in 1948. It is based on the observation that survival time distributions in several malignant diseases are skew with a marked tail and that the logarithms of the survival times are approximately normally distributed. The model assumes that a population of cancer patients consists of a cured group, c, which is only subject to normal mortality risks, and an uncured group, (1-c), which is subject to both the normal mortality and to a disease-specific mortality. These two forces of mortality are assumed to act independently. The survival of a group of patients in terms of the model can be formulated as: where Pt is the probability to survive to the time t.
Pt is the probability to escape the normal mortality risk to the time t, m is the mean, and s the standard deviation of the lognormal survival time distribution. The disease-specific mortality of the uncured group is thus expressed in terms of the two parameters m and s. The antilogarithm of m is an estimate of the median survival of uncured patients. All three parameters (c, m and s) were estimated simultaneously with a maximum likelihood method according to Boag (1949). An iterative procedure was used starting from rough estimates of the parameter values which were improved at each cycle of computation to converge to the solutions.
For the analysis, it was necessary to estimate the number of deaths attributable to breast cancer during each yearly interval, i.e. excess deaths. These deaths were assumed to occur independently. The number of excess deaths for the year i was calculated as the difference between the observed number of deaths during i (Oi) and the expected number (Ei). Ei was calculated as the product of the number of woman-years at risk during i (Ni) and the midpoint estimate of the hazard rate (instantaneous death rate) from 'normal' causes of death (ti); N'i = [Ni-(Oi/2)-(WJ/2)], where Ni is the number of women entering i and Wi the number withdrawn alive; mi = [qi/(l -qi/2)], where qi is the conditional probability of dying from 'normal' causes during i; qi= [1 -(Pi/Pi -)], where Pi is the expected survival at the end of i. Estimates of P were derived from life tables for the general Swedish female population for periods approximately covering the years of follow-up.
It is possible that the maximum likelihood equations for a particular group of patients cannot be solved with the mentioned iterative procedure due to failure of convergence towards any parameter values. A conceptually different technique might then be used: the minimum x2 method. Starting from rough estimates, the parameter values are changed stepwise and for each set of estimates, the theoretical distribution of deaths is compared to the observed distribution. The deaths are compared for, e.g. yearly intervals, and a x2 value is UTILITY OF THE LOGNORMAL MODEL 877 computed, the set of estimates yielding the lowest overall x2 value is selected. These estimates might differ slightly from those obtained with the maximum likelihood method depending on the choice of intervals. With the minimum x2 method, equal weight is given to each interval of the observation period for determination of the parameter values whereas the maximum likelihood method gives equal weight to each deceased case, hence intervals with large number of deaths will be relatively more important. One disadvantage with the minimum x2 method is that standard errors of the estimated values cannot be computed.
2-parameter lognormal model It has been suggested that the standard deviation of the lognormal model might be kept fixed and only the two remaining parameters, m and c, kept floating when solving the maximum likelihood equations (Boag, 1948;Haybittle, 1959;Mould & Boag, 1975;Mould et al., 1976). If the number of patients in an analysis is small, large standard errors might otherwise make the estimated values meaningless. The rationale for this technique is that the standard deviation for a particular disease might be considered to be a constant. The selected value of s should preferably be based on data for a large patient population. For breast cancer, s was estimated to 0.60 log-years in a study including more than 14,000 patients from the Cancer Registry of Norway . However, this value is lower than the value estimated for all cases in the 1961-1963 series (0.71 log-years, see Results). The technique of only two floating parameters was therefore used, keeping s fixed at either 0.60 logyears or 0.71 log-years, in addition to Boag's original technique with three floating parameters.
Tests for lack offit A minimum chi-squared test was used to assess the agreement or disagreement between the observed survival and the theoretical survival according to the lognormal model. This method is similar to the mentioned minimum x2 method for estimation of parameter values. The estimates obtained with the maximum likelihood method with either two or three floating parameters were used in these tests. The theoretical and observed number of deaths were compared for yearly intervals. The degrees of freedom were (n-k-1), where n was the number of intervals and k the number of estimated parameters. A runs test was also performed in order to detect correlated errors, i.e. a possible nonrandomness of the temporal occurrence of deviations from the model during the following-up period.
The statistical significance of differences in relative survival was tested by comparing the 10 year rates. The parametric analysis using the lognormal model was restricted to patients aged below 70 years since the model might not be applicable for older patients (see Results).

Results
The relative survival of all patients in the 1961-1963 series is shown in Figure 1. No cured fraction was observed because the relative survival declined continuously during the follow-up period, at 20 years it was 41 +1% (s.e.). Figure 2 shows the material by age at primary diagnosis (<50, 50-69, .70y). The continuous decline in relative survival was observed in all age groups. For cases aged above 70 years, however, the curve showed an erratic pattern after 15 years which probably was due to the small number of patients at risk. The survival was generally higher for young than for old cases. At 15 years, for instance, the relative survival for cases aged <50, 50-69 and .70 y was 55 +1 %, 45 +1 % and 31+ 3% respectively. Time (y) Figure 1 Relative survival of female breast cancer cases reported to the Swedish Cancer Registry during 1961-1963 (0, n=8,170) and 1971-1973 (0, n = 10,655). The 95% confidence intervals are indicated.  Table I shows estimates of the parameters of the lognormal model and their standard errors. All three parameters were estimated simultaneously with the maximum likelihood method. The estimated cured proportion for all cases in the 1961-1963 series was 27+2%, the estimated median survival of uncured cases was 4.9 years. The corresponding figures for cases aged < 70 years was 33 + 2% and 4.5 years. For the age-group .70 years, no estimates were obtained because the iterative procedure used for solving the maximum likelihood equations failed to converge towards any parameter values.
To assess the relevance of the lognormal model, tests for goodness-of-fit (minimum x2 test and runs test) were performed using the estimated parameter values. The result for all cases is shown in Table II. The theoretical distribution of deaths did not deviate significantly (P> 0.05) from the observed distribution and the model could thus not be rejected. The results for the separate age-groups are summarized in Table III. No significant deviation was observed for cases aged <70 years. For the older patients, the tests were made using parameter estimates obtained with the minimum x2 method, but even they yielded a highly significant lack of fit (P<0.001). Table IV shows estimates of the parameter values obtained when using the technique of keeping the standard deviation fixed. The calculations were made for all cases in the 1961-1963 series and for the age-groups <70 and .70 years. As expected, the standard errors tended to be lower as compared to those obtained with three floating parameters ( Table I). The estimates showed considerable variation, however, depending on the value of s. With s fixed at 0.60 log-years, for instance, the estimated cure rate for all cases was 34+2% as compared to 27+2% with s fixed at 0.71 log-years (P<0.001). The estimated mean, on the other hand, was significantly higher (P<0.001). The results for the two age-groups showed similar inconsistencies.    Table   II.  Table III summarizes the tests for goodness-of-fit of the 2-parameter lognormal model, the parameter values were those shown in Table IV. This model was only accepted for all cases with s fixed at 0.71 log-years. The model was rejected for the two agegroups with either value of s, and for all cases with s fixed at 0.60 log-years.
The parameter values (3-parameter model) for all cases in the 1961-1963 series were estimated using survival data for only 5, 10 and 15 years of followup (Table V'). The estimates obtained with the 10and 15-year data were similar to those obtained with data for the entire follow-up period even though the standard errors were slightly higher. The 5-year estimates, on the other hand, deviated considerably. The estimated cured fraction, for instance, was 41 + 6% as compared to 27 + 2% with the 18-21 year data (P < 0.05).
The survival time trend 1961-1973 Figure 3 shows the relative survival of all patients by period of diagnosis. The survival at 10 years was 52 + 1 % (s.e.) for the 1961-1963 series, and 54 + 1 % for the 1971-1973 series (P <0.05). Further analysis by age (<50, 50-69, _ 70 y) showed that the increase at 10 years was only significant for the age-group 50-69 years for which it rose from 49+1% to 54+1% (P<0.001) (Figure 2). For patients aged <50 years and >70 years, the increases were smaller, 12-3%, and not statistically significant (Figures 4 and 5). Table I shows estimates of the 3 parameters of the lognormal model and their standard errors for cases aged < 70 years by period of diagnosis and by age (<50, 50-69 y). Older patients were excluded from the analysis because of the mentioned poor fit of the model. The estimated cured proportion rose from 33% in the 1961-1963 series to 40% in the Significant increases (P<0.05) of the estimated cured fraction were observed both for cases aged below 50 years (42% to 53%) and 50-69 years (29% to 34%). The estimated cured proportion was thus consistently higher for the younger cases in both series. The estimated median survival of uncured cases, however, was not significantly different between the two age-groups during either period.  models include the cured proportion of patients and differ only in the analytical form of the diseasespecific survival for non-cured cases. The cured proportion is in this context defined as the proportion of patients who experience a death rate from all causes of death which is similar to that of an age-matched general population. As mentioned earlier, it is not possible to prove that any model is true. It is only possible to reject it if it can be shown that the theoretical survival according to the model significantly deviates from the observed survival in a studied population. Tests for goodness-of-fit are therefore more reliable when based on large case materials. It is understandable that different models have been suggested in the literature since they were often established on fairly restricted data. In two previous investigations it was shown that Boag's lognormal model provided the best overall fit to the observed survival data for carcinoma  (0) 1961-1963, 1971-1973, n=3,585. The 95% confidenc indicated.

Discussion
The interest in parametric survival mainly focused on relatively simple functions with 2 or 3 parameters. T] are usually assumed to have biolog and thus to be clinically meaningful. YEARS cervix uteri ana tor breast cancer (Moula & Doag, red <50 years 1975;Rutqvist et al., 1984). The other studied n =1,934; (0) models, including the 'extrapolated actuarial', the e intervals are Weibull and various exponential models, were either rejected or were found to yield less consistent results. The current study also failed to disclose a siianificant lack of fit with the loanormal model except for cases aged above 70 years. The model was rejected for this age group. There are several possible reasons for the lack of fit. It has, for instance, been reported that erroneous registration of old breast cancer cases was common in the Swedish Cancer Registry during the early 1960s. In the age-group >70 years, it was estimated that about 7% of the total number of cases should be excluded from the registry files because of registration errors (Rutqvist & Wallgren, 1983). These cases were mostly patients with recurrences from breast cancer diagnosed during previous years, and hence poor survival. Among younger women, such errors were found to be less frequent. If erroneous cases had been excluded from the current study, it is possible that the fit of the lognormal model would have been better. On the other hand, tne mouei mignt oe too simplistic, ano pernaps does not accurately describe the forces of mortality ed .70 years or their interactions in old breast cancer patients. n = 2,256; (0) The model assumes that death from intercurrent ,e intervals are causes and from breast cancer occur independently but this assumption might not hold good. In old patients, debilitating conditions such as chronic heart and lung diseases might hasten death from disseminated breast cancer thereby producing I models has deviations from the model. Nevertheless, in view of mathematical the fact that the model could not be rejected for all he parameters cases nor for cases aged less than 70 years (Table   ical correlates  L.E. RUTQVIST attractive because it is consistent both with late excess mortality and with a cured fraction. Estimates of parameter values based on followup data for 10 years were similar to those obtained using data for the entire follow-up period (Table  V). This suggests that the model could be used for prediction of long term results from short term data. However, the relative survival declined during the entire follow-up period ( Figure 1) and a cured fraction could not be observed. Hence, extrapolations from the model should be cautiously judged until supported by observed data.
One disadvantage with the lognormal model is that large populations are necessary in order to obtain stable estimates of the parameter values. The 95% confidence interval of the estimated cured proportion for cases aged less than 50 years in the 1961-1963 series, for instance, was found to be 36-47% even though this group consisted of more than 1,900 cases. In order to reduce the standard errors, Boag suggested that the standard deviation of the model might be kept fixed and only the remaining two parameters be estimated. The original 3parameter model is thus converted into a less flexible 2-parameter model. The rationale for this technique is that the standard deviation might be considered as a constant for a given disease, and was supported by the finding that estimates of the cured proportion were similar even with fairly large variations in the assumed value of the standard deviation (Mould & Boag, 1975). This result was based on studies on carcinoma cervix uteri and on head and neck cancers. However, the estimated value of the standard deviation in the current study (0.71 log-years) was not similar to that estimated in a previous study (0.60 log-years) based on more than 14,000 breast cancer cases from the Norwegian Cancer Registry . Furthermore, estimates of the cured proportion and the mean of the lognormal distribution were significantly different when the standard deviation was fixed at 0.60 log-years as compared to 0.71 logyears (Table IV). Similar findings were reported by Haybittle (1959). Tests for goodness-of-fit also disclosed significant deviation with the standard deviation fixed at 0.60 log-years (Table III). In view of these inconsistencies the present results support Haybittle's conclusion that the 2-parameter lognormal model is not advisable for analysis of breast cancer survival. The standard deviation is probably affected by the stage distribution and thus probably varies from one case material to another. Therefore, no a priori assumption of its value is possible. The reason why some of the previous studies did not produce significant deviations could be that the case materials were smaller or that the studied populations were more homogeneous with regard to disease outcome than is usually the case in breast cancer. This study confirmed previous information (Rutqvist, 1984) by showing an upward survival trend for breast cancer in Sweden during [1961][1962][1963][1964][1965][1966][1967][1968][1969][1970][1971][1972][1973]. The trend was unequally distributed and was only significant for cases aged 50-69 years. The relative survival of patients aged below 50 years and above 70 years showed only minor, insignificant increases. The potential problem of lead time bias should, however, be considered when judging the survival trend. Unfortunately no methods are at present available which permit determination of lead time differences between patient populations diagnosed during different periods.
Lead time bias is not a confounding factor if the outcome of treatment is measured in terms of the proportion of patients who are cured. In the current study, the relative survival declined during the entire follow-up period and a cured group of patients could not be observed. This finding accords with many other long term follow-up studies which have shown that the excess mortality from breast cancer probably persists at least up to 30-40 years after diagnosis (Adair et al., 1974;Baum, 1976;Rutqvist & Wallgren, 1985). The higher relative survival for more recently diagnosed cases ( Figure 1) and for young as compared to old cases (Figure 2) might therefore be explained by lead time bias.
To get round this problem, the lognormal model was utilized to interpret the observed survival patterns. According to the model, the survival trend was the result of a 7% increase (95% confidence interval: 0-13%, P<0.05) of the cured proportion.
The estimated median survival of uncured cases, on the other hand, was similar during 1961-1963 and 1971-1977 (4.5 and 4.6 years). This result suggests that the trend was the result of long-term cures and not merely due to protracted survival with cancer. Similarly the higher relative survival rates for young as compared to old patients were also reflected in higher estimated proportions of cures.
Breast cancer incidence rates have increased in most countries in the Western world during the past decades. It has been suggested that this could be the result of an increased diagnosis of 'biologically benign' breast cancer, i.e. breast lumps exhibiting all histologic characteristics of cancer, but which have relatively benign biological properties (Fox, 1979;Doll & Peto, 1981). An increased proportion of patients with such lesions among the more recently diagnosed cases might explain an increased proportion of 'cured cases' and consequently also an upward survival trend. Due to the limited data on the natural time history of breast cancer, it is not known whether 'biologically benign' breast cancers exist, nor if they have biased the incidence trend. It therefore remains controversial whether the survival trend reported here, is the result of an improved outcome for patients with serious breast cancer or if it is simply artifactual.