Bioinformatics: The Path to Species Comparison

Systems biology relies on integrating genetic, proteomics, and metabolic data, and on understanding interdependent cellular and intercellular events that are constantly in flux. To accomplish this feat, researchers have relied on DNA and protein sequence databases and high-throughput expression analysis techniques such as microarrays to produce ever-growing libraries of expression data. DNA and protein sequences can be quickly such as BLAST (Basic Local Alignment Search Tool), a program that identifies similar genes in different organisms. Now scientists are applying this computational approach to protein interaction networks, which are the means by which proteins communicate. 
 
“As we move from a focus on sequences to one on networks, we need a tool similar to BLAST,” says Trey Ideker, an assistant professor of bioengineering at the University of California, San Diego. The software program PathBLAST was developed to fill this need by a group consisting of researchers from Ideker’s lab and the lab of Brent Stockwell, now an assistant professor of biological sciences at Columbia University. At the time, both Ideker and Stockwell were fellows at the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts, and worked on the program development with Richard Karp, a professor of bioengineering and mathematics at the University of California, Berkeley, known for his work in combinatorial algorithms and bioinformatics. 
 
The PathBLAST program rapidly compares protein interaction networks across two different organisms using fast-executing algorithms. The program searches for high-scoring alignments involving one path from each network. The proteins of the first path are paired with putative homologs—or proteins presumed to have a common origin and function—from the other species and occurring in the same order in the second path. PathBLAST is built as a plug-in to Cytoscape, a widely used software platform. Scientists use Cytoscape to visualize molecular interaction networks and integrate these interactions with gene expression profiles and other data. 
 
“The important stuff in biology is revealed by comparing things,” says Ideker. “By comparing protein interaction networks of two different species or even within species, we can identify pathways and complexes that have been conserved over evolution.” These evolutionarily conserved pathways allow interpretation of the network of a poorly understood organism based on its similarity to that of a well-known species. This comparison could provide a model of signaling and regulatory pathways that are related to a response to an environmental toxicant. It could also help target drugs to pathways that are present in a pathogenic organism but absent from its human host. Such a model could furthermore help identify drugs that would repair damaged pathways or even cause new ones to be formed. 
 
The PathBLAST development group published a paper in the 30 September 2003 issue of Proceedings of the National Academy of Sciences in which they identified the conserved pathways within the yeast Saccharomyces cerevisiae and the bacterium Helicobacter pylori. For example, the authors found that one pathway that was critical in catalyzing DNA replication and another in protein degradation were conserved in both organisms as a single network. Within seconds, the program had determined that the bacterium contained 1,465 interactions among 732 proteins, and the yeast contained 14,489 interactions among 4,688 proteins. 
 
This report proved that the method works for matching conserved networks from among all the networks in two species, according to software engineer Brian Kelley, a member of Stockwell’s lab. Kelley says, “The next step is to prove the software in a novel application where you start with a given disease network and see if it is conserved in other species. Once you prove this utility, then the use of PathBLAST will skyrocket.” Kelley adds that research into the mTOR cell growth–triggering protein pathway may prove to be that application. This pathway is composed of a complex of proteins that respond to nutrient cues; understanding it will clarify the role that nutrients and metabolism play in disease. 
 
Other researchers have taken a complementary approach by comparing what’s known about a disease to a known network. At Beyond Genomics in Waltham, Massachusetts, researchers measure quantitative differences between transcripts, proteins, and metabolites across a given disease model, determine correlations within the data set, and then compare the experimentally derived network with a known biological network or pathway. 
 
“As the protein interaction databases become more heavily populated with interactions among higher eukaryotes, PathBLAST and related approaches will start to shine as they can help elucidate the set of core biological networks for a given genome,” says Tom Plasterer, the principal scientist for bioinformatics at Beyond Genomics. “These networks—when coupled with a tightly defined experimental context—will be invaluable in understanding mechanisms of disease, where one expects compensatory and subtly differing biological networks to emerge.” 
 
The PathBLAST website is hosted by the Whitehead Institute and available at http://www.pathblast.org/; it will soon be mirrored at the San Diego Supercomputer Center at the University of California, San Diego. And as for whether industry will embrace PathBLAST, Ideker says, “It’s still early. Speculating too far about these technologies is like asking industry in 1980, ‘Is genome sequencing going to revolutionize your drug discovery pipeline?’ Even in 2004, the verdict is still out on that one!”


Background
Colorectal cancer is the second most common cause of cancer related death in the developed world [1], in consequence advances in our understanding and treatment of colorectal cancers can potentially have a huge impact on cancer morbidity and mortality. Currently much of our understanding of cancer behaviour, including the prediction of likely patient outcomes, is based on histopathological parameters, and from this treatment is tailored to individual patients. At present TNM stage, tumour type and resection margin status are the most widely used parameters in planning adjuvant treatment. Tumour grade of differentiation, vascular invasion and more recently perineural invasion and tumour border configuration have also been used to assist the clinician in predicting colorectal tumour behaviour and hence subsequent patient management [2].
It is well recognised that clinical response and recurrence rates vary within the conventionally staged groups and that this reflects variation in the genetic and molecular make-up of these tumours. Molecular changes occur within cancer cells during tumour progression; these changes provide a potential insight into tumour development and metastasis.
Refining prognostic markers allow treatment to be more accurately tailored to individual patients, as well as suggesting potential mechanisms through which tumour progression occurs which in turn could provide targets for novel therapies.
MUC1 is a membrane bound glycoprotein which has been demonstrated to be predictive of tumour progression and worsening prognosis in both gastric [3][4][5] and colorectal cancer [6,7] including those related to HNPCC [8]. This increased expression has been seen more predominantly at the invasive tumour front [9].
MUC3 is also a trans-membrane glycoprotein which is seen in both colorectal cancers and normal colon [10]. Studies have shown an association between MUC3 expression and poor prognosis in a number of cancers including pancreatic [11], breast [12], gastric [13] and renal [14]. There is some evidence suggesting that MUC3 expression is reduced in colorectal cancers and that this varies between histological types [15]. The cellular distribution is also seen to be affected; apolar distribution is thought to reflect abnormal transport systems [16].
Whilst previous studies have suggested that tumour expression of MUC1 may be a useful prognostic factor in colorectal carcinoma [6,9] these studies have failed to include the presence or absence of vascular invasion in their analysis, this is known to be a highly significant prognostic factor in colorectal cancer [17]. We assessed the prognostic value of MUC1 on a larger set of colorectal tumours and included vascular invasion in our analysis to determine if MUC1 was truly independent as the previous studies have suggested. We also wanted to assess whether MUC3 demonstrated any prognostic influence on colorectal cancers as seen in other tumours.
Since its first description in 1998, tissue micro-array (TMA) analysis [18] has been employed for the immunohistochemical analysis of target protein expression in a wide range of primary tumour types. Initial fears that the reduced amount of individual tumour tissue analysed using this technique might not be representative of the tumour as a whole appear largely unfounded [19]. The strengths of this approach lie in its ability to provide a rapid turnover of results from very large patient cohorts, whilst reducing variability in experimental conditions and reducing costs [20]. Recently, in an attempt to overcome some of the reporting deficiencies inherent in prognostic tumour marker studies a set of guidelines, the reporting recommendations for tumour Marker prognostic studies (REMARK) have been proposed [21]. The reporting of this study therefore adheres to the REMARK guidelines. This TMA of colorectal cancer patients has previously been validated with a p53(-)/Bcl-2(+) phenotype, loss of HLA or over-expression of MICA all being independent markers of poor survival [22][23][24]. TMAs have also been utilised with MUC1 and MUC3 expression in breast cancer [12].
We have therefore used TMA technology to analyze expression of MUC1 and MUC3 in a series of 462 paraffin embedded colorectal tumour specimens, in conjunction with a detailed data base of clinicopathological variables including disease specific survival. We propose that tumours lacking expression of MUC1 and MUC3 will be more likely to metastasise, due to previously observed loss of cell-cell adhesion, and this will therefore lead to more aggressive cancers with poorer prognosis.

Patients and study design
The study population comprised a series of 462 consecutive patients undergoing elective surgical resection of a histologically proven sporadic primary colorectal cancer at the University Hospital, Nottingham, UK (table 1). These patients were treated between 1st January 1994 and 31st December 2000; this time period allowed meaningful assessment of the prognostic markers studied. All patients treated during this time-frame were considered eligible for inclusion in the study. Tumours were classified as mucinous carcinoma, when more than 50% of tumour volume consisted of mucin [25].
Only cases where the relevant pathological material was unavailable were excluded from the study. Follow-up was calculated from time of resection of the original tumour with all surviving cases being censored for data analysis at 31st December 2003, this produced a median follow up of 37 months (range 0-116) for all patients and 75 months (range 36-116) for survivors.
A prospectively maintained database was used to record relevant clinicopathological data, with data provided from the UK Office for National Statistics; this was available in more than 99% of cases. The information collected was independently validated through case note review of deceased patients. Disease specific survival was used as the primary end point; however, data was also collected on the various other relevant clinical and histopathological parameters these are summarised in table 1. There was no formal sample size calculation performed, although the inclusion of over 450 cases is in excess of most studies of prognostic tumour markers.
Adjuvant chemotherapy consisting of 5 FU and folinic acid was reserved for those patients with positive lymph nodes, although, surgical and adjuvant treatment was at the discretion of the supervising physician.
Prior ethical review of the study was conducted by the Nottingham Local Research and Ethics Committee, who granted approval for the study.
Construction of the array blocks incorporated a wide spectrum of electively resected colorectal tumours and was found to be broadly representative of the colorectal cancer population in the UK. 266 (58%) patients were male and 196 (42%) female. The median age at the time of surgery was 72 years, consistent with a median age at diagnosis of colorectal cancer of 70-74 years in the UK [26]. 69 (15%) tumours arrayed were TNM stage 1, 174 (38%) stage 2, 155 (34%) stage 3 and 54 (11%) stage 4; there were 3 cases of in-situ disease. These figures are comparable with national figures for distribution of stage 1-4 at diagnosis of 11, 35, 26 and 29% respectively [27]. The majority of tumours (392, 85%) were adenocarcinomas, and were most frequently of a moderate histological grade (353, 77%). 128 (28%) tumours were noted to have histological evidence of extramural vascular invasion, 224 (48%) had no evidence of vascular invasion, and this information was not available in 110 (24%) cases.
At the time of censoring for data analysis 228 (49%) patients had died from their disease, 64 (14%) were deceased from all other causes, and 169 (37%) were alive. The median five-year disease-specific survival for the cohort was 58 months, comparable with a national average of approximately 45% five-year survival for colorectal cancer in the UK [27].

Specimen characteristics
All tumours received following resection in the operating theatre were incised, fixed immediately in 10% neutral buffered formalin followed by standard processing through to embedding in paraffin wax, ensuring optimal tissue fixation and preservation for histological examination.
Tissue micro-arrays were constructed as described previously [18]. For each tumour, 5 µm section slides stained with haematoxylin-eosin were first used to locate representative areas of viable tumour tissue. 0.6 mm needle core-biopsies from the corresponding areas on the paraffin-embedded tumour blocks were then placed at prespecified coordinates in recipient paraffin array blocks using a manual tissue-arrayer (Beecher Instruments, Sun Prarie, WI). Array blocks were constructed with between 80-150 cores in each, with analysis of a single core from each case. Fresh 5 µm sections were obtained from each TMA block and placed on coated glass slides to allow the immunohistochemical procedures to be performed, preserving maximum tissue antigenicity.

Immunohistochemistry
Immunohistochemical analysis of MUC1 and MUC3 expression was performed using a routine streptavidinbiotin peroxidase method. Tissue array sections were first deparaffinised with xylene, rehydrated through graded alcohol and immersed in methanol containing 0.3% hydrogen peroxide for 20 minutes to block endogenous peroxidase activity. In order to retrieve antigenicity, sections were immersed in 500 mls of pH 9.0 EDTA buffer and heated for 10 min in an 800 W microwave at high power, followed by 10 min at low power. Endogenous avidin/biotin binding was blocked using an avidin/biotin blocking kit (Vector Labs, USA). In order to block nonspecific binding of the primary antibody all sections were then treated with 100 µl of 1/5 normal swine serum (NSS) in TBS for 15 min.

Evaluation of MUC1 and MUC3 staining
The tumour cores were assessed by two observers (TJD and AHA) with regard to distribution and intensity of staining, both with extensive experience in the analysis of tissue micro-arrays. Tumours were classified according to a semi-quantitative system in a coded manner and blinded to the clinical and pathological parameters of the case. In the few cases (<5%) where there was discrepancy between the classification of cores a review, using a double headed microscope, was performed and a consensus reached.
For MUC1, tumours were scored according to the proportion of viable tumour cells within the tumour core which displayed unequivocal staining and scored: None = 0, <5% = 1, 5-29% = 2, 30-59% = 3, >60% = 4 in line with previous studies using the same antibody [6,28]. As the expression of MUC 3 was more uniform throughout positive tumour cores the intensity of staining was used as the discriminator with tumours categorised as showing negative, low, moderate and high intensity of MUC3 expression. For the purposes of survival analysis tumours were further categorised as either negative or positive for marker expression. Tumours were considered positive for MUC 1 expression when at least 30% of cells demonstrated positive staining, this is in keeping with previous studies [6,28]. For MUC3 tumours displaying moderate or high intensity staining were considered positive, with the remainder considered negative

Statistical analysis
Statistical analysis of the study data was performed using the SPSS package (version 14 for Windows, SPSS Inc., Chicago, IL). Pearson χ 2 chi-square tests were used to determine the significance of associations between categorical variables. Disease-specific survival calculations included all patients whose death related to colorectal cancer. In contrast, patients whose deaths resulted from non-colorectal cancer related causes were censored at the time of death. Kaplan-Meier curves were used to assess factors which influenced survival. The statistical significance of differences in disease-specific survival between groups with differing MUC1 and MUC3 expression was estimated using the log-rank test. The Cox proportional-hazards model was used for multivariate analysis in order to determine the relative risk and independent significance of individual factors. In all cases p-values < 0.05 were considered as statistically significant.

Patient and histopathological variables and prognosis
Univariate relationships between known patient/tumour characteristics and DSS were evaluated using the log-rank test (see Table 2). There appeared to be no significant differences in DSS between patients of either gender. Similarly when patient age was considered in three groups (patients 64 years or younger at the time of surgery, patients 65-79 years, and those 80 years and over), no significant differences in DSS were noted. The strongest association of clinicopathological variables with DSS was seen with TNM staging (log-rank = 211.37, p < 0.0001), showing a progressive reduction in DSS with increasing tumour stage.

Tumour marker expression MUC1
Analysis of MUC1 expression was possible in 403 of the 462 tumours on the TMA (87%), with the remainder being lost during antigen retrieval or not demonstrating viable tumour cells within the core. This level of core loss is within the rates described by previous authors using TMAs [29,30]. The majority of staining was seen within Representative examples of positive and negative staining for each antigen are shown in figure 1.

Relationships between tumour markers and standard clinicopathological variables MUC1
For the purposes of analysis the tumours were divided into those with positive or negative expression, as described previously [6,28]. There did not appear to be Immunohistochemical staining of tissue microarray cores with MUC1 and MUC3 antibodies any relationship between any of the clinicopathological variables, including stage, and MUC1 expression (see table 3).

MUC3
As the majority of tumour cells within each core expressed a uniform staining pattern, the cores were classified according to intensity of staining as opposed to the proportion of cells staining. Cores were deemed positive if moderate or strong staining was seen. Using this system 286 (74%) tumours were positive and 101 (26%) negative. No correlation between MUC3 cytoplasmic expression and any clinicopathological variables, including stage, was seen (see table 4). Equally there was no correlation of membranous staining with any clinicopathological variables (data not shown).

Relationship between tumour markers and patient survival
Correlation between MUC1 and MUC3 expression and DSS was assessed using Kaplan-Meier plots and log rank testing (see table 2, figures 2 and 3). A significant association was seen between tumours with high MUC1 expression and a reduced DSS (mean DSS 54 months vs. 65 months; p = 0.038). In contrast, there was no correlation between MUC3 expression and DSS.
In order to determine the relative influence of MUC1 and other patient and tumour variables known to affect prognosis, a multivariate analysis was performed using the Cox proportional hazards model. We included only those variables which had been shown to be significantly related to DSS on univariate analysis i.e. intramural vascular invasion and TNM stage (see table 5). In this model, vascular invasion (p < 0.001) and TNM staging (p < 0.001) were seen to retain independent prognostic significance. High expression of MUC1 was also seen to be an independent prognostic marker of poor outcome, with a hazard ratio of 1.339 (95%CI 1.002-1.790, p = 0.048), when compared with tumour demonstrating low MUC1 expression.

Discussion
This study investigates the role of MUC1 and MUC3 as prognostic markers in colorectal cancer. Previous studies have suggested a link between MUC1 and MUC3 expression and poor prognosis both in colorectal and other tumour types [3][4][5][6][7][8]. These studies have frequently suffered from small sample sizes and/or heterogeneous methodology and study populations. The current study comprises the largest analysis of MUC1 and MUC3 expression in colorectal cancer to date; including 463 consecutively treated representative patients, who were representative of the colorectal cancer population within the UK. With a comprehensive data set of clinicopathological variables and patient outcome, over a median 3 year postoperative period, a thorough and comprehensive analysis was possible between these variables and disease specific survival.
In our study population 32% of tumours were positive for MUC1. This compares favourably with previous authors work, who also used the same semi-quantitative scoring system and found 32% and 43% MUC1 positivity in colorectal tumours respectively [6,28].
In our study population MUC1 expression was not related to any of the clinicopathological variables examined. Some previous studies demonstrated increased MUC1 expression was related to increasing TNM or Dukes stage [31][32][33]; however, a number of other studies are in line with our findings [9,34]. Variations in the findings of the current and previous studies may relate to differences in immunohistochemical protocols, antibodies used, scoring systems and area of the tumour examined e.g. Hiraga et al and Kimura et al only assessed staining at the invasion front [31,32]. A large study by Lugli et al examines the prognostic significance of MUC1 and MUC2 in relation to differing mismatch repair status in colorectal cancer, with tumours divided into three subgroups. Significant correlations were found in the "mismatch repair proficient group" between MUC1 positivity and tumour stage and grade [33]. There was no such correlation in our cohort, however our analysis did not involve sub-stratification of the population and hence may explain the dissimilar results.
Univariate and multivariate analysis of our patient population confirmed that TNM staging and vascular invasion are strong independent prognostic markers in colorectal cancer. Of particular interest was the large effect vascular invasion had on survival. Presence of vascular invasion reduced mean DSS significantly (38 vs 75 months p < 0.0001), yet no previous studies investigating the prognostic value of MUC1 have included this obviously strong predictor of survival in their analysis. Our data confirm that high expression of MUC1 in colorectal cancer confers a worse prognosis both on univariate and multivariate analysis, even when taking into account the potentially confounding influence of vascular invasion status.
The association of MUC1 with poor prognosis has been linked to effects on cell adhesion and the potential for metastasis. Regimbald et al [35] showed that MUC1 was a ligand for ICAM-1 in breast cancer and might have a pivotal role in haematogenous spread, and it has been speculated that this mechanism may occur in colorectal cancer [36]. MUC1 is also seen to have effects on the extra cellular matrix components through inhibition of kalinin and laminin [37,38].
MUC1 has been demonstrated to affect beta-catenin, a nuclear transcription factor, and its intracellular distribution has been shown to influence progression of colorectal cancer [39], it has been suggested that MUC1 exerts some of it's effects through interaction with beta-catenin, with over expression of MUC1 leading to increased levels of nuclear beta-catenin [40]. A recent study has shown that the co-expression of MUC1 and nuclear beta-catenin at the invasion front of colorectal tumours may be correlated with a worse prognosis [9].  MUC3 expression was present in moderate to high levels in 76% of tumours assessed. Some studies have suggested that MUC3 may in fact be down-regulated in colorectal cancer compared with normal colon [10,15]. We did not see any correlation between the clinicopathological variables and MUC3; in particular there was no correlation with tumour stage as is seen with gastric cancers [13]. Furthermore, MUC3 expression did not appear to correlate with prognosis, as has been reported in other tumour types [11][12][13][14]. Rakha et al demonstrated MUC3 expression in 91% of breast cancers which was associated with increased local recurrence and lymph node stage. They argued that membranous expression of MUC3 was a poor prognostic feature, which correlated with higher grade and poorer Nottingham Prognostic Index (NPI) [12]. Wang reported that increased MUC3 expression in gastric cancer worsened prognosis, with no significant differences in expression seen in relation to patient sex, tumour location, grade of differentiation, serosal invasion, or Lauren's type. However MUC3 expression was higher in those with metastasis (p < 0.01) and in clinical stage III-IV disease compared to I-II (p < 0.05). MUC3 were not detected in the normal gastric mucosa [13]. MUC3 showed a progressive increase in expression with pancreatic intraepithelial neoplasia of increasing dysplasia and was also highly expressed in ductal adenocarcinoma [11].
Normal lung tissues exhibited a distinct pattern of mucin gene expression, with high levels of MUC1 and low levels of MUC3 immunoreactivity and mRNA. In contrast, lung adenocarcinomas, especially well-differentiated cancers, exhibited increased MUC1 and MUC3 mRNA levels [41].
Copin et al found that coexpression of MUC3 and MUC1 was constant among lung adenocarcinomas [42].

Conclusion
We have demonstrated that using TMA technology and a large cohort of colorectal cancer patients with robust long term follow up data that biomarkers of prognosis can be reliably assessed. Our data clearly demonstrates a role for MUC1 in the progression of colorectal cancer, probably through its effects on cell adhesion and metastasis. MUC1 expression appears to function as an independent prognostic marker in colorectal cancer even when the conventional variables of tumour stage and vascular invasion status are included in the analysis.