Toxicogenomics through the eyes of informatics: conference overview and recommendations.

Virginia Bioinformatics Institute, in conjunction with National Institutes of Environmental Health Sciences, hosted a conference, "Toxicogenomics through the Eyes of Informatics," in Bethesda, Maryland, USA, on 12-13 May 2003. Researchers around the world met to discuss how the application of bioinformatics tools, methodologies, and technologies will enhance our understanding of how cells and organisms respond to toxins. Conference topics included statistical methods, quantitative molecular data sets, computational algorithms for data analysis, computational modeling and simulation, challenges and opportunities in computational biology, and information technology infrastructure for data and tool management. This meeting report is a summary of conference presentations, survey results, current toxicogenomics concerns, and future directions of the toxicogenomics community. In conclusion this report discusses toxicogenomics as related to environmental agents, cell-chemical reactions, and gene-environment interactions.


Background
Recombinant proteins are massively used in medicine as diagnostic reagents, drugs or vaccines. Moreover, the rapid discovery of new pharmaceutical proteins leads to an increased demand for their production [1]. To date, pro-duction of recombinant proteins mainly relies on microbial fermentation or on insect and mammalian cell cultures. These systems allow highly controlled manufacturing procedures essential for product quality. However, they have disadvantages in term of cost and scalability.
Pathogen contamination of animal cell cultures also represent an important safety issue.
Plants are now being recognised as an alternative system for the production of recombinant proteins. Plant expression systems allow large scale production of recombinant proteins with accurate folding and assembly of protein complexes [2][3][4][5]. Importantly, plant systems offer the possibility of lowering production costs by a factor of 10 to 100 compared to traditional systems [1,6]. Chloroplasts have proven a useful cellular compartment for protein accumulation owing to their large size and number. Moreover, transplastomic plants (produced by introducing DNA into the chloroplast genome) enable high yields in recombinant proteins due to a high transgene copy number and limited epigenetic phenomena [7,8]. Contamination of wild and crop species by pollen flow is also largely circumvented by organellar transformation [7]. Important issues for industrial production of plantderived recombinant proteins are extraction and purification [9]. Standard protocols include homogenisation of plant biomass followed by chromatographic methods. However, high abundance of secondary compounds, especially in tobacco, is problematic for chromatographic procedures. Therefore, developing a cost-effective preliminary (or alternative) purification step is required [6].
The chloroplast has several interior compartments: the envelope membranes, the thylakoid membranes and lumen, the stroma and plastoglobules. So far, the stroma has been the major target for protein accumulation in plastids [8]. Plastoglobules are low density lipoprotein particles attached to the thylakoid membranes [10] (see Fig. 1) and coated with proteins of the PAP-fibrillin family [11][12][13]. The recently determined proteome [14,15] of Arabidopsis plastoglobules identified eight PAP-fibrillin proteins ("plastoglobulins").
In this study, we examined the sequence requirement for correct targeting of the plastoglobulin of 34 kDa (PGL34, [AGI: At3g58010]). We then used the genomic sequence of PGL34 to target a fluorescent protein (YFP) to plastoglobules and took advantage of the low density of plastoglobules for rapid enrichment of the recombinant protein by flotation centrifugation, YFP allowing detection throughout the procedure.

Sequence requirement for targeting of PGL34-GFP to plastoglobules
In the absence of data on the sequence requirement for plastoglobule targeting, we designed a series of C-terminal GFP fusion constructs that removed portions of PGL34 coding sequence ( Fig. 2A). PGL34 is a representative member of the Arabidopsis PAP-fibrillin family.
Three domains with somewhat higher hydrophobic scores can be recognised in the Kyte and Doolittle hydropathy plot of PGL34 ( [16], Fig. 2A). These domains comprise residues 80-94 (HI), 142-161(H2) and 273-282 (H3). Furthermore, any domain responsible for targeting or anchoring plastoglobulins to plastoglobules may be conserved among the different members of the PAP-fibrillin family. Sequence alignments (not shown) revealed that a central domain in PGL34 (residues 103 to 132, Fig. 2A) is conserved among the PAP-fibrillin family. Figure 1 Plastoglobule lipoprotein particles in chloroplasts. A. Schematic diagram of a chloroplast. The organelle is delimited by a double membrane system (envelope). The interior of chloroplasts comprises the aqueous stroma, the thylakoid membranes (green) and starch granules (yellow). Plastoglobules (black circles) are lipid particles associated with thylakoids. B. Transmission electron micrograph of an Arabidopsis chloroplast. Plastoglobules are visible as dark round bodies after postfixation with osmium tetroxide. Figure 2 Transient expression of truncated PGL34-GFP fusions in protoplasts. A. DNA constructs encoding fragments of or full length AtPGL34 coding sequence fused to GFP. The transit peptide of PGL34 is shaded. The Kyte and Doolittle hydropathy plot of PGL34 is shown and domains with higher hydropathy scores (H1-H3), as well as a domain conserved among PAP-fibrillin proteins (black bar), are indicated. B. Fluorescence of the GFP fusion proteins (GFP) was detected in transformed protoplasts by confocal laser scanning microscopy. Arrows indicate strong GFP signals overlapping with weak chlorophyll autofluorescence signals (chlorophyll). Merge: overlap of chlorophyll and GFP signals. Scale bars: 5 μm. C. Detection of GFP fusion proteins in transformed protoplasts by immunoblotting using anti-GFP antibodies. D. Cotransformation of protoplasts with full length and truncated PGL34. Protoplasts coexpressing PGL34-CFP and PGL34 1-133 -YFP, PGL34 1-170 -YFP or PGL34 1-290 -YFP were analysed by confocal microscopy. CFP fluorescence (CFP) and YFP fluorescence (YFP) were monitored sequentially using distinct excitation wavelengths and detection windows. Chlorophyll: chlorophyll autofluorescence, merge: superposition of YFP and CFP signals (green and red pseudocolours, respectively). Bar length: 5 μm.

Transient expression of truncated PGL34-GFP fusions in protoplasts
Both homology and hydropathic criteria were therefore taken into account for the choice of the deletion constructs. The shortest construct (PGL34 1-56 -GFP) only comprised 3 residues in addition to the predicted transit peptide and was designed as a stromal control. The second construct, comprising the amino acids 1-133 (PGL34 1-133 -GFP), contained the H1 domain as well as the conserved central domain. The construct PGL34 1-170 -GFP additionally included the H2 motif. The fourth construct (PGL34 1-290 -GFP) comprised H1, H2 and H3 motifs, as well as the conserved domain. An additional N-terminal deletion construct (PGL34 1-56..134-308 -GFP), lacking residues 57-133, contained the H2 and H3 motifs but not H1 and the central domain.
As shown in Fig. 2B, PGL34 1-56 -GFP and PGL34 1-133 -GFP gave diffuse signals broadly overlapping with the autofluorescence of the chlorophyll. In several chloroplasts however, peak GFP and chlorophyll fluorescence did not overlap, suggesting that both fusion proteins localised in the stroma. In contrast, when PGL34 1-290 -GFP construct was expressed in protoplasts, the fusion protein localised to small punctate structures, similarly to the full length protein. A distinct pattern was observed with PGL34 1-170 -GFP and PGL34 1-56..134-308 -GFP constructs. Fewer and larger fluorescent spots were observed in chloroplasts, suggesting mistargeting or protein aggregation. The same observation was made with constructs comprising residues 1-215 or 1-255 of PGL34 (not shown).
To verify the integrity of each GFP-fusion protein, transformed protoplasts were analysed by Western blotting using an anti-GFP serum (Fig. 2C). With the exception of PGL34 1-56..134-308 -GFP which could not be detected due to low expression levels, all GFP-fusion proteins migrated at the expected mass. Mistargeting (or aggregation) of PGL34 deletion constructs imply that the corresponding fusion proteins do not colocalise with full-length PGL34. We therefore cotransformed protoplasts with PGL34-CFP and PGL34 deletion constructs fused to YFP (Fig. 2D). As revealed in merged fluorescent images, CFP and YFP signals did not overlap in protoplasts expressing PGL34 and PGL34  . In contrast, PGL34-CFP and PGL34 1-290 -YFP colocalised, indicating that the short C-terminal hydrophilic part of the protein is dispensable for targeting. The strong punctate signals observed in protoplasts expressing PGL34  were not labelled with PGL34-CFP, confirming mistargeting and/or aggregation of the truncated protein.

Transgenic plants expressing YFP fused at the C-terminus of PGL34
The results obtained with PGL34-GFP deletion constructs suggested that most of the protein sequence is required for plastoglobule targeting. To address the potential of plas-toglobule targeting for protein purification, Arabidopsis plants were transformed with the coding sequence of YFP replacing the stop codon in the genomic sequence of PGL34 (PGL34 g -YFP, Fig. 3A). The chimeric coding sequence was placed under the control of PGL34 promoter and terminator in order to minimise the risk of gene silencing. Eleven primary transformants (Tl) were isolated and homozygous lines containing a single insertion locus were selected by segregation analysis (not shown). Expression of PGL34 g -YFP in these lines was analyzed using both Western blotting and fluorescence microscopy. A wide range of transgene expression levels was observed (Fig. 3B), possibly reflecting positional effects of the T-DNA insertions [17]. A double band was detected in the transgenic plant extracts, the upper band possibly reflecting partial cleavage of the chloroplast targeting sequence.
The T2 line 5.2, accumulating the highest level of recombinant protein, was chosen for subsequent analysis. The concentration of recombinant protein in crude extracts was quantified by comparing immunoblot signals with serial dilutions of purified GFP produced in Escherichia coli (Fig. 3C). PGL34-YFP was calculated to account for approximately 0.2% (w/w) of total leaf proteins.
Strongest YFP fluorescence was detected in leaves ( Fig.  3D) and roots showed comparably lower YFP signals, consistent with the expression pattern of PGL34 determined by microarray analysis [18]. A detailed picture of PGL34-YFP signals was obtained by confocal microscopy analysis of transgenic leaves (Fig. 3E). Punctate signals were observed in epidermal as well as in mesophyll cells. YFP patterns were similar to those observed in protoplasts transiently expressing PGL34-YFP (Fig. 3B), indicating targeting of the YFP cargo to plastoglobules in planta.

Phenotype of plants accumulating PGL34-YFP
Transgenic plants accumulating PGL34-YFP were indistinguishable from the wild-type when grown in short or long-day conditions (Fig. 4A). The germination rate of transgenic seeds was equal to wild type (Pearson Chi-Square test, n = 358, p = 0.33). No difference in fresh weight was observed between wild type and transgenic plants (Fig. 4C), indicating that growth was not altered in plants overexpressing PGL34 g -YFP.
In sub-optimal photosynthetic conditions, the photosystem II (PSII) is subject to photoinhibition, characterised by a decrease of maximum PS II quantum efficiency (Fv/ Fm, [19]). Fluorometry revealed similar Fv/Fm fluorescence values in transgenic vs. wild type leaves (Fig. 4B), ruling out major inhibitory effects of plastoglobule targeting on photosynthesis.
Expression of PGL34 g -YFP in planta
SDS-PAGE analysis of the low-density fractions revealed a prominent band with an apparent mass of 55 kDa consistent with the predicted mass of PGL34-YFP (28.3, processed PGL34 + 26.9, YFP). This band was recognized by anti-GFP antibodies (Fig 5B). Similar to immunodetection on total leaf extracts (Fig. 3a), an additional faint band, migrating at 57 kDa, was also recognized by the antibody. Analysis by tandem mass spectrometry confirmed that both bands contained PGL34-YFP (not shown). The upper band may reflect partial cleavage of the chloroplast targeting sequence or post translational modifications of PGL34-YFP. However, mass spectromet-ric analysis of tryptic peptides derived from the two protein bands yielded no additional evidence with regard to the two possibilities.
To estimate the enrichment factor resulting from the gradient flotation, the amount of recombinant protein in the 5% sucrose step (fraction nb. 6 of the gradient) was determined by comparing immunoblot signals with a dilution series of purified GFP (not shown). PGL34-YFP accounted for about 50% of the protein content. Based on this and the PGL34-YFP concentration in leaves (0.2% w/w), the gradient flotation step had lead to an approximately 250 fold enrichment of the recombinant protein.

PGL34 homologues in cultivated plant species
To determine whether proteins similar to PGL34 are present in cultivated plant species, sequence homology searches were performed. As shown in Fig. 6, protein sequences with significant BLAST hits were found in monocots and in a broad range of dicot taxa. Pairwise comparisons performed with the MatGAT software [20] revealed that the protein sequences shown in Fig. 6 were at least 40% similar. Arabidopsis PGL34 shared for example 79% similarity and 63% identity with the protein sequence deduced from a tobacco EST assembly. These data indicate that proteins from the PAP/fibrillin family are highly conserved in the plant kingdom. Purification of PGL34-YFP by flotation centrifugation Presence of PGL34 homologues in crop species Figure 6 Presence of PGL34 homologues in crop species. Sequence alignment of proteins similar to PGL34. Amino acid sequences were aligned using ClustalW. Transit peptides were predicted using the TargetP software and removed from the protein sequences. The predicted transit peptide length of PGL34 was 53 amino acids. The alignment was formatted with the

Sequence requirement for protein targeting to plastoglobules
The presence of proteins associated with plastoglobules implies that mechanisms ensuring protein assembly in plastoglobules must exist in plastids. However, nothing is known regarding these mechanisms and plastoglobulins do not share conserved sequence motifs with other plant or animal lipid body proteins. Several types of proteins associated with lipid bodies have been described in prokaryotic and eukaryotic cells (reviewed in ref. [21]). They have highly diverse physicochemical properties and topologies, reflecting various modes of association with the lipidic structures. In desiccation tolerant seeds, oil bodies are coated with oleosins [22]. A 72-residue central hydrophobic domain in the proteins, often referred to as "proline knot motif" is essential for their association with oil bodies [23][24][25]. In mammalian cells, under certain conditions, caveolins accumulate at the surface of cytoplasmic lipid droplets. Using deletion constructs, Ostermeyer et al. [26] demonstrated requirement of a hydrophobic domain for lipid droplet targeting of caveolin-1. Hydrophobic domains were also shown to play important roles in targeting and anchoring perilipin to lipid droplets in adipocytes [27]. If several lipid body proteins are characterised by hydrophobic domains, others lack large apolar regions. Adipophilin (also termed Adipocyte Differentiation-related Protein, ADRP), which localises to the periphery of cytosolic lipid bodies in mammalian cells, has no obvious lipid-binding motif (hydrophobic domains or amphiphatic α-helices; [21]) and discontinuous stretches of the protein are necessary for targeting to lipid bodies [28].
Although plastoglobulins associate with lipid bodies, their overall amino acid composition is not hydrophobic (Grand average of hydropathicity (GRAVY) index = -0.173 for PGL34) and it is apparent from hydropathy plots that the proteins, in contrast to oleosins, lack a strongly hydrophobic domain (see Fig. 3A). Therefore, association of PGLs with plastoglobules may rather rely on interactions with surface lipids, as proposed by Kim et al. [29], similarly to ADRP [21]. Based on our observation that the almost complete sequence of PGL34 is necessary to maintain proper targeting (Fig. 2), we propose that a correct folding of PAP/fibrillins rather than a sequence determinant is requested for assembly in plastoglobules.

Similarities between plastoglobules and seed oil-bodies
A successful technology for production and recovery of recombinant proteins expressed in seeds from oilseed plants has recently been developed at SemBioSys [30,31].
In the so-called Stratosome™ system, proteins of interest fused to an oleosin moiety are targeted to oilbodies and recovered by subsequent centrifugation steps [32]. Biolog-ically active human insulin has notably been produced from Arabidopsis seeds using this system [33].
Plastoglobules and seed oilbodies share similarities. They are of low density due to their high lipid content and only contain a few different proteins [14,15,34] -both properties being advantageous for purification by density partitioning. Whereas oleosin-coated lipid bodies accumulate predominantly in the cell cytoplasm of oleaginous seeds, in tapetum cells and in pollen grains [22], plastoglobules are ubiquitously found in all types of plastids [35], notably in leaf chloroplasts. In leaf crops such as tobacco producing high biomass, plastoglobule targeting would allow downstream processing of leaf material similar to the oilbody-oleosin system in seeds.

Future perspectives
In this proof-of-concept study, a strong enrichment in PGL34-YFP was achieved. We expect that the yield of plastoglobulin-fusion proteins will be readily improved in follow-up studies using tissues accumulating plastoglobules or related fibrills such as fruits or senescing leaves [36][37][38] and/or using plastid genetic engineering [7]. The general usefulness of the plastoglobule-plastoglobulin system is predicted since plastoglobulins as well as plastoglobules are highly conserved throughout the plant kingdom and many homologues have been identified in genome and EST sequencing projects (Fig. 6). In particular leafy crop plants such as tobacco may be ideal to exploit the potential of the plastoglobule targeting system in molecular farming approaches.

Conclusion
The data presented in this paper demonstrate the potential of protein accumulation in plastoglobules. Sequestration of recombinant proteins in the particles may limit deleterious effects on photosynthetic light and dark reactions. Plastoglobule accumulation combines the advantages offered by a chloroplast localization and, as in seed oil-bodies, of a simple enrichment step needed prior to standard chromatographic purification.

DNA vectors for plant transformation
The complete coding sequence of PGL34 (At3g58010), excluding the stop codon, was amplified by PCR from a cDNA clones (U15686) obtained from the Arabidopsis Biological Resource Center (ABRC, [39]). Forward (5'-cat gcc ATG GCA TTG ATC CAA CAT GG-3') and reverse (5'cat gcc atg gcA CTG TTG TAT TCA AGA TTC TCT ACA AC-3') primers included NcoI sites. The PCR product was ligated in the NcoI site of either pCL60 [40] or pCL62 [14], resulting in C-terminal GFP or CFP fusions respectively, under the control of the CaMV 35S promoter and the nos terminator.

Transient and stable A. thaliana transformation
Transient transformation of protoplasts was done using the polyethylene glycol method as described in ref. [41], but reducing cellulase and macerozyme (Serva) concentrations to 1% and 0.25% (w/v) respectively. Fluorescence in transformed protoplasts was monitored 48 h after transformation by confocal laser scanning microscopy. GFP was detected using the FITC (488 nm) laser line from a LEICA TCS 4D microscope (LEICA Microsystems). For double fluorescent experiments, CFP and YFP were detected sequentially with a LEICA SP2 AOBS microscope, using 458 and 514 nm laser lines and 460-510 nm and 520-588 nm detection windows, respectively. Chlorophyll autofluorescence was monitored using either 594 nm or TRITC (568 nm) excitation wavelengths.
Stable transformation of A. thaliana plants with pCAMBIA3300:PGL34-YFP was carried out using the flo-ral dip method as described [42,43]. Transformants were selected on plates containing phosphinothricin.

Protein extraction and immunoblot analysis
Proteins were isolated from Arabidopsis leaves according to ref. [44]. 25 μg proteins were concentrated by chloroform -methanol precipitation [45], separated by SDS-PAGE and blotted onto nitrocellulose membranes. Proteins were stained with amidoblack (= naphthol blue black) for protein detection or used for immunodetection. Blots were probed with anti-GFP (gift from Dr. E. Schäfer) and anti-CAB (kindly provided by Dr. K. Apel) sera, or with affinity-purified antibodies specific for TOC75 [46] and PGL35 [14]. For quantification, chemiluminescent immunoblot signals were analysed using a Bio-Rad ChemiDoc XRS system.

Gradient flotation centrifugation of chloroplast membranes
Leaves from Arabidopsis plants grown on soil for 4 weeks were harvested and immersed in tap water in the dark at 4°C for 30 min. Leaf material was homogenized in HB buffer (450 mM sorbitol, 20 mM Tricine/KOH pH 8.4, 10 mM EDTA, 10 mM NaHCO 3 , 1 mM MnCl 2 ) with a Waring blender and filtered through cheese cloth and miracloth. Chloroplasts were sedimented (2 min 700× g), washed with TrE (50 mM Tricine/HCl pH 7.5, 2 mM EDTA) and hypertonically lysed 10 min in TrE + 0.6 M sucrose supplemented with 0.5% (v/v) protease inhibitor cocktail (Sigma P9599). The lysate was frozen at -80°C, thawed on ice, diluted 3 times with TrE buffer and homogenised with a Potter homogeniser. Total membranes, corresponding to 10 mg of chlorophyll, were sedimented at 100'000× g and resuspended in 3 mL 45% sucrose in TrE buffer. Membranes were overlaid with a discontinuous sucrose gradient consisting of 2 ml 38% sucrose, 2 ml 20% sucrose, 1.4 ml 15% sucrose and 2.7 ml 5% sucrose in TrE buffer and centrifuged for 17 h at 100'000× g and 4°C (SW41Ti rotor, Beckman). 0.5 ml fractions were collected starting from the top of the gradient and used for confocal microscopy analysis or Western blotting (see above).

Fluorometry
Maximum quantum efficiency of photosystem II (Fv/Fm) was measured using a Handy Plant Efficiency Analyser chlorophyll fluorometer (Hansatech Instruments, Norfolk, UK). Detached leaves from 4 week-old plants were dark-adapted 20 min prior to measurements.

Authors' contributions
PV carried out most of the cloning work, transient and stable plant transformation, as well as the microscopic and biochemical analyses and helped to draft the manuscript. CB participated in the elaboration of the experimental design and did the sequence analysis. FK conceived the study, drafted the manuscript and was responsible for the research program. All authors read and approved the final manuscript.
Publish with Bio Med Central and every scientist can read your work free of charge