Retroviruses as genetic tools to isolate transcriptionally active chromosomal regions.

By exploiting the ability of retroviruses to move genes into random sites of mammalian genomes and by exploiting some features of their replication, retrovirus vectors have been developed that select for instances in which the virus integrates into expressed genes. Since integrated proviruses tag transcriptionally active sites, the vectors provide a means to identify and isolate promoters active in different cell types. Furthermore, the viruses may be useful as insertional mutagens, since they select for instances in which integration occurs into expressed sites. This reduces the number of integrants needed to screen for loss of gene function and may enable genes controlling phenotypes in mammalian cells to be isolated.


Introduction
Like transposons in bacteria or retrotransposons in lower eukaryotes (1)(2)(3)(4)(5)(6), retroviruses are genetic elements capable of moving genes into the genomes of mammalian cells. Upon integration, retroviruses may cause either a recessive loss of gene function due to gene disruption or a dominant gain of function due to transcriptional activation of adjacent genes. Integrations near cellular promoters or enhancers may also activate viral gene expression.
By exploiting the ability of retroviruses to move genes into random sites of mammalian genomes and by taking advantage of some features of their replication, retrovirus vectors have been developed that select for instances in which the virus integrates into expressed genes (7,8). After briefly reviewing some aspects of retroviral replication, the present article discusses the development and uses of retrovirus vectors to tag transcriptionally active chromosomal regions. structural proteins of the virus particle (gag and env) and for enzymes (pol) found in particles (protease, reverse transcriptase, and integrase) (2,(9)(10)(11).
Shortly after infection, viral RNA is converted into DNA by reverse transcriptase. Prior to integration, terminal sequences of the viral genome join and are duplicated such that the retroviral genome is flanked by long terminal repeats (LTRs), each containing the U3, R, and U5 regions. Circular forms of the viral genome, containing either one or two copies of the LTR, are also found in virus-infected cells. There is evidence that formation of circular molecules with two tandem LTRs creates cis-acting recognition sequences for the enzymes catalyzing integration. Thus, sequences containing a spleen necrosis virus LTR-LTR junction when inserted into DNA at an internal site were sufficient to promote integration. This suggests that circular genomes containing two LTRs can serve as integration precursors (12,13). However, several investigators have shown that, at least in vitro, linear viral DNA can integrate directly without forming a circularized intermediate (14,15). The integration steps appear similar to those observed in E.coli during phage Mu transposition, involving cleavage at the integration site; joining of the retroviral 3' ends to protruding 5' ends of the target DNA; and repair ofremaining single-stranded gaps (14). Whether linear DNA also serves as an integration precursor in vivo remains to be established.
LTR sequences are maintained in the integrated retrovirus (also termed provirus) except that two nucleotides (nt) are lost from each end (2,9,10). Cellular DNA sequences are also unaltered except that upon integration, 4 to 6 nt are duplicated such that the provirus is flanked at each end by 4 to 6 bp repeats (8,10,13).
As a provirus, the retroviral genome is replicated with cellular DNA and transcribed as a cellular gene. Provirus transcription is controlled by promoter/enhancer sequences located in the U3 region of the 5' LTR. Transcripts initiate at the junction between U3 and R (cap site) in the 5' LTR and terminate at the R-U5 junction in the 3' LTR. RNA is synthesized by cellular RNA polymerase II and processed by the cellular enzymes. Full-length (genomic) RNA is transported from the nucleus to the cytoplasm and either packaged into virus particles that bud from the cell or translated to yield precursors of the gag and pol proteins. A fraction of the RNA is spliced to yield mRNA encoding env.
It is possible to adapt retroviruses to transduce genes into mammalian genomes. Provided that certain control sequences within the LTRs remain unaltered (16,17), much of the retroviral genome can be deleted without impairing its ability to replicate in cells that express proteins necessary for reverse transcription, integration, and particle formation. To do this, vector DNA is transfected into cell lines that contain complete retroviral genomes. The helper virus cannot assemble into particles due to a small deletion that removes sequences (termed T) required for assembly of viral RNA into infectious virions. Since vector DNA does not contain the T deletion, recombinant transcripts are packaged and expelled from the cells as virus particles (Fig. 1). In addition to NV, gag sequences also enhance the ability of the vectors to be packaged (16,(18)(19)(20)(21)(22)(23).
Retroviruses appear to integrate randomly throughout the genome, although about one-fifth of all integrations involve highly preferred sites (24). Integration sometimes results in mutations that either inactivate or augment expression of genes in the vicinity of the provirus (9,(25)(26)(27). Gene inactivation may be caused by insertions into exons (28)(29)(30) that interrupt open reading frames or introns (31,32) and alter normal splicing patterns. Activation of genes adjacent to the provirus involves transcriptional enhancement either by upstream U3 promoters or nearby U3 enhancers. By some yet unknown mechanism, the activity of 3' LTRs is inhibited by 5' LTRs (33). Thus, in most cases, where downstream cellular genes are activated by 3' LTR promoters, 5' LTRs are either rearranged or lost (34)(35)(36)(37)(38)(39)(40).

Development of a Retroviral Promoter Trap
Transcriptional activation of proviral genes has been observed in cells in which the LTR is transcriptionally inactive. In some instances activation was due to insertions near cellular promoters or enhancers (41)(42)(43)(44). Thus, in principle it is possible to tag transcriptionally active chromosomal regions by infecting cells with retrovirus vectors that contain a selectable marker (e.g., antibiotic resistance) or reporter gene (e.g., lacZ) and selecting for recombinants in which the genes are expressed. However, several factors have undermined the practical use of retroviruses as probes for transcrip- tionally active chromosomal regions. First, enhancers in the LTRs may influence the expression of adjacent genes and thus interfere with the detection of cellular sequences that regulate transcription in a tissue-specific manner (17,(44)(45)(46)(47). Second, 3' RNA processing signals and AUG codons within the lefthand LTR may interfere with translation of proviral transcripts initiated by nearby cellular promoters, since it has been shown that AUG or termination codons upstream of the protein synthesis initiating codon often inhibit efficient translation and lead to aberrant translation products (48)(49)(50)(51)(52). For these reasons, a retroviral vector aimed at detecting cellular promoters should lack both enhancers and viral sequences between proviral genes and flanking DNA.
The strategy we have used (7,8) involved inserting a selectable marker, histidinol dehydrogenase (hisD) (53) into the 3' LTR of an enhancerless Moloney murine leukemia virus (MoMuLV) (Fig. 2). The extra sequence did not interfere with the ability of the virus to be passaged and his sequences duplicated normally. Coding sequences of hisD in the 5' LTR were placed just 30 nt from the flanking cellular DNA (Fig. 3A). This removed intervening viral sequences that could interfere with transcriptional activation of provirus genes by cellular 142 promoters. The virus also contained a second selectable marker, neomycin-phosphotransferase (neo), expressed from an internal herpes simplex virus thymidine kinase (tk) promoter, to provide an independent measure of virus infectivity (Fig. 2). Virus-producing cell lines were generated by transfecting U3His vectors into NIH 3T3 cell lines expressing packaging-defective ecotropic (P2) and amphotropic (PA317) helper viruses. Viruses recovered from cloned producer lines were titered on NIH 3T3 cells, selecting either with G418 (neomycin derivative) or L-histidinol. Neo-titers of U3His viruses were high and similar to what we and others have obtained with other MoMuLV vectors (19,22,(54)(55)(56)(57), indicating that insertion of his sequences into the LTR did not significantly interfere with virus replication or integration. However, comparing the ratio of neomycin and histidinol titers suggested that provirus integration was 2500-fold less likely to convert cells to a histidinol-resistant (Hisr) phenotype than to a neomycin resistant (Neor) phenotype.
In principle, the potential to express histidinol resistance could be an intrinsic but inefficient property of each provirus, or alternatively may require secondary events such as mutations or transcriptional activation by adjacent cellular sequences. Several experiments (7) suggested that the capacity to transduce histidinol resistance is neither an intrinsic property of the provirus nor the consequence of mutations: a) clones initially selected in G418 did not survive when transferred to me-A~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~A -ftf,~~~~~~~~4 dium containing L-histidinol, indicating that most proviruses did not confer histidinol resistance; b) the number of doubly resistant colonies produced after plating U3His infected cells in medium containing both G418 and L-histidinol was similar to the number of colonies obtained after selection in L-histidinol alone, implying that only a subset of the proviruses conferring neomycin resistance was capable of expressing histidinol resistance; c) virus titers were similar for U3His vectors with or without tk promoters, indicating that the ability to passage histidinol resistance did not require the tk promotor; d) proviruses in Hisr clones lacked gross sequence rearrangements as judged by Southern blot analysis; and e) proviruses rescued from Hisr clones after superinfection with wild-type MoMuLV did not transduce his any more efficiently (as compared with neo) than did the original U3His vectors.
To further examine why only certain proviruses expressed his, transcription of provirus sequences in Hisr and Neor clones was analyzed by Northern blot hybridization (Figs. 3A-C). The results indicated that the histidinol-resistant phenotype is conferred by transcripts initiating in the nearby provirus flanking region. Both Neor and Hisr clones expressed 4.9-and 3.3-kbp provirus transcripts, whereas only Hisr clones expressed two additional transcripts of 6.5 and 1.7 kbp (Fig. 3B). Mapping these RNAs according to their ability to hybridize to hisand neo-specific probes (Figs. 3B and C) suggested that the 4.9-and 3.3-kbp RNAs started at the 5' LTR and at the tk promoter, respectively, and terminated in the 3' LTR, whereas the 6.5-and 1.7-kb RNAs in Hisr clones appeared to initiate outside the provirus and terminate at polyadenylation sites in the 3' and 5' LTRs, respectively (Fig. 3A). The sizes of the smaller (1.7 kbp) transcripts in Hisr clones were never quite the same but varied as much as 100 bp (Fig. 3B). This is the result one might expect if the proviruses were located at different distances from cellular promoters, and the size of each transcript depended on the amount of appended cellular RNA.
It is of interest that the steady-state levels of the 3.3 kbp tkneohis mRNA were either undetectable or significantly lower in Hisr clones than in Neor lines (Figs. 3B and C). Accordingly, Hisr clones exhibited variable resistance to G418 ranging from 2 to 2 x 103 colony forming units (CFU) per 105 cells (7). Although the mechanisms are unknown, the phenomenon may reflect suppression of the tk promoter by an upstream cellular promoter. The phenomenon has been frequently observed with retrovirus vectors (58-63) and was initially described by Emermann and Temin (64,65). By using a retrovirus vector that contained two marker genes under the control of different promoters, these investigators have shown that transcription of the 5' gene was suppressed when there was selection for the 3' gene and vice-versa, whereby the suppressed genes produced about 10 to 50% less product than when they were selected (65).
Ribonuclease protection assays confirmed that transcripts conferring histidinol resistance are initiated in the nearby flanking region. Accordingly, labeled RNA probes complementary to the provirus coding strand protected fragments of exactly the size expected for transcripts colinear with provirus sequences extending from the ClaI site to the 5' end of the LTR (Fig. 3A). Thus, the ability of U3His viruses to convert cells to a Hisr phenotype requires that the provirus acquire a promoter from the flanking cellular DNA.
Several investigators have isolated cellular promoters or enhancers by linking random DNA fragments to the coding sequence of a selectable marker, introducing the DNA into recipient cells and selecting for clones that result if the gene is expressed (66)(67)(68)(69)(70). However, the strategy has several limitations that the U3His vectors avoid. First, DNA-mediated gene transfer is less efficient than retrovirus transduction. Second, introduced gens are frequently amplified in cells surviving selection, thus increasing background and necessitating screening of multiple clones or performing secondary transfections in order to identify clones containing only one gene copy. Third, potential promoter-enhancer elements identified after DNA transfer are not expressed in their normal chromosomal locations.
Transfected enhancerless genes have been used to identify transcriptionally active chromosomal regions (71)(72)(73). In some cases expression appeared to be regulated in a tissue-specific manner (71,73). However, cloning the transcriptionally active genes by this approach is difficult because enhancers may be located at considerable distance from and on either side of the integrants (10).

Promoter Trap Vectors As Genetic Tools to Isolate Cellular Promoters
The approach we employed to isolate cellular promoters from U3His expressing cell lines involved the polymerase chain reaction (PCR) (74)(75)(76)(77). PCR allows over 109 copies of a small segment of DNA to be made by repeated cycles of denaturation, annealing of specific oligonucleotide primers, primer extension, and chain elongation by a thermostable DNA polymerase (Taq polymerase). The standard technique requires oligonucleotides complementary to sequences on opposite strands and on each end of the DNA fragment to be amplified. Although in U3His provirus expressing cell lines genomic sequences are flanked only on one side by a region of known sequence, it is possible to link both ends of the flanking cellular DNA to the provirus. The procedure involves digesting cellular DNA with an enzyme that generates small enough fragments to be amplified and ligating the DNA to obtain circular molecules where both ends of the cellular DNA are flanked by the provirus (Fig. 4A) (78,79).
Synthetic oligonucleotide primers complementary to sequences on both sides of the PvuII site at position 72 of the U3His provirus were constructed that would permit extension reactions proceeding away from each other (Fig. 4A). To  that amplify more efficiently, genomic DNA was digested to completion with HinfI, yielding an average fragment length of 800 bp. Religating HintfI protruding ends positioned 5' proviral flanking fragments between the proviral priming sites. To avoid background PCR products originating from circles formed at the 3' end of the provirus, the DNA was digested with PvuII. As shown in Figure 4A, this technique separates the priming sites onto different DNA fragments, thus precluding their geometric amplification. Such separation is less likely to occur at the left (5') end because PvuII sites are an order magnitude less frequent than HinfI sites in mammalian DNA. PCR products from histidinol-resistant lines were different in size, as one might expect if the proviruses are located at different distances from a flanking HinfI site and the size of each segment depends on the amount of appended cellular DNA (Fig. 4B). In each case the am- plified template represented the provirus-flanking region junction segment, extending between the two HinfI sites. Thus, HinfI digestion yielded two fragments: a 560 bp fragment representing the proviral region extending between the PvuII and HinfI sites and a 100 to 500 bp fragment representing the variable flanking sequence appended to the provirus region upstream of the PvuII site (Fig. 4C) (8). In some instances, because of multiple proviral integrations, more than one amplification product was observed (Fig. 4A). Several clones exhibited a 680 bp minor band reflecting amplification of a few circular molecules from the 3' end of the provirus (Fig. 4B). As the amplified flanking sequences contained between 100 to 500 nt, some are likely to contain promoter elements. These sequences have been cloned into Bluescript vectors for further analysis.

U31acZ Vectors As Genetic Tools to Analyze Patterns of Gene Regulation
Although U3His vectors provide a means to isolate promoters active in different cell types, the vectors select for promoters that express his at levels sufficient to confer resistance. This precludes detection of promoters inactive in the target cells. For these reasons, it seemed important to develop a vector that carries a reporter gene instead of a selectable marker in U3. The strategy involved replacing his-D sequences in U3 of GgTKNeoHisU3en(-) by the E. coli lacZ gene, a reporter gene whose expression is monitored by histochemical staining. The obtained GgTKNeoU31acZen(-) vector was transfected into T2 producer cells and the viruses recovered from cloned producer lines titered on NIH 3T3 cells by selecting in G418. Titers of Gg-TKNeoU31acZen(-) were high and similar to U3His titers, indicating that the insertion of the 3 kbp lacZ gene into the LTR did not interfere with virus replication or integration (S. Reddy, H. von Melchner, and H. Ruley, in preparation). Southern blot analysis of several U31acZ infected NIH 3T3 clones revealed that the elongated LTR can duplicate, thus placing lacZ sequences 30 nt downstream of the flanking cellular DNA (S. Reddy, H. von Melchner, and H. Ruley, in preparation). Approximately 0.5% of Neo clones will express lacZ on transcripts initiating in flanking cellular sequences, suggesting that U31acZ vectors may be used to isolate promoters regulated during different stages of cellular development.

U3His Vectors As Insertional Mutagens
Because selection for U3His expression reduces the size of the integration target to genomic sequences immediately downstream of cellular promoters, U3His vectors may be effective insertional mutagens. It is possible to estimate the maximum number of integration sites that enable NIH 3T3 cells to express histidinol resistance. The total integration target for neo transduction cannot exceed the size of the genome (3 x 109 bp), implying that the integration target for his is approximately 106 nt. Integrations that activate his are also likely to occur near transcriptional start sites to avoid appending AUG codons upstream of his coding sequences. In genes, the distance from the transcriptional start site to the first AUG averages 50 to 100 nt (50). Dividing the maximum integration target by the size of the average integration site yields a maximum of 1 x 104 to 2 x 104 integration sites capable of expressing his at levels sufficient to confer resistance.
Although this calculation probably overestimates the number of integration sites, the value is still small enough to suggest that U3His vectors may be used as insertional mutagens. Thus, the provirus would serve as a molecular tag to clone any gene whose function is linked to an observable phenotype, assuming that the gene is expressed at levels sufficient to promote histidinol resistance. Cells derived from a collection of 104 to 105 independent Hisr clones should contain proviruses in all expressed sites including transcriptionally active genes. Consequently, selection for Hisr cell populations should reduce the number of integrants needed to screen for the loss of gene function. Isolation of clones expressing null phentotypes may be improved by the use of hypodiploid cells (e.g., CHO) and in cases where gene inactivation yields a selectable phenotype (e.g., tumor suppressing genes or antioncogenes). Alternatively, homozygous loss of gene function may be accomplished by breeding mice derived from embryonal stem cells infected with U3His vectors and that contain germ line integrations of the provirus.

Conclusions
By tagging transcriptionally active genes, promoter trap vectors provide a means to isolate promoters active in different cell types. While U3His viruses may be used to identify and clone genes that are differentially expressed in cells generating observable phenotypes during development (e.g., embryonal or hemopoietic stem cells), U31acZ vectors may reveal, in addition, patterns ofgene regulation that accompany alterations in cellular phenotypes (e.g., differentiation or malignant transformation). Sequences flanking the 5' ends of his or lacZ expressing proviruses may be amplified by the polymerase chain reaction cloned, and used as probes to isolate cDNA clones derived from transcripts of genes expressed during distinct stages of cellular development.
Finally, U3His vectors may make effective insertional mutagens because they select for instances in which integration occurs into expressed sites. This reduces the number of integrants needed to screen for loss of gene function and may enable genes controlling phenotypes in mammalian cells to be isolated.