In addition to protein-coding genes, mammalian pol II (RNA polymerase II) transcribes independent genes for some non-coding RNAs, including the spliceosomal U1 and U2 snRNAs (small nuclear RNAs). snRNA genes differ from protein-coding genes in several key respects and some of the mechanisms involved in expression are gene-type-specific. For example, snRNA gene promoters contain an essential PSE (proximal sequence element) unique to these genes, the RNA-encoding regions contain no introns, elongation of transcription is P-TEFb (positive transcription elongation factor b)-independent and RNA 3′-end formation is directed by a 3′-box rather than a cleavage and polyadenylation signal. However, the CTD (C-terminal domain) of pol II closely couples transcription with RNA 5′ and 3′ processing in expression of both gene types. Recently, it was shown that snRNA promoter-specific recognition of the 3′-box RNA processing signal requires a novel phosphorylation mark on the pol II CTD. This new mark plays a critical role in the recruitment of the snRNA gene-specific RNA-processing complex, Integrator. These new findings provide the first example of a phosphorylation mark on the CTD heptapeptide that can be read in a gene-type-specific manner, reinforcing the notion of a CTD code. Here, we review the control of expression of snRNA genes from initiation to termination of transcription.
- C-terminal domain
- RNA polymerase II (pol II)
- RNA processing
- small nuclear RNA (snRNA)
The vertebrate snRNA (small nuclear RNA) genes transcribed by pol II (RNA polymerase II) are ubiquitously and generally highly expressed, providing ideally a model system to elucidate both fundamental and gene-type-specific mechanisms underlying transcription and co-transcriptional RNA processing.
The structure of human snRNA genes transcribed by pol II
The human snRNA genes transcribed by pol II are exemplified by the best-characterized genes encoding the spliceosomal U1 and U2 snRNAs. These genes are repeated many times in the human genome and, in the case of the U2 snRNA genes, are located within 6.1 kb tandem repeats on chromosome 17 . The U1 genes are more loosely clustered repeat units of up to 45 kb on chromosome 1 . The genes are simple and compact with short TATA-less promoters minimally containing an enhancer-like DSE (distal sequence element), and an essential snRNA gene-specific PSE (proximal sequence element) (see ). snRNA gene transcripts are not spliced and the 3′-ends are not polyadenylated, presumably to discourage association with the translation machinery. The snRNA gene-specific 3′-box, located 9–19 bp downstream of the RNA-encoding region, is required for correct 3′-end formation of snRNAs . A diagram of the major cis-acting elements is shown in Figure 1. The primary transcript of the U1 genes extends to just beyond the 3′-box, while the primary transcript of the U2 genes extends for up to 1 kb [5,6] (Figure 1), most likely to be due to differences in transcription termination (see below).
snRNA gene promoters recruit gene-type-specific pre-initiation complexes (Figure 2)
Several factors that activate transcription of both mRNA and snRNA genes (e.g. Oct-1 and Staf) bind to sites in the DSE . Oct-1 potentiates interaction of the multisubunit snRNA gene-specific transcription factor, PTF (PSE-binding transcription factor)/SNAPc (snRNA activator protein complex)/PBP (PSE-binding protein), with the PSE [3,7–9]. In common with mRNA genes, the pol II-dependent snRNA genes require the general factors TBP (TATA-box-binding protein), TFIIB (transcription factor IIB), TFIIA, TFIIE and TFIIF for transcription in vitro [3,10]. A more complete characterization of the transcription machinery has been hampered by the relatively low level of in vitro transcription from templates with a U1/U2 promoter. There is currently no direct evidence that TFIIH is required for transcription. However, phosphorylation of Ser5 of the Y1SPTSPS7 heptad repeat of the CTD (C-terminal domain) of pol II by the CDK7 (cyclin-dependent kinase 7) subunit of TFIIH probably activates co-transcriptional 5′ capping of all pol II transcripts (see below), suggesting that the kinase at least is associated with snRNA genes. In support of this, pol II transcribing snRNA genes is phosphorylated on Ser5 of the CTD repeat . TFIID, the TAFII (TBP/TBP-associated factor II) complex, which is involved in transcription of mRNA-coding genes, cannot substitute for TBP in transcription of the human U1 gene in vitro, and the form of TBP used is unclear . ChIP (chromatin immunoprecipitation) analysis indicates that TAF-5 (TAFII100) is associated with snRNA genes  but it is not clear which, if any other TAFs are involved. Thus PTF/SNAPc/PBP is, so far, the only component of the pre-initiation complex that is specific to pol II-transcribed snRNA genes.
Interestingly, some snRNA genes, including those for U6 spliceosomal snRNA and 7SK RNA, are transcribed by pol III despite promoter structures very similar to those of the U1 and U2 genes . Pol III-dependent snRNA genes are characterized by the presence of an additional TATA box at –25 relative to the site of transcription initiation, which favours recruitment of pol III-specific transcription factors .
The function of the 3′-box
The 3′-end formation of human snRNAs transcribed by pol II occurs in several steps initiated by recognition of the cis-acting 3′-box . This 13–16-nt-long element directs the production of a 3′ extended pre-snRNA, which is a specific substrate for subsequent processing that generates the mature 3′-end after transport to the cytoplasm [14–16].
The demonstration that transcription of the human U2 gene continues for approx. 800 nt beyond the 3′-box in vivo  strongly suggested that this element directs RNA processing rather than transcription termination. In addition, the 3′-box can direct accurate 3′-end formation of RNA substrates in vitro , providing further evidence that it is a bone fide RNA-processing element. Both protein and RNA components are required for in vitro processing and at least one heat-labile component is involved . In common with in vitro polyadenylation, processing is activated by a 7-methyl-G cap on substrate RNA . There is no evidence for the involvement of mRNA-processing factors in 3′-box recognition in vitro [17,18]. More recently, a large complex termed Integrator has been shown to play a role in pre-snRNA 3′-end formation in vivo [19,20]. This complex contains homologues of the CPSF-100 (cleavage and polyadenylation specificity factor-100) and CPSF-73 subunits of the CPSF, termed respectively either RC74 and RC68  or Int9 and Int11 . ChIP analysis indicates that Integrator subunits are associated with U1 and U2 genes in vivo and RNAi (RNA interference)-mediated knockdown of Int9 or Int11 causes a defect in pre-snRNA production . There is now good evidence that CPSF-73 is the cleavage activity for 3′-end formation of polyadenylated RNAs . RC68/Int9 is therefore a good candidate for the endonuclease implicated in 3′-box-directed processing in vitro . In line with the findings from analysis of in vitro 3′-box-directed processing, no factors involved in mRNA 3′-end formation have been found associated with the Integrator complex . Taken together, these findings indicate that the 3′-box is an RNA-processing element analogous to the polyadenylation signal commonly found in protein-coding genes, recognized by a distinct set of processing factors. Interestingly, two 3′-box-dependent processing products are produced in vitro: one corresponding to the in vivo pre-snRNA and one slightly longer at the 3′-end . Production of pre-snRNA may therefore involve endonucleolytic cleavage by RC68/Int9 followed by limited exonucleolytic trimming.
Integrating the transcription of snRNA genes with RNA 3′-end formation (Figure 3)
In vivo, the 3′ box is only recognized in transcripts from an snRNA promoter [23,24], emphasizing the close link between transcription and RNA processing. Recent work has demonstrated that the CTD of the large subunit of pol II and its phosphorylation are required for 3′-box-dependent RNA 3′-end formation in vivo [6,25], indicating that processing occurs co-transcriptionally. The CTD plays a major role in co-ordinating transcription with co-transcriptional RNA processing in expression of protein-coding genes (recently reviewed in [26,27]. In mammalian pol II, this structure comprises 52 repeats with the consensus Y1SPTSPS7 and its deletion can cause a major defect in capping, splicing and polyadenylation of pre-mRNAs. The CTD undergoes reversible phosphorylation in vivo during transcription. Phosphorylation of Ser5 at initiation by the CDK7 subunit of TFIIH allows the CTD to interact with and activate factors that cap the 5′-end of the nascent RNA. Subsequent phosphorylation of Ser2 by P-TEFb (positive transcription elongation factor b), comprising CDK9 and a cyclin T subunit, increases the ability of the CTD to activate splicing and cleavage/polyadenylation. In yeast, Ser2 phosphorylation potentiates interaction with the polyadenylation factor Pcf11. Activation of polyadenylation in mammalian in vitro systems by phospho-CTD and inhibition of polyadenylation in Drosophila by CDK9 inhibitors suggest that a similar mechanism operates in higher eukaryotes. Ser2 phosphorylation is also required to overcome an early block to transcription caused by the N-TEFs (negative elongation factors), DSIF [DRB (5,6-dichloro-1β-D-ribofuranosylbenzimidazole) sensitivity-inducing factor] and NELF (negative elongation factor), and the conversion of pol II into a processively elongating form during transcription of protein-coding genes .
The CTD of pol II is also phosphorylated during transcription of snRNA genes , although the transcription cycle differs from that of protein-coding genes in some respects . ChIP studies indicate that P-TEFb is associated with U2 snRNA genes and that the CTD of pol II transcribing U2 genes is phosphorylated on Ser2 in addition to Ser5 . Inhibitors of P-TEFb abolish recognition of the 3′-box in vivo [6,11], and mutation of Ser2 to alanine residue, which cannot be phosphorylated, reduces 3′-box-dependent processing [11,20]. Taken together, these results suggest that phosphorylation of Ser2 by CDK9 plays a key role in 3′ box-dependent processing in vivo. However, Ser2 phosphorylation by P-TEFb does not appear to be necessary for elongation of transcription of snRNA genes . Interestingly, repeats 1–25 of the wild-type CTD, which are largely consensus, activate recognition of the 3′ box more effectively than the repeats between 27 and 52, which generally lack a serine residue in position 7 . In contrast, both halves of the CTD are equally effective in activating polyadenylation . In accordance with these findings, selective mutation of Ser7 to alanine affects recognition of the 3′ box but does not affect polyadenylation of transcripts from several protein-coding genes [20,31]. Interestingly, mutation of Ser7 to alanine residue also abrogates recruitment of Integrator to the U1 and U2 snRNA genes. It has now been demonstrated that Ser7 is phosphorylated during transcription of snRNA genes and that Integrator binds specifically to the phosphorylated CTD in vitro only if there is a serine residue in position 7 , Thus, as shown for activation of capping and polyadenylation, CTD phosphorylation potentiates interactions with a 3′ box-specific processing factor, resulting in increased recruitment and possibly also allosteric activation. These studies on snRNA 3′-end formation have therefore highlighted new aspects of the role of the pol II CTD in RNA processing. Since Ser2 phosphorylation is also implicated in recognition of the 3′ box, it is possible that phosphorylation of both Ser2 and Ser7 is required for efficient recruitment of the Integrator complex.
It was previously shown that the protein-coding gene-specific transcription factor TFIID mediates interaction between the polyadenylation factor CPSF and the CTD . Likewise, the snRNA promoter-specific factor PTF may act in concert with phosphorylation of Ser7 of the CTD to specifically recruit Integrator and other factors involved in 3′ box-dependent processing and/or recruit the Ser7 kinase to snRNA genes (see Figure 3). This would provide a simple explanation for the link between a PSE-containing promoter and recognition of the 3′ box in vivo.
Interestingly, in addition to affecting 3′ processing, mutation of Ser7 to alanine residue causes a defect in transcription without affecting pol II recruitment . Integrator recruitment may therefore also be required for efficient initiation to occur.
The role of the 3′ box in transcription termination
Mutation of the 3′ box affects termination of transcription of snRNA genes . Since polyadenylation signals are also required for transcription termination (recently reviewed in ), the molecular mechanisms operating in snRNA and protein-coding genes may be analogous even though a different complement of processing factors is involved. The Torpedo Model of termination suggests that degradation of the nascent transcript from the unprotected 5′-end produced by cleavage at the polyadenylation site destabilizes the polymerase. In support of this model, the 5′–3′ exonuclease Xrn2 is required for efficient termination of transcription of protein-coding genes in human cells . Recognition of the polyadenylation signal can also result in changes in the interaction of factors with the pol II CTD and/or CTD phosphorylation, all of which may contribute to termination (Allosteric Model, see ). Likewise, the 3′ box may promote termination as a result of RNA cleavage and/or changes in the association of factors with the CTD or changes in CTD phosphorylation.
Transcription of the U2 snRNA gene extends for approx. 800 bp after the 3′ box, while transcription of the U1 gene terminates abruptly shortly after the 3′ box . In vivo footprinting suggests that a factor or complex binds to the DNA just downstream of the U1 3′ box . The footprinted region effectively terminates transcription when it is placed downstream of the U2 3′ box , implicating factors binding to this region in the termination process. There are no sequence similarities to the ‘U1 terminator’ 800 bp downstream of the U2 3′ box, suggesting that different strategies are used to terminate transcription of these two highly related snRNA genes.
There has been much progress towards understanding the control of expression of the pol II-transcribed mammalian snRNA genes in recent years. However, there are still many outstanding questions. A more complete understanding of the mechanisms involved in transcription and co-transcriptional RNA processing awaits the identification of all the factors involved in pre-initiation complex assembly, poll II recruitment and 3′ box-directed processing.
Transcription: A Biochemical Society Focused Meeting held at the University of Manchester, U.K., 26–28 March 2008 as part of the Gene Expression and Analysis Linked Focused Meetings. Organized and Edited by Stefan Roberts (Manchester, U.K.) and Robert White (Beatson Institute, Glasgow, U.K.).
Abbreviations: CDK7, cyclin-dependent kinase 7; ChIP, chromatin immunoprecipitation; CTD, C-terminal domain; DSE, distal sequence element; CPSF, cleavage and polyadenylation specificity factor; N-TEF, negative transcription elongation factor; pol II, RNA polymerase II; P-TEFb, positive transcription elongation factor b; PSE, proximal sequence element; PBP, PSE-binding protein; PTF, PSE-binding transcription factor; SNAPc, snRNA activator protein complex; snRNA, small nuclear RNA; TBP, TATA-box-binding protein; TAF, TBP/TBP-associated factor; TFIIB, transcription factor IIB; USE, upstream sequence element
- © The Authors Journal compilation © 2008 Biochemical Society