Gene Expression in Development and Disease

Coupling genetics and post-genomic approaches to decipher the cellular splicing code at a systems-wide level

Yilei Liu, David J. Elliott


Nuclear RNA processing is a critical stage in eukaryotic gene expression, and is controlled in part by the expression and concentration of nuclear RNA-binding proteins. Different nuclear RNA-binding proteins are differentially expressed in different cells, helping the spliceosome to decode pre-mRNAs into alternatively spliced mRNAs. Recent post-genomic technology has exposed the complexity of nuclear RNA processing, and is starting to reveal the mechanisms and rules through which networks of RNA-binding proteins can regulate multiple parallel pathways. Identification of multiple parallel processing pathways regulated by nuclear RNA-binding proteins is leading to a systems-wide understanding of the rules and consequences of alternative nuclear RNA processing.

  • RNA-binding protein
  • serine- and arginine-rich protein (SR protein)
  • splicing
  • systematic evolution of ligands by exponential enrichment (SELEX)
  • UV cross-linking and immunoprecipitation (CLIP)

A nuclear code of RNA-binding proteins is important for decoding pre-mRNAs

Most human genes are split up into coding exons separated by long non-coding introns. Splicing is an essential step in gene expression carried out by a macromolecular complex named the spliceosome which removes introns and concomitantly joins exons. Spliceosomal recognition of exons and introns within pre-mRNA depends on three sequence elements at exon/intron junctions. These are the 5′ splice site, the branchpoint sequence and the 3′ splice site. In metazoans, these sequence elements can be highly degenerate, which has implications for how easily they are recognized by the spliceosome [1]. Besides the three primary elements, efficient splicing of exons is also controlled by other cis-elements in both exons and introns called splicing enhancers and splicing repressors depending on their effect on splicing. A number of nuclear RNA-binding proteins bind to splicing enhancers and repressors and mediate effects on splicing [2,3]. These proteins are either essential components of the spliceosome like ASF/SF2, or proteins that directly or indirectly interact with the splicing machinery to positively or negatively affect exon recognition. As well as splicing, RNA-binding proteins also play a key role in 3′-end formation of mRNAs and in the nuclear metabolism of various non-coding RNAs [4].

Post-genomic technologies have been developed to globally analyse splicing pathways in different cells and tissues [5,6]. Microarrays have been designed to monitor splicing changes either by detecting signals from individual exons within mRNAs (exon arrays) or by detecting spliced exon junctions (since junctions will have different sequences depending on which exon combinations are spliced together). Alternative splicing has also been monitored using deep sequencing approaches which can provide information on splicing patterns in whole transcriptomes [7,8]. Together these approaches indicate a high frequency of alternative splicing in human cells with above 90% of human genes encoding more than one mRNA isoform. Some tissues have particularly high levels of alternative splicing. Among these are the brain and the testis [9]. The testis is the site of male germ cell development which is one of the major ongoing developmental pathways in the adult. The levels of splicing of exons which are conserved between humans and mice (and therefore likely to play a fundamental rather than a tissue-specific role) are particularly high in the brain, while the testis shows more non-conserved splicing [10,11].

What controls these cell type patterns of alternative splicing? One important factor is the differential expression of RNA-binding proteins both anatomically and developmentally including in the brain and testis [12]. Analysis of the expression of RNA-binding proteins in the developing mouse brain by in situ hybridization indicated that many are only expressed in specific regions of the brain, while others are only expressed in neurons [13]. Anatomically, expression of the nuclear T-STAR (testis signal transduction and activation of RNA) RNA-binding protein is mainly detected in the developing brain and the testis [14]. The testis also contains a number of cell type-specific RNA-binding proteins not expressed at other anatomic locations including hnRNP (heterogeneous nuclear ribonucleoprotein) G-T and RBMY (RNA-binding motif protein, Y-linked), and expression of the splicing activator Tra2β is up-regulated in the testis [1518].

Deciphering the cellular splicing code

Different RNA-binding proteins have distinct RNA target sequences [19]. The location of these target sequences is crucial for understanding the mechanism of the splicing regulation by these proteins and revealing downstream transcripts regulated by their expression in different cell types and developmental stages.

The development of SELEX (systematic evolution of ligands by exponential enrichment) has enabled binding targets of specific RNA-binding proteins to be identified from pools of random RNAs through competitive binding selections. Usually SELEX does not identify a single target sequence, but often a degenerate consensus sequence of approx. 4–8 nt in length. Examples of such degenerate motifs are found for the SR (serine- and arginine-rich) protein family of splicing regulators [20]. The SR protein-derived SELEX-binding sites have been used to design position weight matrices which have been incorporated into motif search tools such as ESEfinder [21]. These search tools can identify potential binding sites within input RNAs which can then be experimentally validated. SELEX has also been used successfully to reveal a high affinity RNA-binding site for the germ-cell-specific nuclear binding protein RBMY [22]. This RBMY-binding site is composed of a highly conserved 5 nt sequence C(A)/(U)CAA placed within a stem loop structure. The RBMY RNA-binding motif has been characterized by mutagenesis, with protein–RNA interactions structurally visualized by NMR and quantitatively measured by EMSA (electrophoretic mobility-shift assay).

Although successful in identifying high affinity RNA target sequences, the conventional SELEX protocols have some drawbacks. First and foremost, because the selection is usually performed on a pool of in vitro synthesized RNA random 20-mers, the final ‘winner’ may not actually exist in the genome. The binding of protein to RNA is a dynamic reaction. In the living cells RNA-binding proteins need to associate and dissociate from the pre-mRNAs which requires a moderate binding affinity. In contrast conventional SELEX protocols select the strongest binders from which the proteins might be unable to efficiently detach. These artificial RNA sequences are not likely to represent the physiological binding targets in the genome. Secondly sometimes SELEX motifs can be very short and highly degenerate. Since these short degenerate motifs frequently occur in the genome, it is difficult to use SELEX data to pick out real functional RNA-binding targets against a huge background number of possible genomic targets.

Most recently some of these disadvantages have been circumvented by the development of the CLIP (UV cross-linking and immunoprecipitation) procedure which can identify the physiological binding targets of specific RNA-binding proteins [23]. CLIP involves a UV cross-linking step that in situ ‘freezes’ all the protein–RNA complexes within intact cells. Then the cells are lysed and the protein–RNA complexes of interest are purified by immunoprecipitation with a specific antibody. After a few steps of processing, a pool of short cDNA CLIP tags is obtained which correspond to the binding targets for that particular protein. It is possible to map a CLIP tag of approx. 50 nt back on to the mouse or human genome sequences by BLAT searching, assuming that the CLIP tag is not a repetitive sequence. The latest versions of the CLIP procedure use deep sequencing technology and high throughput bioinformatics to obtain a global view of all the RNA-binding targets within the transcriptome [2426]. Experimental results indicate that at the whole transcriptome level the detected RNA-binding sites for the neural RNA-binding Nova protein are very reproducible between brains of individual mice [24], and binding sites for the RNA-binding protein Fox-2 (feminizing locus on the X-2) are similar in different cell lines [26]. These results suggest that the association of RNA-binding proteins within nuclei will be a stable feature of the nuclear transcriptome which might be annotated in the future on genome browsers like that at UCSC (

Even from the generation of transcriptome-wide RNA target maps for a small number of RNA-binding proteins to date, some rules are starting to emerge. One is that RNA-binding protein target sites in vivo are likely to be somewhat widely dispersed in the transcriptome, including in regions not traditionally considered to be involved in splicing regulation such as deep introns and intergenic regions. Deep intronic target sites may correspond in some cases with unannotated alternative spliced exons, although since introns are usually longer than exons, there is more chance of them containing sequences with affinity to RNA-binding proteins by chance. Intergenic regions which might have regulatory roles in gene expression are frequently transcriptionally active. Another discovery from CLIP-based identification of physiological RNA target sites is that RNA-binding proteins can bind to RNA with a broader spectrum of binding sequence than often predicted from SELEX. The optimum consensus binding motifs of the Fox-2 and Nova RNA-binding proteins had been identified by SELEX before CLIP experiments were devised [2729]. Recent in vivo target RNAs identified by CLIP containing the Fox-2 and Nova consensus motif only accounted for a relatively small proportion of total target RNAs; for example, only 10–20% of Nova CLIP tags contain high binding affinity YCAY clusters [30] and only 33% of Fox-2 tag clusters harbour a GCAUG-binding motif [26]. Hence it is possible that RNA-binding proteins may modify their binding behaviour when present in complexes which additionally contain other RNA-binding proteins or co-regulators. Analysis of ASF/SF2-binding target sequences in the transcriptome similarly indicated that only exonic CLIP sequences fitted to a recognizable consensus sequence [31]. Another observation has been that not all possible consensus target-binding motif-containing sequences in the transcriptome are bound by Fox-2 protein [26]. Probably the ignored binding sites may be occluded by other RNA-binding proteins within the nucleus, or may not be accessible for protein binding due to RNA secondary structure. This underscores the fact that the physiological binding targets of RNA-binding proteins may vary in different cellular contexts or biological systems.

An important strategy in dissecting the downstream functional effects on RNA targets caused by protein binding has been to deplete these proteins from cells either using genetic approaches (for example in knockout mice [24]) or using siRNA (small interfering RNA) (in cultured cells [26]). RNA splicing pathways are disrupted in mice with null alleles of some cell type-specific RNA-binding proteins such as Nova, ASF/SF2 and SC35 [3235]. Transcriptome-level mapping of target RNA-binding sites coupled with genetic/siRNA approaches have enabled general rules for splicing regulation to be drawn up. An important general rule is that RNA splicing regulator binding within a transcript does not necessarily always lead to a functional effect on splicing of the target RNA. Based on the more detailed studies of a few RNA-binding proteins to date, functional RNA-binding sites which result in splicing regulation by RNA-binding proteins are usually close (within 200 nt) to the regulated exon–intron splice junction and conserved between species [26,36]. This correlates well with the higher observed levels of intron conservation close to alternatively spliced exons (usually spanning approx. 100 nt either side) [3739].

The actual position of the specific nuclear RNA-binding protein target site relative to the regulated exon is also important for the outcome of the splicing regulation. The most prominent example to date is provided by the Fox-2 protein which respectively activates or represses alternative splicing when binding to target sites downstream or upstream of alternatively spliced exons [26]. The regulatory RNA map of Nova also shows similar trend, but with exonic Nova sites blocking spliceosome assembly and intronic sites close to alternatively spliced exons promoting spliceosome assembly [36].

Many of the deep intronic binding sites observed for RNA-binding proteins might not affect splicing, but these protein–RNA interactions might be relatively long lived since deep introns might be bound by less competing RNA-binding proteins than those sites close to intron–exon junctions. On the other hand, the observed wide distribution of RNA-protein binding sites may suggest some further potential functions of the RNA-binding proteins in addition to splicing control. Consistent with this, SR proteins also play an important role in nuclear mRNA export [40,41]. CLIP experiments have shown that in addition to pre-mRNA targets, hnRNP A1 binds to a miRNA (microRNA) precursor and acts as an auxiliary factor involved in miRNA processing [42] and ASF/SF2 likewise binds to micro and other non-coding RNAs [25]. It has also been reported that Nova1 controls the selection of alternative polyadenylation sites by binding to the YCAY elements flanking the poly(A) site of some nascent pre-mRNAs. Interestingly, the relative position of the target binding site to the poly(A) site determines the effect of Nova on the use of the poly(A) site [24].

Future challenges

The basic mechanism of splicing and rules for alternative splicing regulation were largely worked out on a single gene by gene basis. However, within living cells alternative splicing enables even single splicing regulator proteins to bind to and co-ordinately regulate an array of target transcripts in parallel at the post-transcriptional level. Hence in the future, rather than single processing events, an important goal will be global analysis of the effects of RNA splicing regulation at a systems-wide level within the transcriptome (see Figure 1). This will enable more comprehensive rules of splicing regulation to be established, as well as the kinds of biological responses which are regulated in concert by each single RNA-binding protein (i.e. their downstream effects on proteins expressed in a cell, on cell biology and on development). For example the Nova protein specifically regulates splicing of mRNAs involved in synaptic plasticity [43]. Such systems-wide approaches of the function of specific splicing regulators are important, since the role of RNA-binding proteins in development has been in general much less intensively studied than the role of transcription factors, but all evidence suggests splicing factors are probably just as important. For example, global defects in splicing regulation occur as a result of disruptions in the levels of nuclear RNA-binding proteins in myotonic dystrophy resulting in pleiotropic disease symptoms according to the particular pre-mRNAs affected [44]. It will also be important to correlate empirical RNA-binding maps of different nuclear proteins with each other, particularly those of physically interacting splicing regulators. This will probably reveal complicated interaction maps. For example, transcriptome-wide studies of ASF/SF2 and Fox-2 RNA splicing regulators have indicated that each protein binds to the pre-mRNAs of other splicing regulators, suggesting that RNA-binding proteins form a complex network regulating both themselves and downstream gene expression within cells.

Figure 1 Systems-wide effects of RNA-binding proteins

Even single RNA-binding proteins are able to co-ordinately regulate multiple downstream targets in parallel in the transcriptome. These target transcripts can be identified at a systems-wide level by global approaches such as CLIP and using splicing junction arrays to monitor responses of nuclear RNA processing pathways to specific splicing regulator protein depletion. Some RNA splicing regulators are involved in networks of cross-regulation with other RNA-binding proteins and transcription factors, and a number of RNA splicing regulators also autoregulate their own levels through splicing. The biological effects of RNA splicing regulators in the nucleus will manifest at the cellular level in downstream changes in protein expression which impact on both developmental decisions and cellular metabolism. Understanding the biological role of nuclear RNA-binding proteins at a systems-wide level through molecular and genetic approaches is the key to interpreting the downstream effects of these splicing regulators.


This work was supported by the Biotechnology and Biological Sciences Research Council [grant number BB/D013917/1] and the Wellcome Trust [grant number WT080368MA]. Y. L. was supported in part by the Overseas Research Students Awards Scheme.


  • Gene Expression in Development and Disease: 13th Tenovus-Scotland Symposium, an Independent Meeting held at University of Glasgow, Glasgow, U.K., 16–17 April 2009. Organized and Edited by Sheila Graham (Glasgow, U.K.).

Abbreviations: CLIP, UV cross-linking and immunoprecipitation; Fox-2, feminizing locus on the X-2; hnRNP, heterogeneous nuclear ribonucleoprotein; miRNA, microRNA; RBMY, RNA-binding motif protein, Y-linked; SELEX, systematic evolution of ligands by exponential enrichment; SR, serine- and arginine-rich


View Abstract