The analysis of deep sequencing data allows for a genome-wide overview of all the small RNA molecules (the ‘sRNome’) that are present in a single organism. In the present paper, we review the processing of CRISPR (clustered regularly interspaced short palindromic repeats) RNA, C/D box sRNA (small non-coding RNA) and tRNA in Nanoarchaeum equitans. The minimal and fragmented genome of this tiny archaeon permits a sequencing depth that enables the identification of processing intermediates in the study of RNA processing pathways. These intermediates include circular C/D box sRNA molecules and tRNA half precursors.
- C/D box small non-coding RNA (C/D box sRNA)
- clustered regularly interspaced short palindromic repeats (CRISPR)
- genomic fragmentation
- split gene
The emergence of RNA-Seq high-throughput sequencing methodologies has provided valuable tools for the genome-wide analysis of the RNA content of a cell. To obtain an overview of all the small RNA molecules that are present in an archaeal cell, several studies utilized RNA-Seq, which helped us to identify novel small RNA families and to analyse their maturation and possible functions. Such studies include analyses of the transcriptomes from Methanosarcina mazei Gö1 , Sulfolobus solfataricus P2  and Pyrobaculum species  among others. The growing known repertoire of small non-coding RNA that can be identified in archaea includes (i) C/D box sRNAs (small non-coding RNAs), H/ACA sRNAs (small RNAs involved in the modification of rRNA) , (ii) regulatory sense and antisense sRNAs [5–7], (iii) crRNAs [CRISPR (clustered regularly interspaced short palindromic repeats) RNAs] (component of the prokaryotic immune system), and (iv) other unclassified RNA molecules .
One interesting candidate for the study of a complete ‘minimal’ RNome present in an archaeal cell is Nanoarchaeum equitans . This tiny archaeon grows in hydrothermal vents and is always found attached to a second archaeon, Ignicoccus hospitalis. N. equitans harbours a minimal and highly compact genome that has a size of only 480 kb . Operons that are conserved among other archaea are widely dispersed throughout the genome and several genes are found to be split [10,11]. N. equitans appears to lack enzymes that are required for the production of nucleotides and is expected to rely on their import from I. hospitalis. This immediately raises the question of which RNA molecules would be produced under such constraints.
The nanoarchaeal sRNome
The analysis of the RNome of an organism via RNA-Seq methodology requires consideration of potential biasing influences during RNA preparation or cDNA library preparation . One attempt to minimize such bias is to treat all investigated RNA samples enzymatically in order to convert 5′-terminal triphosphate and hydroxy groups into suitable substrates for adapter ligation. After such treatment, the most abundant RNA molecules in the nanoarchaeal cell are, as expected, the 5S, 16S and 23S rRNA . N. equitans is a hyperthermophile with an AT-rich genome which allows for the easy identification of such structural RNAs due to the apparent GC-content difference of between 32% for the entire genome and 66–73% for rRNA and tRNA genes respectively [14,15] (Figure 1). Other small RNAs that are highly abundant in the cell are the C/D box and H/ACA box sRNAs and crRNAs (Figure 1). Both RNA families are frequently found in archaea and are detailed below.
Several interesting features were observed for the nanoarchaeal tRNA-processing pathways that showcase exceptions to tRNA maturation mechanisms that were believed to be universal. First, N. equitans is the only known organism that can survive without the activity of the ribonucleoprotein RNase P as both protein subunits and the catalytic RNA are absent from this organism . The major function of RNase P is the removal of 5′-terminal leader sequences from tRNA precursors and therefore this ribozyme is responsible for mature 5′-termini of tRNA molecules . N. equitans circumvents the need for this activity by (i) creating leaderless tRNA precursors, (ii) adding extra nucleotides allowing proper transcription initiation for tRNAs that require a pyrimidine as the first tRNA base, and (iii) presumably tolerating tRNAs that contain 5′-triphosphate termini. The enzymes responsible for 3′-termini maturation of tRNAs (the endonuclease tRNase Z and the CCA-adding enzyme) are still present in the cell. A different endonuclease, the tRNA-splicing endonuclease, is required for the removal of tRNA introns. Four tRNAs contain introns of which one forms a stable RNA that contains a k-turn motif. However, in N. equitans, the splicing endonuclease is required to also fulfil a different function as it is the enzyme that facilitates the trans-splicing of tRNA half molecules [18,19]. Six tRNA genes are split in the genome of N. equitans. The 5′- and 3′-tRNA half genes encode the split tRNA body and a 12–14 nt GC-rich sequence that finds its matching reverse complementary sequence only in the complementary tRNA half molecule. These sequences are then thought to form a duplex that facilitates folding of the tRNA body . The junction of the duplex and the tRNA body forms a structural motif that is recognized and cleaved by the tRNA-splicing endonuclease. Thus tRNA trans-splicing guarantees the presence of a complete set of functional tRNAs in N. equitans. A heterotetrameric splicing endonuclease variant co-evolved with the occurrence of split tRNAs and tRNA precursors with highly variable intron locations [21,22].
The depth of RNA-Seq analysis of nanoarchaeal small RNAs allowed the identification of the tRNA half precursors in the cell. Most precursors consist precisely of the tRNA half body and the GC-rich stretch for tRNA half joining. However, whereas the 5′-termini of these tRNA halves are defined and conserved promoters are detectable , the 3′-termini are highly variable which also sheds light on the understudied process of terminating transcription in archaea. Several tRNA half genes are flanked by oligo(T) sequences that are known to play a role in archaeal transcription termination . However, it is apparent that tRNA half transcripts can terminate without such an oligo(T) stretch or avoid termination despite the presence of such a sequence motif. One example is the 5′-tRNA(His) precursor transcript that does not show a distinct termination event at six thymidine residues located downstream of the tRNA half, but appears to be part of the 5′-UTR (untranslated region) of the adjacent valyl-tRNA synthetase gene (Figure 2). Most of the tRNA half genes are located directly next to a gene that is oriented in reverse direction and one tRNA half gene is positioned adjacent to a CRISPR cluster (Figure 2). These observations imply that tRNA molecules can be assembled in trans from different tRNA half precursors that vary in their 3′-terminal ends. A potential regulatory role of tRNA halves that are located in the 5′-UTR of genes requires further analysis.
C/D box sRNA
The most abundant RNA molecule (after rRNAs) identified in the RNome of N. equitans is a member of the C/D box sRNA family. These molecules were first identified in 2000 as the archaeal counterparts of the eukaryotic snoRNAs (small nucleolar RNAs) . The sRNA molecules contain guide sequences that are utilized in the context of a ribonucleoprotein complex to identify the targets for 2′-O-methylation (C/D box) and pseudouridylation (H/ACA box). The C/D box sRNAs were identified to be abundant in hyperthermophilic archaea, whereas often only a few sRNA genes are found in mesophilic archaea . RNA-Seq analyses identified 26 C/D box sRNAs and one H/ACA box sRNA in N. equitans. The most abundant sRNA is encoded as a dicistronic tRNA–sRNA construct and is probably produced via tRNAse Z endonucleolytic cleavage. Similar dicistronic arrangements are found in plants . Little is known about the maturation of the C/D box sRNAs that was shown to include circular sRNA molecules [26,27]. Permuted RNA-Seq sequencing reads of N. equitans sRNAs confirmed the presence of circular RNA molecules. Some of the sRNA genes are located next to the split protein-coding genes. One example is the N-terminal fragment of the reverse gyrase gene that is flanked by two sRNA genes, one of which has a permuted order of conserved C and D RNA motifs. Therefore only a circularized sRNA version would ensure functionality. The intriguing possible role of C/D box sRNAs in genome fragmentation will require further analysis of the sRNA maturation pathway. Previous observations support such an impact on genome evolution. First, C/D box sRNA can be encoded as a tRNA intron . Here, the tRNA-splicing machinery creates a circular trans-acting sRNA and the fragmented tRNA is restored . Secondly, in mammals, snoRNAs are described as potential mobile genetic elements that can copy themselves to different genomic locations [30,31].
One surprising observation is the presence of two CRISPR clusters in the genome of N. equitans [10,13]. The CRISPR/Cas (CRISPR-associated) system is an antiviral defence system present in archaea and bacteria, which uses small interfering RNAs (crRNAs) to target foreign nucleic acids . A CRISPR cluster consists of repeat sequences interspaced by unique spacer sequences that can be derived from previous encounters with mobile genetic elements (e.g. viruses). These clusters are adjoined by a set of cas genes coding for proteins involved in the functionality of the immune system (adaptation of new spacers, processing of crRNAs and interference with foreign nucleic acids) [33–36]. In N. equitans, RNA-Seq revealed a high abundance of mature crRNA derived from both CRISPR clusters. The processing of crRNA occurred in the repeat elements resulting in mature crRNA with an 8 nt 5′-tag and gradually trimmed 3′-ends .
Similar results were found in RNA-Seq studies of M. maripaludis C5 small RNA, where crRNAs of the single CRISPR locus also show an 8 nt 5′-tag and a trimmed 3′-end . In this archaeon, the Cas6b enzyme was identified to cleave the precursor crRNA in the repeat region resulting in the 8 nt 5′-tag. Unlike other Cas6 enzymes investigated [38–41], Cas6b of M. maripaludis possesses a catalytic centre that contains not one but two conserved histidine residues, which were both shown to play an important role in the processing of mature crRNA . N. equitans contains all of the Cas proteins that are thought to be required for the functionality of the CRISPR/Cas system.
The observed abundance of some small RNA families in N. equitans raises several questions. The apparent evolutionary pressure of minimizing the genome resulted in the loss of many essential genes, including the otherwise universal RNase P and the genes required for the biosynthesis of nucleotides. Therefore it is plausible that the necessity of the import of these nucleotides from I. hospitalis would result in a reduced nanoarchaeal transcriptome that has to keep the usage of the imported nucleotides to a minimum. However, this is in contrast with the observation of the abundant crRNA production in the cell. Several archaeal genomes lack CRISPR clusters entirely and crRNA production was shown to be down-regulated in other organisms . N. equitans follows the previously observed trend that hyperthermophiles produce a much higher number of crRNAs and C/D box sRNAs than mesophilic archaea . It appears that rRNA modification and, maybe more surprisingly, the defence against viruses are of utmost importance even in an organism that has to survive with a highly reduced genome. CRISPR systems might also help to prevent an increase of the genome size via prophage integration. Viruses that infect N. equitans are not known, but are expected to be abundant in hydrothermal vents  and the nanoarchaeon's host, I. hospitalis, harbours several CRISPR systems. A possible rationale for the increased CRISPR activity in extreme geothermal environments is the limited host range of viruses that could potentially predominantly replicate by a lysogenic cycle . A proposed function of CRISPR/Cas systems in DNA repair [46,47] provides another possible explanation for the high abundance of these systems in extreme environments.
This work was supported by the Deutsche Forschungsgemeinschaft [grant number FOR1680] and the Max-Planck Society.
We thank André Plagens for advice and discussion.
Molecular Biology of Archaea 3: An Independent Meeting held at the Max Planck Institute for Terrestrial Microbiology, Marburg, Germany, 2–4 July 2012. Organized and Edited by Sonja-Verena Albers (Max Planck Institute for Terrestrial Microbiology, Germany), Bettina Siebers (University of Duisberg-Essen, Germany) and Finn Werner (University College London, U.K.).
Abbreviations: CRISPR, clustered regularly interspaced short palindromic repeats; Cas, CRISPR-associated; crRNA, CRISPR RNA; sRNA, small non-coding RNA; snoRNA, small nucleolar RNA; UTR, untranslated region
- © The Authors Journal compilation © 2013 Biochemical Society