Chloroplast gene expression is mainly regulated at the post-transcriptional level by numerous nuclear-encoded RNA-binding protein factors. In the present study, we focus on two RNA-binding proteins: cpRNP (chloroplast ribonucleoprotein) and PPR (pentatricopeptide repeat) protein. These are suggested to be major contributors to chloroplast RNA metabolism. Tobacco cpRNPs are composed of five different proteins containing two RNA-recognition motifs and an acidic N-terminal domain. The cpRNPs are abundant proteins and form heterogeneous complexes with most ribosome-free mRNAs and the precursors of tRNAs in the stroma. The complexes could function as platforms for various RNA-processing events in chloroplasts. It has been demonstrated that cpRNPs contribute to RNA stabilization, 3′-end formation and editing. The PPR proteins occur as a superfamily only in the higher plant species. They are predicted to be involved in RNA/DNA metabolism in chloroplasts or mitochondria. Nuclear-encoded HCF152 is a chloroplast-localized protein that usually has 12 PPR motifs. The null mutant of Arabidopsis, hcf152, is impaired in the 5′-end processing and splicing of petB transcripts. HCF152 binds the petB exon–intron junctions with high affinity. The number of PPR motifs controls its affinity and specificity for RNA. It has been suggested that each of the highly variable PPR proteins is a gene-specific regulator of plant organellar RNA metabolism.
- helical repeat protein
- pentatricopeptide repeat (PPR)
- RNA metabolism
- RNA recognition motif
Most chloroplast genes of higher plants are organized into clusters, and are co-transcribed as large polycistronic precursor RNAs that are subsequently processed into shorter RNA species . Precursor (pre), intermediate and mature RNAs are relatively stable and accumulate to their respective steady-state levels. Post-transcriptional RNA processing of pre-RNAs, which includes RNA cleavage/trimming, RNA splicing and RNA stabilization, is an important step in the control of chloroplast gene expression. In general, RNA processing is mediated by numerous nuclear-encoded protein factors in the chloroplasts of higher plants and in the unicellular green alga Chlamydomonas reinhardtii [2,3].
cpRNP (chloroplast ribonucleoprotein) as a fundamental factor in chloroplast RNA metabolism
Among the first chloroplast RNA-binding proteins to be identified was a group of proteins of approx. 30 kDa. In tobacco, five cpRNPs, with mass ranging from 28 to 33 kDa and designated cp28 to cp33, were isolated by a one-step process of ssDNA (single-stranded DNA) column chromatography . These proteins are nuclear-encoded and have two RNA-recognition motifs and an acidic N-terminal domain. The cpRNPs are widely distributed in land plant species. On the basis of the phylogenetic comparisons, these cpRNPs can be classified into three groups: I (cp28 and cp31), II (cp29A and cp29B) and III (cp33) . The Arabidopsis genome has eight proteins of these groups .
The cpRNPs bind the ssDNA-, poly-ribonucleotide homopolymers poly(G) and poly(U) in vitro, although the proteins are capable of binding any chloroplast RNA in vitro. When chloroplast proteins are UV-cross-linked with several mRNA probes, a subset of proteins of approx. 30 kDa are usually detected in the chloroplasts of land plants and green algae. Immunoprecipitation of stromal extracts with anti-cpRNP antibodies and Northern-blot analysis revealed that cpRNPs are associated in vivo with various species of chloroplast mRNAs and intron-containing precursor tRNAs, but not with mature tRNAs . These results suggest that cpRNPs do not have distinct binding sites on chloroplast RNAs and that they participate, in general, in RNA metabolism. Recently, accumulating biochemical and genetic evidence has suggested some physiological roles for these proteins in chloroplast gene expression.
We first analysed the cpRNP molecules that accumulate in chloroplasts (Table 1) . These cpRNPs are surprisingly abundant stromal proteins (approx. 3×105 molecules/chloroplast), with an abundance one-tenth that of RuBisCO and greater than that of the total chloroplast mRNA molecules, including the most abundant psbA mRNA (approx. 14000 molecules). Fractionation analysis revealed that cpRNPs form RNP particles with ribosome-free RNAs. Interestingly, more than 90% of pre-ribosomal psbA mRNA in the stroma is bound to cpRNPs. Furthermore, microarray analysis showed that at least 80% of mRNA species and the precursors of stable RNAs are contained in cpRNP–RNA complexes (T. Nakamura, M. Sugiura and M. Sugita, unpublished work) . These analyses revealed the presence of an RNA pool in the chloroplast. Incubation of RNA with normal chloroplast extract or a cpRNP-depleted extract showed that RNA is rapidly degraded in the absence of cpRNPs. This rapid degradation was restored to normal levels by supplementation with Escherichia coli-expressed cpRNPs. cpRNPs were detected only as RNA-interacting proteins in this condition, indicating that cpRNPs are direct mediators of RNA stabilization.
Emerging evidence has identified additional functions of these cpRNPs. Spinach 28 RNP was isolated as a factor required for mRNA 3′-end formation . Another, cp31, was implicated in the editing processes of psbL mRNA . ATRBP35 from avocado, which is classified in group II cpRNP, was shown to bind the RNA of the ASBVd (avocado sunblotch viroid) . Recombinant ATRBP35 was capable of facilitating the hammerhead-mediated self-cleavage of dimeric ASBVd transcripts. Therefore cpRNPs may be involved in distinct steps of post-transcriptional gene expression by directly protecting RNAs from ribonuclease, facilitating proper RNA folding for ribozymes and intron-containing RNAs, and recruiting site-specific factors that mediate RNA metabolism. Interestingly, recent proteomic identification of plastid ribosomal proteins showed that a portion of the cpRNP is found as an extraribosomal protein . Most mRNAs encoding photosynthesis-related proteins (psbA, rbcL and petD) are ribosome-free and these RNAs accumulate in cpRNP–RNA complexes. The translation of these RNAs is modulated by various environmental and developmental conditions, especially by light. It has been proposed that the specific interaction of the psbA mRNA 5′-untranslated region with RB47 activates light-regulated translation in green algae . However, the recent finding described in the present study suggests the possibility that the release of cpRNP could be a regulatory step in light-activated chloroplast mRNA translation. Indeed, some cpRNPs, although not all, can be phosphorylated in a light-dependent manner, and it has been reported that phosphorylated cpRNPs have weaker affinity for RNA . Therefore cpRNPs are proposed to be global mediators of chloroplast RNA metabolism, which connect transcription and translation in the chloroplast.
HCF152, a gene-specific factor for chloroplast gene expression
Many of the factors discovered with biochemical approaches play general roles as components of the basic gene expression machinery. In contrast, most factors identified with genetic approaches are specifically required for the expression of small subsets of chloroplast genes and are involved in post-transcriptional steps. Nuclear mutations that disrupt chloroplast gene expression define genes that participate in chloroplast gene expression . Over the past few years, the analysis of many photosynthetic mutants of green algae and vascular plants has provided the basis of these genetic approaches and some of the regulatory factors have been successfully cloned. To date, approx. 20 nuclear loci have been isolated that encode proteins involved in a variety of post-transcriptional steps of chloroplast gene expression . Most recently, direct interactions with plastid RNA probes have been demonstrated for HCF152 and Tab2 [15,16].
A group of nuclear-encoded proteins that are involved in chloroplast gene expression have homology with enzymes involved in RNA maturation processes (such as peptidyl-tRNA hydrolase, pyridoxamine 5′-phosphate oxidase and pseudo-uridine synthase). However, these enzymic activities do not seem to be required for their function in chloroplasts. Another group shares homology with proteins of other organisms, but has no obvious characterized motifs. Tab2 is required for the translation of chloroplast psaB mRNA, and displays 31–46% sequence identity with several orthologues found only in eukaryotic and prokaryotic organisms that perform oxygenic photosynthesis. A direct and specific interaction has been identified between Tab2 and the psaB 5′-untranslated region, although Tab2 has no known RNA-binding motifs. The chloroplast RNA-splicing factors, CRS1 and CRS2, are required for the splicing of group II introns in the maize chloroplast [17,18]. CRS1, which is required solely for the splicing of atpF pre-mRNA, is a founding member of a family of plant proteins that contain a novel RNA-binding domain of ancient origin, designated CRM (chloroplast RNA splicing and ribosome maturation). CRS2 is required for the splicing of nine of the ten chloroplast RNAs in subgroup IIB. CRS2 is closely related to bacterial peptidyl-tRNA hydrolase. Subsequent studies identified CRS2-associated factors 1 and 2 (CAF1 and CAF2) . These are the members of a protein family in maize that includes the previously identified group II intron-splicing factor, CRS1. The similarity between these proteins is confined to repeated segments corresponding to the CRM domain initially identified in CRS1. In vivo CRS2–CAF1 and CRS2–CAF2 complexes contain their target group II intron RNAs. Therefore it has been proposed that the CRM domain is an ancient RNA-binding module that has diversified to mediate specific interactions with various highly structured RNAs.
The last group of proteins have repeated motifs of 34–38 amino acids. This group contains Chlamydomonas Mbb1 and Arabidopsis HCF107, which contain TPR (tetratricopeptide repeat) motifs; Chlamydomonas Tbc2, which has uncharacterized 38-amino-acid repeats; and CRP1, HCF152 and PGR3, which have PPR (pentatricopeptide repeat) motifs [15,20,21].
The non-photosynthetic mutant of Arabidopsis, hcf152, is impaired in its processing of the chloroplast polycistronic transcript, psbB-psbT-psbH-petB-petD, resulting in the non-production of the photosynthetic cytochrome b6f complex. In this mutant, reduced amounts of spliced petB RNAs are detected, explaining the observed protein deficiencies. Furthermore, the mutant is affected in the accumulation of transcripts cleaved between the genes psbH and petB. The nucleus-encoded HCF152 gene encodes a PPR protein composed of 12 PPR motifs. The PPR motif is similar to but distinct from the TPR motif, which is composed of 34-amino-acid repeats responsible for protein–protein interactions and has been defined using a bioinformatics approach . Proteins with PPR motifs have been identified in mutants with impaired DNA and RNA metabolism or male sterility. The Arabidopsis genome contains approx. 500 PPR proteins of this family, compared with the few of yeasts and humans . So far, proteins of the PPR family have not been identified among the Archaea or prokaryotes, including the cyanobacteria, which is believed to be closely related to the chloroplast ancestor (EMBL-EBI proteome database). This observation suggests that this nucleus-encoded protein family has evolved into factors required for organellar gene expression. Indeed, when the 452 PPR proteins were analysed for their localization in the cell, 189 were predicted by the Target P program  to occur in the mitochondria and 96 in the chloroplasts.
Analyses of the HCF152 strain and E. coli-expressed HCF152 protein revealed the molecular nature of this protein in chloroplast gene expression . HCF152 is a minor stromal protein, with 1000–1500 molecules in each chloroplast, which is less than the number of petB transcripts (Table 1). Two HCF152 polypeptides form a dimer (180 kDa) and no RNA components are included. Recombinant HCF152, as well as native HCF152, binds with high affinity to petB exon–intron junctions, the splicing of which is impaired in the mutant. When truncated proteins composed of different numbers of PPR motifs were analysed for RNA binding, it was found that two PPR motifs are required for RNA binding, albeit with very low affinity. The affinity and site specificity for RNA increased significantly when proteins composed of more PPR motifs were analysed, with the highest and intact activity observed for the full-length protein composed of 12 PPR motifs. Therefore it is suggested that HCF152 is a nuclear-encoded chloroplast RNA-binding protein that may be involved in the processing or stabilization of the petB transcript by binding to exon–intron junctions. Furthermore, the PPR motif is an RNA-binding domain and repetition of the motif seems to determine the affinity and specificity of the binding of the protein to the target RNA sequence. Analysis of the PPR proteins encoded by the Arabidopsis genome identified an average of 11 repetitions of this motif and often 7–16 repetitions were found. The particular amino acid sequence in each PPR motif varies and probably contributes to the RNA-binding properties. Recent analysis of another PPR protein located mainly in the human mitochondria, LRP130 (called BSF in Drosophila), showed that the PPR motifs contribute in a very limited way to the RNA-binding properties in that the deletion of seven out of nine motifs did not change the binding properties of the protein to poly-ribonucleotide homopolymers .
From a structural viewpoint, the PPR protein belongs to a large family of helical repeat proteins [22,27]. The atomic structure of several such proteins has been resolved: β-catenin with 12 Arm motifs of 42 amino acids; the A subunit of protein phosphatase 2A with 15 HEAT motifs of 39 amino acids; Pumilio with eight Puf motifs of 36 amino acids; and protein phosphatase 5 with three TPR motifs of 34 amino acids. These motifs consist of two or three α-helices, with the coiled region between the helices. All helical repeat proteins have a crescent shape. The HEAT, Arm and TPR motifs are known to be responsible for protein–protein interactions and the concave surface of the crescent is the site of these interactions. However, the Puf motif uses the same surface for RNA binding. The structure of Pumilio, when complexed with the substrate RNA, reveals that the RNA is bound to the concave surface of the protein . The repeated nature of the protein allows recognition of a single RNA base by each of the eight repeats using three amino acid side chains. A point mutation in the concave surface of Puf motifs induces the loss of RNA-binding activity or specificity . The repetition of PPR motifs probably allows the use of the same concave surface for interactions with RNA and/or DNA in a similar manner. Furthermore, we propose that each of the 500 PPR proteins in Arabidopsis has a distinct affinity and specificity for organellar RNAs.
The biogenesis of the chloroplast is controlled by numerous nucleus-encoded factors that mediate the cross-talk between the plastid and nuclear genomes by regulating organellar gene expression, mainly at the post-transcriptional level. Recent biochemical and genetic approaches have identified the molecules responsible for the distinct steps of chloroplast post-transcriptional gene expression. The molecular roles of these factors are being revealed. Many of the factors have been found in high-molecular-mass complexes, presumably in the non-ribosomal fraction (Figure 1). Further biochemical and genetic studies will address the co-ordination between gene-specific factors and basic factors in the complex gene expression of the chloroplast.
Post-Transcriptional Regulation of Plant Gene Expression: Focused Meeting held at the University of East Anglia, Norwich, U.K., 15–17 April 2004. Edited by A.J. Michael (Institute of Food Research, Norwich) and John W.S. Brown (Scottish Crop Research Institute, Dundee, U.K.). Sponsored by Ambion, Biotechnology and Biological Sciences Research Council, Daiwa Foundation, The Gatsby Charitable Foundation, The Institute of Food Research, The John Innes Centre, The Scottish Crop Research Institute and VWR International.
Abbreviations: cpRNP, chloroplast ribonucleoprotein; PPR, pentatricopeptide repeat; ssDNA, single-stranded DNA; TPR, tetratricopeptide repeat
- © 2004 The Biochemical Society