Retroviruses are enveloped plus-strand RNA viruses that can cause cancer, immunodeficiency and neurological disorder in human and animals. Retroviruses have several unique properties, such as a genomic RNA in a dimeric form found in the virus, and a replication strategy called ‘copy-and-paste' during which the plus-strand genomic RNA is converted into a double-stranded DNA, subsequently integrated into the cellular genome. Two essential viral enzymes, reverse transcriptase (RT) and integrase (IN), direct this ‘copy-and-paste' replication. RT copies the genomic RNA generating the double-stranded proviral DNA, while IN catalyzes proviral DNA integration into the cellular DNA, then called the provirus. In that context, a major component of the virion core, the nucleocapsid protein (NC), was found to be a potent nucleic-acid chaperone that assists RT during the conversion of the genomic RNA into proviral DNA. Here we briefly review the interplay of NC with viral nucleic-acids, which enables rapid and faithful folding and hybridization of complementary sequences, and with active RT thus providing assistance to the synthesis of the complete proviral DNA. Because of its multiple roles in retrovirus replication, NC could be viewed as a two-faced Janus-chaperone acting on viral nucleic-acids and enzymes.
Retroviruses constitute a large family of small enveloped plus-strand RNA viruses (Figure 1A) that are widespread in eukaryotes and can cause cancer, immunodeficiency and neurological disorder in human and animals [1–4]. Retroviruses have unique properties, such as a genomic RNA in a dimeric form, where two molecules of full-length viral RNA (FL-vRNA) with 5′-cap and 3′-polyA structures, similar to cellular mRNA's, are associated as a structured 60S-RNA within the virus core [5–7] (Fig. 1B). In addition the FL-vRNA has long untranslated 5′ and 3′ regions (UTR's) that have functions in the structure, expression [8–10], and in the replication of the genomic RNA (gRNA). Also Retroviruses have a replication strategy called ‘copy-and-paste’ during which the plus-strand gRNA is converted into a double-stranded DNA, subsequently integrated into the cellular genome. Two essential viral enzymes, reverse transcriptase (RT) and integrase (IN), direct this ‘copy-and-paste’ replication strategy. RT copies the gRNA generating the double-stranded DNA flanked by the long terminal repeats (LTR) called the proviral DNA [11–14], while IN catalyzes proviral DNA integration into the cellular DNA, then called the provirus [15,16]. In that context, a major component of the virion core, the nucleocapsid protein (NC), originally described as a nucleic-acid binding protein (NABP), was shown to be a potent chaperone providing asistance to the formation of the genomic RNA in a dimeric condensed form and its reverse-transcription [17–20]. Here we briefly review the interplay of NC with the viral nucleic-acids (NA), which chaperones rapid and faithful folding and hybridization of complementary sequences, and with active RT during the reverse transcription process with the possible occurence of recombinations. In view of the multiple roles of NC in virus replication, we propose that NC could be a two-faced Janus-chaperone acting on both viral nucleic-acids and enzymes.
Nucleocapsid protein is an essential factor for retrovirus replication
Nucleocapsid is a major component of the virus core
In the interior of the core, hundreds of nucleocapsid (NC) molecules are tightly associated with the dimeric RNA genome in a condensed form (Figure 1B) [21,22]. NC is a highly conserved viral protein with small Zinc fingers (ZnF)  flanked by basic domains, which tightly binds nucleic acids (NAs). Also, NC has potent NA chaperoning activities that promote key conformation changes and interactions of the viral NA during reverse transcription (see Figure 2, steps B, D and E) [24–26] and assembly, and, as such, are required for virus infectivity [24,27]. All these viral components, namely the dimeric FL vRNA, the reverse transcriptase (RT) and integrase (IN) enzymes, and the structural NC protein are found within the core structure in the infectious spherical viral particle 110–130 nm in diameter (Figure 1A,B). These proteins and NA components are considered to form the replicating genomic ribonucleoprotein complex or nucleocapsid  where the dimeric FL vRNA is coated by ∼1800–2000 molecules of NC, together with 60–80 molecules of the RT and IN enzymes [29,30]. In addition, molecules of cellular RNA are present in the nucleocapsid, notably the replication primer tRNA and other tRNAs and ribosomal RNAs [24,31]. Little is known about the architecture of the viral nucleocapsid, except that the coating of the dimeric RNA genome by NC molecules (every 8 nt on the average [21,34,32]) has key functional implications:
(i) Formation of a condensed viral RNP where the genomic RNA (gRNA) becomes, at least in part, resistant to cellular nucleases , (ii) a molecular crowding phenomenon which facilitates viral DNA synthesis by the RT enzymes [34–36] and (iii) the possible binding of NC to the newly made proviral DNA which in combination with IN molecules ensures protection of the long terminal repeat (LTR) ends of the proviral DNA and its integration into the host genome [37–40].
NC possibly drives formation of the capsid shell
Another important aspect on NC, but still a matter of speculation, concerns the possible interaction between NC and the capsid protein (CA), and their functional implications. In fact, the viral RNP complex has been proposed to drive the initial association of CA molecules, followed by polymerization and formation of a sheet surrounding the RNP . In mature virions, the RNP is embedded in the virion core formed of a lattice of CA hexamers that play prominent roles during the early phase of virus replication, from entry to proviral DNA integration [30,35]. CA is indeed engaged in multiple interactions with host cell cofactors, such as Cyclophilin A, CPSF6 and nuclear importin NUP153, impacting on the levels of viral DNA synthesis and targeted integration . The multifunctional nature of CA is further assessed by its roles in escaping the restriction factor TRIM5α and in the immune evasion of human immunodeficiency virus type 1 (HIV-1) . Therefore, the capsid with its lattice could be viewed as a molecular ensemble protecting and chaperoning the RNP complex in its journey from core entry into naive cells to the conversion of the gRNA into the double-stranded proviral DNA and its integration into active sites of the chromatin located at or nearby nuclear pores . Given the overall structure of the core, NC may also participate in the immune evasion of HIV-1 due to the tight binding of a large number of NC molecules to the viral NAs in the incoming particle and during reverse transcription; this, in turn, may well hide the viral NAs vis-à-vis the immune sensors during the early phase of HIV replication [45–47].
Core maturation activates the reverse transcription complex
A highly simplified scheme of virus assembly stipulates that the newly made Gag molecules first bind the gRNA via the NC domain that have little chaperoning activity in the Gag context [48,49]. This, in turn, causes dimerization of the gRNA acting as a scaffold, Gag oligomerization and GagPol incorporation. The late phase of virus assembly takes place at the plasma membrane with the recruitment of the viral envelope and a step-by-step processing of Gag and GagPol precursors by the protease (PR). This results in the generation of mature active Pol enzymes and Gag structural proteins, notably NC causing core condensation and remodeling (see ‘The interplay of NC protein with nucleid acids’, iv) [35,50–52].
The mature active NC protein is formed of either one (γ-retroviruses such as MuLV) or two copies (lentiviruses such as HIV-1) of a folded CCHC ZnF motif (Figure 3A), flanked by unfolded basic domains [53,54]. Along this line, we discuss how the multiple interactions of NC with NAs  could account for its seminal roles in proviral DNA synthesis [17,33,55–57]. Next, we briefly describe the roles of NC in the reverse transcription of the FL vRNA by RT, notably how NC assists RT from primer tRNA-promoted initiation of minus-strand cDNA synthesis to the completion of a functional proviral DNA flanked by the LTR and its maintenance in the infected cell (Figure 2) .
The interplay of NC protein with NAs
The binding of NC protein to NAs has several important structural consequences as listed below. NC is a NA-binding protein that binds NAs via several modes, through basic residues to any single-stranded molecule with affinities in the micromolar range [27,59–62], and through hydrophobic interactions between the NC zinc fingers and TGG/UGG-containing sequences with nM affinities [63–66], and where the G residues play a central role in NA chaperoning . Upon binding, NC adopts an antiparallel orientation with respect to the RNA, while it is parallel in the case of DNA .
NC function in virus replication is correlated with its ability to act as a NA chaperone, which, as said above, catalyzes NA conformational changes that result in the most thermodynamically stable structures [27,65,66]. This NA-chaperoning activity could be divided in the following steps:
(i) NC promotes fraying of an NA hairpin structure. RNA molecules are highly flexible and have the capacity to form a hairpin-like structure at their 3′-end that could be recognized by RT to prime reverse transcription in vitro. Interestingly, binding of NC molecules to TAR RNA hairpin structure alters the entire RNA folding inducing its destabilization [67,68], which in turn prevents self-priming of reverse transcription [18,69] and contributes to the fidelity of viral DNA synthesis (Figure 2).
(ii) NC directs annealing of NAs with complementary sequences. This rapid reaction takes place under physiological conditions through several modes. In fact, NCp7 can hybridize cTAR to dTAR [67,70] and TAR to dTAR  by a stem–stem zipper process (Figures 2D and 3A), whereas (+)PBS/(−)PBS annealing occurs through a loop–loop intermediate pathway (Figures 2H and 3B) [72,73]. Thus, different mechanisms are involved in this NC-NA annealer activity, during the two obligatory two strand transfers, 5′ to 3′ and 3′ to 5′, in the course of reverse transcription (see Figure 2).
(iii) NC assists NA rearrangements. Single-stranded NA molecules are flexible and can thus adopt a large number of conformations, but only one or a few are functionally relevant. The functional conformation thought to be the most stable one is gained by NA structure remodeling by NC [25,26]. This folding process appears to be achieved by an entropy exchange mechanism between NC molecules and the NA , resulting from a rapid ON/OFF binding kinetics .
(iv) NC causes molecular aggregation and crowding. Binding of NC to NA results in the formation of nucleoprotein complexes that rapidly become large aggregates as seen by electron microscopy . Such NC-based complexes were also found to recruit the viral enzymes RT and IN [36,77]. An important consequence of nucleoprotein formation is that very high concentrations of the components are gained. This in turn causes molecular crowding that is considered to take place in living cells; this phenomenon can dramatically influence molecular interactions and biochemical reactions, notably viral DNA synthesis by RT [34,36,78,79].
According to in vitro assays , the NA-chaperoning properties of NC are remarkably conserved among retroviruses such as HIV, FIV, MuLV and RSV, except for the low activity of NC of HTLV-1, which also poorly replicates in vivo [40,80].
The multiple roles of NC in proviral DNA synthesis
The roles of NC in proviral DNA synthesis by the viral RT enzyme, which is central to the replication of all retroviruses, are shown in Figure 2. This complex process takes place during the early phase of virus replication, from viral core entry into the infected cell to the completion of proviral DNA, and its integration into the host genome . However, data on α- and γ-retroviruses and on lentiviruses show that cDNA synthesis may well start at the end of virus morphogenesis, possibly during core maturation and condensation during which mature NC is generated upon PR-mediated Gag processing [82–84]. Therefore, mature retroviral particles may contain newly made minus-strand cDNA (up to 5–10%) in addition to RNA. In the case of HIV-1, it appears that basic peptides present in the seminal fluid can stimulate this late cDNA synthesis and concomitantly virus infectivity in primary T CD4+ cells [18,85].
A detailed model of reverse transcription with its crucial aspects was first published in 1979 . Ten years later, the key roles of NC in reverse transcription have been the subject of several publications (reviewed in ref. ). Based on the initial model, each step of proviral DNA synthesis has been studied in depth in vitro and ex vivo, notably the two obligatory DNA-strand transfers as well as recombination events. More recently, the core shell formed of CA hexamers (Figure 1) was found to provide protection to the replicative machine, the nucleocapsid. It was also found that completion of proviral DNA synthesis required core dissociation, a process called uncoating in newly infected cells, reminiscent of the requirement for detergents, such as NP40, for the synthesis of infectious viral DNA by purified MoMuLV (Moloney murine leukemia virus) virions .
Each step of reverse transcription is illustrated in Figure 2, focusing on the role of NC in a simple manner where only one intact FL vRNA molecule is represented without NC and RT molecules.
Figure 2, steps A and B show that NC chaperones the formation of a specific reverse transcription initiation complex. The cellular replication primer tRNA is annealed to the genomic primer-binding site (PBS) in a two-step process, whereby it is placed by the NC region of Gag polyprotein during virus assembly and subsequently annealed by NC concomitantly with PR-mediated maturation and core condensation [56,57]. During this annealing reaction and later on, there is a complex interplay between NC molecules, primer tRNA and the genomic 5′-UTR region, notably involving the PBS and the primer tRNA 3′-end and the highly modified anticodon loop in vitro. Several factors appear to facilitate primer tRNA recognition by the RT enzyme such as its affinity for both NC and tRNA [17,34] and its large excess within the nucleocapsid structure together with the estimated very high concentrations of NC and RNA nucleotides in the order of 0.1 and 1 M, respectively. At the same time, other factors might influence the fidelity of reverse transcription initiation such as breaks in the gRNA, where hairpin-like structures could be used in a self-priming reaction. However, the ability of NC to bind and unwind such structures appears to extensively prevent this initiation reaction in vitro.
Figure 2, steps C and D show that NC promotes minus-strand strong-stop cDNA [(ssscDNA(−)] synthesis and transfer. RT initiates ssscDNA(−) synthesis by elongation of the CCA 3′-end of primer tRNA and continues until the 5′-end of R is reached. Concomitant with ssscDNA(−) synthesis, the gRNA template is progressively degraded by the RNaseH due to the formation of a DNA:RNA hybrid during reverse transcription. NC was found to augment the RT-RNaseH activity and to facilitate the release of the RNA oligonucleotides (∼18, 8 and 5 nt in length for HIV) generated by RNaseH. This in turn will generate a single-stranded ssscDNA(−), which will be hybridized to the genomic 3′-end via the RNA R(+) and the ssscDNA R(−) sequences (see Figures 2 and 3A; panels A–C) [17,24]. In the case of HIV and ASLV, 5′–3′ genomic interactions are thought to facilitate the ssscDNA(−) transfer [86–88]. This NC-promoted hybridization reaction is very rapid and allows RT to resume reverse transcription at the 3′-end of U3 until the 3′-end of the genomic PBS is reached.
Figure 2, steps E and F show that NC promotes plus-strand strong-stop DNA [sssDNA(+)] synthesis. RT synthesizes the cDNA(−), which generates a long RNA:DNA hybrid where the RNA template is evenly degraded by RT-RNaseH giving rise to oligonucleotides that could potentially serve as primers for DNA(+) synthesis. However, only one sequence corresponding to a small purine-rich sequence called the PPT (polypurine tract) located 5′ to the U3 region is currently used to prime plus-strand DNA synthesis. NC appears to be responsible for such a specificity of plus-strand DNA initiation at the PPT [89,90], since in its absence multiple DNA(+) initiations can take place at non-PPT sites in vitro.
Synthesis of the plus-strand ssDNA(+) (gray line) initiated at the PPT and arrested at the methylated A residue at position 58 of primer tRNA  results in the formation of a double-stranded DNA encompassing U3, R and U5, thus corresponding to the 3′-LTR. This also causes the release of the remaining part of primer tRNA by the RT-associated RNaseH activity.
Figure 2, steps G and H show that NC assists plus-strand DNA transfer. This plus-strand DNA transfer reaction is initiated by a loop–loop interaction involving the PBS plus and minus sequences . Then, NC causes a rapid hybridization giving rise to a double-stranded PBS (see Figure 3B, panel B) that allows two RT-directed polymerization reactions to take place, extension of the plus-strand DNA on the one hand and 5′-LTR DNA synthesis by minus-strand displacement on the other hand (reviewed in ref. ).
Figure 2, steps I and J show that NC ensures completion of proviral DNA synthesis.
The plus-strand DNA is completed by RT, which results in the formation of the double-stranded proviral DNA flanked by two LTRs. The proviral DNA is found in the preintegration complex  in the newly infected cell, imported into the nucleus and ultimately integrated by the IN enzyme into the host genome. This process is facilitated by NC, and necessitates both the ZnF and the basic residues and formation of a nucleoprotein complex in vitro. This chaperoning role of NC in integration appears to be remarkably conserved among retroviral NC proteins .
Thus, NC is both a major structural component of the interior of the retroviral particle and a key factor exerting a tight control over the synthesis of a bone fide proviral DNA by the RT enzyme, from the tRNA-primed initiation to the completion; in the case of HIV-1, this complete process requires 14–16 h in cell lines, to 24–36 h in activated primary human T CD4+ cells, from virus entry to proviral DNA integration .
Implication of NC in genetic variability
As an indispensable cofactor of reverse transcription, NC is also implicated in the genetic variability of retroviruses that is generated by recombination events during the reverse transcription of heterozygous genomes . In fact, one salient feature of retroviruses is the dimeric nature of the RNA genome, which can be homozygous with two identical FL vRNAs or heterozygous with two distinct FL vRNAs. As a matter of fact, HIV-1-infected cells harbor several distinct integrated proviruses and thus can produce both homozygous and heterozygous virions , corresponding to a virus population formed of quasipsecies . Once cells are infected by virus quasispecies, proviral DNA synthesis by RT takes place during which frequent DNA-strand transfer events can occur resulting in template switching, ∼5–15 times per cycle . Recombination results from the alternate reverse transcription by RT of the two distinct templates constituting the heterozygous genome. The basic mechanism relies on the ability of RT to switch template (Figure 4), especially at the level of pause sites corresponding to RNA break points and/or stable secondary structures (Figure 4A). At the same time, point mutations appear to occur at the site of cDNA transfer, thus contributing to the genetic variability of the newly made proviral DNA (reviewed in ref. ).
Both cDNA transfer and nucleotide misincorporation are facilitated by NC protein by means of its annealing and chaperoning activities and its interaction with RT causing a notable increase in template residency and in RT-RNaseH activity (Figure 4B,C). These recombination events result in the efficient generation of virus diversity [94,95,97], a major issue vis-à-vis anti-HIV therapies (HAARTs) and the immune response.
Functional interactions between NC and the viral enzymes RT and IN
(i) NC–RT interactions have an impact on the fidelity of viral DNA synthesis. As summarized above, the interplay of NC with the viral NAs directs the specific initiation of (−)-and (+)-strand DNA synthesis and at the same time prevents false initiations by RT. Moreover, NC was found to interact with the active RT enzyme, impacting on the fidelity of DNA synthesis [77,98]. Formation of a RT/NC complex involves the palm and the zinc fingers, respectively, in the case of HIV-1. This results in a large increase in the time of residence of active RT molecules on the viral RNA template to the benefit of polymerization processivity and at the same time eliciting some nucleotide excision-repair activity of RT, thus influencing the fidelity of cDNA synthesis .
(ii) NC–IN interactions influence proviral DNA integration. Before undergoing integration by the viral integrase, the newly made proviral DNA should remain intact notably the terminal inverted repeats (IRs; Figure 2J), because they are specifically targeted by IN for the integration reaction. The integrity of the LTR and IR of the proviral DNA is ensured by the binding of NC molecules together with active tetrameric molecules of IN in vitro . In support of this, mutations in HIV-1 NC, notably those disrupting the conserved CCHC ZnFs (see below), result in the trimming of the proviral DNA ends , thus impairing its integration. Also, NC is probably chaperoning the IN enzyme, since it has been reported to greatly facilitate concerted DNA integration in vitro, via interactions with the DNA ends and IN [37,39,40].
(iii) The viral chaperones virus infectivity factor (VIF) and transactivator of transcription (TAT) influence viral DNA synthesis. In HIV-1, two other viral factors with NA-chaperoning activity may directly impact on the reverse transcription process and its variability. The first one is the VIF that was shown to facilitate tRNALys, 3 hybridization to the PBS and the ssscDNA(−) transfer [100–102]. In addition, Vif targets and counteracts the cellular restriction factor APOBEC3G through proteasome degradation  and/or translation inhibition [104,105]. However, some ABOPEC3G can be incorporated into the newly formed virions, which causes dC to dU transitions by a deamination reaction in the newly made minus-strand cDNA . This turns out to be quite common in vivo, thus contributing to the virus genetic variability .
The second viral factor is the TAT transactivator, which is a small basic peptide indispensable for the transcription transactivation of the provirus, via an interaction with the TAR RNA motif . Interestingly, it resembles NC in that it is a member of the large family of intrinsically unstructured proteins with NA-binding and -chaperoning properties  and at the same time being multifunctional. Interestingly, Tat and NCp7 were found to act co-operatively in promoting cTAR/dTAR annealing reaction . Along this line, TAT is considered to be a helper factor for viral DNA synthesis , notably for the strand transfers. In agreement with this notion, a TAT null mutant was recently shown to inhibit the reverse transcription process .
The multiple roles of retroviral NC proteins go far beyond retroviruses, since a similar NC has been found in the yeast retrotransposon Ty3, a founder member of a large group of LTR retrotransposons known as the TY3/GYPSY clade or Metaviridae with a eukaryote-wide distribution . In fact, TY3 NC is a small protein with a ZnF flanked by unstructured basic sequences, possessing NA binding, annealing, fraying and chaperoning properties that direct the dimerization of the gRNA and the annealing of replication primer tRNAMet,i, and thus is involved in reverse transcription in a manner similar to retroviruses [114–116].
Since the first publications on the implications of NC protein of α- and γ-retroviruses and lentiviruses in proviral DNA synthesis, and as the immature polyprotein Precursor GagNC in virion assembly (not reviewed here), nearly 30 years have passed during which NC has emerged as a central player in the early phase of virus replication from entry to the synthesis of a complete bona fide proviral DNA and its integration into the host genome. How does NC function is yet poorly understood, except that both high- and low-affinity interactions are taking place between a single NC and a small NA molecule in vitro. But, the nature of the interactions between oligomeric NC, including NC–NC, and RNA/DNA molecules is still a matter of speculation. In addition, how these multiple NC interactions with the viral RNA/DNA and the RT enzyme provide assistance to the synthesis of a complete bona fide proviral DNA is not known. A likely hypothesis would be that NC as an intrinsically disordered protein could adopt many different active conformations according to the substrate NA or enzyme. These NC conformations would co-operate in providing assistance to NA folding and interactions and enzymatic activity. Thus, NC could be viewed as a two-faced Janus chaperone acting on both NA and protein conformations and activities .
The multifunctional nature of HIV NC together with its high degree of conservation strongly argued for the development of specific inhibitors (NCi). A first series of NCis with zinc ejector properties, such as PATE, DIBA, NOBA and SAMT [118,119], were characterized in vitro, but turned out to be toxic preventing their use in the treatment against AIDS . NC binders preferentially bind that the hydrophobic pocket at the top of the folded finger motif was developed. These compounds were found highly promising, being endowed with efficient antiviral activity and rather low toxicity . Alternatively, viral NA partners of HIV NC such as stem loops 1–4 involved in gRNA dimerization [122,123] and 5′ TAR (Figure 3A) [124,125] were used to design compounds able to interfere with the NC-chaperoning activity, but were shown to modestly inhibit NC chaperone activity in vitro [126,127] with little antiviral activity. In conclusion, NCis are still actively developed, with the prospect of getting sustained antiretroviral activity with a high barrier for resistance .
CA, capsid protein; FL vRNA, full-length viral RNA; gRNA, genomic RNA; HIV-1, human immunodeficiency virus type 1; IN, integrase; IR, inverted repeat; LTR, long terminal repeat; MoMuLV, Moloney murine leukemia virus; NA, nucleic acid; NC, nucleocapsid protein; PBS, primer tRNA-binding site; PPT, polypurine tract; RT, reverse transcriptase; ssscDNA(−), minus-strand strong-stop complementary DNA; TAT, transactivator of transcription; UTR, untranslated region; VIF, virus infectivity factor; ZnF, zinc finger.
Supported by Centre National de la recherche scientifique (CNRS, France), Institut National de la Santé et de la recherche médicale (INSERM, France) and European Project THINPAD ‘Targeting the HIV-1 Nucleocapsid Protein to fight Antiretroviral Drug Resistance’ (FP7-Grant Agreement  to Y.M).
The Authors declare that there are no competing interests associated with the manuscript.
Thanks are due to Julien Godet for fruitful discussion.
- © 2016 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society