## Abstract

In the present article, we review a derivation of the numbers of RNA complexes of an arbitrary topology. These numbers are encoded in the free energy of the Hermitian matrix model with potential *V*(*x*)=*x*^{2}/2−*stx*/(1−*tx*), where *s* and *t* are respective generating parameters for the number of RNA molecules and hydrogen bonds in a given complex. The free energies of this matrix model are computed using the so-called topological recursion, which is a powerful new formalism arising from random matrix theory. These numbers of RNA complexes also have profound meaning in mathematics: they provide the number of chord diagrams of fixed genus with specified numbers of backbones and chords as well as the number of cells in Riemann's moduli spaces for bordered surfaces of fixed topological type.

- free energy
- Hermitian matrix model
- random matrix theory
- RNA complex

In the present article, we review a solution to a problem studied and solved in [1]. Let us consider a complex consisting of an arbitrary number of RNA chains with various nucleotides connected by Watson–Crick bonds both intra- and inter-chain. We seek the number of such topologically inequivalent complexes consisting of a given number of RNA chains and bonds. To explain the topology, it is useful to rephrase this problem in terms of so-called chord diagrams. Chord diagrams comprise a number of so-called backbones, which are represented by disjoint, oriented and labelled intervals lying along a fixed line, that are connected by so-called chords taken as semi-circles lying above this fixed line. Each backbone is identified with the sugar-phosphate backbone (hence the terminology) of a single RNA molecule oriented from its 5′ to its 3′ end. Chords correspond to Watson–Crick base pairs, where we add an associated chord taking care that its end-points in each backbone occur in the correct order corresponding to the primary structure, i.e. the word in the four-letter alphabet of nucleic acids that determines the RNA molecule. End-points of chords lie at distinct interior points of the backbones. In this way, a complex of interacting RNA molecules determines a chord diagram. We assume that nucleotides not participating in base pairs play no role in this model, i.e. there are no isolated vertices, and that chord diagrams are connected, i.e. the corresponding RNA complexes are also connected. Now, distinct isomorphism classes of chord diagrams correspond to the topologically inequivalent configurations we wish to count. An example of an RNA configuration (consisting of a single RNA chain) and the corresponding chord diagram (on one backbone), are shown in Figure 1. We also stress that the results presented in what follows can be applied to various complexes of (bio)polymers, not necessarily RNA complexes. However, for definiteness, we are going to discuss these results just in the RNA context.

It turns out that the counting of RNA complexes, or chord diagrams, can be refined by introducing an additional parameter, which naturally characterizes the topology of these structures. This parameter arises as the genus *g*≥0 of an auxiliary topological surface with boundary which can be associated with a given chord diagram by thickening its backbones and chords, as shown in Figure 2. This surface has some number *r*≥1 of boundary components. If we denote the number of backbones and chords respectively by *b* and *n*, then the Euler characteristic of these auxiliary surface can be expressed as *b*−*n*=2−2*g*−*r*; this relationship can be used in particular to determine the genus *g*. For example, in Figure 2, we have *b*=2 backbones, *n*=4 chords and *r*=4 boundary components, which implies that the corresponding surface has genus *g*=0. Let us remark parenthetically that [2] provides a context-free grammar and polynomial algorithm for computing minimum-free energy configurations of a single RNA molecule while allowing for certain pseudo-knot patterns of arbitrarily high genus that arise by suitably iterating genus one; since the *c _{g}*

_{,b}(

*n*) grows rapidly in

*g*or

*b*for

*n*in the appropriate range, it is unlikely that a similar higher-genus approach is reasonable short of a full-blown and as-yet-unknown field theory for RNA.

To sum up, a given RNA complex or chord diagram is characterized by its number of backbones *b*, its number of chords *n* and its genus *g*. Let us denote the number of topologically inequivalent such diagrams by *c _{g}*

_{,b}(

*n*). These are the numbers we wish to determine. In what follows, we explain how to determine the following generating functions of these numbers (1) using random matrix theory, and in particular, the so-called topological recursion. Before we present how these generating functions can be found, let us provide some examples. In the special case of one backbone and genus

*g*=0, it is not hard to see that

*c*

_{0,1}(

*n*) are Catalan numbers with the generating function (2) so that

*c*

_{0,1}(0)=

*c*

_{0,1}(1)=1,

*c*

_{0,1}(2)=2,

*c*

_{0,1}(3)=5, and so on. More generally, configurations which involve only one backbone can be enumerated by the simplest matrix model with the quadratic (Gaussian) potential [3]. However, this Gaussian model cannot describe configurations involving many backbones, and constructing a model which works for many backbones is an important motivation for our work; we describe this new model below. Using this new model, we find, for example, that generating functions of diagrams in genus zero and one on four backbones with an arbitrary number of chords take the form (3)

This means that *c*_{0,4}(3)=72, *c*_{0,4}(4)=2448, for example. Whereas computation of these numbers by explicit enumeration of chord diagrams is possible for relatively low and fixed *g*, *b* and *n*, as illustrated, e.g. in the appendix in [1], it quickly becomes involved. On the other hand, the random matrix theory allows us to determine generating functions *C _{g}*

_{,b}(

*z*) in an algorithmic way, in principle for all

*g*and

*b*. As yet another example, for four and five backbones, in genus two and three, we find the following generating functions (4)

We can explain now how the above generating functions can be determined from the random matrix theory. As shown in [1], all chord diagrams can be enumerated by the following matrix model, i.e. an integral over Hermitian matrices *H* of size *N*,
(5)
where the so-called matrix model potential is given by
(6)

These expressions should be understood as follows. The integral in eqn (5) with the potential given in eqn (6) arises from the combinatorial analysis of chord diagrams, and its form was found in [1] using well-known techniques in random matrix models, analogous, e.g., to those used in [4]. In the limit of large matrix size *N*→∞, the free energy of this model *F*=log*Z* has the genus expansion in *N* given in the exponent on the right-hand side of eqn (5) (the extra summand −*N*^{2}*s* in the exponential in eqn 5 is purely for convenience). In general, finding the free energy of a matrix model of the above form [with arbitrary potential *V*(*x*)] is a very difficult task. On the other hand, such free energies *F _{g}*, for the potential given in eqn (6), are essentially the objects we are after. Namely, from the combinatorial description discussed in [1], it follows that the free energies

*F*encode the generating functions (eqn 1) (7) where the constant terms reproduce the free energies of the Gaussian matrix model [characterized by the quadratic potential

_{g}*V*(

*x*)=

^{1}/

_{2}

*x*

^{2}], where

*B*

_{2g}denote Bernoulli numbers. The extra factor

*b*! arises because

*c*

_{g}_{,b}(

*n*) counts chord diagrams with labelled backbones as opposed to unlabelled ones as arise in the matrix model description (and where a permutation of backbones must nevertheless preserve backbone orientations). We also see that the genus expansion into free energies

*F*agrees with the genus

_{g}*g*of the auxiliary surfaces described above.

The problem of enumerating RNA complexes therefore reduces to the problem of performing the matrix integral and determining free energies *F _{g}* in eqn (5). As we have already mentioned, in general this is a very difficult problem; however, recently a beautiful algorithmic solution to this problem has been given. To find free energies, one should solve the so-called loop equations of the matrix model, which are equations (known as Ward identities in quantum field theory) satisfied by certain multilinear correlators

*W*

_{n}^{(g)}(

*p*

_{1},…,

*p*) in this model. The leading order equation among those identities specifies a so-called spectral curve, i.e. an algebraic curve which characterizes distribution of eigenvalues in the matrix model in the

_{n}*N*→∞ limit. It also turns out that all correlators

*W*

_{n}^{(g)}(

*p*

_{1},…,

*p*) and loop equations they satisfy can be encoded entirely in terms of this spectral curve. These loop equations can be solved in a recursive way [5,6], and, in this manner, free energies

_{n}*F*(for

_{g}*g*≥2) are completely determined by correlators

*W*

_{1}

^{(g)}(

*p*). This entire procedure requires just the knowledge of the spectral curve, which can be regarded as the initial condition of the recursion (and a universal form of the solution to loop equations) and no other details of a matrix model from which this curve was derived. An important achievement of Eynard and Orantin [7] was to realize that one can use the recursive solution of loop equations to assign correlators

*W*

_{n}^{(g)}(

*p*

_{1},…,

*p*) and

_{n}*F*to an arbitrary algebraic curve, not necessarily of matrix model origin. On the other hand, it is guaranteed that

_{g}*F*computed for the spectral curve of a matrix model reproduce the free energies.

_{g}In order to solve the matrix model (eqn 5) with the potential (eqn 6), we can therefore use the formalism of this topological recursion. This has indeed been done in [1], and the main steps of this solution are as follows. First, we need to determine the spectral curve of the model (eqn 5). This can be done by the analysis of a distribution of eigenvalues in the large *N* limit. Because the potential (eqn 6) is a deformation of the quadratic function, it has a single minimum, and in the equilibrium configuration, eigenvalues spread around this minimum. For large *N*, the eigenvalues are distributed along an interval with end-points *a* and *b*, which defines a cut in a certain auxiliary complex plane. Such a one-cut solution defines the corresponding spectral curve which has genus zero, and it turns out to be given by the following algebraic equation for two complex variables *x* and *y*:
(8)
where
(9)

Although the end-points of the cut *a* and *b* cannot be given in a closed form, it can be found that they are determined by the following system of equations
(10)

From the knowledge of the curve (eqn 8) and the formalism of the topological recursion, we can now determine *F _{g}* for

*g*≥2 (

*F*

_{0}and

*F*

_{1}must be determined separately, independently of the recursion). In particular, we find the following exact result for the free energy at genus two: (11) where

We also obtain an exact result for the free energy *F*_{3} which is yet more complicated, with its precise form given in [1]. Expanding these results in the form given in eqn (7) and using the perturbative expansion of *a* and *b* in *s* which follows from eqn (10), we can determine appropriate generating functions *C _{g}*

_{,b}(

*z*), such as those given in eqn (4). This procedure can be continued in an algorithmic manner, and, with sufficient computational power, one can determine exact forms of

*F*for any

_{g}*g*, and so the corresponding

*C*

_{g}_{,b}(

*z*) and finally all

*c*

_{g}_{,b}(

*n*).

To sum up, we have shown how random matrix theory and the topological recursion can be used to enumerate topologically inequivalent RNA complexes. This result opens many other perspectives and research possibilities. On one hand, one can consider asymptotics of the numbers *c _{g}*

_{,b}(

*n*) which we have found, and analyse their statistical properties. One can also compare these theoretical predictions with the structure of RNA configurations observed in Nature. On the other hand, in our discussion, a relationship to matrix models arises naturally if we translate RNA configurations into the form of chord diagrams; therefore

*c*

_{g}_{,b}(

*n*) count the number of diagrams characterized by data (

*g*,

*b*,

*n*). Such chord diagrams arise in many other problems in mathematics and physics (e.g. in knot theory or algebraic geometry) and our results should shed new light on those other fields as well. In particular, a simple transformation of the numbers

*c*

_{g}_{,b}(

*n*) count a subclass of chord diagrams called ‘shapes’, which give the number of cells in the ideal cell decomposition [8] of Riemann's moduli space for a surface of genus

*g*≥0 with

*b*≥1 boundary components provided 2

*g*+

*b*>2. The computation reported in the present paper is therefore at once of significance in computational biology and in geometry, and represents a remarkable confluence of biology, mathematics and physics.

## Funding

This work was supported by the Danish National Research Foundation Center of Excellence grant ‘Center for Quantum Geometry of Moduli Spaces’, the Marie-Curie IOF Fellowship and the Foundation for Polish Science.

## Acknowledgments

P.S. thanks the Isaac Newton Institute for Mathematical Sciences, Cambridge, for hospitality.

## Footnotes

Topological Aspects of DNA Function and Protein Folding: An Independent Meeting held at the Isaac Newton Institute for Mathematical Sciences, Cambridge, U.K., 3–7 September 2012, as part of the Isaac Newton Institute Programme Topological Dynamics in the Physical and Biological Sciences (16 July–21 December 2012). Organized and Edited by Andrew Bates (University of Liverpool, U.K.), Dorothy Buck (Imperial College London, U.K.), Sarah Harris (University of Leeds, U.K.), Andrzej Stasiak (University of Lausanne, Switzerland) and De Witt Sumners (Florida State University, U.S.A.).

- © The Authors Journal compilation © 2013 Biochemical Society