## Abstract

The formation of DNA loops is a ubiquitous theme in biological processes, including DNA replication, recombination and repair, and gene regulation. These loops are mediated by proteins bound at specific sites along the contour of a single DNA molecule, in some cases many thousands of base pairs apart. Loop formation incurs a thermodynamic cost that is a sensitive function of the length of looped DNA as well as the geometry and elastic properties of the DNA-bound protein. The free energy of DNA looping is logarithmically related to a generalization of the Jacobson–Stockmayer factor for DNA cyclization, termed the *J* factor. In the present article, we review the thermodynamic origins of this quantity, discuss how it is measured experimentally and connect the macroscopic interpretation of the *J* factor with a statistical-mechanical description of DNA looping and cyclization.

- cyclization kinetics
- DNA elasticity
*J*factor- partition function
- site-specific recombination

## Introduction

Simultaneous binding of multiple protein factors to two or more DNA sequences is a common aspect of gene expression, genetic recombination, DNA replication and DNA repair. The target sites involved in these interactions are frequently located at considerable distances apart along the same DNA molecule, which requires that the intervening DNA form a looped structure. Looped nucleoprotein assemblies, for example some gene-regulatory complexes in mammalian systems [1–3], can involve many proteins and also segments of DNA that are thousands of base pairs long. Such systems are challenging enough to characterize *in vitro*; having an accurate *in vivo* picture of these processes requires also knowing how DNA flexibility and folding are affected by DNA-bound architectural proteins in prokaryotes [4,5] or chromatin organization [6] and the binding of non-histone proteins [7,8] in eukaryotes. Advances in our quantitative understanding of DNA loop formation therefore depend on detailed consideration of the mechanics of loop formation that takes into account DNA elasticity, local and global aspects of DNA organization, and the plasticity of protein–DNA and protein–protein interactions.

In the present article, we address the thermodynamic cost, in terms of free energy, of forming specific DNA loop conformations. The overall free energy of assembling a particular protein-mediated loop amounts to the sum of the free energies for protein–protein interactions, protein–DNA binding and DNA distortion through wrapping, bending and looping. The protein–protein and protein–DNA terms are typically regarded as independent and separable from that of DNA distortion, although these assumptions can clearly be questioned in the context of a chromatin environment. We focus in the present article on the free energy cost of DNA looping in terms of the statistical distribution of polymer conformations. Mechanical constraints applied to DNA segments at the loop ends strongly perturb the polymer statistics, and finding a general solution is a challenging problem in statistical mechanics. However, treating the looping problem as a generalization of the more extensively analysed case of DNA cyclization has helped to illuminate general principles [9]. In particular, the DNA-size- and helical-phase-dependence of loop formation can be understood in terms of the size and conformation of the protein complex that mediates looping.

The dependence of the free energy of loop formation on DNA size contains non-trivial enthalpic and entropic contributions [9,10]. The behaviour of looped domains involving DNA contour lengths smaller than the persistence length, *P* (~150 bp or 50 nm at moderate ionic strength), is dominated by the bending and twisting rigidity of the double helix and thus the free energy cost of loop formation is largely enthalpic. In contrast, loops that are much larger than *P* incur minimal enthalpic cost, but are entropically unfavourable. Effects of DNA tertiary structure and topology on looping such as supercoiling or knotting of DNA domains are not always considered, but remain nonetheless important.

## The thermodynamics of polymer cyclization

The problem of polymer cyclization was first considered by Jacobson and Stockmayer [11], who formulated the free energy cost of cyclization in terms of a ratio of respective equilibrium constants for formation of a circular polymer chain from a linear monomer, *K*_{c}, and conversion of two linear monomers into a linear dimer, *K*_{d}. This ratio is commonly known as the Jacobson–Stockmayer factor, *J*, also called the *J* factor,
(1)
where Δ*G*_{c}^{0} and Δ*G*_{b}^{0} are the respective Gibbs free energy changes in the standard state for the cyclization and dimerization reactions. Equivalently, *J* is the equilibrium constant for forming a circular and a linear monomer from a linear dimer [12]. By including the free energy change for the dimerization reaction in eqn (1), the contribution of bond formation between terminal residues is subtracted from the overall cyclization free energy. Thus *J* specifically quantifies the effect on the global free energy of the polymer of constraining the chain to a circular conformation.

For a random-flights polymer, the orientations of successive chain segments are independent, and the effect of the end constraints involves only the confinement of both ends of the chain to a common volume element, δ*V*. Then,
(2)
where *W*(0) is the probability density for the end-to-end distance *h* (see Figure 1) evaluated at *h*=0, *N*_{Av} is Avogadro's number, and Δ*G** is the free energy of bond formation between the first and last residues of the linear chain [13]. It then follows from eqn (1) that
(3)
which connects the *J* factor to the statistical fraction of chain conformations having both chain ends within an infinitesimal distance of one another. *J* can therefore also be considered as the effective concentration of one chain end in the vicinity of the other.

In the case of a random-flights polymer, *W*(*h*) has a closed-form solution for arbitrary *h* (including *h*=0), which leads to a simple relationship for *J*,
(4)
where *n* is the number of chain segments of length ℓ. This formula is equivalent to Jacobson and Stockmayer's original expression. Note that the volume element δ*V* does not appear, having been eliminated in eqn (2).

In the early 1980s, DNA cyclization measurements on restriction fragments approximately 200–1500 bp in size [14–16] motivated more sophisticated treatments of the ring-closure problem (Figure 1). Advanced methods were necessary in order to account for the effects of bending and twisting rigidity in DNA molecules on this length scale. Twisting rigidity in particular leads to a strong periodic dependence of *J* on the fractional number of helical turns. A perturbation method for homogeneous and isotropically flexible DNA circles was introduced by Shimada and Yamakawa [17] that successfully accounts for helical phasing, bending and twisting rigidities in small circles. Subsequently, Monte Carlo methods became available to compute *J* from ensembles of simulated helical wormlike chains [12,18,19]. An advantage of simulation-based methods over that of Shimada and Yamakawa [17] is that applications are not limited to homogeneous chains or those with uniform bending and twisting rigidities [12].

The statistical-mechanical interpretation of the J factor can be generalized further to include effects of limited bending and twisting flexibility. Flory et al. [13] introduced an extension of the Jacobson–Stockmayer theory for RIS (rotational-isomeric-state) models of semi-flexible polymers. Their formula for *J* is a product of *W*(0) with two conditional probability densities evaluated at specific parameter values that define the geometry of chain ends in the cyclized conformation, specifically
(5)
where the spatial probability density *W* is now a function of the end-to-end vector **h** and evaluated at **h**=0, Γ** _{h}** (γ) is the conditional probability density for the scalar product, γ=

**l**

_{1}·

**l**

_{n}, of tangent vectors at the ends of the chain, given

**h**, and Φ

_{h}_{,γ}(τ) is the conditional probability density for the twist angle τ given

**h**and γ (Figure 1). The last quantity is evaluated at the ‘natural’ twist angle expected for a pair of adjacent segments modulo 2π; for sufficiently long chains, the last factor may need to be replaced by a sum of conditional probability terms to account for the fact that chain closure can sample topological states with different linking numbers or topoisomers [20]. Given the generality of the RIS model, it is clear that this approach is well justified for helical wormlike chains such as DNA. Moreover, because the problem reduces to evaluating a set of probability density functions,

*J*can readily be computed from Monte Carlo simulations of cyclized chains [12,18,19,21,22].

## Distinctions between DNA looping and cyclization

Although the boundary conditions imposed on the ends of the chain by cyclization are well specified by the structure and flexibility of DNA molecules, the constraints involved in DNA loop formation are wholly dependent on the geometry of the protein or protein complex that mediates the loop. A search for possible boundary conditions that fit a given set of experimental *J* values is highly inefficient, which makes Monte Carlo simulation impractical for analysing experimental data. The issue of protein flexibility is also usually neglected in considering the thermodynamics of loop formation, typically because little information is available concerning intramolecular motions in proteins and multiprotein complexes.

We developed a method for computing the *J* factor for looped protein–DNA complexes, generalizing an approach used to compute *J* for the cyclization of sequence-dependent DNA circles [23]. The method combines computation of the equilibrium conformation of the DNA circle with subsequent evaluation of statistical thermodynamic quantities using a harmonic approximation [23]. In this model, the DNA conformation is described by parameters defined at dinucleotide steps, i.e. tilt, roll and twist [24], which allows straightforward incorporation of intrinsic or protein-induced DNA curvature at the base-pair level. The method is similar in principle to the approach taken by Shimada and Yamakawa [17] in that the theory takes advantage of small fluctuations around one stable mechanical configuration in small DNA circles (e.g. less than ~1000 bp). We treat the protein subunits mediating the loop as virtual base pairs in the cyclized molecule, forming a connected set of rigid bodies with a limited number of degrees of freedom between the subunits. Once the mechanical equilibrium conformation of the circle is found, fluctuations around the equilibrium conformation are taken into account with the harmonic approximation. The new method is approximately four orders of magnitude more efficient than Monte Carlo simulation and has comparable accuracy [23], making this algorithm suitable for fitting experimental *J* factor data using non-linear least-squares methods [10].

Examining the behaviour of *J* as a function of protein geometry revealed that there are significant quantitative differences between DNA cyclization and looping [9]. These differences are manifested in the amplitude and phase of *J* on loop size. Protein-specific geometry and flexibility can couple to the DNA twist in a loop to give unexpected deviations in the periodicity of *J* compared with that expected according to cyclization theory. Unlike cyclization, multiple looped conformations involving the same protein structure, but different loop geometries, can coexist. These details should be considered in analysing DNA loop formation both *in vitro* and *in vivo*.

## Measuring DNA looping

Two particular aspects of looped DNA structures have been exploited in experiments *in vitro* and *in vivo*. One is the co-operative binding of a protein to its two cognate sites, which can be demonstrated by footprinting methods [25]. DNA looping can increase the occupancies of both binding sites; in particular, it can significantly enhance protein association to the lower-affinity site because of the tethering effect of DNA looping. This is believed to be a general mechanism by which many transcription factors recruit RNA polymerases in gene regulation [26,27] (we note, however, that recent data suggest that a more complex paradigm may operate in a chromatin environment [28]). Another hallmark is the helical dependence of loop formation for sufficiently small loops [29,30], which arises because of DNA's limited torsional flexibility and the requirement for correct torsional alignment of the two protein-binding sites.

Many methods have been used to directly observe DNA looping *in vitro*, such as scanning-probe [31] and electron [24] microscopy, and single-molecule techniques [32]. *In vivo* assays based on helical dependence, in which the DNA length between two protein-binding sites is varied and excess repression or activation of a reporter gene is measured [29,33], have been a powerful tool in bacterial systems. Several techniques have been developed in the last decade to investigate loop formation in the cells of higher organisms. 3C (chromosome conformation capture) technology and variants thereof [34] make use of non-specific protein–protein cross-linking combined with digestion and religation of protein-bound DNA fragments to identify long-range interactions across complex genomes.

Most of the available techniques give only relative values of *J* because it is not normally possible to determine the value of *K*_{d} that appears in eqn (1) under the exact conditions of the *in vitro* or *in vivo* experiment. The most rigorous and reliable approach for measuring absolute values of *J in vitro* is to determine *K*_{c} and *K*_{d} from measurements of pairs of forward and reverse rate constants for loop formation and dimerization respectively. In enzymatically catalysed reactions such as ligase-dependent cyclization [35–37], the intramolecular and intermolecular reactions are typically monitored in separate experiments. This is because the intermolecular pathway generally requires significantly higher concentrations of DNA substrate and enzyme for the reaction to occur at a measurable rate. After correcting for effects of substrate and enzyme concentration by extrapolation, the equilibrium and rate constants obtained in these experiments are regarded as apparent values.

Recently, we showed that the kinetics of site-specific recombination mediated by the Cre recombinase of bacteriophage P1 gives quantitative measurements of the absolute value of *J* for DNA loops in the size range 870–3050 bp [38]. Recombinase-based measurements of loop formation have an important advantage over ligase-catalysed cyclization in that the reaction does not require free DNA ends. Thus DNA looping assays can be carried out on covalently closed or supercoiled DNA, a strict requirement for direct *J* factor measurements *in vivo*.

The complexity of the recombinase mechanism presents a significant challenge in these measurements. Unlike the ligase reaction, simple Michaelis–Menten enzyme kinetics models do not apply; instead, the recombination kinetics data must be analysed by solving systems of ODEs (ordinary differential equations). The ODEs contain a set of rate constants for elementary recombinase binding and dissociation steps, *k*_{1}, *k*_{−1}, *k*_{2} and *k*_{−2}, in addition to those for site synapsis and recombination, *k*_{3}, *k*_{−3}, *k*_{4} and *k*_{−4} [38]. Unlike the recombinase binding/dissociation steps, the last four rate constants differ for the inter- and intra-molecular mechanisms. The value of *J* is obtained from the quotient *k*_{3}^{(c)}*k*_{−3}^{(b)}/*k*_{−3}^{(c)}*k*_{3}^{(b)} for the synapsis/recombination steps shown in Figure 2. Here *k*_{3}^{(c)} and *k*_{−3}^{(c)} are the apparent forward and reverse rate constants respectively for intramolecular recombination site synapsis, whereas *k*_{3}^{(b)} and *k*_{−3}^{(b)} are the corresponding values for the intermolecular reaction. In practice, all four apparent rate constants, i.e. *k*_{3}, *k*_{−3}, *k*_{4} and *k*_{−4}, need to be determined for both the intra- and inter-molecular reactions. Thus reliable fits to the data require significant numbers of data points for each kinetic curve. In our method, we monitor site synapsis and recombination in real time using the FRET (fluorescence resonance energy transfer) signal obtained from fluorophore-labelled substrate DNAs. A typical analysis fits the four free parameters to between 500 and 1000 data points; overall uncertainties in *J* are conservatively estimated to be in the range 20–50%, which is comparable with those for ligase-catalysed cyclization.

## Statistical-mechanical partition functions and DNA looping

For a system confined to a fixed volume and at constant temperature, there is a direct relationship between *J* and the canonical ensemble partition functions, *Z*_{loop} and *Z*_{lin}, for a loop and the corresponding linear chain respectively [23]:
(6)
Although we use the Gibbs free energies of the two forms as the arguments in the exponential, the correct free energy function is formally the Helmholtz free energy. We consider this difference to be negligible, as is normally the case with systems in condensed phases.

*Z* takes the form
(7)
where *h* is Planck's constant and β=1/(*k*_{B}*T*). Here is the sum of kinetic energy terms plus the total potential energy *U*_{tot} expressed as functions of non-canonical momenta {*p*,*l*} and co-ordinates {*x*,θ}, with Δ the Jacobian for the transformation from canonical to non-canonical variables [39]. The integration is performed over all segments of the chain *k* and the three degrees of freedom for each segment *a*. Transformation from canonical to non-canonical co-ordinates is necessary to facilitate factorization of the kinetic energy contribution to the integral, namely
(8)

This expression for the partition function is trivial to evaluate for the linear chain, but a major challenge to compute for the closed loop because of the presence of multiple non-linear constraints.

A closed form expression is available for *J* in the case of the harmonic approximation. The derivation in terms of the partition function ratio given in eqn (6) is too lengthy to provide in the present article; instead we give the result (for details, see [23])
(9)
where *E*_{s} is the elastic energy of the loop in its minimum-energy conformation. Eqn (9) contains two matrices **A** and **F** whose elements are functions of the elastic constants of the chain and also the first and second derivatives of the non-linear constraint functions with respect to angular parameters evaluated at the mechanical-equilibrium conformation.

The range of applicability of the harmonic approximation has not been rigorously determined, but it is clear that its use becomes problematic for loops that contain significant levels of excess supercoiling. This is because the harmonic approximation exclusively consists of local (i.e. nearest-neighbour) interactions between rigid-body base pairs and protein subunits. The lack of long-range contributions to the chain's potential energy is therefore expected to become a serious limitation for DNA loops much larger than 500 bp, a size regime where multiple topoisomer species become populated during random closure of DNA circles [40].

## Summary

DNA cyclization and loop formation are related processes that can be described by closely similar thermodynamic and statistical-mechanical formalisms. The thermodynamic quantity relevant to both processes is a ratio of equilibrium constants known as the Jacobson–Stockmayer factor or *J* factor. The *J* factor is an extremely useful quantity because it gives the conformational free energy cost of forming a DNA loop independent of the free-energy changes associated with protein–DNA and protein–protein interactions accompanying loop formation.

Most experimental approaches for measuring looping probabilities report on relative, rather than absolute, values of the *J* factor; therefore rigorous measurements of this quantity remain a significant challenge. We summarize a recent approach based on the kinetics of the Cre site-specific recombination reaction that enabled the *J* factor to be measured without the ligation of free DNA ends. This method can be readily implemented with covalently closed DNA molecules, which is a critical requirement for DNA looping measurements *in vivo*. Such advancements in experimental methods for measuring *J* will probably motivate future theoretical and computational approaches for evaluating looping free energies in complex nucleoprotein systems.

## Funding

This work was supported by a grant from the National Institutes of Health (NIH)/National Science Foundation (NSF) Joint Program in Mathematical Biology [grant number DMS-0800929 from the National Science Foundation (to S.D.L.)] and from the National Institutes of Health [grant number 5SC3GM083779-03 (to A.H.)].

## Footnotes

Topological Aspects of DNA Function and Protein Folding: An Independent Meeting held at the Isaac Newton Institute for Mathematical Sciences, Cambridge, U.K., 3–7 September 2012, as part of the Isaac Newton Institute Programme Topological Dynamics in the Physical and Biological Sciences (16 July–21 December 2012). Organized and Edited by Andrew Bates (University of Liverpool, U.K.), Dorothy Buck (Imperial College London, U.K.), Sarah Harris (University of Leeds, U.K.), Andrzej Stasiak (University of Lausanne, Switzerland) and De Witt Sumners (Florida State University, U.S.A.).

**Abbreviations:**
FRET, fluorescence resonance energy transfer;
ODE, ordinary differential equation;
RIS, rotational-isomeric-state

- © The Authors Journal compilation © 2013 Biochemical Society