Bacterial gene expression is regulated by DNA elements that often lie far apart along the genomic sequence, but come close together during genetic processing. The intervening residues form loops, which are organized by the binding of various proteins. For example, the Escherichia coli Lac repressor protein binds DNA operators, separated by 92 or 401 bp, and suppresses the formation of gene products involved in the metabolism of lactose. The system also includes several highly abundant architectural proteins, such as the histone-like (heat-unstable) HU protein, which severely deform the double helix upon binding. In order to gain a better understanding of how the naturally stiff DNA double helix forms the short loops detected in vivo, we have developed new computational methods to study the effects of various non-specific binding proteins on the three-dimensional configurational properties of DNA sequences. The present article surveys the approach that we use to generate ensembles of spatially constrained protein-decorated DNA structures (minicircles and Lac repressor-mediated loops) and presents some of the insights gained from the correspondence between computation and experiment about the potential contributions of architectural and regulatory proteins to DNA looping and gene expression.
- computer simulation
- DNA looping
- J factor
- lac operon
- Lac repressor
All of the processes necessary for the survival of a living system hinge on its ability to store and read the genetic information encoded in its DNA. The packaging of a long genome into the eukaryotic nucleus or prokaryotic nucleoid is complicated by the necessity of maintaining the accessibility of the DNA for genetic processing. The binding of multiple proteins to DNA plays an important role in reading and compacting the genome. Many regulatory proteins, such as the bacterial repressor proteins , bind two or more widely separated sites along DNA, forcing the intervening sequence into a loop. Other architectural proteins, like the bacterial nucleoid proteins, deform the DNA at isolated sites of contact while concomitantly wrapping the double helix on their surfaces. Whereas the regulatory proteins generally bind specific sequences, the architectural proteins bind in a more sequence-neutral fashion, allowing for compaction of DNA as a whole.
How these proteins work in combination with DNA to control the overall structure and function of the genetic material is beginning to come to light. For example, deletion of the histone-like (heat-unstable) architectural protein HU in Escherichia coli cells reduces the repression of genes controlled by the Lac repressor protein [2–4]. The loss of HU seemingly disables the looping of DNA between the operator sites recognized by the repressor. The architectural protein, which assembles as a dimer, introduces some of the largest known protein-induced deformations of DNA [5,6]. By contrast, the Lac repressor, a complex of four identical proteins, introduces much lesser distortions in the operator sequences attached to its binding headpieces [7,8]. The precise locations of the two headpieces, however, constrain the spatial pathways of the intervening DNA.
The looping of DNA induced by the Lac repressor occurs on a length scale shorter than the natural scale of DNA deformation. The distortions of local double-helical structure found at ambient temperatures, i.e. bends of 5–6° between adjacent base pairs, 4–5° fluctuations in twist and 0.2–0.6 Å (1 Å=0.1 nm) displacements of successive base pairs , lead to spatial arrangements more extended on average than the pathways needed to form the ~100-bp loops detected in vivo. The sharp bends associated with HU binding and the accompanying changes in helical twist and axis dislocation seemingly counteract the propensity of short pieces of DNA to adopt stiff relatively straight structures. Conventional interpretations of gene expression data in terms of an ideal wormlike representation of DNA attribute the observed levels of gene expression found in wild-type cells to an increase in both chain flexibility and the double-helical repeat [4,10,11]. The key roles of the architectural and repressor proteins in DNA looping and gene expression are lost in such interpretations.
As a first step in deciphering the complex interplay of DNA and different types of proteins during genetic processing, we have developed new methods to simulate both the cyclization of DNA and the looping of the double helix mediated by the Lac repressor protein in the presence of randomly bound HU [12–15]. The present review describes our computational approach and the new insights into DNA ring closure and protein-mediated DNA looping gained from the correspondence between the predicted cyclization/looping propensities and experimental data. The combined effects of HU and repressor on DNA loop formation hint of ways in which multiple proteins may co-ordinate the packaging and processing of the genetic message.
We treat the double helix at the level of base-pair steps, using six rigid-body parameters to specify the arrangements of successive base pairs and a potential that allows for elastic deformations of the long thin molecule from the canonical B-DNA structure . The pathway of protein-free DNA is constructed, one base-pair step at a time, from randomly sampled sets of rigid-body parameters subject to the elastic potential. The protein-bound DNA is treated implicitly in terms of the rigid-body parameters of DNA found in known high-resolution protein–DNA structures . The HU-bound fragments are assigned the sets of parameters that describe the pathways of DNA observed in the crystal complexes with the Anabaena protein . The DNA attached to the Lac repressor is described by the parameters adopted in the high-resolution crystal complex of a stable dimer with a symmetric operator . Both HU and repressor are included as ‘side groups’ of the DNA, i.e. the atoms of protein are expressed in the reference frame of one of the bases in the bound molecular complexes.
Each site of possible HU binding on a simulated protein-free duplex is visited, and the protein is placed on the basis of the assumed binding probability . One of the currently available HU–DNA step motifs is selected at random and introduced at an accepted binding site, thereby altering the configuration of DNA. The binding probability is based on the levels of protein detected during the exponential growth phase, roughly one HU dimer for every 150 bp of the E. coli genome [17,18]; the precise value can be chosen for each simulation. The ease of DNA loop formation is estimated from the number of simulated configurations of a linear molecule with terminal base pairs positioned so as to overlap the base pairs at the ends of the operators bound to the Lac repressor. The formation of a covalently closed duplex is detected by adding a virtual base pair to the 3′-end of the DNA and checking its coincidence with the first base pair of the chain . The arrangement of operators on the Lac repressor is based on the relative disposition of the two dimeric halves of the tetramer. The probability of DNA cyclization/looping is reported in terms of the Jacobson–Stockmayer J factor, the well-known ratio of the equilibrium constant for cyclization/looping compared with the bimolecular association of a linear molecule of the same length and composition . The greater the value of J is, the lower the free energy of DNA is and the greater the probability of cyclization or looping.
We consider the looping of DNA between the headpieces of the V-shaped arrangement of the Lac repressor found in the crystalline state  as well as the ease of placing the double helix between the ends of a tetramer subject to the largescale opening detected in low-resolution experiments [20–27]. The opened states are generated by rotation about an axis through the four-helix bundle  and penalized by a free energy term proportional to the small contact interface believed to stabilize the crystalline form . Once the repressor opens, all subsequent configurations are assigned the same penalty. The looping computations also take account of the four distinct orientations of DNA operator sequences on the LacR-binding headpieces . The detailed treatment of protein and DNA structure makes it possible to visualize the spatial arrangements of cyclic and looped DNA and to gain new insight into the possible effects of architectural proteins on gene organization and expression. Minimum-energy structures illustrative of the sampled loops come from recursive solution of the variational equations that express the balance of forces and moments acting on the unbound base pairs [12,14].
Although our approach does not take explicit account of the supercoiling of DNA essential for Lac repressor-mediated DNA loop formation , we can determine the topological properties of the simulated pathways, i.e. the writhing and twisting numbers of the protein–DNA assemblies, with efficient new computational methods . This information is useful in understanding the different responses of DNA and the contributions of HU to looped systems where the operator sites are in and out of register with the binding headpieces of the Lac repressor.
Enhancement of DNA ring closure with HU
As illustrated in Figure 1, the random binding of HU has striking effects on the cyclization properties of short (100–150 bp) DNA duplexes . Binding of HU at the level of one dimer per 200 bp of DNA increases the J factors by three to five orders of magnitude over the cyclization propensities of ideal protein-free chains of the same length. Moreover, in contrast with protein-free DNA, where the chances of ring closure increase with chain length, the J factors of short HU-bound molecules are roughly comparable in chains of lengths differing by multiples of a helical turn (10–11 bp). Increasing the concentration of HU dampens the amplitude of oscillations in the J factor with chain length and shifts the locations of local maxima and minima. Doubling the amount of HU enhances the J factor by as much as an order of magnitude in the formation of some small less easily closed minicircles. The effects are less pronounced at chain lengths where cyclization is more probable, i.e. at in-phase values of chain length where there is a local peak in J.
The HU concentration reported for the simulations refers to the probability of one HU dimer binding, on average, over a given length of linear unconstrained DNA (50, 100 and 200 bp in Figure 1, upper panel). The average number of HU dimers bound to the cyclic DNA species collected under these conditions is quite different. The successfully formed minicircles take up more HU than the imposed binding levels, and, remarkably, there are almost no minicircles with fewer than two bound HU molecules, even at the lowest binding levels (Figure 1, lower panel). The number of bound HU dimers on closed 100–150-bp duplexes is roughly double the number expected in randomly sampled DNA chains under conditions where the binding probability is one HU dimer per 200 bp and up to 4-fold the expected number in minicircles formed under conditions where the binding probability is one HU dimer per 100 bp.
Contributions of HU to protein-mediated DNA looping
The presence of HU similarly enhances the looping of DNA between the binding headpieces of the Lac repressor  (Figure 2). The J factors of the DNA anchored between the two halves of the protein mimic the complex chain-length-dependent variation in looping propensities determined from gene-expression studies of wild-type and HU-depleted E. coli cells [2–4]. The values assigned to the experiments come from solution of the coupled equilibria that relate the reported gene activity to the possible associated states of the repressor and DNA [11,15,32]. The computed J factors are weighted averages of the looping probabilities determined for the placement of DNA operators on the V-shaped structure adopted in the crystalline state  and on tetramers allowed to interconvert freely between the V-shaped form and a fully extended arrangement where the DNA binding sites lie on opposite sides of the tetrameric assembly. Here the free-energy penalty of chain opening is set at 1.8 kBT, near the lower end of values estimated from the surface area of the contact interface in the closed complex , and the HU-binding probability is one dimer per 500 bp of DNA.
The composite looping propensities capture the peaks and valleys, including the primary and secondary peaks, in the gene-expression data. The simulations performed in the absence of HU show the decrease in looping detected in mutant cells, but fall substantially below the observed values. The amplitude of the simulated cyclization profile of HU-free DNA also exceeds that deduced from mutant cells. The latter discrepancies become even greater if the cost of repressor opening exceeds the estimate used here. The loops formed in the presence of HU, by contrast, closely match the observed data regardless of the opening penalty. A close fit between computation and experiment requires the HU binding level to be substantially lower than the cellular stoichiometry.
The structures of the simulated loops show how HU might contribute to gene expression in vivo. The loops formed most easily, when the DNA operators are in closest register with the binding headpieces of the Lac repressor, tend to follow antiparallel pathways. That is, the DNA enters and exits the tetrameric assembly from opposing directions, forming relatively smooth turns through apices located closer to one of the ends of the loops than the other (Figure 3). The strong propensity of HU to build up around the apices may help to stabilize the antiparallel forms, which dominate but occur with lesser likelihood in DNA chains of the same length closed in the absence of HU. The architectural protein introduces slight changes in the antiparallel loops and seemingly contributes to a wider variety of repressor-bound states. The range of writhing numbers of 109-bp HU-bound loops exceeds that of HU-free loops of the same length (Figure 4). The average writhing numbers of the loops, however, are independent of HU levels. Note that the computed values include a contribution associated with the closure of the loop through the Lac repressor  and that only the differences in writhe between loops formed in the presence and absence of HU are meaningful in this context. Because the attachment of HU preserves the spatial pathways and writhing numbers of the 109-bp loops, the free DNA must compensate for the HU-induced reduction in twist, ~0.2 helical turns per bound dimer, through comparable levels of overtwisting. The latter deformations keep the DNA operators in close register with the binding headpieces on the Lac assembly and the sum of the twist and writhe invariant. The Lac repressor retains a closed V-shaped arrangement, whether or not HU is present (Figure 3).
The less easily formed Lac repressor-mediated loops, which differ in length from the more easily formed loops by 5–6 bp, undergo substantial changes upon addition of HU. The out-of-phase loops attach to the repressor not only in antiparallel orientations similar to those adopted by the more easily formed 109-bp loops, but also in parallel orientations where the DNA enters and exits the protein assembly in the same direction (Figure 3). Moreover, the parallel loops form almost exclusively on opened Lac repressor templates in the absence of HU. The addition of HU tends to close the repressor and increases the variety of looped structures in terms of both the orientation of DNA and the number of bound proteins. The HU-bound loops attach to the repressor in all four possible orientations with a 3:2 preference for parallel over antiparallel arrangements. By contrast, only one of the parallel orientations occurs in loops free of the architectural protein. HU-bound loops of 115 bp bind up to three dimers in the antiparallel arrangements and two on the parallel forms. The greater variety of loops formed in the presence of HU underlies the computed enhancement of the J factor (Figure 2) and the significantly broadened distribution of the writhing number (Figure 4). The binding of two or three HU dimers on the majority of 115-bp loops reduces the helical stress on DNA while concomitantly altering the double-helical pathways (writhing numbers). The unbound DNA segments adopt relatively straight configurations and take advantage of the HU-induced changes in twist to bring the operator sites in register with the Lac repressor.
How the naturally stiff DNA double helix forms the short loops implicated in the regulation of bacterial genes has long been a mystery. Base-pair-level treatments of spatially constrained protein-decorated DNA molecules show how architectural and regulatory proteins present in the cell might contribute to these looping processes. The natural deformability of the double helix in combination with the DNA pathways found in high-resolution HU–DNA complexes account for the measured chain-length- and concentration-dependence of DNA ring closure in the presence of the architectural protein . The random binding of HU, at probabilities corresponding to cellular levels, enhances the computed likelihood of cyclization of short DNA chains several orders of magnitude over that of protein-free molecules of the same length. The simulated looping of similarly sized pieces of DNA between the binding headpieces of the Lac repressor occurs even more easily. The correspondence between computation and experiment suggests that the amount of HU available to the lac operon may fall below stoichiometric levels or that HU-bound DNA may deform to a lesser extent in the cell.
Opening of the Lac repressor along the lines suggested by low-resolution experiments does not account for the looping propensities of HU-depleted cells. Other molecular components found in abundance in the bacterial nucleoid, but not yet considered in the calculations may contribute to the observed looping. These factors may include other highly abundant nucleoid proteins that deform the double helix, subtle distortions in the spatial pathways of the DNA operators, or melting of the promoter sites on the DNA sequences looped between the repressor headpieces.
Finally, the random binding of HU introduces folding pathways inaccessible to protein-free DNA loops. Successfully closed HU-bound loops adopt a wider variety of configurations than those formed in the absence of the protein, particularly when the DNA helical repeat is out of phase with the protein-binding headpieces. The loops closed in the presence of HU contain a greater number of bound protein molecules than anticipated from the assumed binding levels. The added proteins contribute to the wider variety of looped structures and the simulated enhancement in loop formation.
This work was generously supported by the U.S. Public Health Service [research grant number GM34809 and instrumentation grant number RR022375] and the Human Frontiers in Science Program [grant number RGP0051/2009]. M.A.G. gratefully acknowledges support from a U.S. Department of Education Graduate Assistance in Areas of National Need Fellowship.
D.S. and W.K.O. thank the Newton Institute for Mathematical Sciences at the University of Cambridge for providing a stimulating environment to carry out parts of this work.
Topological Aspects of DNA Function and Protein Folding: An Independent Meeting held at the Isaac Newton Institute for Mathematical Sciences, Cambridge, U.K., 3–7 September 2012, as part of the Isaac Newton Institute Programme Topological Dynamics in the Physical and Biological Sciences (16 July–21 December 2012). Organized and Edited by Andrew Bates (University of Liverpool, U.K.), Dorothy Buck (Imperial College London, U.K.), Sarah Harris (University of Leeds, U.K.), Andrzej Stasiak (University of Lausanne, Switzerland) and De Witt Sumners (Florida State University, U.S.A.).
- © The Authors Journal compilation © 2013 Biochemical Society