All proteins require physical interactions with other proteins in order to perform their functions. Most of them oligomerize into homomers, and a vast majority of these homomers interact with other proteins, at least part of the time, forming transient or obligate heteromers. In the present paper, we review the structural, biophysical and evolutionary aspects of these protein interactions. We discuss how protein function and stability benefit from oligomerization, as well as evolutionary pathways by which oligomers emerge, mostly from the perspective of homomers. Finally, we emphasize the specificities of heteromeric complexes and their structure and evolution. We also discuss two analytical approaches increasingly being used to study protein structures as well as their interactions. First, we review the use of the biological networks and graph theory for analysis of protein interactions and structure. Secondly, we discuss recent advances in techniques for detecting correlated mutations, with the emphasis on their role in identifying pathways of allosteric communication.
- protein complex
- quaternary structure
Over the last two decades, it has become clear that most biological functions can only be described using a system of interacting biological molecules. This observation has shifted the focus of biological science from classical molecular biology, with its focus on individual biological molecules, towards systems biology. In order to understand and analyse these systems, new computational and experimental methods are emerging. We therefore begin the present review with a short overview of the concepts of biological modularity, networks and graph theory, which are being applied to increasingly more types of molecular data.
Further on, we review the structural, biophysical and evolutionary aspects of these interactions. We start by discussing adaptation and oligomerization, highlighting examples from the literature showing how protein function and/or stability benefit from proteins forming oligomers. Then, we use the example of p53 to show that oligomerization can, at the same time, affect a protein in both an advantageous and a detrimental manner.
The ubiquitous nature of oligomerization cannot be understood without considering the evolutionary pathways of protein complexes. We therefore discuss how protein interactions evolve faster than protein folds and how easily novel symmetrical interactions emerge. Once a protein complex is sufficiently populated, it is continuously modified and optimized by natural selection. The idea of detecting co-evolving residues, which one would expect to find in such evolutionary dynamic systems, has received considerable attention over the years. We identify some recent advances in techniques for detecting such residues, which have significant implications for understanding and predicting protein structure, function and interactions.
Interactions between proteins are central to almost every biological process, and the importance of oligomerization for the evolution of allostery was recognized very early on . We review the classical view of protein allostery and its relationship to the different conformational states of protein subunits. Furthermore, we discuss protein allostery in the context of protein dynamics and the energetic coupling of distal sites, and address in detail the specific example of the lac repressor.
The protein complexes we survey throughout the majority of the present review are oligomers of identical subunits, i.e. homomers. Although global surveys at the level of protein structure analysis and functional genomics experiments point towards the fact that homomerization is ubiquitous [2–4], the vast majority of these homomers assemble into higher-order heteromeric complexes in a cellular context. The ubiquitous nature of physical interactions between proteins is particularly powerfully conveyed by the network representation of these interactions (Figure 1, level 1). At the same time, our structural understanding of heteromers lags far behind that of homomers. Luckily, many of the principles of protein interaction and evolution we discuss throughout the present review, such as epistasis, co-evolution and indirect allosteric mutations, are shared between homomers and heteromers. In addition, principles of symmetry and allosteric mechanisms apply equally to homomers and heteromers. We therefore conclude with a discussion of heteromeric complexes and the structural and evolutionary diversity afforded by the incorporation of multiple distinct subunit types.
Protein modularity and networks
Although Nature does not function as an engineer , many of its systems share an important principle with engineered networks: modularity . Modularity of natural systems can be studied from different perspectives, and, in the present review, we examine structural features of physical protein interactions. Protein complexes constitute functional modules, as members of a protein complex engage in stronger interactions within the complex than with external components, and a protein complex can be reconstituted in a functional form independent of the rest of the network . An enzymatic protein complex, e.g. thiocyanate hydrolase (see Figure 6a), concentrates different molecular functions, and a signal transduction system, e.g. the MAPK (mitogen-activated protein kinase)/ERK (extracellular-signal-regulated kinase) signalling pathway, is an extended module, isolated by the specificity of its interactions. In both cases, complex formation enables the proteins to perform an important biological function. There are both functional and structural differences between these two types of protein complexes, and they are often grouped as obligate and transient complexes respectively. However, it may be more appropriate to think in terms of ranges of protein interactions: from obligate-to-transient, strong-to-weak or rigid-to-dynamic interactions .
Insulation of functional modules from each other prevents potentially harmful cross-talk, yet high connectivity of protein interaction modules enables one function to influence and thus regulate another, an important feature of biological modules . Nature, being a rather parsimonious tinkerer, also connects its networks through protein reuse, whereby a protein participates in more than one complex. A protein might, for example, perform different functions if it is expressed in different tissues or have a second, moonlighting, role . Even an organism as simple as Mycoplasma pneumoniae has extensive physical interconnections between approximately one-third of its heteromeric complexes .
The modularity of biological systems facilitates their representation as networks, enabling the use of graph theory to systematically and quantitatively analyse these highly complex systems. Protein systems networks can be analysed at three levels, two of which involve interactions between proteins, whereas the third involves interactions between individual residues (Figure 1).
The highest, most coarse-grained, level describes large-scale protein–protein interactions, such as the interaction network of the yeast proteome shown in Figure 1. The information needed to build such networks is obtained through high-throughput experiments, such as yeast two-hybrid  or TAP (tandem affinity purification)-tag purification combined with MS .
The next two levels of interaction require structural information. The middle level of protein networks (for example, the ATP synthase protein complex network in Figure 1) also represents interactions between proteins. In contrast with top-level protein–protein interaction networks, it describes only a subset of interactions, but contains information about stoichiometry, and the interface size and symmetry.
Finally, at the most detailed level, atomic contacts between residues in a single protein or protein complex can also be represented as a network of residue contacts (for example, the ATP synthase residue contact network in Figure 1). Residue interaction graphs have been used to systematically represent protein folds , identify functional residues  or analyse protein dynamics . Later in the present review, we discuss in more detail how residue–residue contact graphs can be used to study conformational changes linked to allostery.
Networks representing interactions between subunits in protein complexes
A protein complex can be represented as a weighted contact graph of subunits, in which the edge weights can represent the size of the intermolecular interfaces (Figure 1, level 2). Networks of individual protein complexes contain a small number of nodes and edges compared with networks from the other two levels, but can still be very powerful. For example, the network representation of protein complexes allows simple graph matching between different complexes and hence construction of a hierarchical classification of protein complexes, as in the 3DComplex database .
Protein complex networks also represent a framework for more sophisticated analyses that provide deeper insight into the symmetry and modularity of complexes. Specifically, the complexity of protein quaternary structure can be measured quantitatively by determining the minimum amount of information necessary to describe the protein complex in terms of self-assembling units . To do this, the proteins are treated as building blocks with a set of specific pairwise binary attractive interactions which undergo a stochastic assembly process. The set of building blocks that corresponds to the most concise description can be found by employing an algorithm that labels the building blocks iteratively .
This approach is based on the concept of Kolmogorov complexity [17,18], which can be extended to physical structures [16,19]. Formally, the Kolmogorov complexity of a given string of data is the length of the shortest program on a Universal Turing Machine that reproduces it. Just as a string of data can be described using different programs, a physical structure can be described by different sets of self-assembly building blocks and their interactions. Hence the simplest such set of building blocks, i.e. that which requires the shortest description, gives us a quantitative measure of the complexity of the structure. Moreover, this description will highlight any symmetry, and, more generally, any modularity, present in the structure, as the minimization of the assembly description must take into account any repeated sets of building blocks that are connected in the same way within the structure. Using this approach, we showed that in terms of the 3DComplex quaternary structure topologies , most protein complexes are modular, and simple topologies are much more frequent than complex ones .
Adaptive aspects of homomeric protein complexes
Summarizing data from large-scale protein–protein interaction screens using the network representation discussed above demonstrates the ubiquitous nature of physical interactions between proteins (Figure 1, level 1). Most proteins form homomers [2,4], and a vast majority of those homomers interact with other proteins, at least part of the time . This system of interactions constitutes biological units and gives them modular properties, which impose a certain degree of independence and therefore evolvability. This means that the modules of physically interacting proteins are first conserved and then occasionally reshuffled by evolution. We start this section by describing diverse examples where oligomerization benefits protein function and/or stability. We then use the extensive literature on p53 biophysics and function to show how oligomerization can be both advantageous and disadvantageous depending on which properties are considered. This equilibrium between function, stability and evolvability is characteristic of proteins, which brings us to the main topic of the following section on evolutionary dynamics of protein interactions.
Homomers and protein function
In some cases, homomeric oligomerization facilitates the formation of an active site from residues contributed by more than one polypeptide chain. There are many examples of enzymes for which this occurs, such as dihydropicolinate synthase  and HIV protease . In the case of HIV protease, a face-to-face homodimeric interface presented a simple way of evolving a symmetric active site. New protein interfaces emerge in evolution much more easily than new folds, therefore enabling a protein interaction provides a good way of bringing about new structural combinations of amino acids. Whatever the initial trigger might be, once an active site in the interface has emerged, the protein oligomer is ‘trapped’ by selection pressure, and the interface is conserved.
There are also less direct ways in which oligomerization can enable the molecular function of a protein, for example by optimizing the protein dynamics, as association into higher oligomeric states affects the protein collective motions (as reviewed in ). Monomeric TIM (triose phosphate isomerase) contains all of the residues forming the active site, but, although there is no co-operativity between the subunits, only the homodimer is enzymatically active. The proposed mechanism involves transmission of the dynamics of the dimeric interface to the loop covering the active site .
Oligomerization has also been proposed to benefit enzymes catalysing oxidation processes, such as bacterial periplasmic hydrogenases. These enzymes are continually inactivated under aerobic conditions, as O2 oxidizes the active-site Ni(II), and then reactivated by electrons replenished from the surrounding membrane molecules, such as reduced quinoles . However, a more efficient way of reducing the active-site Ni(II) is to draw electrons directly from another protein, a mechanism that is shown to apply to certain hydrogenases .
Independently of whether the activity of an enzyme is directly or indirectly connected to oligomerization, it provides a potential means of regulation, as oligomerization is highly dependent on protein concentration . There are many more examples where oligomerization is directly involved in protein function, including the dimerization of caspase 9 and GPCRs (G-protein-coupled receptors) (reviewed in ).
Returning to the concept of modularity, protein oligomerization is a parsimonious way of increasing complexity. When explaining allostery, Monod et al.  made the point that having a symmetrical oligomer enhances the sensitivity of selection, since every mutation counts twice. The concepts of allostery and co-operativity in the context of protein oligomerization are discussed in further detail later in the present review.
Homomers and protein stability
Larger proteins have a lower surface/core ratio and thus Goodsell and Olsen  argue that their extensive internal interactions and reduced solvent-exposed surface area make them more stable against denaturation. That is why small proteins often have to resort to disulfide bonds or specific metal-binding sites to achieve stability. Large proteins, however, are difficult to maintain, and the same effect can be achieved by oligomerization . Packing of atoms in the protein interfaces is, in principle, the same as their packing in the protein core , and both protein folding and interface formation are governed by the same biophysical principles .
Oligomerization is potentially a simple way to increase stability, but an extensive analysis of protein structures from Thermotoga maritima, a thermophilic organism, showed that oligomerization plays a minor role in adaptation to high-temperature conditions . Other mechanisms, including an increase in the numbers of salt bridges and structural compactness, seem to be more significant. There are, however, numerous anecdotal examples where either a loss of interaction decreased the stability of the protein or, conversely, decreased stability was compensated for by oligomerization. Inactivating mutations in the human homotetrameric fructose-bisphosphate aldolase B cause hereditary fructose intolerance. One of the most common point mutations associated with the disease, A149P, turns the protein into a dimer with decreased thermal stability  (Figure 2).
An extreme case of overlap between protein folding and oligomerization is a mechanism of oligomerization called domain swapping . Domain swapping is, in many aspects, a process similar to aggregation [34,35], but domain-swapped oligomers can stay functional, which is confirmed by numerous examples of domain-swapped oligomers .
Advantages and disadvantages of oligomerization: a case study of the p53 TF (transcription factor)
Oligomerization of TFs
Many prokaryotic and eukaryotic TFs act as homo- or hetero-dimers . One benefit of dimerization is higher affinity and specificity of DNA binding. Homodimeric TFs bind DNA on both strands in a clamp-like manner and a dimer binds twice the number of bases compared with a monomer . In addition, TF dimerization enables a negative autoregulation mechanism, which reduces genetic network noise levels. Proteins can either dimerize upon DNA binding (monomeric pathway) or bind DNA as preformed dimers (dimeric pathway), with the latter providing a better noise-reduction mechanism .
Although dimerization is common among TFs, higher-order oligomerization also occurs. HSFs (heat-shock factors) act as trimers , whereas the master regulator p53 acts as a tetramer. Tetrameric p53 has evolved from a dimer, as can be deduced from its ancestral homologues  and its symmetry . Above, we have discussed several aspects of oligomerization, from stability to function. In this section, we explore which of these may explain the adaptive advantages of p53 tetramerization.
p53 DNA recognition
Four p53 DNA-binding domains bind a canonical four-site (→←→←) RE (response element) . Full-length p53 first dimerizes in solution via its C-terminal oligomerization domain . Then the two p53 core domains bind a DNA RE half-site, forming an additional symmetrical (dimeric) interface, which is stabilized by both protein–protein and protein–DNA interactions, and finally a translational (tetrameric) interface with the dimer bound to the second half-site  (Figure 3). The co-operativity of DNA binding comes from protein–protein interactions between the core domains of the two dimers, as the oligomerization domain mutant still binds DNA co-operatively, although with decreased affinity .
It was shown previously that the p53 ‘transcriptional universe’ also includes non-canonical half- (→←) and three-quarter (→← →) RE sites . Although two p53 proteins bind to a half-site, it is not a simple one-monomer-to-one-quarter-site binding, as some p53 residues interact with the quarter-site covered mostly by the other p53 molecule. Also, similarly to other TFs, protein–protein interactions comprise an essential part of the DNA–protein complex. It is thus not surprising that a p53 dimer does not bind a quarter-site RE (E. Natan and A.R. Fersh, unpublished work) and there are no known quarter-site REs (→). This range of REs, which introduces considerable variation and evolutionary flexibility into the system, is attributed to the tetrameric and co-operative nature of p53 DNA binding. Existence of non-canonical sites also implies that tetramerization allows for a certain degree of mutational robustness. If p53 can functionally bind a three-quarter site, it could as well bind a four-site RE with one quarter-site mutated. This claim, however, comes with several open questions. What is the oligomeric state of p53 at the mutant four-site RE? Will it be a tetramer with one p53 molecule loosely bound (Figure 3)? And what would be the efficiency of its transcriptional regulation?
Robustness to mutations and the dominant-negative effect in oligomers
Is an oligomer composed of wild-type and mutant subunits sufficiently functional or, in the case of p53, can heteromeric p53 be tolerated in a manner similar to a heterogeneous RE site? A similar general question was raised early on, discussing the relationship between different alleles in a heterozygous cell and the degree of the overall functionality as a function of oligomerization .
Most p53 mutations related to cancer can be divided into class I mutants, which affect the residues involved in DNA binding, but not the protein conformation, and class II mutants, which change the native p53 conformation. When a class I mutation in one of the p53 alleles occurs, the expression levels of both the mutant and the wild-type p53 will be, at least initially, equal. Slow dissociation kinetics and high interface affinity [43,48] mean that p53 dimerizes co-translationally [49,50]. One can thus expect only three types of tetrameric species in a heterozygotic cell: wt4, wt2mut2 and mut4, in a 1:2:1 ratio and with mutational robustness depending on the activity of the wt2mut2 tetramer. In theory, any activity of the wt2mut2 species higher than half of the wt4 activity means the tetramerization plays a role in buffering mutations (Figure 3c).
It was long emphasized how many p53 mutations have a DNE (dominant-negative effect) . However, in most of the earlier experiments, the mutant was overexpressed (, and references therein, ), and such an excess of the mutant in the cell cannot take place without the loss of the wild-type allele . More recent experiments show that, often, when the two alleles are expressed equally, a wild-type phenotype is observed [54,55]. Furthermore, Demidenko et al.  hypothesize that some pairs of different mutants, e.g. those affecting different p53 domains, can complement each other and exhibit a wild-type phenotype when expressed in the same cell. The authors provide an example where one p53 mutant is DNA-binding-deficient (mutDNA) and the other is transactivation deficient, i.e. cannot bind its transcription co-activators (mutTrans). If the mutants form a mutDNA2mutTrans2 tetramer, the transactivation-deficient dimer can bind DNA and, through protein–protein interactions, facilitate the low-affinity binding of the DNA-binding-deficient dimer (Figure 3B). At the same time, the DNA-binding-deficient mutant can successfully transactivate. In short, the structural assembly, which enables p53 to bind DNA co-operatively, can also allow the complex to overcome deficiencies.
Some p53 mutants do, however, exhibit a strong DNE and enable complete inactivation of p53. These mutants are often structural, i.e. their mutations cause protein instability [57–60]. For example, a mutation in the hydrophobic core of p53 would show a strong DNE, as it causes aggregation of the mutant protein, which in turn induces severe instability of the wild-type protein. On the other hand, tumours that carry p53 with DNA-contact mutations, which normally do not affect the native conformation, show higher rates of loss of heterozygosity  probably since their negative effect was not dominant.
In summary, p53 oligomerization confers mutational robustness, which increases adaptation and thus fitness , but most p53 research is conducted in the context of cancer, and thus cannot be considered in terms of fitness. However, p53 has more fundamental functions, which can be discussed in the adaptive context, e.g. in metabolism  or stem cell differentiation [64,65].
Furthermore, when discussing oligomerization and mutational robustness, one needs to also take into account the effects of loss of heterozygosity, i.e. loss of one of the alleles. These events are less common than point mutations, but oligomerization might, in those cases, be detrimental. For a monomer, the levels of a functional protein will fall to approximately 50%, but, for an oligomer, the levels can fall much lower. Owing to p53 autoregulation and its dependency on the cellular concentration threshold for oligomer formation, levels of functional p53 fall to 25% upon loss of heterozygosity .
Local concentration: increase in binding frequency and aggregation tendency
Close proximity of similar hydrophobic patches can lead to aggregation and amyloids . There is evidence that homologous adjacent domains of multidomain proteins are under selection pressure to reduce misfolding on the basis of their rapid sequence divergence . In the case of multidomain homo-oligomers, interactions via the oligomerization domains will inevitably increase the local concentration of other domains. If these domains have unstable or disordered regions, as is the case with p53, their proximity could impair stability.
Thermodynamic stability experiments show that full-length p53 is less stable than its monomeric constructs, but, at the same time, more stable than a tetrameric construct lacking only the disordered N- and C-termini . Five highly conserved amino acids at the N-terminus are sufficient to restore stability to the levels of the full-length protein. This short fragment interacts with the unstable DNA-binding domain and is not required for DNA binding, but rather seems to act as a guardian, which protects the homomer from accelerated aggregation.
In summary, above we have surveyed oligomerization of p53 in terms of its (i) function, (ii) mutational robustness, and (iii) stability. Tetramerization enables the co-operative nature of p53 DNA binding and a certain degree of mutational robustness, which are both advantageous for regulation of transcription and protein interactions. At the same time, tetramerization of a multidomain protein increases the local concentration of protein domains, which do not need to interact. In the case of p53, a short stabilizing region compensates for this disadvantage. Sometimes the scenario is the opposite: oligomerization increases stability, but at the same time impairs function. The interplay between stability and function is characteristic of evolvable proteins. In the following section, we survey how protein interactions emerge and how they evolve.
Evolutionary pathways and evolutionary dynamics of protein complexes
When discussing different adaptive aspects of oligomerization, it would be more correct to refer to them as adaptive benefits rather than reasons for oligomerization. As André et al.  have emphasized, evolution can only optimize a complex that is already significantly populated, meaning that its binding energy first needs to be high enough to overcome the loss in entropy. Indeed, as mentioned earlier, protein interactions emerge in evolution with relative ease in comparison with evolution of new enzymatic functions or new structures . Beltrao and Serrano  estimate that, in the Homo sapiens protein interaction network, approximately 1000 interactions change, or rewire, per 1 million years.
It was shown recently that protein interfaces require, on average, only a few mutations to evolve from a typical protein surface patch . Interestingly, when searching for these easily evolvable protein interactions, a clear trend is obvious: low-energy complexes are highly enriched in symmetrical complexes composed of structurally similar (or identical) subunits, both in simulated [69,73] and real systems . Homo-oligomers represent a large fraction of protein complexes, as shown by analysis of known protein complex structures , as well as systematic analyses of protein interaction networks . Symmetry clearly plays an important part in the evolution of interactions. For example, there is a clear enrichment of complexes with self-complementary interfaces (complexes with dihedral symmetry) over those with asymmetric face-to-back interfaces (complexes with cyclic symmetry) . One of the reasons could be that the self-complementary interfaces in complexes with dihedral symmetry are stronger on average, since each mutation that contributes to an increase in binding energy counts twice .
Oligomeric state and indirect mutational pathways
There have been some impressive achievements in the de novo engineering of protein interactions [75–77], and their focus has been on engineering an interaction by designing a novel interface, with the mutations introduced directly at the interface positions. However, fructose-bisphosphate aldolase B illustrates how a mutation outside the interface can also indirectly play an important role in the evolution of interactions. A point mutation (A149P) changes the protein from a native tetramer to a non-functional dimer with decreased thermostability (Figure 2). Intriguingly, the mutation is not in the interface. Malay et al.  solved the structures of the mutant at two different temperatures (4°C and 18°C) and showed how mutation from alanine to proline causes local disorder, which propagates to other regions of the structure. This structural perturbation influences the active site of the enzyme, making it less active, even at low temperatures. The same mutation also explains the loss of tetrameric state. Disorder is introduced to the 110–129 loop, which forms the tetrameric interface in the wild-type enzyme . One could thus say that this mutation has an allosteric knock-on effect on the oligomeric state (Figure 2), analogous to allosteric effects of small molecules, which are known to affect protein–protein interactions. In this particular example, the allosteric mutation has a highly deleterious effect, but this does not preclude the opposite case in which mutations indirectly influence oligomerization in a way that is advantageous. Cases where several mutations would be involved raise interesting questions about possible epistatic interactions between residues that may be distant in three-dimensional space.
Evolutionary pathways of oligomerization and residue co-evolution
Formation of both homomeric and heteromeric complexes requires cognate binding patches to exist. Mutations on the protein surface may enable new complexes to form, and this change may be consolidated by subsequent mutations at other sites. In addition, mutations that occur in the interior of a protein can change the physicochemical properties of the binding surfaces, such as their conformations, potentially destabilizing the interactions they make, as illustrated by the aldolase B example in Figure 2.
In some cases, these interactions can be rescued by compensatory mutations that occur, or pre-exist, at other sites. Such groups of mutations will then be selected for in evolution. This scenario is termed ‘co-evolution’ of residues, and can be important in the evolution of a protein's ability to engage in different complexes. Although, in some cases, we understand the effects of individual amino acid changes, little is known about how groups of two or more amino acids mutate to control phenotypes such as interaction specificity. This is partly due to the complexity of the phylogenetic models required, and partly due to the complexity of protein structures. For example, even in the best-studied allosteric proteins, as is discussed below, the mechanism by which induced changes propagate through the molecule is not well understood.
If groups of amino acids co-evolve to control protein phenotypes, then we would expect to observe correlations between the mutation patterns of such a group of amino acids in MSAs (multiple sequence alignments). While detecting correlations between groups of residues is prohibitive both computationally and in terms of the number of sequences required, many algorithms have been developed to detect co-evolving pairs of residue from MSAs [78–85]. It is hypothesized that highly correlated residues are likely to be in close structural proximity to each other. Indeed, recent statistical work has found that correlated residue pairs among Drosophila species are more likely to be close in sequence , and that structurally proximal residue pairs that change between human and rat tend to co-evolve .
Applying co-evolution methods to the prediction of intermolecular interactions requires the assembly of MSAs of potentially interacting pairs. Although early results were not encouraging [83,88–90], a more recent correlation analysis was used to predict specificity-determining residues between histidine kinases and response regulators . The set of specificity determining residues identified in this system contains examples of residues that co-evolve with more than one other residue, suggesting the presence of higher-order interactions between groups of amino acids. A similar hypothesis was put forward in work identifying intramolecular correlations in the serine protease family of enzymes; in this study, it was experimentally demonstrated that distinct groups of amino acids were able to modify different protein phenotypes .
In a correlation analysis that considers pairs of amino acids, co-evolution of a group of amino acids will result in high correlation scores between all possible pairs in the group. Conversely, if we consider a chain of co-evolving amino acids where, for example, residues i and j co-evolve because they are structurally proximal, as do residues j and k, then, in many cases, we will see a transitive correlation between residues i and k, which may be structurally distal. Given that the correlation scores assigned to each pair of residues is known to depend strongly on their respective conservation in the MSA, these induced, or transitive, correlations may result in a high rate of false-positive predictions .
Methods from statistical physics or probability theory can be used to build a global model of the protein sequence that allows such transitive correlations to be removed from the set of observed correlations. The algorithms developed previously are local algorithms: they measure the correlation Cij between each pair of residues i and j, independent of the alignment context (e.g. Cij will not change if column k of the alignment is changed). In contrast, in a global method, the correlation score assigned to each pair of residues depends on the rest of the alignment. An implementation of these ideas that used a Bayesian network method to build a global probability model and remove transitive correlations was able to significantly improve prediction of protein–protein interactions from sequence alignments. Subsequent work has applied these ideas to the prediction of residue–residue interaction in both the inter- and intra-molecular cases. Removing the transitive correlations results in an orders of magnitude reduction in the level of false-positive predictions [93–96].
The implementation of global probability models that successfully identify co-evolving residues raises a number of important questions. Using a global probability model dramatically improved both contact prediction and prediction of protein–protein interaction partners by removing chains of co-varying residues. Previously, it has been suggested that such chains of co-varying residues are themselves key to understanding allosteric mechanisms, and it is not yet clear how to resolve these two points of view.
Another important unresolved question involves the concerted action of groups of co-evolving residues. In each of the global models implemented to date, higher-order interactions between groups of amino acids are explicitly excluded from the model . However, in the mutational analyses described above, it was shown that groups of amino acids control protein phenotypes such as interaction specificity and substrate specificity [84,91]. In particular, the analysis of the interfaces between homomeric or heteromeric complexes suggests that a number of amino acids are likely to be involved in co-ordinating interaction specificity [4,97].
Recent work on HIV identified such a correlated group of amino acids in the Gag protein within which multiple mutations were less likely to be tolerated by the virus . Members of this group were found to lie on the interfaces between molecular subunits of the viral capsid, suggesting that mutations within the group destabilize the structural interactions necessary for viral function. This suggests that further application of correlated mutation analysis to proteins involved in complex formation might yield interesting insights into the evolutionary constraints that interface residues are required to satisfy, and help in the development of mutational strategies for protein complex engineering.
Analysis of residue co-evolution has been applied to prediction and engineering of allosteric pathways, as well as protein interfaces [81,99]. In many cases, the allosteric pathways are intimately linked to oligomerization, and, in the next section, we discuss this connection.
Allostery and protein oligomerization
Allostery is fundamental to many biochemical pathways, from cell signalling to metabolic regulation , and a recently developed Allosteric Database  classifies more than 300 proteins as allosteric. Although some of the proteins in this database are monomeric, the majority form homo- or hetero-oligomers. In this section, we discuss why oligomerization is so common among allosteric proteins.
Allostery is intimately associated with intrinsic flexibility and intra/inter-molecular communication between different parts of a protein. Structures of many allosteric proteins are a combination of semi-rigid domains connected by flexible hinges. Such assembly of subunits constrains the intrinsic dynamics of a monomer, but, on the other hand, creates novel collective motions by means of intersubunit communication. This communication between different parts of the structure, or in this case, different subunits of an oligomer, enables a high level of allosteric control.
An interesting example where oligomerization promotes allostery is the NAGK (N-acetyl-L-glutamate kinase), a member of the amino-acid kinase family . Depending on the organism, NAGKs can be hexameric and allosterically regulated by arginine, or homodimeric and arginine-insensitive. The formation of an extra trimeric interface in the hexamer provides an additional mode of collective motion of the dimeric blocks, in turn promoting allosteric communication  (Figure 4a). In summary, a combination of selective dynamics in monomers, as well as their assembly in a complex seems to have been recurrently optimized to achieve modes of motion necessary for allostery. Even though it may be clear that an oligomeric arrangement can promote additional communications essential for allostery, the atomic mechanisms and pathways by which this occurs remain elusive.
Two early models explained allostery through conformational changes of homodimeric proteins. The MWC (Monod–Wyman–Changeux) model introduced the concept of equilibrium between different conformations assuming concerted and symmetrical changes from one structure to another . In contrast, the KNF (Koshland–Némethy–Filmer) model proposed sequential structural changes upon ligand binding, introducing the idea of different conformations between the initial and final state .
In recent years, our understanding of allostery has changed and the boundaries between these two seemingly exclusive models have blurred. Proteins are dynamic and exist in an ensemble of conformations, populated according to their Boltzmann distribution . More than one conformation may be able to bind a ligand, which can, in turn, promote subsequent rearrangements and shift a population distribution.
The debate, however, remains about the mechanism of allosteric communication, especially regarding the relationship between the emergence of the binding conformation and the binding event itself. There are numerous examples showing that induced-fit or conformational selection, or both, play important roles in allostery, and several explanations have been put forward showing predominance of one mechanism over the other [105–107]. At the macroscopic level, the main difference is whether the ligand binds to a pre-existing lowly populated conformation resembling the bound state (conformational selection) or, instead, binding to the unbound state promotes structural changes (induced-fit). The conformational selection model has different implications in terms of the evolutionary constraints on individual residues compared with the induced-fit model, as discussed further below.
The idea of an allosteric pathway is related to the allosteric mechanism, and refers to a potential structural pathway that energetically couples two binding sites. The conformational selection mechanism allows for multiple pathways, but the induced-fit model implies a much narrower and more structured route of allosteric communication. Conformational and dynamic transitions within a protein have been successfully probed by mutagenesis and double-cycle mutants , and many computational approaches have been used to correlate protein structure to structural or dynamic coupling between residues (see  for a review).
Most information regarding allosteric pathways comes from high-resolution structures, which can be represented as networks of residue contacts as in Figure 1, level 3. These networks can then be used to analyse structural changes of allosteric proteins in different states. This approach has been used to detect key residues involved in the allosteric communication , and global communication networks involving tertiary and quaternary motions within a protein complex . In general, network representation methods show that residue contact structural networks form ‘small world’ networks with high local residue linkage and sparser long-range connectivity determined by specific residue clusters with critical roles in allosteric communication [112–115] (Figure 4b). However, as proposed by Cooper and Dryden  and seen in the case of dimeric bacterial methionine regulator protein [117,118], allostery can occur without significant changes in the backbone conformation of a protein. This can make the identification of networks of residues that mediate the allosteric communication between distal sites even more challenging and, since the conformational change might be very subtle, even more useful.
It has been suggested that networks of evolutionarily correlated residue pairs provide information about pathways of allosteric communication both within protein monomers, and between members of protein complexes. For example in the GroEL–GroES chaperonin system, networks of residues in GroES coupled to residues in GroEL were identified via a correlated mutation analysis . These networks involve both short- and long-range intersubunit coupling, and were suggested to reflect pathways of information transfer between distal locations. Similarly, an SCA (statistical coupling analysis) was used to identify chain-like networks of amino acid interactions within single PDZ domains that link active-site residues with distant sites , where non-additive binding energy from double mutant cycles was shown to correlate with SCA pairwise correlation scores. A network of residues linking the ligand-binding pocket of GPCRs to conformational changes that occur at the G-protein-binding sites was also identified using SCA . Perhaps most pertinent to the question of allostery is the use of SCA to identify a physically connected pathway of packing interactions that links the allosteric haem pairs across the tetramerization interface in haemoglobin .
Allostery in the lac repressor
The bacterial LacI transcription regulator, i.e. the lac repressor, controls lactose metabolic enzymes and is a classical allosteric protein. It is also an example where flexibility and oligomerization have been evolutionarily optimized to ensure diverse specificities and allosteric control.
LacI belongs to the LacI/GaIR family of dimeric or tetrameric allosteric regulators, where each monomer comprises a DNA-binding domain, a linker region and a so-called periplasmic small-ligand-binding domain (Figure 4c). The DNA-bound form of the protein is always dimeric , although the tetramer (dimer of dimers) is also thought to be functionally significant in mediating looping between binding sites separated in sequence. Numerous high-resolution structures in different conformational states [120,121] and a plethora of genetic and biochemical data makes LacI one of the most studied proteins. The existence of LacI point mutations with inverse phenotypic response relative to the wild-type  and the different ligand phenotytic responses of paralogous families highlights the intrinsic evolvability of LacI. Different methods identified the linker region as playing a key role in the functional rearrangement of the two domains and in the allosteric communication between them [120,123,124]. The existence of such a region is now proposed to be a fundamental feature of many allosteric proteins . Further on, rearrangements in the dimeric interface seem to form part of a primary allosteric pathway illustrating the role of oligomerization [111,126–129]. Additionally, two other allosteric pathways were identified by targeted molecular dynamics  and are also represented in Figure 4.
The architecture of the allosteric pathways of LacI, consisting of domains connected by a hinge and an intersubunit interface, is also seen in other proteins, such as the nucleotide-binding region of Hsp70 (heat-shock protein 70) or human/rat DNA polymerase β . Although it may be premature to generalize, it is beginning to appear that this pathway architecture may be a common principle across many allosteric proteins. Flexibility, order–disorder transitions and interface communication between domains and subunits appear to be recurring themes in the allosteric mechanisms of many proteins. These patterns create specific evolutionary constraints on families with allostery.
The LacI family of allosteric transcriptional regulators are homomeric, and allostery is often linked to symmetric complexes. However, these can be heteromeric, as in the case of haemoglobin or ATCase (aspartate transcarbamoylase), for instance. In the next section, we shift our focus from structure, evolution and allostery of homomers to the structure and evolution of heteromers.
Structural and evolutionary diversity of heteromeric complexes
As mentioned above, the focus of the present review so far has been mainly on homomeric protein complexes. One of the reasons for this bias is that, to date, the structures of many more homomeric than heteromeric protein complexes have been determined. In fact, while the number of published structures of protein complexes has been rapidly increasing, the fraction of those from heteromers has declined substantially since the 1980s (Figure 5). Although seemingly in contrast with observations that a large fraction of proteins participates in heteromeric protein–protein interactions in vivo , this probably reflects biases in structure-determination methodologies. Whereas early structural studies focused primarily on proteins purified directly from cell extracts, which included many heteromeric complexes, the subsequent adoption of recombinant protein technologies encouraged studies of individual gene products. This trend may have increased in recent years given the huge output of structural genomics projects. Although progress is being made on the high-throughput modelling of heteromeric complexes from monomeric or homomeric structures [131,132], tremendous opportunities remain in structural biology for the characterization of new heteromers.
Owing to their multiple subunit types, heteromers can adopt a far greater range of quaternary structures than is possible for homomers (Figure 6a). At the simplest level, the different subunits of a heteromeric complex can be homologous with each other and adopt homomer-like quaternary structure, as is the case for haemoglobin, which resembles a homotetramer . Alternatively, heteromers can have more complicated, but still highly symmetrical, topologies, such as thiocyanate hydrolase which repeats its three different subunits four times each . Some heteromers are highly asymmetrical, such as RNA polymerase II, which contains one copy of each of its ten different heteromeric subunits . Finally, there are heteromers with both symmetrical and asymmetrical regions, such as ATP synthase, in which a cyclic ring of ten γ subunits is connected via an asymmetric stalk region to another subcomplex containing three α and three β subunits . The symmetry mismatch between the different regions of this complex is thought to be crucial for its functioning as a rotary motor.
Not only are heteromers more diverse than homomers in their quaternary structures, but also their subunits are significantly more flexible . Flexibility is intimately related to complex formation, as more flexible proteins tend to undergo larger conformational changes upon binding [137,138]. The increased flexibility of heteromers may be indicative of fundamentally different interaction mechanisms for heteromers compared with homomers. Homomers are inherently symmetrical, and thus each subunit in a complex must undergo the same conformational change upon binding. However, interactions between heteromeric subunits can involve asymmetric conformational changes (e.g. one subunit keeps the same conformation upon binding, while the other experiences a large change). These asymmetric interactions are more likely to require flexible subunits that can adjust their conformations to the structures of their binding partners.
An extreme case of asymmetric binding and subunit flexibility involves the intrinsically disordered proteins, which have received considerable attention in recent years because of their associations with important biological functions as well as various diseases . These proteins can be partially or fully disordered in isolation, yet are often observed to fold upon complex formation . Increasing intrinsic disorder has also been correlated with the size of protein complexes . Moreover, not all intrinsically disordered proteins become fully folded upon binding. Instead, there exist ‘fuzzy’ complexes, in which subunits can retain significant flexibility or disorder, despite being bound . Recent ensemble-modelling strategies using diverse experimental data have begun to shed light on the fascinating structural properties of these highly dynamic complexes [143–145].
We recently introduced a simple structural measure useful for studying protein flexibility called the relative solvent-accessible surface area (Arel) . We found that Arel could be used to predict both intrinsic flexibility and the magnitude of conformational changes upon binding from the structures of either free proteins or bound subunits. Furthermore, intrinsically disordered subunits could be identified from bound complexes by their very high Arel values. We anticipate considerable utility for Arel, both in studies of protein flexibility and conformational changes and as a general tool for structural characterization.
Compared with homomers, heteromers also have more diverse evolutionary mechanisms. Whereas homomer quaternary structure can generally only evolve by changing the number of repeated subunits along defined symmetry-related pathways , heteromers can also gain or lose distinct subunits, allowing evolution to proceed in an orthogonal dimension. This can be through the evolution of new interfaces to bind new subunits, or through gene duplications, as is likely to be the case for haemoglobin (Figure 6a), where the gene encoding a homomeric subunit gradually diverges into two increasingly dissimilar genes encoding heteromeric subunits .
An especially interesting evolutionary mechanism relevant to heteromers is gene fusion, in which two separate genes become fused into one. Since gene fusion often occurs between genes encoding interacting proteins [147,148], this provides a way for the number of distinct subunits in a complex to be reduced, while leaving the overall structure largely unchanged. For example, structures of three different forms of the urease complex have been published (Figure 6b). In Klebsiella aerogenes, the subunits are encoded by three different genes , two of which become fused in the Helicobacter pylori form of the enzyme . Finally, all three genes are fused in the homomeric jack bean (Canavalia ensiformis) urease . Thus fusion represents a mechanism by which a heteromer can evolve into a homomer. The reverse process, gene fission, is also possible as a mechanism for increasing the number of subunits, although it is much less frequent than fusion .
Conclusions and perspectives
In the present review, we have discussed structure, evolution and allostery of homomers and heteromers. Both types of complexes are based on one or more cognate interfaces. An interesting and rarely addressed question is what distinguishes a native functional interface from a spurious non-functional interaction, of which there must be many in the crowded cell. Meenan et al.  recently managed to ‘trap’ a non-functional interface using a clever disulfide cross-link. This allowed them to compare two homologous interfaces with seven orders of magnitude difference in binding affinity. The authors identify subtle and indirect structural differences, which distinguish an interface with femtomolar binding Kd from a homologous ‘non-interface’ with a Kd in the micromolar range . At the same time, some biologically interesting complexes, e.g. ones formed by ubiquitin, have binding affinities in the high micromolar range . This illustrates how important it is to discuss protein interactions and interfaces in a cellular and biological context, ideally taking into account local protein concentrations. Although individual researchers are aware of this, and the differences between transient (e.g. signalling) and obligate interactions have been known for a long time [8,155,156], the community would greatly benefit from a systematic approach combining biophysical and structural data on protein interfaces with quantitative proteomics data on protein concentrations .
Finally, it is worth emphasizing that the future of research on the biophysics and evolution of protein interactions is bright. There are exciting developments related to both experimental and computational techniques for studying the evolutionary and biophysical principles of protein complexes. These include next-generation sequencing, which has produced an explosion of data on protein sequence families, opening a new frontier in protein sequence analysis, as well as more sensitive, comprehensive and novel proteomics methods. Together with increased computational power, this is catalysing new bioinformatics and molecular evolution methods for interpreting the data. Cheap and fast ways of synthesizing DNA mean that it is easier to produce engineered proteins, which makes analysis of mutants more accessible. Together, these and other technologies will usher in a new era in our understanding of proteins and their interactions.
T.P. is supported by a Laboratory of Molecular Biology Medical Research Council (LMB-MRC) Scholarship. J.A.M. is supported by a Long-Term Fellowship from the Human Frontier Science Program. F.L.S. is supported by a fellowship from Fundação para a Ciência e a Tecnologia [grant number SFRH/BPD/73058/2010]. L.J.C. is supported by an Engineering and Physical Sciences Research Council fellowship [grant number EP/H028064/1]. S.E.A. is supported by the Royal Society. S.A.T. is supported by the Medical Research Council [grant number U105161047] and the European Research Council.
We thank Sjors H.W. Scheres for a critical reading of the paper and helpful comments.
Colworth Medal Lecture:
Abbreviations: Arel, relative solvent-accessible surface area; DNE, dominant-negative effect; GPCR, G-protein-coupled receptor; MSA, multiple sequence alignment; NAGK, N-acetyl-L-glutamate kinase; RE, response element; SCA, statistical coupling analysis; TF, transcription factor
- © The Authors Journal compilation © 2012 Biochemical Society