CD (circular dichroism) spectroscopy is a well-established technique in structural biology. SRCD (synchrotron radiation circular dichroism) spectroscopy extends the utility and applications of conventional CD spectroscopy (using laboratory-based instruments) because the high flux of a synchrotron enables collection of data at lower wavelengths (resulting in higher information content), detection of spectra with higher signal-to-noise levels and measurements in the presence of absorbing components (buffers, salts, lipids and detergents). SRCD spectroscopy can provide important static and dynamic structural information on proteins in solution, including secondary structures of intact proteins and their domains, protein stability, the differences between wild-type and mutant proteins, the identification of natively disordered regions in proteins, and the dynamic processes of protein folding and membrane insertion and the kinetics of enzyme reactions. It has also been used to effectively study protein interactions, including protein–protein complex formation involving either induced-fit or rigid-body mechanisms, and protein–lipid complexes. A new web-based bioinformatics resource, the Protein Circular Dichroism Data Bank (PCDDB), has been created which enables archiving, access and analyses of CD and SRCD spectra and supporting metadata, now making this information publicly available. To summarize, the developing method of SRCD spectroscopy has the potential for playing an important role in new types of studies of protein conformations and their complexes.
- circular dichroism spectroscopy (CD spectroscopy)
- macromolecular complex
- protein characterization
- Protein Circular Dichroism Data Bank (PCDDB)
- spectroscopic method
- synchrotron radiation circular dichroism spectroscopy (SRCD spectroscopy)
CD spectroscopy is a widely used technique that has been employed in many laboratories, both academic and industrial, for more than 40 years, to investigate protein structure and function. Recently, there has been a renaissance in the use of this method for structural biology studies, as it has proved very useful for examining structures of proteins and their complexes under a range of conditions, and studies of protein dynamics, especially protein folding and unfolding. In addition, there has emerged an enhanced version of the method, SRCD (synchrotron radiation circular dichroism) spectroscopy, the subject of the present review, which has enabled new studies heretofore not possible with conventional CD spectroscopy.
CD spectroscopy is an absorption technique that is based on the differential absorbance of left- and right-circularly polarized light by chiral molecules, typically (for proteins) in the UV (ultraviolet) wavelength region. In conventional laboratory-based CD instruments, the light source is usually a xenon arc lamp. However, if instead the very bright light produced by a synchrotron ring is used as the light source, the photon flux input into the sample is much higher across the entire UV wavelength range [1,2]. A number of important consequences arise from these increased light levels producing a significantly enhanced method for the examination of protein conformations and protein interactions.
Advantages of SRCD spectroscopy for studies of proteins
The increased photon flux is at the heart of the advantages that SRCD has over conventional CD spectroscopy. The enhanced amount of light input into a given sample will result in an increase in the signal-to-noise ratio in the data measured. The improved signal-to-noise levels mean that: (i) smaller amounts of material can be used (an advantage for precious samples such as membrane proteins or the small samples of proteins produced as part of structural genomics programmes); (ii) faster measurements can be made using the same amount of protein [comparable signal-to-noise levels can be achieved in approx. 10% of the time, thereby increasing the number of samples that can be examined (a major advantage for high-throughput screening)]; or (iii) using similar amounts of material and measurement times, much more accurate measurements can be made, with lower noise levels, thus enabling comparisons of proteins with much subtler differences.
The synchrotron light source provides the high flux over a much wider wavelength range than is attainable in a conventional CD instrument, enabling data collection in aqueous solutions down to wavelengths as low as 168 nm, and in dehydrated films to as low as 140 nm. Because of the increased amounts of data produced, the information contents of SRCD spectra are higher, and hence more types of structural information can be derived from them [3,4].
However, the improvements associated with SRCD come at a cost (literally). The resulting advantages of SRCD must be balanced with the disadvantages of having to schedule experiments at synchrotrons in advance and to travel to remote sites. To demonstrate that the extra effort is worthwhile, it has been crucial not only to show that SRCD can provide enhancements with respect to experiments that can be done with conventional CD instruments , but also to show that, in cases where CD and SRCD measurements can both be measured, they produce comparable (albeit improved in the case of SRCD) spectra. Measurements by CD and SRCD on the same protein under the same conditions have shown that, if the instruments are well calibrated [6,7], the spectra they produce in the far-UV region above 190 nm are essentially identical , except that the CD spectra will have a lower signal-to-noise ratio. At lower wavelengths, however, even though some conventional CD instruments are purported to be able to make measurements to as low as 175–180 nm, comparisons show that the measured CD peak around 190 nm is much broader on the low-wavelength side when compared with the same peak in an SRCD spectrum [8,9], and that some low-wavelength peaks can also appear to be slightly shifted in position. This is because, in order to provide sufficient light into the sample in an instrument using a xenon lamp and with optical elements such as silica windows that themselves may absorb light, the slits are opened, fully allowing in light of a broader wavelength range, which in turn changes the shape of the peaks. Such slit changes are not required when using a synchrotron light source, which maintains the same high flux across the full spectral range. Hence, not only can SRCD spectroscopy achieve lower-wavelength measurements, but also the spectra produced are not distorted by instrumental artefacts resulting from attempts to provide the highest light flux possible.
SRCD studies of protein conformations
The following section describes examples of where the characteristics of SRCD either improved or enabled studies that would not have been possible using conventional CD.
Identification of secondary structures and fold types
CD spectroscopy is most commonly used to determine the secondary structure of a protein. This is because different secondary structure types have specific spectral characteristics, and, to a first approximation, the spectrum of a protein is represented by the net sum of the spectra of the different types of secondary structure present, weighted by their percentages present in the protein. In the far-UV wavelength region, α-helical secondary structures produce two negative peaks at approx. 222 and 208 nm and a positive peak at ~190 nm; β-sheet secondary structures produce a single negative peak around 212–215 nm and a positive peak around 190–195 nm.
Figure 1 gives examples of spectra of the four major classes of proteins, as defined by the CATH classification system : mostly helical, mostly sheet, mixed helix/sheet and few secondary structures (irregular). One striking feature is that, in the far-UV region above 190 nm, the peaks for the mostly sheet protein are of much lower magnitude than are the peaks for the mostly helical protein. The consequence of this in the mixed helical/sheet protein is that the shape of the curve tends to be dominated by the helical component. This means that, whereas empirical algorithms used for secondary structure analyses from CD data produce excellent results with respect to the values determined from crystal structures for mostly helical proteins, they tend to be less accurate for mostly sheet and mixed proteins. However, at wavelengths in the VUV (vacuum ultraviolet) region below 190 nm, because there are additional features arising from helical and sheet structures and the curves are of opposite sign, the contributions of the β-sheet structures become more evident. The consequence of this is that SRCD-based analyses of β-sheet-rich proteins tend to be more accurate than CD-based analyses [11,12].
The second level of classification within the CATH system is protein architecture (‘A’ in the acronym). Whereas for mostly helical proteins with the same architecture, their CD spectra tend to be very similar (Figure 2), for mostly sheet proteins with the same architecture, their CD spectra tend to vary considerably (Figure 3). This is because there is considerable structural variation in the geometries of β-sheets (parallel and antiparallel orientations, with different twists and roll geometries) which are reflected in their spectra . A consequence of this variation is that, using cluster analysis methods to group proteins according to similar SRCD spectra (with no input information regarding sequence or structure), they can be clustered into proteins with different types of folds . This is only possible if data down to 170 nm are included in the analyses, suggesting that the extra data present in the SRCD spectra provide additional information content on protein architecture and supersecondary structures.
Only a few proteins of the fourth CATH class, those which contain ‘few secondary structures’ have been determined; these include so-called ‘natively unfolded’ or ‘natively disordered’ proteins. This class of protein also includes denatured proteins. The type of spectrum produced by this class (the example in green in Figure 1) clearly has features distinct from the other three main secondary structure classes. The major peak tends to be a negative peak around 200 nm, which is very distinct from the helical and sheet peaks. Many folded proteins also include some ‘irregular’ structure (which used to be called ‘random coil’, but this is a misnomer as it is neither random nor coil in nature, and hence now tends to be called ‘other’). However, CD measurements of peaks in the low-wavelength region tend to be noisier, thereby making the quantification of irregular structures problematic. Analyses are complicated further because different types of ‘irregular’ or disordered structures produce spectrally different features (Figure 4). Furthermore, Matsuo et al.  have shown that the SRCD spectra of unfolded proteins, especially in the low-wavelength range, will vary depending on how a protein has been denatured. Hence distinguishing types of irregular structures from each other, and from polyproline-II helices, which also tend to produce negative peaks at similar wavelengths , is another example in which the low-wavelength data present in SRCD spectra is of value.
To take advantage of the additional information content present at low wavelengths in SRCD data, new reference datasets have been produced for use in empirical analyses [11,15,16]. These reference datasets, especially if they contain many protein spectra which broadly cover protein secondary structure and fold space , have been shown to significantly improve the analyses of all types of protein structures with respect to conventional CD analyses.
Examination of environmental effects
SRCD spectroscopy has the capability to address a reoccurring question in many structural biology studies: does a protein have the same conformation under crystallization conditions or the conditions required for NMR measurements as it does under ‘physiological’ conditions. Crystals are often formed at high salt concentrations in the presence of additives of various sorts, and NMR samples are often subjected to low pH at high protein concentrations or other extremes. Whereas conventional CD spectroscopy often cannot examine proteins under these extreme conditions, SRCD, by virtue of its high flux, which enables light penetration into samples with absorbing components, can enable such measurements. Most of the added components are not chiral and hence do not give rise to CD signals, but do decrease the total light transmitted, thereby decreasing the signal-to-noise ratio. SRCD has been used to examine both soluble and membrane proteins under crystallization conditions [11,17], including in the presence of detergents, salts and additives. SRCD has also enabled examination of protein structures in other physical forms that are generally refractory to accurate measurements by conventional CD spectroscopy, such as fibres  and dried films [9,19].
Determination of structural consequences of mutations
Small (even single-residue) changes in the sequences of proteins can often produce significant structural differences, including the inability of the entire protein to fold correctly. CD spectroscopy has been employed in many studies to demonstrate such differences. However, mutations (especially those associated with diseases) often produce relatively subtle structural differences that cannot be detected with accuracy by conventional CD as they produce changes that may alter the structure of only a few residues out of a whole protein. But if those differences are critical to, for instance, either the structure of the active site or the molecular surface or interfaces, a more sensitive technique is needed to explore the changes. In the case of the phosphoribosylpyrophosphate synthetase protein associated with the disease gout, a change of one amino acid from asparagine to serine produced a small, but significant, change in the peaks at ~190 and 220 nm; the magnitude of this change corresponded to a 1% difference in helix content. This difference could be rationalized on the basis of the crystal structure as changes to the local hydrogen-bonding pattern in a way that would change the φ/ψ angles of a segment of polypeptide backbone, resulting in the change from a disordered helix to an ordered one . This level of change would not have been detectable by CD, and neither was a crystal structure available for the mutant, so SRCD enabled the interpretation of the mutational effects.
As a second example, for the eye lens protein γD-crystallin, a single amino acid change from proline to threonine (but not proline to serine) results in a form of congenital cataract. The secondary structural change associated with this mutation was too small to be detected by CD (the differences were smaller than the error bars associated with the measurements in that case), but could clearly be detected using SRCD  (Figure 5). Once detected, the reduced solubility of the mutant protein leading to the insoluble cataract aggregates could be rationalized by comparisons of the crystal structure of the wild-type and the molecular simulations of the modelled mutant structure on the basis of the secondary structural changes identified by SRCD.
These two examples demonstrate clearly the value of the enhanced sensitivity of SRCD for structural studies and its complementarity to protein crystallographic studies.
Studies of protein dynamics
CD can be a useful technique for studying the dynamics of protein folding and/or unfolding. It is often limited, however, by three features: the speed and the wavelengths at which it can monitor the changes, and the relatively large amount of material required. The speed limitation arises for several reasons: the speed and dead time of mixing in a stopped- flow chamber, the rate of modulation of the left- and right-polarization (typically 50 Hz), and the limited size of the signal, which means that, by necessity, the dwell time/averaging at each data point will be long. In SRCD, there is the potential for overcoming all of these limitations . Microfluidic mixers for continuous flow studies are being developed  that can obviate the dead times associated with stopped-flow studies, the use of the inherent polarization of the light in the synchrotron ring is being tested as an alternative to modulation , and the high signal-to-noise ratio enabled by the beam flux means that the averaging time can be less, or the chamber volume smaller, thereby reducing the requirement for the amount of protein. Most conventional CD instruments fitted with stopped-flow accessories can only measure the data with any degree of accuracy for the peaks around 222 nm, as the peaks in the 210–208 or 190 nm range have lower signal-to-noise ratios (exacerbated by the short averaging times used in fast kinetic measurements). This limitation means the conventional CD technique has been very valuable for measuring the kinetics of unfolding or folding involving helical elements, but not very good at monitoring the changes involving irregular structures (which would be monitored at 190 nm) or sheet structures (monitored at ~210 nm). Measurements with an SRCD beamline at the lower wavelengths are now possible, and will thus enable the monitoring of changes in all secondary-structure elements present. This could be enhanced further in the future with the development of ‘white light’ detection systems , which will enable the simultaneous collection of the full spectrum. Although there is much potential for SRCD studies of protein dynamics, there are, as yet, few examples of its application to protein systems [25,26] owing to a lack of beamlines that have the necessary instrumentation installed. However, many of the existing, and most of the new, beamlines are developing such facilities, so this will be an important application of SRCD for the future.
SRCD and CD studies of intermolecular interactions and complex formation
In the early days of molecular biology, structural approaches tended to consider isolated macromolecules or to break down complexes into their component parts in order to simplify systems and make them of the size and complexity amenable to the then-available structural techniques. However, it has become clear that many proteins do not exist or function as isolated macromolecules, but rather as complexes involving other proteins, nucleic acids, carbohydrates, lipids or even small-molecule partners. ‘Interactome’ analyses have shown further that huge networks of proteins exist that consist of both stable and dynamic complexes that assemble and disassemble during metabolic and other functional processes. Advances in high-resolution structure determination techniques such as protein crystallography and NMR spectroscopy have meant that it is now possible to examine large whole complexes such as ribosomes, as opposed to previous studies on individual ribosomal proteins in isolation. Such studies have been essential for understanding the functioning of the complexes and their components.
With recent focuses on systems biology, the nature of the intermolecular interactions have come to foreground importance, and hence the need for methods that can characterize those interactions in different ways. There are at present a large number of methods that are capable of identifying different features of protein interactions in solution (as is evident from the other articles in this issue of Biochemical Society Transactions). Some can identify on a gross overview scale which molecules form partners, at least transiently, some can identify the strength of the interactions or the energetics involved in complex formation, and still others are capable of identifying the structural consequences on the partners when complexes are formed.
In the following sections, the types of interactions and the nature of the information that can be derived from CD and SRCD spectroscopies are discussed, as well as the advantages and disadvantages of these methods with respect to other methods for detecting interactions, and the relative merits of using SRCD spectroscopy as opposed to CD spectroscopy in studies of complex formation.
Types of interactions
In general, there are two major types of interactions involved in complex formation: induced-fit and rigid-body. Induced-fit interactions are ones in which there is a change in the structure of one or both components when they bind together, often creating a pocket in one of the partner molecules for the other to bind into when no such pocket exists in either of the isolated proteins; alternatively, it can involve refolding of one of the partners so that it can fit into an existing pocket in the other partner. Induced-fit interactions often involve changes in secondary structures, commonly by folding or unfolding or refolding of one type of secondary structure into another type of secondary structure; such changes often involve loops or turns at the periphery of the protein. Induced-fit interactions can also involve the folding of ‘natively unfolded’ regions of a protein upon interaction with a partner molecule. Alternatively, they can involve tertiary changes resulting from rotation or translation between domains and/or about a hinge region. These types of interactions can result in different environments for some of the residues without affecting the local secondary structure. Induced-fit interactions often involve relatively large regions of the protein, producing substantial amounts of buried interface areas in the complex. These types of interactions are relatively difficult to predict a priori by bioinformatics methods, as the existing surfaces on the individual proteins may differ considerably from the shape and charge characteristics present in the binding pocket in the interface region of the complex.
In contrast, rigid-body interactions can occur without either secondary- or tertiary-structural changes in either of the components. These types of interactions can produce changes to the local environments for residues that lie at the interface, including decreases to their mobility, formation of hydrogen bonds or hydrophobic interactions and/or changes to their surrounding dielectric environment. They can involve either large or small buried interfaces, and the interfaces can be either hydrophobic or hydrophilic in nature. Often the binding pockets, which exist in the absence of the intermolecular interactions, can be detected, and binding sites and partners can be predicted on the basis of such surface characteristics as radius of curvature, geometric complementarity or electrostatic surface features. Many docking algorithms have been developed on the basis of assumptions of these types of interactions.
There are a number of structural databases of protein complexes derived from crystal structures such as 3DComplex . Predictions of docking and binding using these databases generally assume rigid-body interactions. However, only when the crystal structures of both partners in isolation and in complex are available can the details of the nature of the interactions (with or without conformational changes) be ascertained from crystal structures. Alternatively, solution methods which enable monitoring of the structures of the components before and after complex formation can distinguish between the types of interactions, and also detect what types of structural changes occur on complex formation.
Information from CD and SRCD spectroscopy in studies of complex formation
CD has been used extensively to examine interactions that result in conformational changes upon complex formation. It can only monitor changes in the far-UV region if they involve net changes to the secondary structure of one or both components, or in the near-UV region if they involve alterations in tertiary structures which result in perturbation of the environments of aromatic amino acids. This has meant that interactions that do not involve such changes (such as either rigid-body interactions, or interactions that involve movements of secondary-structure elements, but without changing the types and amounts of secondary structure present) are not detectable by CD. Very recently, however, it has been shown that, using SRCD, where the VUV wavelength region is also accessible, molecular interactions of the rigid body type can be detected due to changes in the charge-transfer transitions that can be monitored at these wavelengths . Furthermore, rigid rotational changes to secondary-structural elements can be detected in some systems, such as membranes, where oriented CD (and in particular oriented SRCD)  can examine interactions involving changes to the orientations of secondary-structural elements such as transmembrane helices.
In addition to requiring specific types of changes, the net changes in the CD spectrum have to be of a sufficient magnitude to be significantly detectable with respect to the signal-to-noise levels in the measurements. This effectively limits detection by conventional CD spectroscopy to changes that involve approx. 10% or more of the molecular structure changing, which would be a very large change indeed. For a protein of molecular mass 50 kDa, this would mean that roughly 50 amino acids would have to be altered in order to be detectable. Often interactions involve only a few residues, which is challenging to detect in large proteins by any solution method. Nevertheless, there are a significant number of interactions that involve moderate size changes, involving, say, 10–20 amino acids. With the advent of SRCD spectroscopy and its vastly improved signal-to-noise levels, changes involving as little as 1–2% of the molecule can be detected with confidence, hence expanding the method to detection of many more interactions.
As with many of the other methods described in this issue of Biochemical Society Transactions, CD can be used to determine qualitatively whether binding occurs, or quantitatively what the binding constant for the interaction is. A disadvantage for CD has been that, in order to determine binding constants, it is necessary to experimentally cover a wide range of concentrations, sometimes involving high concentrations and large amounts of one or both of the components. This is particularly true for measurements in the near-UV region of the spectrum, where the chiral absorbance of the aromatic components that are monitored is generally quite low; hence, unless the binding constants are quite large, the protein quantity requirements can often make such measurements unrealistic. Because of the higher signal-to-noise levels in SRCD, this means it is effectively more sensitive, and so measureable signals can be obtained using less protein, hence making a wider range of binding constant measurements accessible by this technique.
Finally, a major advantage of CD or SRCD spectroscopy relative to other methods that simply identify whether interactions occur, is that the data have a higher information content than just the positive detection of a binding event; it can also be used to define the nature of the changes caused by the binding on the structures of the molecules involved.
Protein complexes detected by SRCD spectroscopy
The following are examples of the use of SRCD to examine complexes. Each example includes a discussion of why SRCD has been essential for its success.
Protein–protein or protein–peptide interactions
SRCD has been used to examine the rigid-body complex formation of carboxypeptidase A with its protein inhibitor latexin . In that study, because the crystal structures of both the isolated proteins and the protein complex are known, it has been possible to ascertain that there is no change in secondary structure of either of the components when the complex is formed. As was expected, in the far-UV spectral region (~200–260 nm) where secondary-structural changes would be detected, no difference was found between the spectrum of the complex and the spectrum of the calculated summed spectra of the two components, whether CD or SRCD (Figure 6) was used. However, in the VUV region (below 200 nm) the SRCD experimental and calculated spectra of the complex showed significant differences owing to the interactions present in the complex (Figure 6). These differences have been proposed to arise either from the additional charge-transfer transitions that occur between adjacent secondary-structural elements in the complex or changes to the aromatic residues at the interface region that can give rise to subtle alterations in signals in the low-wavelength region. The individual and complex spectra were also examined using conventional CD under similar conditions, but because in this wavelength range the signal-to-noise levels were much lower and the error bars much larger, significant differences could not be detected with confidence. This then, is an example of where CD could not be used to detect complex formation, but SRCD was able to detect the formation of a rigid-body complex, thus demonstrating an important way in which SRCD expands the capacity of the technique.
An example of a protein–peptide interaction resulting from induced-fit complex formation that required SRCD in order to be detectable is the binding of the C-terminal peptide of rhodopsin to the G-protein arrestin (P.A. Hargrave and B.A. Wallace, unpublished work), a complex formed during visual cycle signalling in rod photoreceptors. The small peptide (19 residues) has a completely unfolded conformation in isolation (Figure 7). Bovine arrestin, a heterotrimeric protein comprising 405 residues per subunit, adopts a mostly sheet conformation. SRCD was used to compare the difference between the spectrum of arrestin alone and the spectrum of the arrestin–peptide complex. In this case, the difference spectrum resulted from the spectra of two large proteins, one in isolation and one in complex with a peptide that contributed less than 5% of the total number of peptide bonds in the sample. The resulting difference spectrum (Figure 7), although noisy, clearly demonstrated that there was additional secondary structure present in the complex corresponding to approx. 20 helical amino acids. Because CD methods only measure net spectra, and because in this case both partners were polypeptides and thus had produced signals in the same spectral region, this difference could have arisen due either to an increase in the helix content of the arrestin partner, an unlikely event for what was already a fully folded protein, or to the refolding of the unstructured polypeptide into a helical conformation, which was much more likely, especially since the magnitude of the change was appropriate for a peptide of the length of the rhodopsin peptide. The latter explanation is consistent with an NMR study done on the protein–peptide complex that showed the peptide folded when present in the complex . What made SRCD essential for the binding study was the small size of the change that was measured. A difference spectrum between two such large parent spectra would not have given a significant and measurable signal in conventional CD. It is also notable that, whereas similar conclusions have been derived from the solution NMR studies on the protein–peptide complex, the amount of material needed for the SRCD studies was approx. two orders of magnitude less, and the measurements and interpretations could be made in hours rather than months.
Membrane proteins are a class of proteins that are important both functionally and pharmacologically. They comprise roughly one-quarter to one-third of the open reading frames in all genomes, and are the targets of more than two-thirds of all currently marketed pharmaceutical drugs. However, because they require the presence of lipid or detergent partner molecules to retain their solubility and functionality, they have often proved to be refractory to structural studies by crystallography and NMR spectroscopy. In principle, they can be studied by CD methods, although membrane samples can give rise to a number of potential artefacts such as differential scattering, absorption flattening and wavelength shifts that need to be considered in interpreting the results [9,17]. Investigating the nature of the interactions of membrane proteins with specific types of lipids has been an important topic for structural and functional studies. However, a limitation for CD studies of such complexes has been the ratio of lipid-to-protein in the sample. Although the lipids generally do not give rise to significant chiral signals, they do absorb light in the far-UV region. Because CD measures the difference between absorbed right- and left-handed light, even though the right- and left-handed signals are equivalent for the lipid component, they decrease the amount of light that reaches the detector, and hence the signal-to-noise levels. Effectively, that means that studies generally use low lipid-to-protein ratios, which may not accurately reflect the environment of a lipid bilayer, especially as high lipid–protein ratios have been shown to be essential for stability, folding and function of a number of membrane proteins. A significant advantage of the higher flux in SRCD means that higher lipid-to-protein ratios can be used (see Figure 8A in which the ratio is 2000:1). These types of measurements could not be made with a conventional CD instrument. Consequently, SRCD has enabled studies which examine the effects of different lipid components on the structure of a protein. For example (Figure 8B), it was possible to demonstrate the requirement for the presence of sphingomyelin in the lipid mixture for proper folding of the pore-forming protein equinatoxin . In this case, the ability to achieve a high lipid ratio was essential, as the protein is lytic at high protein ratios, destroying the integrity of the lipid bilayer.
Another issue for lipid complex formation, especially with peptides as opposed to proteins, is that often the sample contains an equilibrium mixture of conformations. If the peptide is disordered in solution in the absence of the lipid partner, upon association with the lipid, it can fold into an ordered structure, but often the sample will be a mixture of bound ordered peptide and free disordered peptide. An example of this was the N-terminal lytic peptide from the equinatoxin II protein . As was found for the intact protein mentioned above, SRCD was essential to achieve the appropriate lipid ratios so that the lipid bilayers did not lyse or aggregate. However, as in all CD studies, the spectrum was the result of all the components in the sample and hence the derived secondary structure is that of the mixture of structures. In this case, the calculated helical content of the sample was found to be ~40%, which could have meant that all of the peptides in the sample had structures that were roughly 40% helical or that 40% of the peptides were completely helical, or that there was a mixture of structures somewhere in between these two extremes. In this case, the synergy between NMR studies and the SRCD studies was evident, because NMR studies  suggested that the peptide it could detect was almost entirely helical, but was not able to detect unfolded peptides in the sample. By combining the results from the two techniques, it was possible to determine the proportions of the molecules that were folded and unfolded.
The Protein Circular Dichroism Data Bank (PCDDB): a new resource for SRCD and CD data sharing and analyses
To enable the life sciences community to take advantage of the information content in CD and SRCD spectroscopic data, and to provide a service for all users/producers of SRCD and CD spectroscopy, a new resource has recently been developed, namely the PCDDB. This is an accession and deposition data bank for CD and SRCD spectra and their associated metadata which makes the data freely available online. It parallels the Protein Data Bank (PDB) , a long-established data bank of crystallographic and NMR structures and data, which is highly accessed and has found great use in many applications in the life sciences.
Contents of PCDDB records
The PCDDB, despite the ‘protein’ in its name (a homage to the PDB, which was originally established to contain data for proteins) will eventually include spectral data on macromolecules of all sorts, including proteins, nucleic acids and carbohydrates . The aim is also to include not only a wide range of proteins, but also a wide range of experimental conditions for each protein (multiple entries for the same protein). These records will act as a research source for many types of experiments, including protein classification, protein folding and functional/binding studies. In addition to spectra of individual proteins, spectra of protein complexes, both with other macromolecules and with small ligands, will be included. It will also include series of thermal and chemical denaturation spectra that will provide comparison data on different modes of protein folding/unfolding by different types of proteins.
Each entry includes not only the net CD (or SRCD spectrum), but also the high-tension (otherwise known as the high-voltage) spectrum, which is related to the absorption spectrum and is collected on the sample at the same time as the CD spectrum and is one indicator of the spectral quality. Depositors are encouraged to include ‘raw’ sample spectra and baselines as well as information on the processing procedures used. Each entry includes an extensive amount of metadata regarding the characterization, identity and sequence of the protein (including Uniprot  and PDB  identifiers, if available), the sample contents and conditions and the instrumental conditions used for collection. The spectrum for each entry is available online in graphics format as well as in a downloadable text format of wavelength against Δε value suitable for porting into commercial spread sheets for plotting, or for input into analysis software, such as that available on the Dichroweb server . The metadata are downloadable as a simple text file. The databank curation procedures then link the entry to the Uniprot  and PDB  databanks, and to PubMed and other citation indexes as appropriate, and calculate the secondary structures from any linked PDB file using the DSSP algorithm  for inclusion in the metadata.
Datasets included in the PCDDB are standardized and checked for both completeness and quality. Standardization is accomplished by ensuring format interoperability with other software and seachability with web-based tools. The checking for completeness is carried out through autocuration methods which prevent deposition until all fields for a file are complete, and the checking for quality is carried out through a validation procedure which tests a number of characteristics of the data.
The data included in the PCDDB are validated with an automatic online validation software package, ValiDichro (B. Woollett, A.J. Miles, L. Whitmore, R.W. Janes and B.A. Wallace, unpublished work) and the validation report is appended to the spectral entry so that the users of the data have an indication of the quality of the data that they have downloaded. At present, there are 12 validation criteria (although this will be augmented later following beta-testing consultation), and each criterion leads to the annotation of ACCEPTABLE (fully meets the quality criterion), FLAG (partially meets the criterion, and which can include an explanatory comment by the author) or REJECT (does not meet the minimum level for this criterion standard); the validation report is date/version-stamped so that when later versions are released, the user can see which criteria were applied at the time of deposition.
The validation procedure is critical to ensuring the integrity of the databank, and follows along the line of the crystallographic validation software PROCHECK , MolProbity  and WHAT_CHECK  reports that accompany entries in the PDB.
The SP175 dataset
In the first release at the end of 2009, the PCDDB was available as an accession-only databank, in order to provide a period of general beta-testing and user feedback; the full deposition databank version will become available later in 2010.
The initial entries were the 71 protein spectra that comprise the SP175 reference dataset that was created for analysis of protein secondary structures  and which is currently available (and highly utilized) as an option on the Dichroweb server . The aggregate name of the spectra in this dataset, SP175, indicates that it includes SRCD data (down to 175 nm) on soluble proteins. The proteins cover a wide range of secondary-structural space [11,43] as well as protein-fold space, as defined by the CATH classification method . In addition, it includes at least one example of each CATH protein superfamily . The protein spectra used to produce Figures 1–3 of the present review are derived from these spectral entries. One reason that this dataset was chosen for the first release is that it includes high-quality spectra, all of which meet the validation criteria. It should be noted, that, although the spectra are SRCD spectra, cross-calibration studies have shown that, in the higher-wavelength range (i.e. above 190 nm), they are identical with CD spectra, and can be used interchangeably for any purposes (such as spectral comparisons or calculations) as if they were CD spectra [6,7,8,11,45].
The aim of the PCDDB is to include the original data of any published SRCD and CD spectra, thus acting as a repository for good practice as well as a data-sharing resource for the whole scientific community.
In recent years, the concept of data sharing has been adopted by various scientific funding and policy bodies throughout the world, such as the US Department of Energy, National Institutes of Health and National Science Foundation, the Research Councils of the U.K., the Commission of the European Communities, and the Australian Prime Minister's Science, Engineering and Innovation Council (PMSEIC) Working Group for Data Sharing. Their policies expect the timely sharing of data by funded researchers while recognizing the right of the data-producing groups to get fair credit for their work. Many journals, too, are adopting data- and material-sharing policies which require their authors to provide such information as a condition of publication. A key feature of these policies is the requirement that publication must include the submission of relevant information to public repositories.
The PDB has provided such a repository for the crystallographic and NMR communities for many years, long before any data-sharing requirements. It has not only benefited the depositor as a simple and enduring means of archiving their data externally, but has benefited the structural biology, bioinformatics and wider life sciences communities, enabling access to both the structure and the actual data and metadata. The PCDDB is a new, but similar, resource created to facilitate the data sharing of CD and SRCD spectroscopic data. This databank provides a facile means for the authors to fulfil data-sharing requirements by making the data available freely on the web, while ensuring the authors get credit for their data, as any download of the data is directly associated with their literature citations.
The present review has described recent developments in the use of the method of SRCD spectroscopy for structural characterization of proteins and their complexes. The applications of SRCD described have built on, and extended, those possible with the well-established technique of conventional CD spectroscopy. Although the first SRCD beamlines were built 30 years ago [1,2], it has only been in the last few years that the method has found use in biology. This was in large part because a number of the essential proof-of-principle studies had not been done until the turn of this century. These include demonstration of the additional information content present in the low-wavelength data [4,5,46], its value for examining proteins in heretofore unachievable conditions such as high salt and additives , the cross-calibration of SRCD and CD instruments to show consistency between the two related methods [6,7], the availability of software and reference datasets to enable the enhanced analyses of the additional data [11,12,14–17,42], the development of new sample cells enabling data collection at low wavelengths [48–50], and the consideration of potential effects on samples caused by the high-intensity beams and ways to overcome them [51–55]. In addition, over the same time period, significant developments in instrumentation design have made the beamlines more effectively usable [56–60].
Consequently, the era of SRCD usage for the study of biological systems effectively began in the 2000s. The present review has included discussions of a number of examples of usage that currently exist, as well as discussions of the future potential of the method in other types of studies. It is expected that the popularity and number of studies will grow rapidly now that many new SRCD beamlines are in operation internationally (Table 1), with increased capabilities and capacities which will enable new types of experiments to be undertaken.
The present review has also described another development that will benefit both CD and SRCD studies: the PCDDB , which is an archive and data-sharing facility for spectroscopic data. This means that data produced by these methods will be accessible for general use, not only for spectroscopic studies, but also for bioinformatics and structural biology in general.
In summary, new developments in both hardware and software are extending the utility of one of the workhorse techniques of structural biology, CD spectroscopy, to enable its use for new applications in structural and functional genomics, most dramatically as the result of the development of SRCD beamlines.
This work was supported by Biotechnology and Biological Sciences Research Council project grants to B.A.W., a Biotechnology and Biological Sciences Research Council Bioinformatics and Biological Resources grant to B.A.W. and R.W.J., Biotechnology and Biological Sciences Research Council US-partnering and China-partnering grants to B.A.W. and R.W.J., and beamtime grants from the Synchrotron Radiation Source (SRS) Daresbury, the Institute for Storage Ring Facilities at Aarhus (ISA), Beijing Synchrotron Radiation Facility (BSRF), National Synchrotron Radiation Research Centre (NSRRC), the Soleil Synchrotron and the National Synchrotron Light Source (NSLS).
We thank the following collaborators with whom we carried out the studies reviewed in this article for samples and helpful discussions: Professor Paul Hargrave (University of Florida, Gainesville, FL, U.S.A.), Dr Nathan Cowieson (formerly Queensland University, now Monash University, Clayton, Victoria, Australia), Professor Paula Booth (Bristol University, Bristol, U.K.), and Professor Frances Separovic (Melbourne University, Melbourne, Victoria, Australia). We also thank Dr Andrew Miles, Dr Frank Wein (now at the Soleil Synchrotron) and the late Dr Paul Evans from the Wallace Laboratory at Birkbeck College for help with data collection and sample preparations.
AstraZeneca Award Lecture:
Abbreviations: CD, circular dichroism; PCDDB, Protein Circular Dichroism Data Bank; SRCD, synchrotron radiation circular dichroism; UV, ultraviolet; VUV, vacuum ultraviolet
- © The Authors Journal compilation © 2010 Biochemical Society