## Abstract

X-ray and neutron scattering and analytical ultracentrifugation provide multiparameter structural and compositional information on proteins that complements high-resolution protein crystallography and NMR studies. They are ideal methods to use when either a large protein cannot be crystallized, when scattering provides the only means to obtain a solution structure, or the protein crystal structure has been determined and it is necessary to validate this. Once these results have been obtained, we apply automated constrained modelling methods based on known subunit crystal structures to identify the best-fit structure. Using our antibody structures as examples, we describe the generation of appropriate starting models, randomizing these for trial-and-error scattering fits, identifying the final best-fit models and interpreting these in terms of function. We discuss our structure determinations for IgA and IgD, an IgA–human serum albumin complex, the dimer of IgA and secretory component associated with this and chimaeras of mouse IgG with two complement proteins. Constrained modelling confirms the experimental data analysis and produces families of best-fit molecular models. Its usage has clarified several aspects of antibody structure and function in solution.

- analytical ultracentrifugation
- chimaeric antibody
- constrained modelling
- hinge conformation
- neutron scattering
- X-ray scattering

## Introduction

Scattering studies the overall structure of macromolecules in random orientations in solution [1]. The experimental output is the scattering curve *I*(*Q*) as a function of *Q* (Figures 1 and 2), where *Q*=4 π sin θ/λ (2θ=scattering angle; λ=wavelength). Guinier plots of ln *I*(*Q*^{2}) at low *Q*^{2} give the molecular mass and the radius of gyration *R*_{G} (and in certain cases that of the cross-section *R*_{XS}). The Fourier transform of the full *I*(*Q*) curve gives the distance distribution function *P*(*r*) to yield the maximum dimension of the macromolecule and its shape in real space. In comparison with other structural methods, scattering provides a multiparameter description of a protein structure under near-physiological conditions. High flux sources such as those at ESRF (European Synchrotron Radiation Facility)/ILL (Institut Laue-Langevin) in Grenoble, France and the Diamond and ISIS/TS2 facilities in Oxfordshire, U.K. now make possible more extensive applications of scattering by accessing lower sample concentrations and larger *Q* ranges. To match these instrumental developments, we developed a method of automated constrained solution scattering modelling [2,3]. By this, trial-and-error curve fits extract structural information to a precision of 0.5-1.0 nm, based initially on randomizing arrangements of the subunits in known crystal structures and then comparing these with the data to identify the best fits (Figure 1).

X-ray scattering is distinguished by high primary beam intensities and minimal instrumental errors caused by wavelength polychromicity and beam divergence. However, radiation damage effects are common in X-ray scattering and these must be explicitly discounted before modelling is performed. X-ray scattering reveals the hydrated dimensions of the macromolecule. Neutron scattering is mainly distinguished by working in ^{2}H_{2}O buffers, the absence of radiation damage effects and the general invisibility of the hydration shell. Neutrons become unique when deuteration is used to label the macromolecule(s) of interest. More detailed background is given in other reviews [3,4].

Modelling determines a three-dimensional structural model that best accounts for the observed scattering curve. Even though unique structure determinations are not possible on account of random molecular orientations, modelling is able to rule out structures that are incompatible with the scattering curves. Hence, the basic premise of constrained modelling is the ability of the scattering fits to reject poorer fit models. Constrained modelling originated with small sphere models and the Debye equation [1]. Initially, assemblies of small spheres, sometimes with guidance from electron micrograph images, were manually adjusted until they accounted for the scattering curve [5]. Constrained modelling directly uses known atomic structures to generate fits. A large number of conformationally randomized but stereochemically correct structures are prepared (Figure 1). Less than 1% of these will yield good curve fits. This strategy was manually used to model pentameric human IgM in terms of known Fab and Fc crystal structures from IgG [6]. It was automated to model human IgA1 (Figure 2a) [2,7]. The resulting best-fit models provide biologically useful information on domain arrangements, even though they correspond to medium structural resolutions.

## Algorithm for constrained scattering modelling

### Trial structures

The prerequisite is a full starting co-ordinate model, including all carbohydrate chains if present (Figure 1). The three major constraints are: (i) the known sequence and composition to fix the macromolecular volume; (ii) the use of relevant homologous crystal or NMR structures or good homology models to fix the domain shapes; and (iii) the known covalent peptide linkers between the subunits to limit the structures allowed. Different conformations of the proteins are derived from the linkers. These are initially modelled as an extended β-strand structure. Molecular Dynamics then randomizes this to generate libraries of 500–10000 conformers, which is usually sufficient to explore the conformational variants of interest.

### Curve simulations

Scattering curves are calculated using the Debye equation. Several hundred Debye spheres that replace thousands of atoms provide sufficient detail for modelling in a CPU-effective manner [1]. The optimal cube side and atom cut-off for the sphere conversion is determined using one of the most extended models. The scattering curve *I*(*Q*) is essentially computed from summation of all the distances *r* from each sphere to the remaining spheres. For X-rays, while no instrumental corrections are required, a hydration shell of 0.3 g of water/g of glycoprotein has to be added to the model. A hydration shell is well represented as a monolayer of water surrounding the protein surface [8]. To add this, extra spheres are added around every sphere in the model, then duplicated and excess spheres are removed [9]. For neutrons, only unhydrated models require consideration. While no neutron curve corrections are needed at low *Q*, the physically large neutron camera leads to wavelength spread (typically 10%) and beam divergence effects (typically 0.016 radians) at large *Q*. A Gaussian function is used to convolute the calculated curve for these. In addition, a flat background arising from the incoherent scatter from protons requires a correction of 0.5–2.7% of *I*(0) [10,11].

### Comparisons with X-ray and neutron data

In an automated procedure driven by UNIX scripts, the Guinier *R*_{G} and *R*_{XS} values of each model are extracted from the modelled curve for comparison with experiment. The number of spheres in each model is compared with the expected total to rule out steric overlap. The goodness-of-fit *R*-factor (=100×Σ|*I*(*Q*)_{exp}−*I*(*Q*)_{cal}|/Σ|*I*(*Q*)_{exp}|] is computed. Distances that define each co-ordinate model are computed (e.g. the N-terminal and C-terminal separation). This output is sorted to identify the best fits (Figure 1). Thus models are retained if their Guinier-fitted *R*_{G} and *R*_{XS} values are within 5% or ±0.3 nm from the experimental values, and have at least 95% of the expected number of spheres. The models are then ranked according to *R*-factors. Good *R*-factors are less than 5%.

### Interpretation of the best-fit structures

It is necessary to demonstrate that a sufficient number of randomized structures have been screened (Figure 1). A V-shaped distribution of *R*-factors versus *R*_{G} values should result, in which the best-fit *R*_{G} value should be close to the minimum and agree with the experimental *R*_{G} value. The best fit is confirmed by visual inspection of the experimental and modelled *I*(*Q*) and *P*(*r*) curves (Figure 2). The best-fit models are visually inspected for a stereochemically reasonable outcome. The α-carbon co-ordinates for approx. 10–12 models are deposited in the PDB, qualifying because they correspond to experimental structure determinations. Sedimentation coefficients are calculated from the best-fit models for comparison with ultracentrifugation data [12]. The dimensions of the best-fit models are compared with electron microscopy results if available.

## Antibody structures determined by constrained modelling

Constrained modelling can be classified into four types: types 1, 2, 3, and *N* [13]. Type 1 proteins involve the simple association of oligomeric proteins with no covalent linkers between the subunits. Types 2, 3 and *N* proteins involve, respectively, the analyses of multidomain proteins with two, three or more covalently-linked domains or subunits.

### Type 3: modelling of monomeric human and chimaeric antibodies

Monomeric antibodies comprise two Fab and one Fc fragments joined by a hinge region. The hinge conformation is central to the antibody. The hinge is structurally diverse in the five antibody classes, with lengths of up to 64 residues (IgD) or replaced by an extra domain pair (IgE and IgM). There are very few crystal structures for the IgG class and none for the IgA, IgM, IgE and IgD classes. These crystal structures correspond to unphysiological buffers in high salt and are single snapshots of symmetric or asymmetric structures frozen by intermolecular contacts. Constrained scattering modelling is well suited to clarify hinge structure and flexibility for antibody interactions with its antigens and receptors. In this, the only variable is the hinge conformation, and relevant crystal structures are known for the Fab and Fc fragments.

IgA is found in serum as the second most abundant antibody after IgG, mostly found as monomers with approx. 10% of J-chain-linked dimers. The human IgA1 isotype has an O-glycosylated 23-residue hinge region. A reduced flexibility in this hinge was demonstrated by the appearance of two peaks and not one in the X-ray *P*(*r*) curve (Figure 2a). The same hinge conformation could be used for both hinges. The generation of 12000 2-fold symmetrical hinge conformations from Molecular Dynamics, including the inclusion of hinges that were set to be longer in order to populate extended conformations, revealed 104 extended T-shaped hinge structures that resulted in good scattering fits. Thus IgA1 was found to have a longer antigenic reach compared with several other antibodies (Figure 3) [7]. The human IgA2(m1) allotype of IgA2 has a shorter 10-residue hinge. Its modelling was complicated by the disulfide bridge joining the two Fab fragments at the C-termini of their light chains. Approximately half of the best-fit models from constrained modelling gave IgA2 structures with the correct cysteine–cysteine spacing. The connection of this bridge gave T- and Y-shaped models that confirmed the good curve fits [10].

Human IgD with a long O-glycosylated hinge occurs most abundantly as a membrane-bound antibody on the surface of mature B-cells. The X-ray *R*_{G} of 6.9 nm for a myeloma IgD showed that this is more extended than IgA1 with an X-ray *R*_{G} of 6.1 nm. The analysis of 8500 randomized IgD hinge structures resulted in a low *R*-factor and good curve fits (Figure 2). The hinge conformation in the best-fit models corresponds to a semi-extended but principally T-shaped arrangement of the Fab and Fc fragments [14]. Comparison with IgA1 suggested that IgD and IgA1 are more similar than might have been expected (Figure 3).

Chimaeric antibodies are engineered in order to replace the antibody Fab fragment with a different biological function. The Fc fragment is retained. The hinge region is often engineered as a long peptide. Two constrained modelling analyses of these showed that the resulting antibody structure became significantly more flexible at the hinge as a result of this. Two-fold symmetry could no longer be used to obtain good curve fits [15,16].

## Type N: modelling of IgA SC (secretory component)

Many multidomain proteins contain more than three domains. Constrained modelling of these type *N* structures is important to show the degree to which these proteins are formed as extended domain structures, or are folded back. Thus the cell surface tumour marker protein CEA (carcinoembryonic antigen) with seven Ig domains and 50% glycosylation was shown to be extended with slight bends between the domains in constrained modelling fits [17].

An interesting case is that of SC, which binds to dimeric IgA (see below). SC has five heavily glycosylated variable Ig domains D1–D5. The overall lengths of SC and its D1–D3 and D4–D5 fragments were found to be similar in length at 10–13 nm. This was unexpected as, if fully extended, SC was expected to be 20 nm in length (Figure 1). A J-shaped structure for SC was determined by constrained modelling (Figure 2d) starting from the crystal structure of the D1 domain [11]. Here, two strategies were followed. In one, the D1–D3 and D4–D5 fragments were individually fitted first; then, only the linker between the two best-fit D1–D3 and D4–D5 models was randomized to reveal the best fit SC structure using X-ray and neutron data for SC (Figure 1, white circles). In the other, all four linkers between the five domains were randomized (Figure 1, grey circles). Together, both strategies ensured that a full range of conformations had been assessed to determine a best-fit SC structure.

### Types 1 and 2: modelling of oligomeric antibodies

SC associates with dimeric IgA to form SIgA (secretory IgA). In secretions, SIgA constitutes a major mucosal defence mechanism. To model dimeric IgA1, two T-shaped structures for monomeric IgA1 were used as the starting point. Here, a type 1 modelling strategy was used because the covalent links between the monomers and a joining (J) chain were unknown. The two monomers were positioned end to end. A series of 5869 *x*-, *y*- and *z*-axis rotations of one monomer about the other evaluated the orientations and translations between them. A small family of near-planar dimer structures resulted [18].

The modelling of the covalent complex between IgA1 and human serum albumin utilized the T-shaped structure of the former and the crystal structure of the latter. This type 2 modelling showed that an extended peptide linker joined IgA1 and human serum albumin [19].

## Summary and future considerations

Solution scattering is applicable to a broad range of macromolecules. Data interpretation is much improved by the use of molecular modelling using tight constraints based on available atomic structures in the PDB. The key procedure is the generation of a full range of stereochemically correct conformations for fitting. The benefit is the provision of significant new insights on structure–function relationships, especially when no crystal structures are available or are apparently not achievable, and sometimes even after a crystal structure is determined. Our antibody studies have provided much needed insight into their hinge conformations that could not have been elucidated from the known IgG crystal structures, therefore revealing new aspects of their specific function. The method can be extended to higher antibody oligomers. To date, 22 structures have been deposited in the PDB as a permanent archive of the modelling, of which eight are antibody-related.

## Acknowledgments

We thank the Wellcome Trust and BBSRC (Biotechnology and Biological Sciences Research Council) for support. We are particularly grateful to Dr Theyencheri Narayanan, Dr Stephanie Finet and Dr Pierre Panine at ESRF and Dr Richard K. Heenan and Dr Stephen M. King at ISIS (Rutherford Appleton Laboratory, Didcot, U.K.) for their help.

## Footnotes

Bringing Together Biomolecular Simulation and Experimental Studies: A Biochemical Society Focused Meeting in conjunction with the Molecular Graphics and Modelling Society held at Manchester Interdisciplinary Biocentre, Manchester, U.K., 10–11 September 2007. Organized and Edited by Mike Sutcliffe (Manchester, U.K.).

**Abbreviations:**
SC, secretory component;
SIgA, secretory IgA

- © The Authors Journal compilation © 2008 Biochemical Society