Biochemical Society Transactions

8th International Symposium on Cytochrome P450 Biodiversity and Biotechnology

Proteomic analysis of cytochromes P450: a mass spectrometry approach

Y. Wang, A. Al-Gazzar, C. Seibert, A. Sharif, C. Lane, W.J. Griffiths

Abstract

In human, the CYP (cytochrome P450) superfamily comprises 57 genes arranged in 18 families and 42 subfamiles. These genes encode for enzymes involved in the metabolism of drugs, foreign chemicals, fatty acids, eicosanoids and cholesterol. Additionally, they play roles in bile acid biosynthesis, steroid synthesis and metabolism, and vitamin D3 synthesis and metabolism. Mutations in many CYP genes cause inborn errors of metabolism and contribute to increased risk of cancer. MS provides a convenient method for the identification and quantification of CYP enzymes, and in the present paper we will review the current state of the technology for such an analysis.

  • cytochrome P450
  • liquid chromatography–MS
  • one-dimensional gel electrophoresis
  • proteomics
  • quantification
  • tandem MS

Introduction

At present, there are more than 270 different CYP (cytochrome P450) gene families, with 18 recorded in mammals (http://drnelson.utmem.edu/CytochromeP450.html). In human, there are 57 CYP genes arranged in 18 families and 42 subfamilies [1]. In contrast with human, there are at least 249 active CYP genes in the mustard plant Arabidopsis thaliana constituting 1% of its genome [2] (http://drnelson.utmem.edu/CytochromeP450.html). Sequence comparisons indicate extensive similarity between CYPs identified in human and bacteria, suggesting a common ancestor for the CYP superfamily some three billion years ago [3]. CYP enzymes act on endogenous small molecule substrates introducing oxidative, peroxidative and reductive changes, and the enormous diversity of small molecules in plants is reflected in their high number of active CYP genes. CYP enzymes also metabolize exogenous compounds including drugs, environmental pollutants and plant metabolites. Metabolism of foreign chemicals usually results in detoxification; however, CYP enzymes can generate toxic metabolites that can contribute to increased risks of cancer and birth defects [2].

CYP enzymes are arranged into families and subfamilies on the basis of percentage amino acid sequence identity [3] (http://drnelson.utmem.edu/CytochromeP450.html). CYPs that share ≥40% identity are assigned to a particular family designated by an Arabic numeral; those sharing ≥55% identity make up a particular subfamily designated by a letter; for example, CYP7A and CYP7B share ≥40% but ≤55% identity. If an additional member of the CYP7A subfamily is identified, it would be classified as CYP7A2. The biological functions of CYP enzymes include metabolism of endogenous substrates and synthesis of hydrophobic lipids such as bile acids, steroid hormones and fatty acids. Although CYP enzymes have a major role in drug metabolism, most CYP gene products in vertebrates probably first evolved for important life functions, before then developing plant-metabolite-degradation and drug-metabolizing abilities [2].

Our interest in CYP enzymes is 2-fold, on the one hand our research concerns cholesterol metabolism, bile acid biosynthesis, steroid synthesis and metabolism [46], and on the other the fate of clinically administered drugs [7,8], and in this mini-review we will describe how MS can be used to investigate the CYP proteome.

Proteomics

Proteomics attempts to analyse the entire protein compliment of the genome. It is challenging in that proteomes change qualitatively and quantitatively to reflect physiological states. At present, the most powerful techniques for interrogation of the proteome are based on MS. There are many variations of MS-based proteomic techniques, some of which are discussed below and are extensively reviewed elsewhere [9,10]. MS-based technologies share the process of fragmenting proteins into peptides by proteolysis, followed by the use of MS to extract sufficient information from either a collection of peptides or individual peptides to identify the proteins conclusively (Scheme 1). Quantitative proteomics is usually accomplished by using stable isotopes that are incorporated into the sample peptides by one of a variety of strategies. In CYP proteomics, the proteins of interest are limited to those derived from CYP genes; however, these proteins are always analysed as part of a complex mixture of proteins as methodology is not in place to specifically co-purify all members of the CYP superfamily into a single fraction.

Scheme 1 MS-based approaches to protein identification

Left panel: PMF approach. Using the 1-DE PMF approach, Galeva and Altermann [19] identified CYP2A1, CYP2B1, CYP2B2, CYP2C11, CYP2D2, CYP2D5 and CYP3A1 in rat liver microsomes from control and or PB-treated animals. Right panel: LC-MS/MS approach. Using the 1-DE–LC-MS/MS approach, Nisar et al. [7] identified CYP1A2, CYP2A1, CYP2A2, CYP2B3, CYP2C6, CYP2C7, CYP2C11, CYP2C12, CYP2C13, CYP2C22, CYP2C23, CYP2C24, CYP2D1, CYP2D2, CYP2D3, CYP2D4, CYP2D5, CYP2E1, CYP3A18, CYP4A2, CYP4A3, CYP4F1, CYP17 and CYP19 in rat liver microsomes and Lane et al. [8] identified CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4, CYP4A11, CYP4F2, CYP4F11, CYP8B1 and CYP27A1 in microsomal fractions from human liver and colorectal metastases.

Proteomics technology

Most proteomics strategies are based on the extraction of protein from a sample, followed by protein fractionation or separation, often by 1-DE (one-dimensional gel electrophoresis) or 2-DE. Separated proteins are then subjected to proteolysis, usually using trypsin, and analysed by MS. When 2-DE is used, proteins can be separated into spots containing just one or a simple mixture of proteins, and identification can be made by PMF (peptide mass ‘fingerprinting’) methodology [11,12]. Briefly, the masses of the tryptic peptides generated by in-gel digestion of the protein(s) in a 2-DE spot are measured and searched against a protein database. The experimentally measured masses of the tryptic peptides are compared with theoretical masses generated by in silico tryptic digestion of each protein in the database. The best correlation between a set of theoretical peptide masses from a protein and the experimental set of peptide masses identifies the protein (Scheme 1). The PMF strategy is reliable for 2-DE spots containing one protein or a simple mixture of proteins [13], but is less trustworthy for complex mixtures generated from 1-DE bands. An alternative gel-based strategy is founded on reversed-phase HPLC and MS/MS (tandem MS). Proteins separated into 1-DE bands (or 2-DE spots) are in-gel digested with trypsin, and the mixture of peptides from all the proteins in a band are partially separated by HPLC. As peptides elute from the column, they are ionized and their mass is measured, and the most abundant peptides are subjected to MS/MS. In the MS/MS process, the peptide dissociates into smaller fragment-ions, and then the experimental MS/MS spectrum is compared with theoretical MS/MS spectra of all peptides with the appropriate mass generated by in silico tryptic digestion of an entire protein or DNA database. The best correlation between experimental and theoretical spectra defines the parent protein [14,15] (Scheme 1). A variation of the HPLC-MS/MS approach to protein identification is to digest with trypsin an unseparated protein mixture and then perform LC-MS/MS (where LC is liquid chromatography) on the entire compliment of tryptic peptides. In this case, two dimensions of LC [2D-LC (two-dimensional LC)] are usually incorporated for peptide separation. In the first dimension, peptides are separated on a cation-exchange column, and then subjected to reversed-phase HPLC in the second dimension, followed by MS/MS analysis. Both LC columns may be arranged in series and on-line with the mass spectrometer [16], or fractions may be collected from the cation-exchange column, and subsequently injected on the HPLC column after desalting and solvent exchange [17].

Proteomics instrumentation

PMF is usually performed using MALDI (matrix-assisted laser-desorption ionization)–TOF (time-of-flight) instruments. Peptide samples are ionized by the MALDI process and their mass-to-charge ratios (m/z) are measured by TOF MS. In the MALDI process, singly protonated peptides are usually formed, [M+H]+, making the m/z scale effectively a mass scale. When performing PMF, the accuracy of the peptide mass measurement greatly affects the confidence with which a protein can be identified [18]. A mixture of peptide masses measured to an accuracy of 0.01 Da will provide much more confident protein identification than the same mixture measured to an accuracy of 0.1 Da [18].

MS/MS experiments can be performed on MALDI generated ions, but for this application ionization is more commonly by ES (electrospray). ES tends to generate multiply protonated tryptic peptides, [M+nH]n+, where n is often 2 or 3. MS/MS instruments essentially consist of two mass analysers arranged in series (either in space or time); the first analyser selects the ion of interest to be fragmented; CID (collision-induced dissociation) generates fragment ions, which are then mass analysed in a second mass analysis event. MS/MS data usually provide more confident protein assignments than PMF. Assuming that a good quality MS/MS spectrum is generated, it can be confidently assigned to an amino acid sequence of a tryptic peptide derived from a protein within a database (if the relevant protein is within the database). The amino acid sequence will be characteristic of a protein, or at least a small number of proteins, and if more than one peptide is identified from the same protein, the protein can in many cases be unequivocally identified (Figure 1).

Figure 1 Amino acid sequence of human CYP2A6

Peptides identified by 1-DE–LC-MS/MS are indicated in boldface.

CYP proteomics

Identification

CYPs are membrane-associated proteins and perform poorly on 2-DE [19]. However, 1-DE (SDS/PAGE) is far more compatible with membrane proteins. One proteomic strategy that has been applied for CYP profiling is to separate microsomal proteins, which include many CYPs, by 1-DE into bands, digest the mixtures of proteins within each band with trypsin and perform PMF on each band [19] (Scheme 1). As PMF is most suitable for simple protein mixtures [13], this approach is not optimal for identifying proteins in 1-DE bands generated by the separation of complex mixtures of proteins from e.g. microsomes. A much more reliable method for CYP profiling is to perform LC-MS/MS on the peptides derived from each band [7,8]. MS/MS spectra will be generated for the most abundant peptides and by performing a database search proteins will be identified. Using this technology, we have identified CYP proteins in rat, mouse, and recently human liver, colorectal metastases and intrahepatic cholangiocarcinoma tissue [7,8].

Quantification

Proteomics is concerned not only with the profiling of proteins, but also with determination of their quantitative levels and monitoring how these levels change with time. Stable isotope labelling methods provide the ‘gold standard’ for such quantitative studies. Stable isotopes, usually 13C, 15N, 18O and 2H can be incorporated into proteins metabolically, or as is more usual, through chemical modification of reactive amino acid side chains or through chemical or enzymatic modification of their N- or C-termini. Isotope-labelled peptides have essentially the same physicochemical properties as unlabelled counterparts, and give similar chromatographic behaviour and identical signal intensity in mass spectra. However, labelled peptides differ in mass by an increment that is directly related to the number of incorporated isotopes. This allows a comparison of their signal intensity, which infers the relative protein abundance [10].

At least three different isotope labelling strategies have been applied to CYP proteomics. Jenkins et al. [20] recently published data on the relative quantification of CYP proteins in control mouse liver and in liver from mice treated with PB (phenobarbital), a CYP-inducing drug. They used ICAT (isotope coded affinity tag) technology [21,22]. The ICAT reagent consists of a thiol-reactive group attached to a biotin tag through a linker. The linker is labelled with either eight 2H or nine 13C atoms in the heavy labelled form, and 12C and 1H atoms in the light form. Equal quantities of microsomal proteins from control mouse liver and treated mouse liver were reacted with ICAT reagents. The proteins from the control sample were reacted with light reagent and those from the treated mouse were reacted with the heavy reagent. The thiol-reactive groups react with cysteine residues and tag the proteins with the ICAT labels. The heavy and light ICAT-labelled microsomal proteins were then mixed together and digested with trypsin. Cysteine-containing peptides are tagged with a biotin group and can be isolated by avidin affinity chromatography. Jenkins et al. [20] used the second generation ICAT reagent which incorporates an acid labile group, which allows the biotin portion of the tag to be removed after labelled peptides have been isolated [20]. The isotope-labelled cysteine-containing tryptic peptides were then subjected to LC-MS and MS/MS analysis. The peptides were identified from their MS/MS spectra and the relative abundance of light and heavy labelled peptides were determined from the MS spectra. Peaks corresponding to a pair of differentially labelled peptides are separated in mass according to the isotope label, and the ratio of their peak areas reflects their relative abundance. Assuming the identified cysteine-containing peptides are unique to a single protein, i.e. CYP isoform, the ratio of peak areas will reflect the relative abundance of this protein in the two samples analysed. Using ICAT methodology, Jenkins et al. [20] identified eight CYP isoforms (Figure 2). The disadvantage of the ICAT methodology in CYP analysis is that protein identification is solely based on the amino acid sequence of cysteine-containing tryptic peptides. This requires a CYP protein to give a cysteine-containing tryptic peptide with an amino acid sequence unique to that protein. What is more, this peptide must be of a length and hydrophobicity compatible with LC-MS/MS analysis. This is not always the case for closely related subfamily members. For example, in their PB work, Jenkins et al. [20] were unable to find cysteine-containing peptides that differentiate CYP2B10 from CYP2B20, CYP2D9 from CYP2D11, or CYP3A11 from CYP3A16, although peptides common to these pairs of proteins were identified [20].

Figure 2 Schematic description of protein quantification methods

(A) The Figure shows the ICAT approach. Using ICAT and LC-MS/MS and -MS, Jenkins et al. [20] obtained relative quantification data for liver microsomal proteins CYP1A1/2, CYP2B10/20, CYP2B10/13/20, CYP2B9/10/13/20, CYP2C29, CYP2C29/39, CYP2C40, CYP2C50, CYP2C37/50, CYP2D9/11, CYP2D10, CYP2D10/22/26, CYP2D13, CYP2E1, CYP2F2, CYP3A11/16/41 and CYP7B1 from control and PB-treated mice. By the addition of synthetic peptides labelled with ICAT to protein digests, absolute quantitative data were obtained, e.g. CYP2E1 was present in control liver microsomes at a level of 35 pmol/mg of protein. (B) The Figure shows N- and C-terminal labelling methods. Quantification is performed by LC-MS when C-terminal 16/18O labelling is used, while LC-MS/MS provides quantitative data for iTRAQ N-terminal labelling experiments. Using 16/18O labelling, Lane et al. [23] found the liver microsomal proteins CYP1A2, CYP2B10, CYP2B20, CYP2C37, CYP2C38, CYP2C39, CYP3A11 and CYP39A1 to be up-regulated, CYP2C40, CYP2E1, CYP3A41 and CYP27A1 to be down-regulated and CYP2A12 and CYP2D10 to be unchanged in mice treated with TCPOBOP. Using iTRAQ, we have found CYP2A6 to be down-regulated in human cholangiocarcinoma tissue. (C) The Figure shows the MS/MS spectrum of the iTRAQ-labelled peptide GTEVYPMLGSVLR from CYP2A6 that is down-regulated in human cholangiocarcinoma tissue.

To avoid the problem of relying exclusively on cysteine-containing peptides for protein identification and quantification, we have used an alternative strategy based on enzymatic isotope labelling of all tryptic peptides with 18O or 16O [23,24]. In brief, equal amounts of microsomal protein from control mice and TCPOBOP [1,4-bis-2-(3,5-dichloropyridyloxybenzene)]-treated mice were run in parallel on 1-DE. Equivalent bands in the two lanes were excised and proteins were digested in-gel with trypsin. Following digestion, the resultant tryptic peptides from the treated mice were incubated with trypsin in H218O, while those from the control were incubated with trypsin in H216O. Trypsin has the effect of exchanging the two oxygen atoms at the carboxy group of the C-terminus of a tryptic peptide with oxygen atoms from water, thus the peptides from treated mice were elevated in mass by 4 Da above the mass of equivalent tryptic peptides from the control. The light and heavy peptide mixtures were then combined and analysed first by LC-MS/MS and then by LC-MS. LC-MS/MS was used to identify the peptides and LC-MS was used to determine the relative abundance of light and heavy peptide pairs. Using this methodology, 16 individual CYPs were identified, eight of which were up-regulated, and four were down-regulated in the TCPOBOP mice (Figure 2). Significantly, CYP2B10 and CYP2B20 were differentiated unlike in the ICAT study of Jenkins et al. [20].

We are also exploring a gel-free method for relative protein quantification. Using the so-called ‘iTRAQ’ (isotope tags for relative and absolute quantification) technology [25], we have made a comparison of the CYP content of normal human liver and intrahepatic cholangiocarcinoma tissue. Equal quantities of microsomal protein from normal liver and carcinoma tissue were digested with trypsin and then the peptides from the normal liver were reacted with the iTRAQ-114 reagent, while those from the carcinoma were reacted with iTRAQ-117 reagent. The iTRAQ reagent consists of an amine reactive group, a balance group and a reporter group. At present, the iTRAQ reagent is suitable for the relative quantification of four samples in parallel, although in our study we compared only two samples. Briefly, the iTRAQ reagent comes in four forms, iTRAQ-114, -115, -116 and -117, all of which have the same nominal mass and add 145 Da upon reaction with the N-terminus of a peptide. Labelled peptides from the normal liver and carcinoma tissue were combined and separated on-line by 2D-LC, i.e. cation-exchange, and reversed-phase LC. A peptide of given amino acid sequence labelled with the iTRAQ-114 is isobaric and will co-elute with its iTRAQ-117 (and -115 and -116) equivalent, and the pair (foursome) will not be differentiated by LC-MS, but rather give a peak of summed ion current. Upon MS/MS of the composite peak, the peptide labelled with iTRAQ-114 will give a reporter ion at m/z 114, and that labelled with iTRAQ-117 will give a peak at m/z 117 (iTRAQ-115 and -116 give ions at m/z 115 and 116 respectively), and the relative abundance of these ions will give a measure of the relative abundance of the peptide in its two labelled forms, and hence the parent protein derived from normal liver and carcinoma (Figure 2). The MS/MS spectrum also contains amino acid sequence information that allows the peptide to be identified. Like the 16/18O C-terminal labelling method, every peptide (except those N-terminally blocked) will be labelled with the iTRAQ reagent, so protein identification and relative quantification is made on the basis of the analysis of a population of peptides rather than only those containing cysteine as is the case with the ICAT methodology. Shown in Figure 2 is the iTRAQ data for the identification and relative quantification of CYP2A6 from human liver and carcinoma tissue.

The absolute concentration of a protein in a sample can also be determined by the use of stable isotope methods. For many years, stable isotope dilution MS has been used to determine the absolute concentration of the products of CYP-catalysed reactions [26], and similar principles can be used to determine the absolute concentration of the proteins themselves [27]. Assuming that a protein can be digested quantitatively into peptides with e.g. trypsin, and a peptide can be identified that is unique to that protein, e.g. GTVVVPTLDSVLYDNQEFPDPEK is unique to CYP2E1 in human, then the abundance of that peptide will be a measure of the abundance of the parent protein. By synthesizing an isotope-labelled version of the peptide and adding it to the digest in a known amount, the relative abundance of the native and isotope-labelled peptides can be measured by LC-MS or MS/MS from which the absolute amount of parent protein can be determined [28]. We have applied this methodology to the absolute quantification of CYP2E1 in human liver microsomes. The microsomal protein was run on 1-DE, and the CYP-containing bands were excised, and the proteins were digested in-gel with trypsin. A known amount of isotope-labelled peptide GTVVVPT(*L)DSVLYDNQEFPDPEK (*L is a leucine residue labelled with six 13C and one 15N atoms), which is unique to CYP2E1 in human, was added with the trypsin digestion buffer and the peptides were analysed by LC-MS and LC-MS/MS. The mass of the native peptide and its isotope-labelled analogue are known and the MS/MS experiment is designed to specifically fragment these precursor ions and measure the abundance of defined fragment ions in a so-called MRM (multiple reaction monitoring) analysis. The relative abundance of the endogenous and labelled peptides can be determined by comparing the fragment-ion currents derived from the two precursor ions. As the amount of added labelled peptide is known, and assuming quantitative digestion and extraction from the 1-DE band, the absolute amount of CYP2E1 can be determined. In this manner, the amount of CYP2E1 in human liver microsomes from a 66-year-old woman was determined to be of the order of 100 pmol/mg of microsomal protein.

We are currently designing a panel of synthetic isotope-labelled tryptic peptides, each one of which has an amino acid sequence unique to an individual CYP subfamily member and is LC-MS/MS-compatible. This panel will be added to a biological sample and analysed by LC-MS/MS. As the retention time, mass and fragmentation pattern of each peptide will be known, it will be possible to design an MS/MS method that will be specific for individual CYP quantification. At a given retention time, the instrument will be programmed to monitor by MRM a given pair of endogenous and isotope-labelled peptides, then as the next pair elutes from the column move on to their analysis. Modern mass spectrometers have fast ‘scanning’ times and multiple pairs of peptides can be monitored by MRM ‘simultaneously’. Using this methodology, the mass spectrometer will only invest time on the analysis of selected CYP-specific peptides and not waste time on the acquisition of redundant data from other peptides.

Conclusions

The first generation of proteomic studies largely based on 2-DE and MALDI–PMF predominantly identified cytosolic proteins; however, as proteomic methodology has moved to a 1-DE–LC-MS/MS or 2D-LC-MS/MS platform, membrane proteins have become accessible. CYP proteins are readily identified by these latter technologies and can be easily quantified by the incorporation of stable-isotope labels [20]. The commercial availability of iTRAQ reagents and 16/18O-labelling kits greatly facilitates quantitative studies [23]. With respect to quantification, two other initiatives should be monitored. The first follows the work of Beynon et al. [29] and their idea to design and express isotope-labelled recombinant proteins that when digested with trypsin will give a series of peptides with sequences unique to selected proteins, which can then be used for quantification of these proteins, e.g. the CYP superfamily. Beynon et al. [29] have named their technology QCAT. Secondly, also along the lines of generating isotope-labelled peptides with amino acid sequences unique to individual proteins, Aebersold and co-workers. [10] have introduced the concept of proteotypic peptides, where a library of LC-MS/MS-compatible (in terms of mass, fragmentation pattern and retention time) tryptic peptides, one with a unique sequence for every protein within a genome, would be generated in an isotope-labelled form. This library could then be accessed and members used for the quantification of specified proteins, or conceivably an entire proteome!

Acknowledgments

This work was supported by funding from U.K. Biotechnology and Biological Sciences Research Council (grant no. BB/C515771/1 and BB/C511356/1) and The School of Pharmacy, University of London.

Footnotes

  • 8th International Symposium on Cytochrome P450 Biodiversity and Biotechnology: Independent Meeting held at Swansea Medical School, Swansea, Wales, U.K., 23–27 July 2006. Organized and Edited by D. Kelly, D. Lamb and S. Kelly (Swansea, U.K.).

Abbreviations: CYP, cytochrome P450; LC, liquid chromatography; 2D-LC, two-dimensional LC; 1-DE, one-dimensional gel electrophoresis; ES, electrospray; ICAT, isotope coded affinity tag; iTRAQ, isotope tags for relative and absolute quantification; MALDI, matrix-assisted laser-desorption ionization; MRM, multiple reaction monitoring; MS/MS, tandem MS; PB, phenobarbital; PMF, peptide mass ‘fingerprinting’; TCPOBOP, 1,4-bis-2-(3,5-dichloropyridyloxybenzene; TOF, time-of-flight

References

View Abstract