Biochemical Society Transactions

Proteins with a BPI/LBP/PLUNC-Like Domain: Revisiting the Old and Characterizing the New

Bioinformatics of the TULIP domain superfamily

Klaus O. Kopec, Vikram Alva, Andrei N. Lupas

Abstract

Proteins of the BPI (bactericidal/permeability-increasing protein)-like family contain either one or two tandem copies of a fold that usually provides a tubular cavity for the binding of lipids. Bioinformatic analyses show that, in addition to its known members, which include BPI, LBP [LPS (lipopolysaccharide)-binding protein)], CETP (cholesteryl ester-transfer protein), PLTP (phospholipid-transfer protein) and PLUNC (palate, lung and nasal epithelium clone) protein, this family also includes other, more divergent groups containing hypothetical proteins from fungi, nematodes and deep-branching unicellular eukaryotes. More distantly, BPI-like proteins are related to a family of arthropod proteins that includes hormone-binding proteins (Takeout-like; previously described to adopt a BPI-like fold), allergens and several groups of uncharacterized proteins. At even greater evolutionary distance, BPI-like proteins are homologous with the SMP (synaptotagmin-like, mitochondrial and lipid-binding protein) domains, which are found in proteins associated with eukaryotic membrane processes. In particular, SMP domain-containing proteins of yeast form the ERMES [ER (endoplasmic reticulum)-mitochondria encounter structure], required for efficient phospholipid exchange between these organelles. This suggests that SMP domains themselves bind lipids and mediate their exchange between heterologous membranes. The most distant group of homologues we detected consists of uncharacterized animal proteins annotated as TM (transmembrane) 24. We propose to group these families together into one superfamily that we term as the TULIP (tubular lipid-binding) domain superfamily.

  • BPI-like family protein
  • cholesteryl ester-transfer protein (CETP)
  • palate
  • lung and nasal epithelium clone (PLUNC)
  • tubular lipid-binding (TULIP)
  • synaptotagmin-like
  • mitochondrial and lipid-binding protein domain (SMP domain)

Introduction

LPS (lipopolysaccharides) are major contributors to the pathogenicity of Gram-negative bacteria. On infection, animals recognize LPS using LBP (LPS-binding protein) and initiate an immune response to boost inflammation [1,2]. When the primary structure of LBP was determined 20 years ago, sequence comparisons showed its homology with BPI (bactericidal/permeability-increasing protein) and CETP (cholesteryl ester-transfer protein) [3]. BPI is also a part of the innate immune response; however, it antagonizes LBP by neutralizing LPS and therefore dampening the inflammatory reaction [4,5]. The tertiary structure of BPI was solved in 1997 [1] and comprises two tandem domains, connected by a seven-stranded anti-parallel β-sheet. Despite statistically insignificant sequence similarity, the two domains have the same fold, consisting of a long α-helix wrapped in a highly curved anti-parallel β-sheet. The extent of the structural similarity and the unusual nature of the fold led Kleiger et al. [6] to propose that the two domains arose from a common ancestor. BPI binds phosphatidylcholine in a tubular cavity, and this hydrophobic tunnel is potentially also the binding site for LPS in LBP. The second aforementioned LBP homologue, CETP, is involved in the transfer of lipids between HDL (high-density lipoprotein) and LDL (low-density lipoprotein) in blood [7]. CETP adopts the same tandem domain architecture as BPI; its crystal structure was solved with cholesteryl ester bound inside the hydrophobic cavity and a phospholipid at the entrance to the cavity in each of the domains. LBP, CETP and BPI are the founding members of a protein superfamily, which has been progressively extended over the years due to improvements in the sensitivity of sequence comparison methods and the growth of sequence databases. We refer to the domain defining this superfamily, corresponding to the N-terminal domain of BPI, as TULIP (tubular lipid-binding) domain [8]. Since the homology of the C-terminal domain of BPI to TULIP seems likely, but cannot be established by statistical measures of sequence similarity, we refer to it as TULIP-like.

The increase in sensitivity from sequence–sequence comparisons (BLAST), [9] to sequence–profile comparisons (PSI–BLAST), [10] allowed for the inclusion of the PLUNC (palate, lung and nasal epithelium clone) proteins into the TULIP superfamily [11]. PLUNC proteins are speculated to be involved in host defence because of their homology with LBP and BPI, and because of their localization in tissues often exposed to pathogens, e.g. the oral cavity, but their biological functions are poorly characterized [12]. PLUNC proteins have two subgroups: LPLUNC (long PLUNC), which comprise both TULIP and TULIP-like domains, and SPLUNC (short PLUNC), which only have the TULIP domain [13].

Further gains in sensitivity resulting from methods that compare sequences to profile HMMs (hidden Markov models) established the homology between the BPI-like family and a group of proteins from trypanosomes called ESAG5 (expression site-associated gene 5) [14]. The ESAG5 proteins were the first members of the BPI-like family that were found outside of metazoans and higher plants. The name ESAG stems from the expression of the proteins from the same telomeric locus as VSG (variant surface glycoprotein), whose variability enables trypanosomes to evade the host immune response, but the role of ESAG5 in the infection process remains unknown.

In the last few years, the ability to perform pairwise comparisons of profile HMMs has led to another substantial increase in sequence search sensitivity. We have therefore recently undertaken a bioinformatic study of BPI-like proteins using two state-of-the-art HMM comparison methods, HHpred and HHsenser [1517], and have been able to extend the boundaries of the superfamily considerably [8].

We have used a cluster analysis that maps out the sequence relationships between superfamily members (Figure 1 and Supplementary Table S1 available at http://www.biochemsoctrans.org/bst/039/bst0391033add.htm) in order to describe the evolution of TULIP domain-containing proteins. In the analysis of superfamilies, cluster analyses are preferable to classical phylogenies for several reasons [18]: (i) Resolution: phylogenies lose resolution in the deep nodes as the number of sequences increase, because branching decisions are always taken hierarchically from the leaves to the root and therefore the effects of conflicting data accumulate as the computation progresses towards the root. Cluster maps do not show this problem, as all data are considered with equal weight in every round of map equilibration. Indeed cluster maps increase their resolution with the number of sequences, as clusters become better delimited with their size. (ii) Accuracy: phylogenies become more inaccurate with the number of sequences because the multiple alignments on which they are based accumulate errors, the likelihood of including false-positive sequences, which distort the topology of the tree, increases, and highly divergent sequences are shuffled to the root of the tree where they are artificially joined into a basal clade (long branch attraction). Cluster maps do not involve multiple alignments, being built on pairwise comparisons, and their accuracy increases with the number of sequences as the larger number of pairwise relationships averages out random errors in the accuracy of pairwise sequence comparison. Cluster maps are also insensitive to false positives and highly divergent sequences as the former simply drift to the edge of the map while the latter have ample topological space into which to move without interfering with each other. (iii) Computational complexity. In phylogenetic analyses, the time needed to find the optimal tree increases exponentially with the number of sequences (at least cubically depending on the algorithm) such that trees of more than a few thousand sequences become computationally prohibitive. The computational complexity of cluster maps only increases approximately quadratically with the number of sequences such that maps with several tens of thousands of sequences can be computed in a matter of hours on a single CPU (central processing unit).

Figure 1 Cluster map of the TULIP domain superfamily

TULIP domains were clustered in CLANS [18] based on their all-against-all pairwise similarities as measured by BLAST P-values. Dots represent sequences. Line colouring reflects BLAST P-values; the brighter a line, the lower the P-value. Sequences within one group are shown in one colour; sequences that could not be assigned to a group are shown in black. Broken lines divide the cluster map into the BPI, Takeout and SMP domain-like families and data on structure and domain composition of the groups are shown. Groups without explicit domain composition are canonical; individual sequences might have compositions that differ from the ones shown for their groups. Broken outlines indicate domains present in some but not all proteins of a group. The blow-up shows a clustering of only the PLUNC group. The structure shown as representative of the BPI group is BPI (1EWF); the one shown for the Takeout/JHBP cluster is Takeout 1 (3E8T). Groups of known structure are marked with a star. Accession details for representatives of all clusters are provided in Supplementary Table S1 at http://www.biochemsoctrans.org/bst/039/bst0391033add.htm.

In the present paper, we define three constituent families for the TULIP superfamily, one containing BPI, LBP, CETP and PLUNC (BPI-like), another containing arthropod Takeout and JHBPs (juvenile hormone-binding proteins) (Takeout-like) and the third containing SMPs (synaptotagmin-like, mitochondrial and lipid-binding proteins).

The BPI-like family

The core group of this family is formed by the closely related BPI, LBP, PLTP (phospholipid-transfer protein), CETP, BPI-like 2 and lipid-binding proteins from plants. Although grouped in a fairly tight cluster in Figure 1, these proteins can be separated into individual clusters with more stringent clustering criteria, showing that the divisions implied by the different names are not evolutionarily arbitrary. The tightest relationship is seen between BPI and LBP, which have a common outgroup in proteins from fish, suggesting that they resulted from a duplication event at the root of terrestrial vertebrates.

Several satellite groups radiate from this core group. One of these contains the PLUNC proteins (see blow-up in Figure 1), which are mainly found in two clusters, one closer to BPI containing LPLUNC3/Rya3, LPLUNC4/Ry2g5, SPLUNC6 and BPI-like 3, and one more divergent, formed by LPLUNC1, SPLUNC proteins 1–3 and latherin. We conclude that the SPLUNC proteins are polyphyletic, with SPLUNC proteins 1–3 and latherin originating by deletion from LPLUNC1 and SPLUNC6 from LPLUNC4. LPLUNC5 is clearly separate and approximately equidistant to these two groups. Outside the PLUNC group and making connections of about equal statistical significance to PLUNC proteins and to BPI lie a group of proteins annotated as LPLUNC2 and/or BPI-like 1, which thus appear to represent a separate evolutionary development from both PLUNC proteins and BPI.

Three other satellite groups are formed by uncharacterized proteins that are either entirely or largely genus-specific (Figure 1): the aforementioned ESAG5 proteins from trypanosomes, a group of proteins from Giardia, and a group of nematode proteins, mainly from Caenorhabditis elegans.

All proteins of the BPI-like family mentioned so far, with the exception of SPLUNC proteins, are formed by a tandem of TULIP and TULIP-like domains, and lack additional domains. The architecture is, however, different in the last important satellite group, in which proteins are characterized by their large size (about twice the size of other proteins in the BPI-like family) and by the fact that they only contain the TULIP domain, typically towards their C-terminus. This group consists of an intermediate cluster of phylogenetically heterogeneous proteins from slime moulds, diatoms and amoebae and a more divergent cluster of proteins almost exclusively from fungi, which itself separates into two paralogous subclusters at higher clustering stringency. Three of the proteins in the intermediate cluster contain three tandem PDZ domains C-terminally to the TULIP domain; otherwise, the domains of these proteins could not be annotated with current databases.

A number of additional proteins and protein clusters radiate from the core group, which we have not labelled at this time. They originate mainly not only from deeply branching eukaryotes (amoebae, ciliates, slime moulds, choanoflagellates, kinetoplastids and unicellular green algae) but also from nematodes. Most show the tandem of TULIP and TULIP-like domains typical for BPI-like proteins, but several are very large (about 1000 residues), contain only the TULIP domain in single or double copy at their N-terminus and have an extended TM (transmembrane) region with nine predicted TM helices at their C-terminus.

The Takeout-like family

This family mainly consists of two groups of sequences. The larger of these two is further removed from the BPI-like family and contains insect proteins, many of which are annotated as Takeout and/or JHBP. The name-giving protein of this family, Drosophila Takeout, connects circadian rhythms with feeding behaviour [19,20] and also affects male courtship behaviour [21]. The structure of its orthologue from the moth Epiphyas postvittana in complex with ubiquinone-8 shows a fold very similar to that of the N-terminal half of BPI-like proteins, with the ligand bound in the same place within the central tubular cavity (PDB code 3E8T [22]). Few other Takeout homologues have been characterized to date, but various findings suggest that many may be involved in chemosensory perception [23,24] or hormone delivery [25]. The best understood of these are the JHBPs of Lepidoptera, which bind the terpenoids that control insect life cycle in the haemolymph and deliver them to the target tissues [21,26]. The crystal structure is known for two of these, one from silkworm in complex with juvenile hormone III (PDB code 2RQF) and the other from honeycomb moth (PDB code 2RCK [20]). The structures are again very similar to the TULIP fold, including the mode in which the hydrophobic ligand is bound. This similarity led the authors of the crystal structures to connect the Takeout and JHBPs to the N-terminal domain of the BPI-like family, a connection that we could confirm by sequence comparisons [8]. No similarity outside the topology of the fold can be found to the TULIP-like, C-terminal domain of BPI, and this domain must be considered specific to the BPI-like family at this time.

The second main group within this family is closer to the BPI-like proteins and consists of a diffuse collection of arthropod allergens, one of which (dust mite allergen Der p 7; PDB code 3H4Z [21]) is also of known structure and unsurprisingly shows the TULIP fold. Der p 7 is one of the major causative agents of dust mite allergy in humans [23,24]. Although its exact function is still unclear, it is known that it evokes strong IgE antibody [25] and T-cell responses in patients with mite allergy [26]. Der p 7 was shown to bind bacterial lipopeptide PB (polymyxin B) with weak affinity and has been speculated to promote TH2 immunity through co-stimulation of TLR2 (Toll-like receptor 2) pathways [21].

Peripheral to the arthropod allergens is a group of loosely connected proteins, which are closest to the BPI-like family in the cluster map by virtue of making multiple, statistically highly significant connections to the BPI core group. We propose that they represent modern descendants of intermediate stages in the origin of insect Takeout proteins from a BPI-like ancestor. Several proteins of this outgroup show an up-to-4-fold amplification of the TULIP domain. Spurred by this observation, we re-investigated the core Takeout cluster and found individual instances of proteins with multiple copies of the TULIP domain. However, at this time, it is unclear as to whether these proteins arose by duplication and divergence from a single TULIP domain or by fusion.

One group of Takeout-like proteins with two TULIP domains takes an unusual position in the cluster map, whereas its N-terminal domain belongs to the Takeout group and its C-terminal domain forms part of the allergen group. The simplest evolutionary explanation for the origin of these proteins is that they arose by fusion of one TULIP domain from each group. A second explanation is that the location of sequences in the cluster map (Figure 1) lays out a path for the origin of the insect-specific Takeout proteins from the ancestral BPI-like family, which is common to all eukaryotes. In this second explanation, (i) the group of arthropod allergens originated from the BPI-like family; (ii) subsequently one of its members duplicated the TULIP domain; (iii) the N-terminal of the two copies diverged away from the arthropod group; and (iv) became the founding member of the Takeout group through deletion of the C-terminal domain.

The last cluster we found in the Takeout-like family is genus specific and contains proteins from Ixodes (tick). It seems reasonable to expect that more species-specific clusters will be identified as genome projects provide better coverage of the arthropods.

The SMP domain-like family

The third family comprises a large number of eukaryotic membrane-associated proteins that contain a variant of the TULIP domain termed the SMP domain [8,27]. In contrast to the majority of proteins from the BPI- and Takeoutlike families that consist solely of TULIP and TULIP-like domains, proteins with SMP domains often contain additional domains. In the original description of the SMP domain [27], proteins were assigned to groups based either on the nature of these additional domains or on the cellular localization of the proteins [C2 domain-containing synaptotagmin-like, PH (pleckstrin homology) domain-containing HT-008, PDZK8 and mitochondrial proteins]. Most of these proteins are poorly studied and the SMP domain itself is functionally uncharacterized.

In addition to the aforementioned SMP domain-containing proteins, we detected two further groups in this family (Figure 1). The first one contains Mdm34 (mitochondrial distribution and morphology 34) proteins, which are close homologues of the previously known SMPs Mmm1 (maintenance of mitochondrial morphology 1) and Mdm12; all three proteins are found only in fungi and are associated with mitochondria. These proteins, like all other members of the SMP domain-like family, have a single TULIP domain, which in Mdm34 is N-terminal and accompanied by an uncharacterized C-terminal domain, in Mmm1 is C-terminal and preceded by a TM helix, and in Mdm12 forms the entire protein. These three proteins, together with the mitochondrial outer membrane β-barrel Mdm10, form the ERMES [ER (endoplasmic reticulum)-mitochondria encounter structure] recently described to be a molecular tether between mitochondria and the ER in yeast [28]. In addition to its function as a tether, this complex was shown to be necessary for the efficient transfer of phospholipids between the two organelles. Yet, it remains unknown as to whether the complex is merely a tether that allows other proteins to carry out the phospholipid transfer or if the complex itself acts as the transporter. Based on the membership of the SMP domain-like family in the TULIP superfamily of lipid/hydrophobic ligand-binding domains and the abundance of the SMP domain in ERMES, we proposed that this complex might mediate the transport of phospholipids between the ER and mitochondria [8].

The second group consists of uncharacterized animal proteins annotated as TM 24, which are distant homologues of the other SMP domain-like family members. In addition to the TULIP domain, these proteins contain a C2 as well as a WW domain (protein–protein interaction domain containing two conserved tryptophan residues). In the cluster map, they are distant to the other SMPs (Figure 1) as they make most of their connections via a single intermediate sequence from Branchiostoma. It is thus at present unclear as to whether they will move closer to the SMP core group as more intermediate sequences become available through genome projects or whether they will emerge as the founding members of a fourth family of TULIP domains.

Conclusions

In the present review, we have discussed the evolutionary relationship of the BPI-, Takeout- and SMP domain-like families that constitute the TULIP superfamily. Based on the structural and functional knowledge on BPI and Takeout proteins, the membrane association and additional domains of SMPs, as well as the homology of these three families, we propose that the TULIP domain is a structural scaffold for the binding of large lipids/hydrophobic ligands. Our results suggest that the evolutionary roots of this domain lie in the BPI-like family, as it is the only family that contains proteins from basal eukaryotes in addition to those of animals, plants and fungi. Presumably, the C-terminal TULIP-like domain now prevalent in most BPI-like proteins arose by duplication and diversification of the TULIP domain in early eukaryotic evolution, but its homology with TULIP domains is too distant to be established at this time via sequence comparison methods. Several members of the BPI-like family subsequently lost the TULIP-like domain by deletion; we briefly discussed this in the context of the polyphyletic origin of SPLUNC proteins. One of these deletion events presumably lies at the root of SMP domain-containing proteins, whose phylogenetic spectrum suggests an origin after the establishment of the eukaryotic cell structure, but before the emergence of true multicellularity. In a separate deletion event, the Takeout-like family evolved from a BPI-like precursor at the base of the arthropod lineage. Based on the domain structure of various present-day Takeout-like proteins, we propose that this evolution followed by consecutive duplication, diversification and deletion events.

Funding

This work was supported by institutional funds from the Max-Planck-Society.

Footnotes

  • Proteins with a BPI/LBP/PLUNC-Like Domain: Revisiting the Old and Characterizing the New: A Biochemical Society Focused Meeting held at New Business School, University of Nottingham, U.K., 5–7 January 2011. Organized and Edited by Colin Bingle (Sheffield, U.K.) and Sven-Ulrik Gorr (University of Minnesota School of Dentistry, Minneapolis, MN, U.S.A.).

Abbreviations: BPI, bactericidal/permeability-increasing protein; CETP, cholesteryl ester-transfer protein; ER, endoplasmic reticulum; ERMES, ER-mitochondria encounter structure; ESAG5, expression site-associated gene 5; HMM, hidden Markov model; JHBP, juvenile hormone-binding protein; LPS, lipopolysaccharide; LBP, LPS-binding protein; Mdm, mitochondrial distribution and morphology; Mmm1, maintenance of mitochondrial morphology 1; PLUNC, palate, lung and nasal epithelium clone; LPLUNC, long PLUNC; SMP, synaptotagmin-like, mitochondrial and lipid-binding protein; SPLUNC, short PLUNC; TM, transmembrane; TULIP, tubular lipid-binding

References

View Abstract