The microtubule-associated protein tau (MAPT or tau) is of great interest in the field of neurodegeneration as there is a well-established genetic link between the MAPT gene locus and tauopathies, a diverse group of neurodegenerative dementias and movement disorders. The genomic architecture in the region spanning the MAPT locus contains a ~1.8 Mb block of linkage disequilibrium characterized by two major haplotypes: H1 and H2. Recent studies have established strong genetic association between the MAPT locus and neurodegenerative disease and uncovered haplotype-specific differences in expression and alternative splicing of MAPT transcripts. Integrating genetic association data and gene expression data to understand how non-coding genetic variation at a gene locus affects gene expression and leads to susceptibility to disease is a high priority in disease genetics, and the MAPT locus provides an excellent paradigm for this. In the absence of protein-coding changes caused by haplotype sequence variation, altered levels of protein expression or altered ratios of isoform expression are excellent candidate mechanisms to link the MAPT genetic disease association with biological function. The use of novel transgenic and endogenous genetic models are required to understand the role of MAPT sequence variation in mechanisms of disease susceptibility.
Intracellular aggregations of abnormally hyperphosphorylated microtubule-associated protein tau (MAPT or tau), known as NFTs (neurofibrillary tangles), are the major pathological feature of tauopathies, a diverse group of neurodegenerative dementias and movement disorders which includes AD (Alzheimer's disease), PSP (progressive supranuclear palsy), CBD (corticobasal degeneration), frontotemporal dementia and argyrophilic grain disease. Identification of tau protein as the major component in NFTs positions the MAPT locus as a leading causal candidate gene in these neurodegenerative diseases [1,2]. Tau is a major neuronal microtubule-associated protein expressed predominantly in the neurons of the central and peripheral nervous systems [3,4]. Tau protein isoforms are expressed from the MAPT locus located on chromosome 17q21 which consists of 16 exons spanning 134 kb (http://genome.ucsc.edu/)  (Figure 1). MAPT transcripts are temporally and spatially regulated by alternative splicing. The alternative splicing of exons 2, 3 and 10 generates six protein isoforms in the human adult central nervous system, producing proteins ranging from 352 to 441 amino acids in size [6,7]. The inclusion or exclusion of exons 2 and 3 generates tau protein with zero, one or two N-terminal inserts (0N, 1N, 2N tau). Transcripts expressing exon 10 (exon 10+) generate proteins with four microtubule-binding repeats (4R tau), whereas those lacking exon 10 (exon 10−) generate three-repeat tau (3R tau). Whereas the adult expresses six isoforms, human fetuses express only the shortest tau isoform, lacking exons 2, 3 and 10 [8,9]. An additional protein isoform over 100 kDa is found in the peripheral nervous system; this is generated by inclusion of exon 4A in transcripts .
Investigation of polymorphisms within the MAPT gene led to the elucidation of two extended haplotypes, H1 and H2, covering the entire locus . This block of linkage disequilibrium spans a region covering approximately 1.8 Mb  and is thought to exist as a result of the inversion of a 900 kb segment of the H2 chromosome with respect to its H1 counterpart . Interestingly, although the H2 haplotype remains largely invariant, exhaustive sequence analysis has shown that recombination has continued within the H1 haplotype, generating a number of sub-haplotypes of H1 [12,14].
The common disease–common variant hypothesis proposes that commonly occurring alleles in the genome underlie most common diseases . The disease variants have been sought by numerous genetic association studies which seek to determine whether allele frequencies differ between patient and control groups. The most recent high-powered GWASs (genome-wide association studies) have yielded new insights into the risk associated with the common variants within the H1 haplotype and neurodegenerative diseases.
Progressive supranuclear palsy
PSP is recognized as the second most common parkinsonian neurodegenerative disorder, second only to idiopathic PD (Parkinson's disease) itself . PSP is neuropathologically characterized by neuronal globose neurofibrillary tangles and neuropil threads and glial tau pathology including tufted astrocytes . The tau aggregates are formed predominately of 4R tau, defining PSP as a 4R tauopathy.
The first genetic association between MAPT and neurodegenerative disease started with the identification of an association between PSP and a polymorphic marker found in MAPT intron 9 . This genetic association was subsequently expanded to include the entire MAPT H1 haplotype [11,18] and later refined to show the strongest association with H1-specific haplotype tagging SNPs (single nucleotide polymorphisms) (rs242557, rs3785883, rs2471738) , suggesting it was variation within the H1 haplotype itself that is the risk factor for developing PSP. The fine mapping of these SNPs indicates that the association is conveyed by a region covering a minimal distance of ~56 kb starting 20 kb upstream of exon 1 to 2.2 kb downstream of exon 9 .
In 2011, a GWAS was performed on 2165 PSP patients and 6807 controls . This study confirmed the highly significant association of H1 polymorphisms with PSP with extremely small P values (e.g. rs8070723, P=1.5×10−116). In the pathologically confirmed series of samples, this association translated into a calculated OR (odds ratio) of 5.5, which exceeds the well-established risk of APOE4 (apoplipoprotein E4) in AD (OR=3.7) . Further examination of the risk alleles showed that after controlling for the H1/H2 inversion, three SNPs continued to exhibit a highly significant association, most notably rs242557 (P=9.5×10−18) which was identified previously as an important PSP-risk allele [12,14].
AD is neuropathologically characterized by the presence of both extracellular neuritic plaques formed of Aβ (amyloid β-peptide) and also intracellular NFTs. It is therefore perhaps surprising that in the first large GWASs of AD cohorts, there was no significant association of AD with MAPT polymorphisms [21,22]. A subsequent meta-analysis of the AD genetic association data by the team at AlzGene also has not shown any association of MAPT with AD . However, a more recent analysis of LOAD (late-onset AD) consisting of 3940 cases and 13373 controls observed a significant association of the MAPT locus (P=0.009), although, notably, no single marker reached genome-wide significance . This analysis suggests that there exist multiple independent associations across the MAPT gene with LOAD, although each of which is likely to be of weak effect.
The most surprising GWAS findings were those that identified MAPT variants among the most highly genetically associated with PD, a neurodegenerative disease not traditionally classified as a tauopathy [24–26]. Several previous studies on much smaller cohorts had reported an association of MAPT with PD [27,28], although this remained controversial and had not always been replicated . The large GWASs and subsequent meta-analysis by the PDGene forum (http://www.pdgene.org/)  now place MAPT as the top-ranked gene for sporadic PD.
In the light of the highly significant genetic association data, it is clear that understanding how non-coding genetic variation at the MAPT locus affects gene expression and leads to susceptibility to disease is a high priority in understanding the molecular mechanisms of PSP and PD pathology. As the MAPT risk alleles and haplotypes produce no protein coding changes, the leading theories to explain the risk susceptibility to disease focus on differences in expression and alternative splicing.
Functional effects of genotype
Total MAPT expression
One leading theory to explain the neurodegenerative risk susceptibility conferred by the H1 MAPT haplotype proposes that DNA sequence variants drive expression differences between the two haplotypes. Several studies have attempted to study the effect of non-coding variation in the promoter regulatory region of the MAPT locus on expression of the gene.
Differences in the transcriptional activities of the MAPT promoter haplotypes have been demonstrated by studies in cell lines or using reporter gene assays. Early studies focusing on the promoters gave some indication that the promoters do indeed show different activities, as a 1 kb fragment of the H2 promoter had a 1.2-fold reduction in transcriptional activity compared with its H1 counterpart . Other studies have attempted to assess the effect of the PSP-risk associated allele at rs242557. When this SNP was placed upstream of a 1.1 kb fragment of the MAPT H1 promoter, the non-risk allele showed greater transcriptional activity . However, in contrast, another study using this same SNP placed downstream of the MAPT promoter region showed that the H1 haplotype construct exhibits a 4.2-fold greater expression than the H2 promoter . Although both studies attempted to narrow down the effect of this risk allele, their conflicting results are probably because their experimental designs use small fragments of regulatory sequence isolated from their correct genomic context.
More physiologically relevant experiments have assayed expression in human post-mortem brain tissue. Comparing expression of the H1 and H2 haplotypes within heterozygous pathology-free post-mortem human brain samples, no allelic difference was observed in MAPT haplotype expression in either the frontal cortex or globus pallidus , a finding which has been replicated by another group . The latter study did, however, find that there was a relative decrease in MAPT H1 expression with increased age . Another study used real-time PCR to assay allele-specific expression analysis of MAPT and found a modest (11–13%) greater expression from chromosomes carrying a variant of H1: H1C .
Detailed analyses of expression have also been carried out by the consortia which undertook GWASs. To assess the possibility of polymorphisms affecting expression of the MAPT transcripts haplotype, GWAS publications examined correlation between gene expression and genotypes. In the MAPT genomic region, the International Parkinson Disease Genomic Consortium found that MAPT-risk SNPs were associated with increased expression (P<2×10−16) and decreased methylation (P=3.68×10−6) . The consortium which performed the PSP GWAS examined 387 normal subjects and found significant association expression levels with SNPs across the inversion region (MAPT, RL17A, PLEKHM1 and LRRC27A4). This indicates that either the orientation of this genomic region or a polymorphism within it determines expression. Interestingly, although the PSP GWAS showed significant association of both the H1 haplotype and variation within the H1, the global expression of MAPT was unable to account for the risk conferred by the rs242557 allele . As the risk conferred by rs242557 is unaccounted for, other modes of expression regulation should also be considered.
MAPT alternative splicing
Another potential mechanism by which MAPT haplotypes may confer susceptibility to neurodegeneration is through the imbalanced expression of alternative transcripts. There is much evidence to suggest that isoform imbalance may play a role in tauopathies. An imbalance is observed in familial dementia FTDP-17 in which MAPT exon 10 splice site mutations act to increase the inclusion of exon 10 in transcripts [35,36]. This gives evidence that an imbalance in the ratio of 3R and 4R tau isoforms is sufficient to cause disease. Further support for the role of imbalanced tau isoforms in neurodegeneration comes from PSP in which there is a significant increase in 4R MAPT mRNA transcript in brain regions highly affected by neurodegeneration .
Expression studies focusing on the inclusion of exon 10 at the MAPT locus have identified differential expression of alternatively spliced transcripts from the two MAPT haplotypes. In post-mortem human brain tissues, the H1 MAPT variants overexpress the disease-associated exon 10 compared with H2 [32,33]. In addition, the protective H2 haplotype has a 2-fold greater expression of transcripts containing the alternatively spliced exons 2 and 3 . In the light of the allelic differences in expression of alternatively spliced transcripts bearing exons 3 and 10, differences in tau isoform expression may be an important factor in generating susceptibility to sporadic tauopathies .
Function of the isoforms
Many studies have been published elucidating functional roles of tau protein. However, these publications have often based their findings on the use of one tau isoform, highly expressed in cell culture models. In the light of the expression differences observed between the risk and non-risk MAPT haplotypes, investigations that consider functional differences between isoforms may offer insight into the disease process. For example, tau is known to have a role in the regulation of microtubule dynamic instability [40,41]. It has been shown that tau isoforms affect this function to different degrees, with the 4R isoform reducing this instability to a greater extent than 3R [42,43]. Tau protein has also been shown to affect axonal transport with tau overexpression perturbing this process both in vitro  and in vivo . Again, tau's functional role within this process is modulated by the different isoforms: the longest tau isoform (2N4R) was shown to be a less potent inhibitor of both kinesin and dynein than the shortest tau isoform (0N3R) [46–48]. Furthermore, 4R tau isoforms affect the trafficking of mitochondria to a greater extent than 3R isoforms .
MAPT genetic models
Great strides towards building better genetic models for tauopathies came with the first PAC (P1 artificial chromosome)-transgenic tau mouse model published in 2000 . These transgenic mice were the first mouse models to express a full-length human MAPT gene driven by the native human promoter. The human gene is expressed and transcripts undergo splicing to generate six human tau isoforms. When crossed on to a mouse tau-null background , the MAPT-PAC tau shows normal localization to the axons. However, over time, tau relocates to the cell body and, by 9 months of age, accumulations similar to those observed in AD can be identified. Although these mice express a wild-type tau, the level of expression is 4-fold greater than endogenous. Additionally, this model does not show the same expression pattern found in human adults as the mice show more exon 10− (3R) transcript than the human control samples [50,51]. This model supports the theory that an overexpression of a non-mutant tau can lead to disease pathology either by altered total or specific isoform expression.
The great advantage of generating transgenic models using the entire genomic locus is that by their very nature there is the possibility for studying the effect of non-coding variation on the function of a gene, although studying the effect of the non-coding variation is limited by the species of the model.
Recent technological advances in the generation of genetic models will help to circumvent limitations due to species differences in the expression of genetic loci. One transgenic model our laboratory has developed relies upon an efficient viral delivery and expression system for genomic DNA loci >100 kb in size, which we have termed the iBAC, or infectious bacterial artificial chromosome, based on herpes simplex virus type 1 amplicon vectors . We have successfully used the iBAC system to express the complete MAPT locus, under the physiological control of its native promoter, in neuronal culture models . Another approach studies endogenous MAPT in human stem cells. It has already been demonstrated that differentiated embryonic human stem cells show a similar pattern of tau expression to human post-mortem samples . There is therefore great scope for future in vitro studies of how MAPT locus polymorphisms may affect tau physiology and pathology using human induced pluripotent cells from donors of differing genotypes .
It is clear that the MAPT locus has a key role in a variety of neurodegenerative disorders and that future work must seek to elucidate the interaction of the genetics with functional disease outcomes. In the light of the increasing body of evidence demonstrating a difference in expression and alternative splicing between risk and non-risk haplotypes, it is vital that these future scientific endeavours consider the role of different tau isoforms in disease mechanisms using models that allow the investigation of a whole genomic locus.
We thank Alzheimer's Research UK and CurePSP for their generous funding. T.M.C. is funded by the Sir Terry Pratchett Research Fellowship from Alzheimer's Research UK.
The Biology and Pathology of Tau and its Role in Tauopathies II: A Biochemical Society Focused Meeting held at Robinson College, Cambridge, U.K., 8–9 January 2012. Organized and Edited by Amritpal Mudher (Southampton, U.K.) and Makis Skoulakis (BSRC Alexander Fleming, Greece).
Abbreviations: AD, Alzheimer's disease; GWAS, genome-wide association study; iBAC, infectious bacterial artificial chromosome; LOAD, late-onset AD; MAPT, microtubule-associated protein tau; NFT, neurofibrillary tangle; OR, odds ratio; PAC, P1 artificial chromosome; PD, Parkinson's disease; PSP, progressive supranuclear palsy; SNP, single nucleotide polymorphism
- © The Authors Journal compilation © 2012 Biochemical Society