Nation Cancer Institute Laboratory of Genomic diversity Center of Cancer Research National Cancer Int. National Institute of Health
 
LGD Lab Photo
About LGD
Staff Investigators
 Human Genetics
 Genomics
 BioInformatics
 BioResource
Publications
Garfield Project
Conferences/Courses
Directions

Events Calendar
Contact Us
 

Naoya Yuhki, D.D.S., Ph.D
Genetics Section
Staff Scientist

 


Complete MHC Sequence, Annotation, and SNP Analysis

The major histocompatibility complex (MHC) plays key roles in controlling both adaptive and innate immune systems. In the adaptive immune system, both MHC class I and class II antigens recognize, bind and present peptides to cytotoxic and helper T-cells, respectively, and initiate cell-to-cell communication between antigen presenting cells and T-cells by forming immunological synapses and activating both subtypes of T-cells for both cellular and humoral immune systems. In addition, a number of gene clusters in this complex encode proteins which play important roles for antigen processing (proteosome subunit, LMP2 & 7, antigen transporter, TAP1 & 2, antigen loading for class I antigen, Tapasin, antigen loading for class II antigens, DM & DO molecules). In the innate immune system, both classical (HLA-A,-B,-C in human) and non-classical class I (HLA-E) antigens, plus class I-related molecules (MIC-A, -B) interact with natural killer (NK) receptors (Killer immunoglobulin-like receptor genes [KIR] and natural killer cell lectin-like receptors [NKG] antigens in human and Ly-49 and NKG antigens in mouse) and inhibit and activate NK-cell functions.

In addition to the immunological importance, the MHC provides important tools to study molecular evolution. Extremely polymorphic features of both class I and class II antigens identified in most vertebrates provide numerous numbers of peptide binding grooves for MHC class I and II antigens in order to adapt various pathogens. Natural and balancing selections play pivotal roles to generate and maintain these polymorphisms.

The nature of multigene clusters of the MHC genes also provides a number of theories to explain the genesis of the MHC. Also, paralogous chromosomal regions found in three other locations in human (chr. 6p21.3 for MHC, 9q33-34, 1, 19 for the others) and jawed vertebrates raises questions for the origin of the MHC.

A large-scale sequencing project for the HLA has been launched and completed for the 3.6 Mb of the classical class I, II, & III regions to reveal the molecular history of this important gene complex, and has identified 224 tightly linked genes, including 128 expressed genes, and 96 pseudogenes. More recently, the MHC expands to 4.6 Mb, including five subregions: 1) extended class II (280 kb); 2) class II (700 kb); 3) class III (1000 kb); 4) class I (1600 kb); and 5) extended class I (1000 kb). In contrast of this large complex structure in HLA, the chicken MHC B-locus presents a "minimal essential MHC" disposition extending 92 kb and including 19 functional genes, raising questions about the structure of other MHC systems.

More recently, a variety of host restriction genes have been identified in humans and mammals that modulate retrovirus infectivity, replication, assembly and/or cross-species transmission. One of these host encoded genes, APOBEC3 (Apolipoprotein B mRNA-editing enzyme catalytic) is capable of terminally editing feline foamy virus in the absence virally-encoded Bet protein, but not in its presence similar to the interplay of APOBEC3 and the HIV encoded protein Vif (Lochelt, et al ., 2005). The editing capacity of APOBEC3 appears to be species specific and limits cross-species transmission of retroviruses. To identify and characterize APOBEC genes in the feline genome, we attempted APOBEC related sequences in the scaffolds of the partial (2x) genome sequence of the domestic cat and compared these phylogenetically to their human and dog counterparts.

(A) Comparative Genomic Structure of Major Histocompativbility Complex:

Comparisons of the genomic structure of three mammalian MHC, human HLA, canine DLA, and feline FLA revealed remarkable structural differences between HLA and the other two MHC. The 4.6 Mb HLA sequence was compared with the 3.9 Mb DLA sequence from two supercontigs generated by 7x whole genome shotgun assembly and 3.3 Mb FLA draft sequence. For FLA, we confirm that: (i) feline FLA was split into two pieces within the TRIM gene family found in human HLA, (ii) class II, III, I regions were placed in the pericentrocentric region of the long arm of chromosome B2, and the (iii) remaining FLA was located in subtelomeric region of the short arm of chromosome B2 (Figure 1). The exact same chromosome break was found in canine DLA structure, where class II, III, and I regions were placed in a percentromeric region of chromosome 12, while the remaining region was located in a subtelomeric region of chromosome 35, suggesting this chromosome break occurred once before a split of felid and canid more than 55 MYA. However, significant differences were found in the content of genes in both pericentromeric and subtelomeric regions in DLA and FLA, the gene number and amplicon structure of class I genes plus two other class I genes found on two additional chromosomes; canine chromosome 7 and 18, suggests the dynamic nature in the

evolution of MHC class I genes.

(B) Completion of BAC-based sequence and annotation of feline major histocompatibility complex:

Gene annotation of Major Histocompatibility Complex (MHC) Regions in the domestic cat was completed and identified 317 possible coding regions (128 human homologues possible functional genes and 189 pseudo/unidentified genes) by GENSCAN and BLAST program (Figure 2a and 2b). The feline MHC is located on a pericentromeric region of a long arm of chromosome B2 and was split into two regions by the break of the distal class I region and translocated to a subtelomeric region of the same B2 chromosome by a chromosome inversion as mentioned above. The first region spans 2.976 Mbp sequence, which encodes six classical class II antigens (three DRA and three DRB antigens), nine antigen processing molecules (DOA/DOB, DMA/DMB, TAPASIN, and LMP2/LMP7. TAP1/TAP2), nineteen class I antigens (FLAI-A to FLAI-S), four class I related (MIC) molecules. The second region spans 0.362 Mbp sequence encoding no class I genes (in human HLA at least eleven class I genes in this HLA-A corresponding region) and ten framework genes, including three olfactory receptor genes were found. In addition, three major feline endogenous retrovirus groups: FeLV-subtype A-like, ECE-RD114-like and, Porcine Leukemia virus-like sequences were found within a 100 Kbp interval in the middle of class I region of the pericentromeric region of the long arm of chromosome B2.

(C) Evaluation of assembly quality of 2 x Whole Genome Shotgun Sequence of Abyssinian female cat based on feline MHC sequence

2 x Cat Whole Genome Shotgun (WGS) Sequence Assembly was completed from DNA molecules isolated from an Abyssinian female cat, Cinnamon. This sequence assembly was aligned to cat chromosomes based on appox, 1800 markers and conserved sequence blocks and canine genome sequences. To evaluate this assembly/alignment, sequence contigs from 2x cat WGS were aligned with three MHC models,: completed BAC-based sequence from this study (RPCI 86 BAC library from a male domestic cat GUS), canine 7 x WGS sequence assembly (canFam2), and human WGS sequence assembly (hg 17) using CROSSMATCH program (Figure 3). In comparison of canine genome vs. cat BAC based alignment, 1339 WGS contigs were correctly aligned to the BAC MHC sequence, while 486 WGS contigs were correctly aligned to the canine MHC sequence (36.3 % ; 486/1339 contigs)). By human MHC model, 389 contigs were correctly aligned (29.0 %; 389/1339 contigs)). Fifty four percentage of coding genes (with more than 50% of total exons) were found base on canine MHC model. This percentage has increased to 69 % and 85 % for the gene-rich MHC class II and class III regions. Only 32 % coding genes were correctly aligned in MHC class I region based on canine MHC model, suggesting two possibilities to explain this poor result: (i) gene contents of cat and dog class I region are quite different near heterochromatin regions and (ii) due to species-specific gene turnover in class I region.

(D) Single Nucleotide Polymorphism in feline MHC;

Cinnamon MHC locates on a homozygous region based on SNP count (2 in 10 Kbp). We have attempted to analyze SNP level in two MHC haplotypes (Cinnamon and Gus (BAC) MHCs). 11,654 SNPs were found in 3.34 Mbp feline MHC (0.0034 SNP per bp), which is three times higher SNP rate than heterozygous region of Cinnamon genome, but is comparable to human MHC SNP count and rate: 16,013 SNPs in 4.75 Mbp (0.00337 SNP per bp). However extremely higher (40 to 100 times higher than average heterozygous region in Cinnamon genome) SNP rates were found in class II DR/class II and class III border /class I/pericentromeric and subtelomeric regions (Figure 3 and Table 1).

(E) Establishment of feline fosmid LGD database and web-based cloning system;

To establish web-based fosmid cloning system in the Cinnamon genome, 1806 fosmid 384 well plates were received from Agencourt, MA and stored in assigned locations ( racks/shelves) of a – 80 C freezer in LGD. Fosmid Database of 1,288,606 fosmid clones, sequence-trace-ID, plate and well ID, and freezer location ID were established and linked to GARFIELD browser and NCBI trace ID. In this system, fosmid cloning is achieved if you have potential orthologues (i.e. human or mouse or dog or yeast) of genes of interest, you can search for fosmid trace ID by (1) gene ID/symbol in GARFIELD browser or by (2) discontigous megablast in NCBI Blast site of orthologous sequences to felis catus WGS, then fosmid freezer location ID will be retrieved from the fosmid database by trace ID. We have tested 704 fosmids and 87.5 % of fosmids have been able to located with more than 99 % accuracy.

APOBEC3 - Different feline retroviruses are targeted bymultiple feline apobec3 proteins generated trough gene duplications and transcript fusion

The APOBEC3 (A3) family of cytidine deaminases is believed to be under a primate specific evolutionary expansion, because rodents carry only one A3 gene and primates seven genes (-A, -B, -C, -DE, -F, -G, -H). We asked in this study how many A3 genes are present in other mammals and used as a model system the domestic cat (Felis catus). By a combination of RACE PCR and shotgun sequencing, we found that the cat expresses five different A3 proteins, while the genome contains three closely related A3C genes (-a, -b, -c) and one A3H gene. In addition to these 1-domain A3s, we detected a 2-domain A3CH mRNA composed of the fused reading frames of -Ca and –H. This is in contrast to another carnivore, the dog, which has only two A3 genes (-A, -H). Neither cat nor dog encode an A3G. In order to understand if an evolutionary pressure on these genes is detectable in Felidae, we characterized the orthologous cDNAs from several big cats. We found that the identity of A3Cs ranges from 86% (Lion) to 95% (Puma) to the A3Ca of the domestic cat. The identity for the A3Hs is much less variable: 95% (Leopard) to 96% (Tiger, Puma, Lynx). Unexpectedly, feline A3C did not inhibit the replication of feline immunodeficiency virus (FIV) or FIV?vif, but strongly reduced the infectivity of feline foamy virus (FFV)?bet. FIV?vif was suppressed by A3H and even more by A3CH, which were not active against FFV?bet. Feline leukaemia virus replication was only moderately inhibited by A3C and -CH. The FIV Vif protein induced the protein degradation of all feline A3s in feline as in human cells. HIV-1wt was not inhibited by feline A3C.. In the genomes of permissive viruses the feline A3Cs induced specific G to A hypermutations with individual preferred recognition sites.

(A) Development of PCR Sequencing based SNP Typing Module for non human MHC;

MHC typing contributes to study many scientific fields, including diseases association, population and speciation history, organ transplantation & therapy, gene evolution and animal behavior. However, detailed MHC typing methods are only available to human populations due to labor intensity, resources such as database, antibodies. However, recent progress in genome projects makes MHC typing possible for many other species.

We have developed MHC typing modules for the feline MHC DRB locus, based on Polyphred v5.4 SNP detection, post processing Perl scripts, clustalW multi sequence alignment and standalone megablast program (Figure 5). Two databases of more than 100 DRB alleleic sequences and 5,000 heterozygous DRB sequences were generated in this process so far. Efforts are in progress to complete this database for all feline DRB locus and class I locus as well.

These systems will be used to type MHCs for available samples in LGD in order to address disease resistance and population structure and history in non-human mammalian species.

Other immune gene complex, such as LRC and NKC for NK receptor genes will be addressed by established sequencing and annotation systems based on current 2x and upcoming 7x whole genome shotgun sequencing projects. Interestingly, feline genome as well as canine genome do not have KIR or LY49 multigene families observed in primates and rodents, respectively, raising questions what gene complexes play roles for controlling NK functions in Carnivore immune systems. This future project will address these questions.

(B) Human EST/cDNA Filtering to Identify New Pathogens

A total of 6.1 million human EST and full-length cDNA database was screened and filtered with available established genome/pathogen sequence databases. These filtering processes yielded 3,400 known bacterial and 400 known viral sequences, plus 924 unknown EST sequences which have weak amino acid sequence similarities to known pathogens and 82 EST sequences which do not have any homology to known sequences either at nucleotide nor amino acid levels. Tissue distribution, pathological status of original tissues where the EST sequences were generated were examined and found that several known pathogens, including papilloma virus in cervical tissues, EB virus in B cell lymphomas were confirmed. These results suggested possibilities of finding new pathogens which may cause/trigger many human diseases, including cancers, autoimmune diseases, diabetes, Alzheimer's diseases by screening tissues' DNA/RNAs with these EST microarray system.