Documente Academic
Documente Profesional
Documente Cultură
Abstract
The major histocompatibility complex (MHC) molecule plays a central role in the adaptive im-
munity of jawed vertebrates. Allelic variations have been studied extensively in some primate
species, however a comprehensive description of the number of genes remains incomplete. Here,
a bioinformatics program was developed to identify three MHC Class I exons (EX2, EX3 and
EX4) from Whole Genome Sequencing (WGS) datasets. With this algorithm, MHC Class I exons
sequences were extracted from 30 WGS datasets of primates, representatives of Apes, Old World
and New World monkeys and prosimians. There is a high variability in the number of genes be-
tween species. From human WGS, six viable genes (HLA-A, -B, -C, -E, -F, and -G) and four
pseudogene sequences (HLA-H, -J, -L, -V) are obtained. These genes serve to identify the phy-
logenetic clades of MHC-I in primates. The results indicate that human clades of HLA-A -B and
-C were generated shortly after the separation of Old World monkeys. The clades pertaining to
HLA-E, -H and -F are found in all primate families, except in Prosimians. In the clades defined by
HLA-G, -L and -J, there are sequences from Old world monkeys. Specific clades are found in the
four primate families. The evolution of these genes is consistent with birth and death processes
having a high turnover rates.
Keywords: Major Histocompatibility Complex, MHC Evolution, Immunologic Repertoire,
Mammalian Evolution, Gene Discovery
etc.), although this is a nomenclature that is All humans possess the haplotype with six
simply used to make a distinction from the MHC-I expressed genes. Orthologs to these
MHCI and MHCII genes. haplotypes have been sought in Apes, Old
MHC class I (MHC-I) are encoded by genes World, and New World monkeys. Previous
with at least 6 exons. The exons of particular studies have described MHC-I genes ortholo-
interest are exons-2 and -3 (EX2 and EX3, re- gous to those in humans, and the human pseu-
spectively) encoding the protein domains α1 dogene, HLA-H, is actually a viable MHC-
and α2, that are responsible for presenting I gene in chimpanzees and gorillas (Wilming
peptides to TCR. These domains also repre- et al., 2013). At present, there are detailed de-
sent regions of the highest allelic variation. scriptions of the MHC loci in Apes, but few
These genes are essential for the innate and such descriptions in evolutionarily more dis-
adaptive immune responses and are subject to tant primates. For these more distant primate
environmental and evolutionary pressures to- species, several RNA studies have attempted
gether with likely coevolution effects through to characterize allelic variability (de Groot
their interaction with TCR and KIR (de Groot et al., 2012). Nonetheless, these sequences
et al., 2015; Garcia et al., 2009). have not been studied within the context of the
In humans, the MHC-I region consists of germline MHC genes from all these species.
six genes. The HLA-A, -B and -C genes are
highly polymorphic and are expressed by all
cells. These molecules are considered the clas-
sical MHC-I molecules and their role is to
present peptides to cytotoxic T lymphocytes. In recent years, genome sequences of pri-
Unlike the classical MHC genes, the non- mates have become publicly available in the
classical genes, HLA-E, -F and -G, exhibit form of assembled WGS datasets. These
limited polymorphism. In particular, HLA-E assemblies consist of relatively large con-
is a CD94/NKG2A ligand, HLA-G cells are tigs containing most of the genomic se-
found only in the trophoblast (Castro et al., quences. In this paper, we describe the anal-
1996; Djurisic & Hviid, 2014; Lynge-Nilsson ysis of MHC-I exon sequences EX2, EX3
et al., 2014) and HLA-F is involved in NK cell and EX4 from primates that were identified
signaling (Lee et al., 2010). Both HLA-F and from WGS data using a new bioinformat-
HLA-G are expressed by a restricted set of cell ics tool, called MHCfinder, freely availble
populations. at http://vgenerepertoire.org/. The primate
Apart from these MHC-I genes in humans, datasets we utilized in this study are indi-
complete genes have been identified that are cated in the phylogenetic ordering of 1, ob-
not expressed because of different causes: tained from molecular studies Perelman et al.
some are pseudogenes (e.g., the known pseu- (2011); Rogers & Gibbs (2014) and diver-
dogenes HLA-H, -K, -J and -L) (Moscoso gence times further confirmed with sets of
et al., 2006; Heinrichs & Orr, 1990) as well as published works summarized by TimeTree
individual MHC-like exons that are not found (Hedges et al., 2006). From the sequences
in tandem with other valid structural exons found by MHCfinder, this wealth of de-
(e.g., those possessing stop-codons) needed tailed genomic information may help to clar-
to form functional MHC-I molecule (Horton ify evolutionary processes that have shaped the
et al., 2004). MHC-I genes in primates.
2
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Figure 1: The phylogenetic tree of the primates indicating the species studied in this work. The tree is based upon
divergence times obtained from (Hedges et al., 2006), and previous molecular phylogenetic studies Perelman et al.
(2011); Rogers & Gibbs (2014).
homology criteria with a supervised machine cise start/stop positions of the exon reading-
learning classifier. frame (defined by AG and GT motifs, respec-
The MHCfinder program extracts exons tively). Once the exons are identified, they are
EX2 and EX3 that encode the α1 and α2 translated into amino acid sequences by check-
domains, respectively, together with exon ing all valid reading frames. Those sequences
EX4 that encodes the constant domain. Al- containing stop-codons in the reading frame
though other exons constitute the full MHC- are discarded, while valid exons are saved and
I gene, these three exons are of particular in- converted into numerical feature vectors (i.e.,
terest for characterizing and comparing MHC- a unique array of numbers, that uniquely char-
I genes within and amongst species. Thus, acterizes the string of amino acids). A sim-
MHCfinder ignores the peptide leader (L) ple transformation of AA to feature vector was
(given by EX1), as well as all peptides cor- used (based upon the frequency of each AA
responding to the transmembrane and cyto- and pairs of AA), because it was found to
plasmic (T m , indicated by EX5 through EX8). discriminate sequences better than other more
While the exon/intron structure of MHC Class sophisticated transformation procedures (e.g.,
I is thought to be universal across jawed ver- those based upon positional physicochemical
tebrates, the specific intron spacing between properties of each AA within the sequence).
the exon sequences EX2, EX3, and EX4 varies From the feature vector representation, a
considerably. As such, the algorithm only machine learning procedure with a Random
imposes a simple structural requirement that Forest (Breiman, 2001) was used to classify
EX2, EX3 and EX4 are found in a tandem the sequence into one of the exon types: EX2,
arrangement along the DNA sequence, but EX3, and EX4. Supervised training is per-
places no hard restrictions on the intron sep- formed by defining these classes with sets of
aration. annotated exons from H. sapiens and defin-
Figure 2 summarizes the principal steps of ing a null set from a random background se-
the MHCfinder algorithm. This program was quences. Binary classification is carried out
implemented as a multi-threaded application for each exon type with a background/signal
in the python programming language, with the ratio of 3:1, determined empirically. The
biopython library (Cock et al., 2009) for low- prediction precision is improved by multiple
level sequence analysis, and the scikits library training/prediction iterations; positively iden-
(Pedregosa et al., 2011) for machine learning tified sequences are included in the training
tasks. First, a Tblastn query from a consensus set for subsequent training/predictions. This
protein sequences from known MHC-I exons process is referred to as iterative supervised
(EX2, EX3, and EX4 from humans) is made learning, and is a common machine learning
against all available primate WGS datasets. technique whereby new information is con-
The search result is a listing of candidate WGS tinually accrued to the knowledge base for
contigs likely to contain valid exons, together improving prediction accuracy (e.g., modern
with the position of the matching nucleotide speech recognition has benefited from such
sequence and similarity scores; this listing is techniques).
referred to as a hit table. The algorithm pro- In our gene finding algorithm, MHCfinder,
cesses each line of the hit table, analyzing a a probable functionally expressed MHC-I gene
nucleotide region larger than the nucleotide that must contain a tandem arrangement of
positions in the hit, so as to determine the pre- the three viable exons (i.e. those that do not
4
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
.....
FeatVec= 0,1,5,9,2,8,15,0,0,1...
EX
EX3 EX2
4. From positive exons, determine structure "indeterminate" examples
(valid gene, pseudogene, or undertermined) (at end or start of contig)
EX4 EX3
end
Figure 2: The steps in the MHC-I prediction algorithm. The selection of valid MHC-I exons is based upon a Tblastn
pre-selection, an exon reading-frame identification procedure, and classification with a random forest method.
contain stop codons in the reading frame), humans, we found 31 exons, 18 of which are
EX2-EX3-EX4 along the germline sequence. constituents of six functional MHC-I genes;
Nonetheless, viable exons, which are homolo- the other exons while viable, must be pseudo-
gous to MHC-I constituent exons, exist in the genes. Similar results are seen in all the other
genome that do not form tandem arrangements species studied.
(i.e., they are isolated or an exon is missing),
and thus, do not express MHC-I molecules. In
5
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Tree construction. To study the phylogenetic able MHC-I genes. Table 1 lists the number of
relationships from the MHC-I exons, we con- exons and candidate viable genes per species
structed a large phylogenetic tree by align- found from different WGS datasets. Differ-
ing sequences with ClustalO and then used ences in the number of exons between WGS
phyML with the WAG matrix (part of the Fast- datasets from the same species is indicative of
tree software (Price et al., 2010)). In all cases, the variability between individuals of the same
500 bootstrapped samples were made. While species, as well as maturity/completeness and
trees were studied using the three exon se- sequencing methods used in constructing the
quences (EX2-EX3-EX4), the final analysis assemblies.
was made only with EX2, since this exon pro- At present, there are 42 WGS of Homo
vides the most discriminatory information. sapiens in the NCBI repository. From these
datasets, MHCfinder was used to find all
3. Results MHC-I exon sequences and the phylogenetic
tree of Figure 4 was constructed using the de-
Exon sequences EX2, EX3 and EX4 of duced amino acid sequences of EX2. The pro-
MHC-I were obtained from 30 WGS primate gram tags the EX2 sequence as belonging to
datasets using our software tool, MHCfinder, one of the following categories: (1) a proba-
described in the Methods section. These se- ble expressed MHC-1 gene (since it is part of
quences are homologous to the human MHC- a tandem arrangement EX2-EX3-EX4), (2) a
I and were found by an iterative supervised pseudogene (because it lacks either EX3, EX4
learning procedure (Methods). As described, or both), or (3) an indeterminate gene (because
these sequences are flanked by splicing sig- it is found at the extreme edge the contig, but
nals AG/GT and have an ORF starting two nu- could form a viable gene; see graphic of Fig-
cleotides after the AG and terminating one nu- ure 2). In the resulting tree, six viable genes
cleotide before the last GT. Exons are consid- (HLA-A, -B, -C, -E, -F and -G) and four pseu-
ered correct if these conditions are met, while dogenes (HLA-H, -J, -L and -V) can be dis-
exons possessing stop codons within the read- cerned. Also, the tree demonstrates that con-
ing frame are discarded. Those exons found siderable variability exists amongst the clas-
in a tandem arrangement (i.e. with EX2-EX3- sic genes (A, B, and C), forming several lin-
EX4 and with nominal intron spacing) are con- eages. However, with respect to the nonclas-
sidered candidate MHC-I genes (referred to as sical genes (E, F, and G), the sequences are
probably viable genes throughout the rest of invariable. Also, the known pseudogene se-
the paper, since they have the necessary con- quences, L, J and V are conserved, while the H
ditions to be expressed). Valid exons that do pseudogene form separate lineages; this may
not participate in a tandem arrangement are be related to its proximity to the HLA-A locus
considered probable pseudogenes (referred as (Grimsley et al., 1998).
pseudogenes throughout) or indeterminate if Hominidae diversified approximately 20
they are found at the extreme ends of the con- (Million years ago) Mya. Eight WGS datasets
tigs (referred to throughout the text as indeter- were used to study the MHC-I exons from the
minate). Figure 3 shows graphical maps of the Ape family; the number of exons identified
MHC-I exons found in contigs of G. gorilla. from the WGS of Ape species is provided in
The exons found in tandem configurations, Table 1. The number of probable viable genes
EX2-EX3-EX4, are indicated as probably vi- per species is between two and eight. The
6
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Table 1: MHC classI Exons and genes of Primates. All values for the contig N50 were > 15kbp, except N. larvatus
(for which no exons were found). For some assemblies, a reliable number for the coverage could not be ascertained,
and are indicated by N/A (not available).
Species WGS N50(kbp) Cov. EX2 EX3 EX4 Genes
Lemuriformes
E. flavifrons LGHW01 27.3 52x 1 1 1 1
E. macaco LGHX01 20.0 21x 1 0 1 0
P. coquereli JZKE01 28.1 104.7x 6 6 7 6
M. murinus ABDC02 182.9 221.6x 9 10 9 6
Lorisiformes
O. garnettii AAQR03 27.1 137x 13 11 13 8
Tarsiformes
T. syrichta ABRT02 38.2 48x 7 8 10 5
Platyrrhini
C. capucinus LVWQ01 41.2 81x 23 20 24 15
A. nancymaae JYKP01 28.5 113.4x 28 32 28 12
C. jacchus ACFV01 29.3 6.6x 22 20 20 13
C. jacchus BBXK01 61 N/A 5 6 8 3
S. boliviensis AGCE01 38.8 80x 15 17 22 8
Cercopithecoidea
N. larvatus JMHX01 13.3 290 - - - -
C. angolensis JYKR01 38.4 86.8x 8 8 10 4
R. roxellana JABR01 77.2 53.7x 20 14 22 8
M. leucophaeus JYKQ01 31.3 117.2x 6 7 16 3
C. sabaeus AQIB01 90.4 95x 9 10 10 6
M. fascicularis AQIA01 86 68x 17 19 21 12
M. mulatta JSUE03 107 47.4x 31 28 32 17
M. mulatta AANU01 25.7 N/A 26 34 32 17
M. nemestrina JZLF01 107 113.1x 24 20 28 12
C. atys JZLG01 113 192.0x 24 19 22 12
P. anubis AHZZ01 40.3 92x 30 21 20 16
Apes
H. sapiens ABBA01 100 13 8 10 6
P. troglodytes AADA01 108.4 N/A 9 8 6 3
P. troglodytes AACZ04 384.8 70x 12 10 10 8
P. paniscus AJFE02 67 26x 10 8 8 6
G. gorilla CABD03 53 N/A 10 9 8 5
G. gorilla CYUI03 ¿1000 NA/ 7 11 8 5
P. abelii ABGA01 15.6 6x 13 10 9 3
N. leucogenys ADFV01 35 5.6x 6 6 5 2
7
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
EX2 EX4
HLA-V
2
EX
HLA-J
EX2
HLA-H
4
EX
4
2
EX
EX
3
EX
HLA-H
2
EX
HLA-F
3
EX
EX 2
4
EX3
EX
HLA-G
4
EX 2
3
EX
EX
HLA-E
3
4
2
EX
EX
EX
HLA-C
3
EX
4
3
EX
EX
2
3
EX
EX
EX
HLA-B
3
2
EX
EX
HLA-H
Figure 3: Graphical representation of the MHC-I exons obtained in G. gorilla (CABD03) with the MHCfinder pro-
gram. Each contig is represented by a line and exons by boxes. Tandem exon arrangements (EX2-EX3-EX4) are
colored blue, indicating the high possibility of being a viable and functionally expressed MHC-I gene (explained in
text). Exons not part of tandem arrangements, thus indicative of pseudogenes, are marked in red. Exons that are found
at the extreme ends of the contigs may be considered indeterminate if they could form tandem arrangements with
exons in other contigs (colored in light blue).
Homo_sapiens-LRUM01-EX2-198|Pseu/1-89
-89
-89
Ho mo _sa ien s-L BL P0 1-E
-8 9
C o n se n supi en s- A D D F0 2- E 2- 30 6| Ps u/ 1- 89
se u/1 -89
Ho m o_
3 /1 -8 30 8| P se t/ 1- 89
7| In de /1 -8 9
2-24 8|Ps eu/1
1- 89
H om o_ sa pi en s-
2|P se u/1
H om
01 -E X2 30 4| In de t/1
H om _ sa p ie n L K H W 0 1 -E X 22 -3 8 |P s e u /1 -8 9
H o m _ sa p ie n ie n s -L J C 0 1 -E XX 2 -2 8 |P s e uu /1 -88 9
H o m m o _ s a n s -L M B D 0 1 -E
9
u/
eu
H oo m o _ s a p ie n s -L B H Z 0 2 3 -1 1 4 H C /1 s e u /1 -8
H o m o _ s a p ie n s -L O Q O s 1 0 - e2 - 3 0 04 |P s ee u /1 - 8 99
C o o_ sa en s- A Q P0 1- EX -3 05 |P se
H o m o _ s a p ie n s s e n s 0 2 - E XX 2 - 2 99 3 |P s e uu / 1 - 8- 8 9
o_
sa
H o s a p ie s -L M E 0 1 -E-E X 2 -9- H L As e u /1 - 8 99
-27 4|P
ap
ap
o
o
pie ns- LD
sa
Ho
H o o _ s p ie n s - A E Q N 0 1 - E E X 2 - 2 7 8 | P s e u / 11 - 8 8 9
H o m o _ s a p ie n s B L 0 1 -1 1 6 |M H C /1 -8 9
o_
Ho
9
H o m o _ a p ie n s - L - A B S A 0 1 2 - E X 2 - - 2 5 3 | P P s ee u / / 1 - 8- 8 9 9
pi en s-
-8
ens -LR IL0 -EX 2-1 40 |Ps euu/1 -89
m
H o m o_ sa s1 ns LR D 02 11 |MH C / t/1-8 89
mo
H o m o _ s a p ie n s - A K 0 1 - 8 |M C /1 -8 9
X 2- 30
pi en L K H Y 0 1 -E X 2 -5
H o m o _ s a p n s -A D D F 2 /1 -8 9 -8 9
Ho
OQN0 1-EX2
0 9 |M 1 |P
m
9
H o m o _ s a p ie n - L IQ N V - 1 2 |M H /1 -8
H o m o _ _ s a p ie n s - A M X P 0 1 - - E X X 2 - - 1 9 3 | P P s e e u u / 1 1 - 8 8 9
EX 2-
-
9
H o mo sap ns s-L Q 02 -E X2 -18 79 |P se u/ /1
H0oo m o _ s a pp ie nn s - AA A D C 0 1 E X 2 |M H C /1 - 8 9
n - A U 0 1 - 1 0 6 - 0 |I |I n d e 9
_ s a p ie n - A A K P 0 1 - E XX 2 - 2 2 7 5 | P s e u / / 1 - 8- 8 9
t/ 1
Ho m o__s ap ie ns -A BS L0 1-E X2
H o mn s e n ss u s 1 0 -M H C -I -p02 -E X 2-
8
9
m o s a p ie - A Y H 0 1 E X 2 - 2 0 9 | P s e u / 1 - 8 9
H o m o s a p ie n s -A C -I -p rc o 2 /1
H o m o _ s a ie n -L D O A 0 1 -1 X 2 -2 8
_s
p
H o m o _ a p ie ie n s - L OA D B F 0 2 1 - E X 2 - 1 7 4 8 | P s e e u
ie n
s- LK
/1 -
HC
p
LM BA 01 -E X 2- 78 |P se u/
s-
s-
Ho mo_ _sa pien s-A -JS M0 02- EX X2- -16 62| |P
|M
se 1- -89
a ie n L O P 1 - X 2 2 5 3 | P s e u / 1 - 9
_s
EK P0 1-
H o m o _ s a ie s - A D 0 2 - 1 4 6 H C
H o m _ su ie s- AA B 1- 07 M H de
sa p ie ns -L OQ N 2- /1 -1 7|P nd /1-
H om o _ a pie n -A 2- M0 -E X2 18 /1 9
H o m o_ sa pi 0 -L U B0 -E 1|M C 1 -
H mo sa ap ns -L QM 0 1- EX 2
u/ 89
9
BB
ns-
p
H X 011 -E X 2 -6 8 |P se uu /1 -8 99
Ho mo o_s pie ens -LO DO Q0 01- -E
89
03
0 , H o n m o _ s a p ie n s - L R S L 0 1 E X 2 2 1 8
n
H o m o _ s a p ie n s B B 0 2 -E
1-E
om o _ sa pi n s- A H 1 X -2 0| -8
u
e
s- AA DC
2- EX 2-
Ho om _sa ap ns s-L LBL EK L01
-L
o _ _ s s a p p i e e n s s - L L O QD B L A - - E X 2 - 1 8 8 2 | P s e
26
-c
s1 0 -M M Y H
B
s
1-
A ZQ 1-E 2- 91 nd u/ -8 9
ie n s B B 0 - E 2 2 2 9 | s u / 1 - 8 9
CY 01
s
-1
n s O Q L0 -E X2 25 0|P se u/ 1- 9
o
|P u/
II
X2-1
-
Homo _sapi ens-L
0
H C -I
H o _s ie n s- O I
X2
Ho om _sa ap pie ns- s-L
H om o_ sa pi en s- LO
H o m o _ s a p ie n s -L B C 0
-
|
-E
I
pie ns-
_
H o _s a ie en
m o
165|P
-13
H om o_ ap ns -LD P01
p
H om o_ _sa sap
H
u
12 0|P se
H m _ s p ie s - L L Q
9
-
om o
sa pi en
-M
2-
1|Pseu/1- 89
-9 8|
B D0 1- -15 2- HC /1-8 9
8
H om o _
0H o m o e n a p ie n s - A D L 0 - 1
-
2|P se
01 -3 |M M 8 -8 9
M C /1 -8 /1
LM B E0 02 EX M C 1-
X 2 -1 |P s - L /1 - 8 9
H om mo
Hom o_s u s 1 0
-4 8 |P s e u /1 -8 9
n
H om sa pi
-4 4| H HC |Ps 9
Ho mo _sa
|P se u/
B D
u
8
s - - L MC Y Y H 1 - - 5 | M H C /
2
H Ho
4| M C /1 e
P L
Ho mo _s
ie
Ho mo _s
|P
H / -8 9
x 2 |P u /1 - 8
C 1 24 | 15 /1
p ie n L R
en s L M Q0 01 -4| H
X 2
H i
p
8
se u /1 9
H om o_
o
|P
u/
C 1- 9
a
B
s p i
H m
se
/1-
Co nsen
-8 9
H om o_
1-
/1 8
1- 89
A
1-
U
9
o_
1- 899
/1
H om o sa
89
e
- 3 s u
0
9
-8
om o _s p
-8
9
9 9
/1 9
s /1
H o _s ap ien
H
-8
-8 -8
|I e 1
o
Ho mo _sa api ien s-L /1 /1 89
0 1H o s _ a p ie
C
H
C C - 9
E 2 1
H H /1 -8
-
H m _ pi en s- D
Ho om o_s sap ens s-L LB NV M |M C /1 89 9
o s
m o_ ap ien -L BH LP 01 4| 64 |MH HC /1- 1-8
s
5 -
o s
Ho Ho _sa api ien s-L BA Z0 1- -EX
M 0 1- 01 -74 |M HC C/ -89
II0 W 01 -84 |M MH /1 89
1- 9
| s
X
C
9 en s L H B 1 12 - 8 0|P se u/
P s
5
8
Ho m sa ie ien KH HY 01 -E 2- 12 5 |P 9
a
o_ pi ns s | pi ien ns- -LK MB A0 2-2 H / 1 -15 5|P Pse /1-8 - 8 9
Ho o _ sa en -L -L W0 1- EX X2- 101 3|P Ps seu
m 0 - a
s ap ie ns -L MB F0 A - X2 -18 9| eu t / 1 9
8
0 ,0
m o s a p ie s - M J I I 1 E X 2 9 1 | P s e u / 1 o_ _s sap pie ens s-L SA - H L 1-E X2 2-21 7|Ps n d e /1-8
_ s p ie n s L M B C 0 1 - E X 2 - 8 1 | P s e e u / 1 - 8
05
I
o mm o o _ _ s a a p i i e n s - J x 2 I L 0 2 - E - E X - 2 6 8 3 | s e u / 1 - 8 9
-
ap ns -L BD 01 -E 2 -7 |P se u /1 -8 9
0,
ie n - L C Y 0 - E X 2 - 6 1 | P s u / 1 - 8 9 H o m o s p en - e R K0 2 X2 - 2 4|P u
Ho O 1 1 e
5 | s u /1 -8 9 H o m o_ sa pi 1 0 -L IQ B0 -E X 2 28 e -89
m s - E - X -
H o o _ C o L R UQ O 0 0 1 - E X 2 2 - 4 1 | PP s e e u / 1 - - 8 9 9 H o m o _ s a s n s - L D 0 2 - E 2 - 5 |P s e u / 1 1 - 8 9
H o m _ s u ie ns AA K 01 EX 28 /
H o m o _ s a p ie n s e L 0 1 1 - E E X 2 - 3 1 1 | P s e u / 1/ 1 - 8 8 9 H o mo e n ap ie s- IQ C 1- - |P s d e t - 8 9
H o n s _ s a p i e n s - L A D D 0 - E X 2 - 2 8 6 9 |I n e u /1
07
H o mo_ sap n s - n s u s -EX X2- -21 |Ps seu u/1- -89 9
0,
H o o _ s a p n - A D 0 1 X 2 - 2 9 |P s
3
m o s a ie n A A 1 0 2 - 1 3 | P e u / 1 8 9
0 ,1
C o m o _ s p ie n s - A A B A 1 - E X 2 9 4
13
4
Ho p s - 07 H o m o s a p ie n s A B L 0 2 - E 2 - 1 /1 - 8 9 8 9
m _ s a ie n s - A D B - e 1 9 2 | P s e u / 1 - 8 9
0,18
0 ,1
0,
0,
6
H o o _ s p ie n s - A B S L 0 2 - x 2 - H | P s e u / / 1 - 8 9 H o m o _ s a p ie s - B S F 0 E X H C - -8 9
0 ,1
H o m o _ s a p ie n s - A AB B A 0 0 1 - 22 3 6 | L A - e u / 11 - 8 8 9 H o m o _ s a ie n s - A D D 0 1 - |M C /1 e t/ 1 9
m M 9 H o m o _ s a p ie n s - A U M 1 - 7 4 |M H 5 |I n d e t/ 1 -8
H o o _ s a a p ie n s - L R D D 0 1 - 2 3 3 3 |M H CJ / 1 - - 8 9 03 H o m o _ a p ie n - L R Q O 0 - 1 8 -2 9 |I n d 1 -8 9
HLA-L
mo p s UM 1- 0 |M H C / 1 - 8 9 0, H om o_s ap ns LO N01 X2 6 t/
H o m _ s a p ieie n s -A- L IQ K 0 1 - 22 2 8 |M s p ie n s - O Q 0 1 -E X 2 -2 9 |I n d e 1 -8 9
-B
0,
HC /1- 89 ,0
3 H m _
H oo m oo _ s a p ie n s - L Q P 0 1 -E 2 -2 9 7 n d e t/
15
n s -A A D C 0 2 - 2 1 7 |M H C /1 - 8 9 0
Hom o_s 9
HLA
a 8 H o m _ s a p ie s -L O Q M -E X 8 |I -8
H o m o _ s a p p ie n s -JM Y H 00 1 -2 11 6 |M HH C /1/1 - 8 99 H o m o _ s a p ie n -L O L 0 1 X 2 -2 1 |P s e u -8 9
9 /1
2
14 9
2
ie 0,
0 ,2
5 H o m o _ s a ie n s -L O Q 0 1 -E 2 -3 0 s e u /1
H o m o _ s a p ie n s -L O S A F 0 2 -2 1 4 |M H CC /1 - 8- 8 9 0, 0 ,0
0,
19
15 9
5 H o m o _ s a p ie n s O E K 1 -E X -1 6 |P
0, 25
u /1 -8
0,
o Q 2 |M /1 -8 9 0 ,0
H o m _ s a p ie n s -A D N 0 1 -2-2 1 0 |M H C /1 H o m o s a p n s -L X P 0 -E X 2 |P se /1 -8 9
5
D 9 6
0 ,2
o _ n s
sa p ie -A E F 0 0 2 H -8 H m o _ s a p ie n s -L P Y E 0 1 2 -2 u
Hom K P 2 -2 0 |M H CC /1 -8 9
o n
H om _ sa p ie n s- L R U L 0 1 -1 9 61 |M H C /1 -8 99
0,
-H 3 H o m o _ s a p ie s -L C D 0 1 -E X 2 -3 6 |P se 1- 89
HLA
0 ,0
8
18
H o m o _ a p ie n s- L M B 0 1 -E X |P se u/
0 ,2
o_ s- L 4 1- 89
Ho m o_ sa pi en s-LB LQ 01 -1
01
-1 7 5
|M
H om o_sa pi en s- D O C 0 3 -1 9 1 |M H C /1 -8-8 9
H
/1
HLA 0 ,0 H o m o _ s a p ie n L M B C X 2- 46
H o m o _ s p ie n s- II 01 -E X 2- 56
|P se u/
u/ 1- 89
0,3
|M C /1 -8 99
- H o m o _ sa pi en s- LJ H W 01 -E 2- 66 |P se
0, 3
-8 9
Ho m o_
sa pi en
sa
LR IL
s- LD NV
Ho mo _s pi en s- LB LP 01 -1 44 |M
01 69 |M H C /1 -8
-1 56 |M H C /1 -8 9
H C /1 9
J 0, 04
H o o_ sa en s- LK Y0 1- EX
H om o_ sa pi en s- LK H 01 -E X2 -7 6|
Ps eu /1 9
eu /1 -8
H om o_ sa pi s- LK HX -E X2 -8 6| Ps
0,
-8 9
Ho mo _sa ap ien s-L MB A0 2- 12 4|M HC /1 -8 9 0, 1 Ho m _s ap ien MB A0 1-E 18 |Ps eu /1-
89
pie ns- LM 1-1 02 |M /1- 89 Ho mo s-L
ap ien LB HZ 02 -EX 2-1
Ho mo _s u/1 -89
Hom o_s api BB 01- EX 2-9 2|P HC /1- 89 0,0 3 pie ns- 2-1 30| Pse
ens
Hom o_sa pien -LK HX 01- 82| MH C/1
seu /1- 89 Ho mo _sa pie ns- LBL P01 -EXX2- 138 |Pse u/1- 89
Ho mo _sa 01-E
s-LK HY0 1-72 -89 ns-L DNV u/1-8 9
Homo _sap iens-
LKHW 01-62 |MHC
Homo_ sapien s-LJII0 1-52|M
|MH C/1- 89
/1-89 0,13
HLA-F 0 ,3
1 Hom o_s apie
Homo _sapie
s-LB LQ01 -EX2 -163 |Pse
Hom o_sa pienns-LD OC03 -EX2- 171|P seu/1- 89
HC/1-8 9 2-204|Ps eu/1-89
Homo_sapi ens-LMBC0 1-42|MHC/1 -89 Homo_sa piens-JS AF02-EX
Homo_sapiens-LMBD01-32|MHC /1-89 0,34 Homo_sapiens-A MYH02-EX2-287 |Pseu/1-89
Consensus10-MHC-I-p1/1-89
Homo_sapiens-LCYE01-22|MHC/ 1-89 0,22 Homo_sapiens-A ADB02-279|MHC /1-89
-89
Homo_sapi ens-LOQO0 1-14|MHC/1 A-F/1-9 1
0,25 Homo_sa piens-AA DC01-EX
2-311|Ps eu/1-89
Consen sus10- ex2-HL |Pseu /1-89
0, 3 Conse nsus1 0-MHC
-I-cer co3/1 -89
Con sens
LOQN 01-EX 2-2893|M HC/ 1-89
0,2 3
Homo _sap iens- 0,1 3 6 Hom o_s us10 -ex2 -HLA -V/1 -89
s-LD OC0 3-17 C/1 -89 0,1 8
A 3
Ho mo _saapie ns-L OQ O01 -EX 2-12
0, 37
HLA-
Hom o_sa pien -LB LQ0 1-1 66| MH 0, 19 0, 2 0,
api ens 3|P seu /1- 89 0,0 Ho
1 mo pie ns- LO
QN |Pse u/1- 89
Hom o_s -15 _sa 01-
IL0 1-E X2 HC /1- 89 0, 2 Ho mo pie ns- LIQ EX 2-2 63|
Pse u/1 -89
pie ns- LR 1-1 41 |M /1- 89 0 ,2 Ho mo _s ap ien s-A ADK0 2-E X2 -26 9|P
Ho mo _sa ap ien s-L DN V0 1- 13 3|M HC /1 -8 9 6
Ho m _s ap ien B0 2-E X2 seu /1- 89
0,34
_s P0 HC 2
H om o_ sa pi en s- AB SL 01
BL 9 6
Ho mo |M -31 0|I nd
HLA-G
0 ,2
eu /1 -8
HLA-
3
ap ien s-L 02 -1 21 5 0 ,2
0,
0 ,2
1 V H om o_ sa pi en s- LC YE
o_
0, 01H o m o _ sa pi en
H o m sa p
s- LM
-E X2 -3
01 -E X2 15
s- LM B D 01 -E X -2 3| Ps eu t/1 -8 9
de t/1 -8
|In de 9
-C
Ho m o_ C s- A E -E X 2- 180 3 |I n d ee t/ 1 -8 99 H o o _ sa ie n s- B C 01 2- 33 /1
pi en 2 H o m o _ s p ie n L JI I0 1 -E X 2- |P se u/ -8 2
o_ sa LI Q K 022 -E X 2 -3 0 2 |I n dH C /1 -8 0,
HLA
-8 9 H o mm o _ s a p ie n ss- L K H W -E X 2 -5 43 |P se u/ 1- 82
0,28
H om
HL
s-
pi en A A D B 0 2 -E X 2 -3-2 5 5 |M H C /1 -8 9 H o _ a p ie -L K 0 1 -E 3 |P se 1-
o_ sa s- 0 1 |M /1 9 H o m o _ s a p ie n s -L H Y 0 1 X 2 -6 u /1 -8 82
A-
H om sa p ie n -A A D B B S L 0 1 -2 5 4 |M H CC /1 -8 9
o_ ns s -A A0 25 H -8 9 H oo m o _ s a p ie n s -L MK H X 0 1 -E X 2 -7 3 |P se u 2
E
H o m _ s a p ie s a p ie n s -A B B K 0 2 -2 0 0 |M H C /1 8 0 ,0 H o m o _ s a p ie n s -L B B 0 -E X 2 3 |P /1 -8
21
2
/1 - - 8 9 -8 3 s e u /1 2
0,
o mo sa
19
_ n -2 |M C H ns MB 1 -E
0,
m o IQ 1 6 1
H oo m o __ s a pp ie n s -L B H A 0 1 -E X 2 -9 |P s e u -8 2
0,
Ho ie
H o m o _ s a p ie n s -L A D D 0 1 -1 8 7 |M HH C /1 - 8 99
1
0,
19
0 ,2
2
14
0, ,17
p 0 5 /1 H m sa ie -L Z X 3 |P /1
0,
H o m o _ s a ie n s -A A D C 0 2 - 2 2 0 8 |M H CC /1 - 8- 8 9 0, B 0
H o m o _ s p ie n s - L L P 2 -E 2 -1 0 s e u /1 -8 2
-8
17
14
0
m p -A H - |M 0, H o m o _ s a p ie n s - D N 0 1 -E X 2 -1 3 |P s
0 ,2,2
H o o _ s a ie n s A M Y F 0 2 - 1 0 0 |M H C / 11 - 8 99 2
H oo m oo _ s aa p ie n s - LL R IL V 0 1 - X 2 -1 2 5 |P s e u /1 -8
0
m a p s - S A 1 0 H / 8
H o o _ s p ie n n s - J B A 0 0 1 - 98 0 |M H CC / 1 - - 8 9 H m _ p n B 01 EX 37 e 2
m
H o o _ s a s a p ie s - L M M B BX 0 1 - 7 0 |M
- H /1 89 H o m o _ s a p ie n s - L L Q - E X 2 - 1 |P s u /1 -8
|M HC /1- 89 H o m o _ s a ie n s - L D O 0 1 - 2 - 1 5 4 5 |P e u /1 -8 2
7
m n L 1 H o m o s a p ie s - R U C 0 E X
H o o m o _ a p ie ie n s - - L K H Y 0 1 - 6 0 0 | M H CC / 1 - - 8 99
0 ,1
7 seu 2
H o m o _ s a p ie n s J S A L 0 3 - E 2 - 1 |P s
0 ,1
m _s ie - H 0 4 |M /1 8 H o o m o __ s a a p i e n s - A B B A 02-E X 2 - - 1 7 6 s e u - 8 2
H o o m o _ s a p ie n s - L K L J I I 0 1 - - 3 0 | M HH C / 1 - - 8 9 9 C o s p n - S 0 X 1 |P s /1 -
H mo sap ns ns- BC 01 -20 |M HC1/1 1-8 89 H o m _ a ie s- AA L0 1-E 2- 93 e 8
H o m o _ s a p iea p ie - L M B D E 0 1 1 9 5 2 | M r c o C / / 1 - - 8 9 9 H o n s o _ s a p i e n s A M D B 1 - X 2 1 |P s u /1 2
- H o m e s a p i n - L Y 0 E X 2 2 - 2 3 1 |P s e u / 1 - 8 2
H o m o _ o _ s ie n s s - L M C Y 0 1 - 2 2 - c e | M HH C / 1 1 - 8 8 9 H o m o_ ns p en s- RU H 2- 1 e
o o m a p en s-L UM 02 -I 03 |M HC / 1- 89 H o mm o o _ s a u s 1i e n s - A A D D M 0 0 2 - E X 2 - 2 3 4 | P s u / 1 - - 8 2
H H _s pi n R B C 2 7 M C / - 89 om o _ sa pi 0 s- D F 1 EX -2 |P eu 82
H - 4 | H C /1 -
e
mo _sa api s-L AD M 01 -1 29 |M H C /1- o_ _s sap pie ens - e x AAD DF0 02- EX 2-2 37|P seu /1-8
H o mm o o _ s a p i ei e n s s - L B L N V 0 1 - 1 5 6 1 | | M H C / 1 8 9
_ s a p ie e n s - L M A 0 0 2 2 5 8 | A C / 1 - 8 8 9
om o _ sa pi n - B P0 1 4 M C/
H o m o_ sa p en -L D 0 1-1 7 MH /1-
n -L B 1 -21 9| MH - C /1 -8 9
H o m o_ sa pi ns -L IL 0 -16 8|
H o m o _ o _ s ie n s s - L KK H XB 0 1 - 1 0 5 3 | MM H C / / 1 - - 8 9 9
p i i e n s - - L OR U L 0 1 - E X 2 - 3 7 7 | P s e u / 1 2
o_ _s sap pie ens s-L DO Z0 1-1 142 |MH HC/ 1-89 9
H o mm o _ s a p i i e n s - L L O E Q 0 10 1 - 1 9 | M H C / 1 / 1 - 8 9
sa ap -L H 0 -9 | H C 1 89
en s L Q M A - E 2- 18 Ps u -82
H mo _sa ien ns MY IL - e -2 13 |M H
sa ap i e n s - LMM B C 0 2 - 1 3 4 | M C / 1 1 - 8
H o_ sa ns s- LD BL Q0 2- 11 0
H
m o H o m o _ s a p ie n ie n s K H WY 0 1 1 - 8 5 | MM H C / / 1 - 8- 8 9
H o m o s a p p i e s - A - L Rs 1 0 0 2 2 - 3 1 - 3 8 | M
s - - L MC Y O 0 1 E / 1 X 2 - 3 1 | I n e u / 1 - 8
H o o m o _ _ s a a p i e n s - L B Q M 0 1 - 1 0 7 |M|M H CC /1 - 8 9
H C 1 9
m o_ se ien s- -L L Z0 3- 11
H o a p ie s a p n s L M L J I I 00 1 - 6 7 5 | M| M H C / 1/ 1 - 8- 8 9
H o m o _ s p ie n s - L O U L 2 - 2 2 1
LM B E0 01 -2 - 8 32 9|I de /1- 2
en s -L H B 1 11 |M C /1
o
H m on ap ien ns -LB H C0 1-
H o m o _ s a p ie n s L R F 0 2 - 2 8 |M C /1 - 9
- A ie n s L M C 0 1 - 5 5 | M H C / 1 - 8 9 9
Ho _ sa en n u DB X 0 10
|
H o m o s a p ie n s - S A B 0 - 2 3 |M H /1 - 8
Ho C _s ap pie ns -LB O A0
H o m o _ s a ie s - J A D 0 1 3 9
m o H o m s a p ie D D F - L C B D 0 1 - 4 5 5 | M H C / 1 - - 8 9
m o_ pi pie n s A -E UM 1-
LJ H Y 1 89 |M MH /1 89
_ a e s R K
o s p n -
H o m o _ s a p ie n s - A D D 1 - 2 0 |M H C /1 -8 9
|M H C / 1 - 8 9
o s D C 1 9 |M C nd t/ 8
Ho om _sa sa n s e s-A 01 LR L0
m o_ sa pie ns -L BB
H m _ p n AA P0 24
C - 89
01 -2 |M et 1-8 2
II0 W 01 -79 |M H C -8
/
H oo m oo _ s aa p ie n s - A E K L 0 1 - 4 1 |M H C /1 -8 9
o _ s s -A E n s - L Q N 0 2 - 3 02 5 |M H C / 1 - 88 9
-3 9| H HC 1-8
H o o_ o en DC s- S
Ho om o_ sa pie ns s-A
H o m o _ s a p ie n s - B S 0 1 -2 4 2 |M H C /1
K P O Q 1 - 2 9 |P H C / 1 - 9
/1
-8 2
1- 01 -69 |M HC C/1 /1-8 9
0 1 -E O 0 7 0 s e /1 8 9
H o m o _ s p ie n s - A B B A 0 1 -2 3 |M C /1 -8 9
C
m m C pi A n B
H o m p ie n s D D F E K P X 2 -3 1- 6 |M H C /1/1 - 8 9
H o m _ s a p ie s -A Q P 2 -2 4 |M H -8 9
9| M C /1 9
9
H om o_ sa pie en
a -A pie s-A
49 -5 |M H /1 -8 9
H o m o _ s a p ie n s -L O D F 0 -2 4 4
1 |M u - 8
M HC /1 -8 2
H o m _ s a p ie A Z Q 0 2 -E X 21 -2 2 7 |I n d eC /1 - 8- 8 9
H o m o _ s a ie n -A D H 0 2 5 |M H C C /1 -8 9
/
H mo sap ns MY
H m o_ a pi
-2 8 |M H t/ 1 -8 9
|M 9|M H C/1 -8 9
H / -8 9
C /1 9
H oo m o _ s a p ie n s -A IQ K 0 2 1 -2 4 7 |M H C /1 -8
o_ ien sa ien
o s
H o m o _ sa p ie n s -L IQ M 0 1 -2 8 0 |P n d e t/ 1 -8 9
H
0 2 -2 6 |M s e u /1 -8 9
L H
C 1- 9
H o m o _ ss a p ie nn s -L O Q L 0 1 -2 2 |M H C /1 -8 9
H om o_ sa
- 5
H H C/1 -8 9
6 8 |M H C /1 -8 9
H
H o m o _ a p ie s -L O Q 0 1 -2 5 |M H C /1
s- LO Y H 0 2 -2 6 6 |M H C /1 -8 9
/1 8
om ap o_ sap
H o m o _ s a p ie n L P X P
C C - 9
C /1 -8 9
H o m o _ s p ie n s- LA Z Q 01 /1 -8 9
H m o_
9
-8
H o m o _ sa pi en s-
-8 9
01 -2 |M H C -8 9
/1 /1 89
H o o_ sa 10 -e x2 O 01 -8 |M
- L O0 2 - E E 0 1 - 3 5 |M
/1 -8 9
m
9
H om se ns us en s- LO Q 01 -1 7| M HC /1 -8 9
o
C on o_ sa pi s- LC YE 1- 27 |M HC
9
ap ien s-A DB 02 -2 35 |M HC /1 -8 9
s om _
H om o_ sa pi en -L MB D0
-8 -8
H om 9
32 |M HC /1 -8 9
Ho m _s ap ie ns MB C0 1- 37 |M /1- 89
Ho mo _s pi en s- AD KP 01 -2 60 |M H C /1 -8
/1- 89
Ho mo
s
o _ p ie -L
2 H
C/1 -89
9
o_ H mo
HC /1- 89
Ho mo _s
-8
HC/1 -89
Ho mo _s
C/1-89
Homo_sapiens-LDNV0 1-139|MHC/1-89
01-164 |MHC/ 1-89
Ho mo _sa
s
9 9
Hom o_sa pien
Homo _sapi ens-LLMBA01 -97|MHC /1-89
A A D DN 01 -2 62 |M H C /1
Homo_s apiens-
Homo_sapien s-LBHZ02-11 9|MHC/1-89
Homo_sapiens-LBLP01-131|MHC/1-89
Hom o_sa piens 01-E X2-1 90|P seu /1-8
H
o
/1
H
-
1
s
L
DF 02 -2 |M H C
8 |I
HC
-
B
2-2 05| MH
-
|M
7
ap
om
64
56
s
-
pie ns-
o_ sa ie n s- A MP X P 0 1 -2
0
H
ien s-L
-
7
Ho om ap
o
8
BS L0 1-2
o _ s ie n s ie n s -A
0
_sa o_s ns
N0
HC -8
-H LA
JII
Q
s- L
-2 4
_
H
Ho m o_ sa pi en s- AE
a
01
o
AA
H
m
-8
-A
ap
-L
-2 78
-47
H om o_ sa pi en s-
-G
n
ns-L RUL
_
H
C
ap ien s-
9
pi en
ie
mo
9 |M
_s
|M HC
p
8
pie
Hom sap
9
Hom om
H
o _ sa
H C /1 9
|MH C/1- 89
/1
o
-8
a
sa
9
Ho
HC /1- 89
Ho mo _sa
H
Ho mo _s
_
o_
/1 -8
o
-8 9
H om
Ho
-8 9
H om
9
1-89
-89
Figure 4: The phylogenetic tree of 333 EX2 sequences of MHC-I found with our |MHCfinder— algorithm from the
42 WGS datasets of Homo sapiens. The probable viable genes are colored in blue, pseudogenes are colored red, and
those genes that are indeterminate are colored light blue. The sequences in brown correspond to consensus sequences
that are used to identify clades. The nodes of the principal clades have been labeled. The tree was constructed after
aligning sequences with ClustalO and using the phyML (part of the Fasttree software (Price et al., 2010)) with the
gamma parameter, WGS matrix, and 500 bootstrapped samples.
sis) and 15 (in C. capucinus). Lemuriformes cause these sequenced assemblies are still in a
and Tarsiiformes represent the oldest extant nascent stage.
primates that diverged from other primates
approximately 76 Mya. Compared to other Using MHCfinder, 20 WGS datasets from
taxa of this study, few exons were found from the Simians family (not including humans)
the datasets of Eulemur species, perhaps be- were studied, from which 210 viable genes
were found. Of the 3 MHC-I exons stud-
9
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
t/1- 89
|M H d e t/ 1 -9 1
ecu s_r oxeLVW Q01 -EX 2-1 1|M et/1 -91
Callithrix_jacc hus-ACFV01- EX2-16|Indet/
N om as
AB R0 1-E HC /1-9 1
Aotus_ nancy maae-
Callithrix _jacchus -ACFV01
1
X2 -7|I nde
R h in M a c a c a _ fa s c u s _ a tya -J S U U 0 1 -E
C /1 -9
Pa n_ tro
Z L G 1 -E X 0 1 -E 2 -1 2 |I -9 1
ro gl od -A JF E0 3- EX 2-2- 0| In de t/1 -8 -8 9
Pa n_ tro
d ri llu P a p io _ a s _ a ty s a -J A B R 0 1 -E XX 2 -7 |M
pi th ec ce bu s_ a e u s- JYZ Z 0 1 -E X 2 -1 3 |P s e uC /1 -8 99
Gor illa_ gorCo nse nsu s10 -ex2-5 |In de t/1 -89
N a c a c a _ a n la t a b a e ll a - A X 2 - 0 |M |M /1 -9 /1 -9 1
/1 -8 9
2-14 |Ind
o p it
9
Go tro glo dy og en ys -A -0| MH C/1 -89
n
e t/ 1
er
-9 1
M a c C e rc o _ m u latt a -A -A H Z Z 0 1 - E X 2 -2 - 3 |I|M H CC / 1 - - 8 99
C e rc s _ ro u la ri -A Q IA 0 1 -E 2 -1 92 |M H
Po ng o_ ab el ii- ABD FV 01 -E X2
-8 9
Pa n_ paab el ii- AB GAA0 1- EX 2--1 |P se u/ 1-
Po ng o_og en ys -A A B R 01 -EX 2- 22 |P
M ao lo ba c ac_e b u c u s u c o gJ S U E K 0 1 0 1 - E2 -1 3 |MX 2 -2 C /1 -9 1
s_ le
co
M a c a c a _ m u u c o a -J -E X 1 -E X H C -9 1
N in o p h e cu le n si s- LA -F /1 2- 0| M H C 9
h e c _ fa s c ic u la ri ss -J Z L G0 3 -E XX 2 -2-4
T h lo io ce he n 0 s A A 02 -A EX 2 -3 | I n 1-8 / 1 - 8 9 - 86
us
R h m a a _ u la ic u la A N u s -J 1 -E X H C H C /1
C /1 -8
C
P a c a c ae b u s _a tu s -Jx e ll a n a -J A B R 0X 2 -4 |I n d
glo dy tes -A AC Z0 4-EEX
C o lo ns us yt es -A 2- EX 2- 0| M HC /1
Ma
Ma
ac
0 1 -E 2 -0 |P X 2 -8
P e i n b u n s l o d y - A g U E 0 0 1 - E |M 5 |M H3 |M H /1 - 8 9
P a _ s c _ m u la s t s is U E A Q B 1 - E C /1 - 9 1 1
glo
o c e b x e ll a s -A Q 0 1 X 2 |I n d C /1
P o c a a _ n e g o t a - e u n a D F - 0 |M H C H C /1 1
|M H
Aotus_n ancymaa e-JYKP0 1-EX2-2
X2-2
M a c a c a _ s _ le e s tr in Z 0 1 H L 0 2 -0 |M e u /1
M ah lo r o p it u s _u la tt aa - A Eis - A Q 0 1 -E K Q 0 1 -1 8 |M -9 1
uc
-9
a
u co pn u b is -A Z L G 0 10 1 -E X 22 -8 |M HH C /1 -8 9
u
C /1
1 -E X
c a c a p io _ a n u b a - J SA N U 0 3 - E- E X 2 - 2 1 |M
c a c _ m u lan u b is - A H E 0 3 1 - E X 2 - 2 00 |M HH C /1/ 1 - 8- 8 9
-E X2 -4
H
dy
M a c a c a _ m u la t t aa - J S N U 0B 0 1 E X 2 2 - 1 69 | Mn d e C / 1/ 1 - 889
-JYKP 01-EX
M a a c a c a _ m u la tta - A A- A Q IZ 0 1 - - E XX 2 - 1 2 | | M Hd e t-2/11 - 8899
nis cu s-A
/1
0| M H
C h lo e o g l o us co JS N U LF 0 1 -11 2 - -1 e u
P
s
u
P a c a _ m u la tt a - J S A N U 0 3 - EE X 2 2 - 04 | M HH CC / 1 - - 8 99
M
C o o nn _ t t r o gn is c _ le t t a a - A Aa - J Z Y K R- E X 20 1 - E 1 - E X - 4 |P 1
a
tes
a
h
M a c a c _ m u a b a is - A J Z LU E 0 1 - E - E X2 - 1 - c e | I n dH C / 1 1 - 8 8 9
c a _ m la tt - A U E 1 - - E X - 2 | M H t / 1 - 8 9
C a s a li s it h e c u s_ ro xe JY K R -9 1
/1
ac a us ith bu bi at ro si H L A 0 - E 9| 1- MH H C /1- 8 9
C a n _ p a u s u la t t r in - J 0 3 I B R 0 X 2 - 9
s
at ys
M a o c e b p io _ u s _ la t t - A A - J Z R 0 1- M H- E XX 2 - 2 1 | | I n n d e C / / 1 - - 8 98 9
M a a c a s _ s n u b t y s - - J S U 0 F 0 1- E X C - I 2 - 4 | M H Ce t / / 1 - - 8 9 9
M e rc o c _ la rv s _ ro lla n a 0 1 -E
C
lo r P a o c e _ m ul a t t a t r in aJ A B s 1 0 E 0 31 - E X 2 - 1 7 | I | M HH C u / / 1 - - 8 9
p io u la a - A U E 0 1 X - 1 | M C / 1 8 9
Pa n_ sc us _le uc A0 1-E X2
4- E X
ic
M a in r _a b c go - e - A C -E D 2 - 1 |P d e 9
-E
a-J
M a cr s i o po c e n u u s _ u s _ l e n x 2 - A D Z 0 4 X 2 - F V 0 - 1 2 | 7 | M s e u t / 1 -
EX
c
9
ac ca _s ec s_ s-A ys xe s-J A - 1 - X 2 Pse EX C/ / 1 - 89
Rh
ac
-8
2
_ fa s a ty M H X -J A B
2
-J
a
n
in
/1
llan
3-
-J
JF E0 2--E X2 -7 |In de
R h p o it _a s 1 t e s- E ys 3- E X X2 -1 C/
M a ca s_
tt
X
M a c _n x e n tt - A Z Z 0 1- -E 2 -1 8 | In
H
a
b u s_ 10 -e A C Z0
G
eu
rc ca u es a - s u U 0 - E X2 -1 -5 |M se C /1
Ma ecu
-C AB D0
_
u a
C a rc op s u d y t e JF en 0 1 - -E X 2 H
01-E X2-6 |MHC
IA X
ith
P an _tpa ni sc us -C YU I0
c a o
KQ
m tt
X
0
01
Y
u
tt
9
Aotus _nanc ymaae
is
ucin us-
-EX2-6|M HC/1-88
t/1 89
R h o p it a n g o x2 -H
eli i-A BG
P a n m a c a m m e le n J S s - - J A V 0
-EX 2-6 |Ps
-8
A N Z 0 1 E X 1 3 n d e /1 8 9
-EX
9
la
0 -E
IA
-8
X X |M EX 7|P 89 89 u/1
M aa n d ric a _ n e b is -Au la ri s -A
de /1-
-E
E
M c a c i o _ m e _ a l a r s - A J S U N UF 0 1
01 01 X 0 X eu eu |P 1-
Aotu s_na ncym
M P a _ne bu cic lar tta - A ZL
M Ce cac _fa ca_ _ m est
U
-E
ul e ta ox eu 01 01 JA 1- 9 | I n e u
9
Saim iri_b olivi
ac a a st ty is Q E
-J
2-H LA -J/
U F E BR -E s s -3 t/
Z
e
ac rc a_ s m u ri
a p
X2
r
X 2- 2| se u//1
Pa n_ill a_ go rililla
-8
2- 5|P t/1 -8 9
a a
la
H C -89 H -89
N L 2- A 1 |P |P 2 de
HZ
u
s
ac _m nu rin s-J -A IA 0 01 -EX
-E
/1
H
ac o fa ci u lat na
s Z 01 -E 2 - -7 |In o t/ - 9
M ca ac c a c nem
2|
A -JZ 0 -J B0 14 12 EX |In
C
|M MH /1 9|M /1
a_ u b i s a- Z Q 0 3- - E 2
a ca a a
ab
-3 |P u /1 -8
0
a ce s cu la t a -J
-E X 2 - 3 |M H t/ 1 - 8 9
r
Rh ino pith
a M a _
n u ic
Ps eu /1 89
-
23 4| C -1 eu
o la ta H LF 0 1 X 2 - 9|
-A a RT na QI 2- 2- 1- -7
-1
m l a t - A JZ LG IA0 1-E EX X -1
P se u/ 1- 89
e
eu/ 1-8 9
c
Po ng o_
C/1-91
eu /1--8
ta rin B la -A X X R0 2
2- 2- H 2 s
|P s e /1 -8
u
-1 |M e t/ -8 99
-
M c a ll u m
2 |M H C /1 - 8 9
-8 9
at st -A el s -E -E B EX
se u
M
/1-89
G
tt
u/1 9
N L
No ma
X
1-88
/1 -8
M ca
1-8
ac
C on se
9
C /1
M 89
M c u m s _
s u i - A
|P s C /1 - 8 9
a
u /1 -8 9
C c o h e le
1- 89
M aa p io _ a
H C 1 -8 9
2 9|M HC /1 -8
ac 9 1- 89
89
-8
R h in
-
M -8 C/ 1-
m
ac Pap aca
-8 99
e u /1 - 8
u -
/1 H et /
/1
M a
X
9 89
-8
ac io _m
c
9
C in s c
C
P a_ _a u H 2 | M| I n d - 8 / 1 -
2 I
-8 9
H
C 1 t
er Pa api m nu lat M 1
2
3| 2- 14 C/ de
C
u 2- X 2- H n 9 9
M co p o_ l b ta
6
M X -E EX |M 3|I / 1 - 8 1 - 8
Rh an M M ac ce io_ an atta is -A -A
/ 1
in dr a c aca ac bu an ub -A H AN -E 01 - 25 -1 C t / 89
-
-8 9
5 M d t 1 8
03 A 01 2- 2 H d e -
c
o ill a c a s u i A Z U
M pith us_ c a _ a_ _m _at bis s-A NU Z 0 01 E I A X X M n /1 8 9 9 89
9
1
s r
U Q I E -E | |I C - -8 1-
e
ac ec le m m ul ys -A H
0 1 - S -A Q 3- 01 2 8 - 4 H / 1 / 1 C/
M ac us uc u l ula att -J H ZZ 1 - E EX -J ris s-A E0 LF 2 - X 2 8|M H C H C H
-
9
/ 1
M a a c a a _ f a _ r o o p h a t t a t t a a - J Z L GZ Z 0 0 1 - E XX 2 - 2 - -8
2
t t a u l al a r i S U - J Z - E X1 - E X 2 - 6 | M2 | M 1 0 | M t/1
-
R
ca n icu lan us AN H E -E E X2 20 |I nd In 1-8 9
9
s X
_n em la a- -J U K0 03 X X2 -7 |M nd e _ mf a a s c l a t t s t r H Z A N E 0 - E X - E 1 - E 1-89 - 1 4 | H C / / 1 - 8
1
Ma e r Y - a _ e A U 1 9 9
Ch M ca m s s A K 1 1- E 2- -0 |M H e t/1
e i J 0 f u 1
a c c a _ m m s - A t a - J S Z 0 U 0 I A o-3 X 0 | M H 1 - 8 / 1 - 80 / 2 C
lo r a c a c a C e s t t r i n - A Q B R Q 0 - E XE X X 2 - 1 4 | | P s H CC / t / 1 - - 8 ac a ca _ ne b i a t a- Z N Q r c 1 - E -2 | M C/ C
Rh
in o
o c c a _ m o n r in a - J I A 0 1 1 - 2 2 - 9 | M e u / 1 - 8 9 9
e _ u s a Z 0 - E - 1 M H 1 8 M a ca c a a c ac a _ n u u l l a t t - A H - A Ai s - AI - c e F 0 E X 2 2 - 2 1 | M H | M H - 8 9
p it M a c b u s _ m u la t t e n s - J Z L F 1 - EE X 2X 2 -1 8 | | M HH CC / 1/ 1 - 8- 8 9 9 M a m r L 3
h e a c s a la t a - J u s L F 0 1 X - 1 0 | M C / 1 - 8 9 M a c c ai o _ a _ m u b i sl a t t au l a C - - J Z 0 1 - - E X 2 - 1 7 2 - 1 d e t / 1 8 9 - 8 9 /1 - 8 9
M a p a c a_ n u u ic H na LG 0 1 X E X In / 1 - /1
Rh cu a_ b t a S 1 01 -E 2- 0 M HC / - 9 M a c ac a m sc - M tri Z U -E 1- -8| C HC HC
in o C e P a ps _ r o m u laa e u s- A A NU E 00 - M - E XX 2 -1 5 | M| P s H C / 1 1 - 8 8 9 P a c o _ a _ f a 1 0 s s - J A N 0 3 A 0 2 M H |M 7 |M
M a pi c _ u s me ty A E QI -EX 3| -0 2-1
eu /1 -8 9
MH
p it r c io x e t t - A U 3 - H C 2 2 | H M a ca ca s e _a ta- SU -A 1 - 1 X2 X - 8 9
h e o c e _ a ll a a - Q 0 1 E X - - 1 | M C / 1 - 8 9 P a c a e n _ n s a t a - J r is G 0 X 2 - E 1 - E /1 8 9
c n n J I I- H 9
C h M a c a u s _ r b u s _ u b is a - J AS U EB 0 1 - E X 2 - 1 cerI n d e C / 1/ 1 - 8- 8 9
C-
M a n s c a b u u l t t la Z L - E 0 1 R 0 H C /1 -
lo r - 0 c M o c a c e _ m u la ic u s - J 0 1 K R A B |M H C - 8 9 9
o c e c a _ n o x e ll a ty s - A H B R 00 3 - EE X 22 - 1 | | M Ho-1 t / 1 - - 8 99
I-
C a r c o c a m s c t y Z Z - J Y - J - 1 7 |M t/ 1 8
b e an -J Z - In /1 8 M e c a a _ f a _ a A H is a n a X 2 2 - 6 d e C /1 -
HLA-F
1
M a P a p u s _ s m e s tr a - J AZ L G Z 0 1 - - E XX 2 - 73 | M d e tC / 1 - -89 9 C a a c a _ u s is - n s e ll - E - E X 7 |I n 9
ce
H
cac io a in 0 E 2 |I H / 8 M a c a c c e b u b o le r o x Z 0 1 0 1 2 - 2 2 5 |M e u /1 -8
a _ n _ a n b a e u a - J ZB R 0 11 - E X X 2 - - 6 |I nn d eC / 11 - 8 99
HLA-J
M a c c o a n a n g s _ H Z L G E X X 2 - |P s 9
rc
C e rc e m u b
is s - A L F - E X 2 - 1 5 |I n d e t / 1 - 8 9 M e r io _ s _ c u - A - J Z 0 1 - 1 - E -1 4 e t/ 1 -89
P a p o c e b ue s tr in - A H ZQ IB 00 1 - E 2 - 1 48 |M Hd e t /t / 1 - 8- 8 9 C a p b u it h e b is ty s Z Z U 0 X 2 |I n d
o-
C io s a Z 1 X |M C /11 - 8 9 P o lo o p a n u s _ a A H A N 0 3 -E 2 -5 H C /1 -89
M a c e rc o c e _ a n u _ a ty s-J Z L F 0 1 - E- E X 22 - 9 |M -
C h in io _ b u b is a - A U E -E X |M /1 -8
2
HC - 9 R a p o c e n u la tt -J S D 0 3 X 2 -5 H C
a b -J 0 X - 8
M a c c a _ n e b u s _ a is -A H Z L G 1 -E X 2 - 2 26 |M HH C /1/1 - 8 99 3 P e r c _ a m u la tt a A B 3 -E 2 -6 |M -8 9
o- C a p io c a _ m u a -C U I0 -E X
aca m e s ty s -J Z Z 0 0 1 -E 2 -1 |M H C /1 - 8 9 9 C /1 -8 9
L
M a c _ fa s c ic tr in a -J Z L G 1 -E X X 2 -76 |M H C /1 - 8 9 P a c a c a _ o ri ll a -C Y E 0 2 -H /1 -8 |M H
A-
rc
1
a 0 |P s C /1 - 8 9 MH M a c a a _ g o ri ll -A J F -H L A X 2 -4 |I n d e t/ 9
M a n M a c a c c a _ m u u la ri s -AZ L F 0 11 -E X 22 -2 0 |M e M o ri ll a _ g c u s 3 -E 2 -3 /1 -8
e u /1 -8 9 C- x2 9
HL
G o ri ll a n is s 1 0 -e A B D 00 3 -E X -3 |P s e u H C /1 -8 9
d ri llu a_m la tt a -E -1
u la tt -J S UQ IA 0 1 X 2 -2 0 |I n dH C /1 -8 -c
0, 55
-8
s_ le
3 |I n e t/ 1 99 I- G a n _ p n s u ri ll a -C A B D -E X 2 2 -3 |M u /1 -8
C
R hi no ol ob us _au co p h a ea -A A N UE 0 3 -E X X 2 -0 |M
-E
d e t/ -8 9 ce C-I P o n s e _ g o ll a -C F E 0 2 0 4 -E X -7 |P se H C /1 -8 9 9
0,9 7
pi th ec u s- JY 0 1 -E 2 -1 C o ri ll a _ g o ri s -A J A C Z 1 -E X 2 |M
ng ol
M ac ac us _r ox elen si s- JY K Q 0 1 -EX 2 -1 4 |M
5 |M H C /11-8 -8 9 rc MH G o ri ll a n is c u te s -A D A 0 X 2- 12 |I nd et /1
G n _ p a g lo d y e s- A A JZ LF 01 -E-E X 2- 10 t/1 -8 9
-8
H
o-
0, 85
a_ la KR X 2 -4 H CC /1 -8 99
C er co ne m es trina -J A B R 01 -E X 2- |M H /1 -8 9 1 P a n _ tr o g lo d yt tr in a- Q R 03 In de /1 -8 9
P a n _ tr o ne m es tt ii- A A 2- E X 2- 6|X2 -2 |M H C -8 9
0, 82
Co lo bu ce bu s_ at na -J ZL F001 -E X 2- 5| M H C C /1 -8
0,
/1 -8 99 P a ac a_ _g ar ne A B R T0 C 02 -E /1
0, 94
s_ ys E X 2- 4| M H C /1 eu
Ce rc oc an go le ns is--J ZL G 011- Ps
87
eb
Ma ca ca us _a tys -J ZL 01 -E X2 |In de -8 9 Ta ro ce bu m ur in us -A BD C0 2- HC
_m
Pa n_ tro ula tta -JS UE G0 1- EX -6 |M HCt/1 -8 9 0,
92 M ic ro ce bu s_ ur in us BD C0 2- EX 2- 3|M/1- 89
76 -H
2- 11 |In /1 -8 9 75
0,7 7
0,
Pan _pa nis AC Z0 4-E -0| Ps -89 69 Mi cro aty
bu s_ tta -AA NU 01- EX
X2 MHeu /1- 89 76 |MH C/1 -89
Gor illa_ goricus -AJ FE0 2-E X2--7| C/1 -89 0, Ce rco ce ula E03 -EX 2-6 1-89
Ma cac a_m ula tta- JSU-AQ IA0 1-E X2- 3|In det/
0,
0 ,7
0, 86
|MH -89
2 ,6 7 0 Sa imi ri_nan cym aae -JY X2- 9|Ps eu/1 t/1-8 9
0,
s-A CF V01 -EX -1| MH C/1 -89 0 ,6 0 0 ,7 7 Ce bu s_ bol ivie nsi KP 01- EX 2-8 -89
jac chu X2 t/1 -89
0,
s-A GC E0 |Ind
Cal lith rix_ ivie nsi s-A GC E0 1-E -10 |In de Ce bu s_ ca pu cin us -LV 1-E X2 -13 et/1 -89
2
6
MHC-
88
HC /1--8 89 0, 99 0 ,8
01 -E X2
0, 79 0,7
Ce bu s_ ca pu WQ |MH
Sa imi ri_ bolcy ma ae -JY KPQ0 1- EX 2- 7|M In de t/1
9
0 ,8
5
I-pl ca pu cin us -L VW 01 -E X2 -17 |M HCC/1 -89
0,78
9 V Ce s_
X2 -1 7| HC /1 -8
HLA- Sa buiri ca pu cin us -L VW Q0 1- EX 2-
0,68
an
Ao tus _n ca pu cin us -L VW KP 01 -E 2- 2| Mse u/ 1- 899 aty- /1- 89
S aiim _b ol ivci nu s- LV Q0 1- EX 2- 20 |M HC /1- 89
Ce bu s_ nc ym aa e- JY BX K0 1- EX X2 -93||PM H C /11--889
4
0 ,7 1 2 m iri _b ie ns is W Q 01 -E 5| MH C/
0 ,9
na us -B -E C al lit ol -A 1- 89
9
Ao tu s_ ch 01 iv
A ot hr ix _j ac ie ns is G C E0 1- X2 -4 |M
rix _j ac ch us -A C FV -E X 2- P se u/
0,9 2
89
7
A o tuus _n an ch us -A G C E 01 EX 2- 3| HC /1 -8 9
0
82
lli th FV 01 2- 2|
X 1| P se u /1 -8 9 u/ 1- 9
3
Ca ac us -A C 0 ,9
0,
hr ix _j 01 -E C a s_ n cy m -A C -E MHC
0 ,8
9
hr
C al lit _t ro gl od ytyt es -A A C |P s e-V /1 -8 C it h ri ja c a e -J P 01 2- 15
|M H C /1 -8 9
E 0 2 -E
0,85 0,14
X 2 -1 -9 59 C a ll it h x _ ja c h u s Y K P -E X 2-
0,9 6
0,7 6
A
P an ro gl od is cu s- A JF D 0 3 -E x 2 -H L s e u /1 /1 -8 9 MHC C a ll it h ri x _ ja c c h u -A C F 0 1 -E X 2| M H C /1 /1 -8 9
|P
B
P an _ta n _ p a n ri ll a -C A s u s 1 0 -E X 2 -1
-e e u
|P s t/ 1 -8 9 C a ll it h ri x _ ja c c h u s -B B X V 0 1 -E 2 -4 |M -8 9
P o en 3 2 -1 n d e /1 -8 9
ll a _ g C o n s -C Y U I00 1 -E XX 2 -4 |I|P s e uu /1 -8 9
-I- S a ll it h ri x _ ja c c h u ss -A C F K 0 1 -E X 2 -1 0 |MH C /1 -8 9
C aa im ir i_ri x _ ja c c h -J R V 0 1 X 2 -1
pla
HLA-
G o ri a A -8 ll u U -E |M H C /1
ll e C it b cc s -A L
g o ri li i- A B GF V 0 1 -E X 2 -1 |P s C /1 -8 9 A a ll it h ri x o li v h u s C F 0 1 -E 2 -2 0 H C /1
X -8 9
ll a _ e C 1 -E 2 -4 |M H C /1 8 5
G o ri g o _ a bh u s -A C F V 00 1 -E XX 2 -9 |M H e t/ 1 - - 8 99
ty- A o tu h ri x _ ja c ie n s -A C V 0 1 -E X 2 -0 |I |M H C -8 9
3 A oo tu ss__ n a _ ja c cc h u sis -A GF V 0 1 -E X 2 -7 n d e /1 -8
P o n _ ja c c h u s -A B X K 0 1 -E X 2 -30 |I n ds e u /11 - 8 9 |M H t/ 1 -8 9 9
G
C tu n n c h -A C
ri x c c s -B B R -E 1
0 1 X 2 - 2 |Pn d /1 - 8 9
- e t/ 8 C e b s _ n a n c y m u s -B C F V E 0 1 X 2 -1 C
it h _ ja hu A C e b u s _ a n y m a a e B X 0 1 -E-E X 2 -28 |M H /1 -8 9
C a ll ll it h ri x _ ja c c ll a n a -J-J Y K R0 1 - E - E X 22 - 9 |I|M H Cu /1 - - 8 99 A oa ll itu s _ cc a p uc y m aa a e - - J Y K K 0 1 -E X2 C
C a ll it h ri x ro x e n s is B G A IB 0 1 - E X 2 - 8 |P s eH C / 1/ 1 - 88 9 C tu h a c a JY P -1 |M H /1 -8
C a e c u s _ n g o lee li i- A s - A Q Q 0 11 - E XX 2 - 1- 2 |M H C / 1 - - 8 99 C e b s _ r ix _ p u in u e - J K P 0 1 - X 2 -0 1 |P C /1 -8 9 9
S e b u s n a ja c in s - L Y K 0 1 E X 2 |P s e u
HC /1 -8 S a im u s _ c n c c c h u s - V W P 0 - E X - 2 s e u /1 -8
p1
h a W 0
p it b u s _ o _ a b b a e u - L V W Q 0 1 - E- E X 22 - 8 |M | M seu /1 -89 A a im ir _ c a p y m u s L V Q 1 - E 2 - 4 |I n /1 -8 9 9
in o s HL C o t ir i_ a p u c a a - A W 0 1 X 2 1 |M d e
R h C o lo P o n g s _ s a c in u s - L VV W QG 0 11 - E XX 2 - 06 | P s e uu / 11 - 8 99
I-
C ea l l iu s _ i_ bb o li u c inin u s e - J YC F VQ 0 1 -- E X 2 - 1 4 |I H C /1 t/ 1 - 8 9
C-I-p2
u
e b a p u in u s - L Z L F 0 - E 2 - 3 | P s e C / 1 - 8 9 A- C t n o v - K 0 -
C-
E n -
r o c u s _ c a p u cc in u t y s - J J Z L T 0 2 1 - E XX 2 -2 6 | PM H e t /t / 1 - 8- 8 99 P e b u h r a li v ie n u s L V P 1 - X 2 0 |P d e 8 9
lo
C h C e b u s _ c a p u s _ a r in a - A B R L F 0 1 - EX 2 - - 1 1 | | I n dd e t / 11 - 8 8 9 AB P a b u s _ i x _ n c ie s - L V W 0 1 E X - 1 s e t/ 1
P o n _ s _ c j a y m n s is - W Q 0 - E 2 - 6 |I u /1 - 8 9
MH
b c u t - Z 0 E 2 5 In e / - 9
C e b u s _ o c e b m e s h t a a - J Z L G 0 3 - E X X 2 - 1 0 | | I n d H CC / 11 - 8 8 9 C C a n g t r c a p c c a is A G Q 1 - X 2 2 |M H n d e t - 8 9
P o n o og ap uc h ae -A C 01 EX -5
A-
/
C e e r c _ n e s y r ics t r iny s - J U E 0 1 - 1 - E X 2 - 2 - 2 6 | MM H C / / 1 - - 8 9 9 P a n _ t _ a l o u i n u s - J G C E 0 - E 2 - |M C / 1 -
G a n s e ro b d cin us -A YK E 1 X 18 HC 1- 89
HLA-MH
Pa o r n _ _ t r o n s g l o e l i i y t e u s - L V C F P 0 0 1 - - E X 2 - 2 2 | M H /1-8 8 9
HL
C ca s_ e t JS Z 0 E X 1 | H C /1 -8 9
c a iu e m s _ a t a - H Z B R 1 - - E 2 - 2 8 | M H C / 1 - 8 7 P - s - 1 2
M a T a r s a _ n e b u u la t i s - A - J A L F 0R 0 1 - E X X 2 - 2 - 6- 0 | M| M Hd e t u / 1 1 - 8- 8 9 9 O o n n _ i l l a t r o g g l ou s 1d y t e A B - A AL V WW QV 0 1 - E XE X 2 - 6 | M| I n d C / 1 - 9
c / t o g pa _ g l o d 0 s G C Q 0 - E 2 - 0 e 8
ca coc _m u b a n a -J Z Y K 0 1 3-E E X X2 -5 | I n se e t /1 -8 le o_ n o d yte - e -A A0 Z 0 1-E X2 -26 |In HC/ t/1-8 9
P on s t r g isc i l l a - a -L YK P0 1- - 8 - E | I n e u
O t o le m u r _ g a r r n e t t t i i - - A A n s u 0 3 - - E XX 2 - 2 - 9 5 | MI n d d e t t / 1 1 - 8 8 9 9
M a C e r a c a _ a nx e l l r i n a i s - J Q I A E 0 0 1 - - E X 22 - 8 2 | P I n d H C u / 1
O t o le m u r _ _ g a r n e t t i i n s e U EF 0 1 3 - E - E X 2 - 1 1 2 | | I n n d e e t / t / 1 - - 8
m a is r i l y t s x 2 A C 1 - 0 4 1 - X - 0 | I n d e 1 -
P o n n _ _ t r oa n g o r r i l l m a u s e - J Y K P 0 X 2 V 0 1 2 - 1 4 | P s
on g e og lo u a C e- V P 1 E | M X
9
C a n _p _ g o cy cin aa -J YK - E F X 1 7
O t o le m u r _ g a r n e t t i i - AA A QQ R s 1 E X 2 - 2 3 0 | | I n H Ce t / 1/ 1 - - 8 9 9
u r b e cu l a e s - A - H Z E X - E E X 2 - | M d t / 1 8 9
O t le m u r g a r n e C o - J S Z L E 0 0 1 E X 2 - - 1 6 6 | I I n d d e e t / 1
c i o o t s A U U 1 -E - | e
g o o _ n s l o d y s - A- C AY U J Y W Q 0 1 - E XX 2 - H C 2 - 5 e t / 1 - 9 0
P a n lla _ an u m ae -J 0 3 AD - E 2-
M a P a p s _ r e s l e n r i s - - J S A N L F 0 0 1 - E X X 2 2 - 2- 4 | M| P s _ g l i s- - C - A A C L 0 4 2 X 2 2 - 3 | M H C e t / - 8
P a ri lla n ap y a ae D s- 01 EX
le u r g a e ii - A R 3 - - 2 2 | I n e / 1 - 8 8 9
_a ab u s d y t e JF B DI 0 3 KP 01 -EX 2- 20| / 1 - | M H - 9 0
a r - A AJ Y U A D Z - A - E - 4 | - 9 | 2 | M H C / 1 1 - 8 9
i A
P o r i s _ c n c y m a A B n y V 1 - 8 9 |M H / 1 - 8 9 - 8 9
M ic t o le mm u r _ g a r r n e t t t ii - AA A QQ R 00 3 - - E XM H C4 | I I n d d e t t / 1 - - 8 9 9
c u n e mn g o u l a t t a - A - J Z L G 0 1 1 - E E X X 2 2 - 9
at a- S I 1 E 2 - -4 |In d
G o tu s_ na nc ym - C ge C F V0 / 1 - 27 C C / 1 9
ul in -J Q 0 1- EX 2 2 6 In
e n e B G FE I 0 A 0 4 / 1 X 2 I n d M H H / 1 - 8 9 9
G o bu _ na nc la co -A F - B 2-
M ic r o c e u r _ g a r n e t t t ii - A A Q R 0 3 - E X 22 - 5 | - I -n d ee t / 11 - 88 9
_ c a a Z B -
m str ta s-A WQ 0 1- EX X -2 5|
ith a _a ci ul t t na J I 0 1 -E X
A
M p 1 t/1 -8 9
op ac s s m la ri s- Q ZZ 0 1 E t t i A 02 3 - 0 1 - E X - 8 - 8 e t C C / 1 - 8 8
C eo t ut u s __ n ag o r _ le uc h u s - A H L A0 3 - E 2 - 1 2 - 1 0 |M H C
a_ e lat ri V CE 0 3- -E 2 -2
H / -8 9
-A B 2 A C A EX X - 6 2- 10 Ind et -8
i n ac b u _ f a a_ m u st ty s - A H K 0 1-
1 |
A o u s a _ u s c h u 2 - E E X X 2 - 8 |M
M ic r o c e b u s _ m u r e t t ii - - A A Q R 00 3 - E- E X 22 - 62 | M H CC / 11 - 8 99
E
i - A 0 1 -E X - E 2 9 | I n / 1 - / 1 - - 8
ac em u la -L G U 0 01 EX 2
A o t r i l l s c _ ja c c - E X S U 1 - 4 - E X 2
B GA - H Z 0 1 2- 2 - 9 | M 18 |Ps et /1-8 9
R h M o l o c a c a c a _ e m e s _ a e u s - A A E HQ I B G 0
ac n _m ic us -A N E A - X
A o a r ix _ ja 1 0 a - J A 0 Z 0 - E 8 9
HC /1 -8 9
/
A - E X2 2 - 2 X 2 - - 4 | de 89 89 9
G 0 L 0 - 7 | H |M e /1 9
M a_ a c in is AA SU I 01 -E
G o m it h r ix u s la t t B G A C 0 1 /1 - n d e t/ /1 - 8
6
|M HC /1 -8
d
N a ll it h n s u i- A - A D A - G - 6 |I H C 9
_ m r in s - A D 0 3 E X - 0 | M C / 1 - 8 9
C ca a ac n u a bi - A L t/1
A 1 A - 4- E X |M M H C / H u/ -8
Q X -4 | 1 M
s
C a ll s e a _ m e li t e s - A A L A X 2 - 2 |M /1 -8 -8 9 -8 9
ac ac fas uc ns a- -J AQ Z 01
M a u s _ m u r in u s - A B D CC 0 2 -- E X 22 - 9 | | P s eH C / 1/ 1 - 88 9
a M a c a_ eb a b u t t a s- JZ
01 -E C E 2 H C 1 C 1- 9
R 2 - |M M | M H
C n c
-8 M u - 9
-8
-
C oa c ag o _ ag lo dd y te- e x 2 A 0 1 3 - E X -2 |M H |M H n d e t/ 1 -8 9
M a a c a c a c a _ m ri n u s - A B D C 0 22 - E XX 2 - 6 |I n dH C / / 1 - 88 9
- 03 1 H H C
ac ac _ p ie tt ta s- Z Z
M n
M cac coc s _ s _a u l a aeu tys 9
M n r o lo 0
1 9
-E X / 1 X 2 - 6 C/ / 1 - 8 9 /1- 89
- E 1 | I HC C / 1 C / / 1 - 8
M M aca _ca liv ula lat ari -AH HZ
P oa n __t tr o gs u s 1li i- A BC A B D0 2 -E 4 -E X X 2 -1 |I n d e 1 -8 9
a _ fa m u la la tt a -J B D CC 0 2 -- E X 22 - 0 |M H Ce t / 1 - - 8 9
ta J U A - X
P a n s e n a b e la - J F E C Z 0 0 1 -E X 2 -2 d e t/ 9
s c ic tt a -A S U 0 2 -EE X 2 - 5 |I n H C /1 - 8 9
X 2- - 8 -1 1 | I n 1-8 - 8
r u io
P o n o _ o r il s -A -A A Z L F 0 1 -E 4 |I n t/ 1 -8
A N E 0 3 X - 4 |M d e /1 - 8 9
a a
M Ce e b ap a _ m sab s_ X n d /1 - 8 1 - 9
u ul is A
C o n g la _ g is c u te s a -J Q IA X 2 -2 n d e
1 89
2- 8| 9 | d 9 9
M ic r o c e b u s g a r n n e t t iiii - A AA Q RR 0 33 - E XX 2 - - 3 | M
P o r il p a n lo d y e s tr in s -A 0 1 -E -2 3 |I d e t/ 1 -8
e s tr n s e IA 0 X 2 -2 |I H C /1 --88 9
-
G a n _ tr o g e m u la ri N U -E X 2 -7 |I n
9
2 - e -8 9 8 9
3| In
M b ri_ a_ _m cic b is
P _ _ n c ic -A A 0 3
2 -6 H C 1 -8 9
o c P a c s_ bu
2
P aanc a c aa _ fa su la tt a -J S U EU 0 1 -E 2 -5 |I n d /1 -8 9
/1 9
4| t/1 9
Ce imi cac aca as anu ub
In de
il
M a c a c a _ m la tt a -A A N 0 3 -E X |P se u 1 -8 9
X 2 -1H C - I H C /1 -8 9
-E X 5 |M d e t/ -8
M ic r o c e u s _ mm u r inin u s - A A Q R 0 3 - E X 2 - - 1 | M
or
M a c a c a _ m u la tt a S U E X 2 -8 |I n d e t/
5 |M -p2/1 -8 9
u e
M a c a c a _ m u tt a -J Z 0 1 -E 2 -2 3
a c eb oc P -8
-89
de t/
hl
M a c a c _ m u la -A H Z 0 1 -E X X 2- 10 |P u/ 1- 89
5 |M e t/ 1 -8 9
n
C o ri s -A QU 0 1 -E-E X 22 -1 |M H C t/
M a c a c a n u b is -A H Z Z U 01 -E 1| P se
H C /1 9
/1
se 9
0
M
-8 9
Sa Ma ac a_ io_ _a
L G 0 -E X 2 -1 8 |I n dH C /1 -8
P a pp io _ a nm
el la na H ZZ 01 X 2- 15 H C /1 -8 9
b y s -H -E
t/1 1-8
P a ac a_ ul at ta -Jph
-E X 2- |M H C -8 9
7
C M c rc
M ac a_ m eu co
M H C /1 -8 9
E
M ac ac
u/
/1 -8 9
|M
M an
/1 -8 99
o
M ac p io
Pa pi
Ca llitith
/1 -8 99
or Ce
Pr op iri_ bo liv ien sis-LV WQ 01 -EX et/1 -89
-8 9
Sa im ca pu cin us 01- EX 2-5 |Ind
1-8 9
1-
Ce bu s__ab
-8
89
Po ngo _ne mes trin a-J ZLF -EX 2-15 |Ind et/1 -89
X2- 0|M C/1 -94
ac P ap
Aotu s_na
2
/1
t/1-8 9
hl
KE01-EX 2-3|MHC /1-89
9
io
s C
89
/1-8
X
M et
de t/1
P
C
f
C
a
_a
dr ill usub is -A HZ
|M
ac us b m
H /1
o_ an ja cc hu s- AC
|M H
u
RT 02 -E -1 5| M HC
C/
C -8
hr ix_us _c oq ue re li-J
2- 4|M HC
-
us BD 0 E
/1 9
3
us _r ox nu bi s- AA B R 01 -E-E X 2 -11|M
-1
o
ec
|MH
X2 -09||In
-8
u
Propithecus_c oquereli-JZKE 01-EX2-2|MH
X2 -5 |M
G
pi th ec u s_ sa ty s- JZ Z Z 0 1 X 2 -1
M
b
C e rca p io _ a n ll a n a -J-J Z L F 0u s 1 0 M
9
ul at ta S U E 03-J YK Q 01 se u/ 1- 89 -8 9
P ap ioox el la na -Js- A Q IB 01 -E X 2 -2
_l
R
Q
is
elii -AB GA
-25
t
1
1 -E
O t t o l e mm u r r _ g a
0
R hi no h lo ro ce b ce b u s_ ab is -A H B R 0 1 -E
X |M
T0 2- EX
A
X2
Ta rs iu us _a ty s- -J AB R 01 -E
-A
n
89
_
1-E
-JY KP0 1-E
s
O tole mu
X
AN
-A
n
u
2
Z0
ae us 2- 12 |P
2
-A GC E0
O ole
H H
d /1
JYKP0 1-EX2 -6|MH C/1-89
A
1- EX
Tarsiu s_syri chta-AJYKP
FV 01 -E 1- EX 2- 4|M 1-8 9
ta -A BR
Ot
C/
_
a
u
u la
-2
-E
u
JZ
_
in
ZK E0 X2 -9| MH C/
1-8
u
o
X2
C C /1
baeu
X 2-
1-8 9
-8
/1 -
01- EX2 -20
aae-
9
aae
1-E
ro c b u s
ncy ma ae-
u
9
7 |I
o
xe
_s yr ich
hec a_nem
b
8
X2
9
_
ca
eb
_a
cym
cinu
ro
O
2-1 9|M HC
-1 3| M HCHC /1- 89
e t/ 1
-E
us_
cac
us _r
se u/
eb
s_
X2
t/
fla
|MHC /1-89
c
1
-8 9
2-1|MHC /1-89
M
R h in M a c a
Ce rc oc
P
-1 |In de
Eu lem ur_
1- 89
9
o p it
/1
/1- 89
C
t/1 -8 9
R hi no
Figure 5: The phylogenetic tree of 417 EX2 sequences of MHC-I found in primate (Prosimians, Platyrrhini, Cercop-
ithecus and Hominidae) WGS datasets with the software MHCfinder. Primate orders are distinguished by color codes,
indicated in the legend. Clades that are suggestive of representing a gene or group of genes have been collapsed. Each
collapsed clade are labeled with the human isoform orthologs, however, those clades not containing human genes are
assigned new names (e.g., MHC I-p1, etc.). The nodes of the principal clades have been labeled. The tree was con-
structed after aligning sequences with ClustalO and using the phyML (part of the Fasttree software (Price et al., 2010))
with the gamma parameter, WGS matrix, and 500 bootstrapped samples. The consensus sequences for identifying
clades are marked in black.
10
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
ied (i.e., EX2, EX3 and EX4), EX2 contain veloped other genes that overtook the basic
the most information to discriminate genes for functions of these classical HLA-class I found
classification. A phylogenetic tree (Figure 5) in Hominids.
was constructed from 410 EX2 exons found From the tree of Figure 5, the HLA-E and -F
from the primate WGS datasets, consisting of clades contain sequences from all the primate
both viable (those having exons in tandem) orders except Prosimians. HLA-G, -H, -J and
and nonviable genes. In the tree, clades were -L clades have sequences from Apes and Cer-
collapsed to improve the visualization of re- copithecus. With the analysis of MHCfinder,
sults. Human sequences from the HLA iso- most of EX2 sequences of Cercopithecus of
types HLA-A, -B, -C, -E, -F, -G, -H, -J, -V the HLA-G clade are identified as pseudo-
and -L genes were aligned with the sequences genes, consistent with experimental results
from non-human primates in order to associate (Castro et al., 1996). All the sequences of the
and identify clades homologous to those in hu- HLA-V clade are pseudogenes and are only
mans. Apart from these homologous human found in Hominids.
clades, other clades were also identified. Three Three clades (indicated by the name cerco)
clades containing only sequences from Cer- are found that contain sequences only from
copithecus species are designated with MHC- Cercopithecus species. These sequences were
I-cerco-1, MHC-I-cerco-2 and MHC-I-cerco- generated after the separation of Old World
3. Another three clades with sequences ex- monkeys from the Hominidae family. In a sim-
clusive to Platyrrhini also are identified, indi- ilar way, three clades contain sequences only
cated by MHC-I-platy-1, MHC-I-platy-2 and from Platyrrhine species (indicated by name
MHC-I-platy-3. Also, there are three other platy.
clades that have sequences from at least two Another three clades exist that are com-
primate orders and have been labeled MHC-I- posed of sequences from at least two pri-
p1, MHC-I-p2 and MHC-I-p3. The MHC-I- mate families, but lacking sequences from
p1 clade contains of the largest number of se- Hominids. Of these, the MHC-I-p1 clade
quences, with representatives from all primate (where p is used to indicate primate) is the
species except Hominids, perhaps suggesting largest. In particular, the MHC-I-p1 and MHC-
a concrete function that was made superfluous I-p2 clades contain sequences from Prosimi-
in the differentiation of Hominid species. ans, Platyrrhines and Cercopithecus, while the
Most of the sequences from the HLA-ABC third clade of this group lacks sequences from
clade (i.e. consisting of HLA-A, -B and -C se- Cercopithecus.
quences) are from Hominid species. Also, this
clade contains a single sequence from Cerco- 3.1. Allele assignment to clades
pithecus and seven sequences from Platyrrhi- Several studies concerning the gene alleles
nus. HLA A, -B and -C have a common ori- of MHC-I have been described in non-human
gin and must have been generated after the primates and are available at the IPD-MHC
separation of the Hominids from the Cerco- database (www.ebi.ac.uk/ipd/mhc) (Robinson
pithecus, previously described by other au- et al., 2013; de Groot et al., 2012). These alle-
thors (Piontkivska & Nei, 2003). The se- les were aligned with the germline sequences
quences from these clades constitute the clas- to classify alleles as classical MHC-I (with al-
sical HLA-Class I genes. Therefore, it is likely lelic variation) or nonclassical MHC-I (with-
that Cercopithecus and Prosimian species de- out allelic variation).
11
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
M. Mulatta has been extensively studied to sequences, deduces the amino acid sequence,
reveal such allelic variation of MHC-I. We ob- and transforms it to a 400 element feature
tained allele sequences of M. Mulatta from vector. Each element of the feature vector
the IPD-MHC database and aligned them to represents the frequency of occurrence of each
the germline sequences found by MHCfinder. AA and pairs of AA (i.e., one of the 20 × 20
Figure 6 shows the resulting phylogenetic tree. possibilities) within the protein sequence. A
To reduce sequence redundancy, only one rep- supervised learning procedure is used to train
resentative allele from each allele lineage was a machine learning classifier to recognize
included in the tree construction. The Mamu- exons homologous to those known in humans.
A alleles belongs to the genes that define the We found that feature vectors based upon the
MHC-I-p2 and MHC-I-cerco-3 clades. In pre- simple frequency of AA occurrence transform
vious publications, difficulties have been de- has a higher classification accuracy (attaining
scribed in assigning specific orthology to the predictive precision of 98%) than other more
Mamu-B alleles (Liu et al., 2013). The tree sophisticated transforms based upon posi-
of Figure 6 resolves this difficulty by show- tional physicochemical properties. If the exon
ing that Mamu-B alleles correspond to genes architecture is known, small modifications
belonging to the MHC-I-p1, MHC-I-cerco-1, could render the algorithm a more general
-cerco2 and HLA-H clades. gene finding method, thereby facilitating rapid
C. jacchus is considered the reference or- identification of gene/specific exons from any
ganism for studies of allelic variation of MHC- species whose genome has been sequenced.
I genes in New World monkeys. Many pub- With MHCfinder, exon sequences of
lished studies (Cao et al., 2015; van der Wiel MHC-I were obtained from 30 primate
et al., 2013; Kono et al., 2014) have con- WGS datasets. The sequences have
cluded that the MHC-I alleles in C. Jacchus been made available in the repository
are orthologous to the HLA-G gene. Fig- vgenerepertoire.org. The program iden-
ure 6 shows the phylogenetic tree of germline tifies individual exons, referred to here as
EX2 sequences obtained from MHCfinder to- viable, meaning that they have a valid reading
gether with allele sequences from the IPD- frame. If the exons are found with the canoni-
MHC database. The tree confirms the results cal MHC-I exon/intron structure (i.e., they are
of these studies, since the majority of alleles arranged in tandem, EX2-EX3-EX4 within
are from gene C. jacchus-ACFV01-13—MHC the same WGS contig and have a valid intron
belonging to the HLA-G clade. However, the spacing), then MHCfinder considers this a
alleles of the lineage Caja-G*18 belong to the viable gene (i.e., that is likely functionally
MHC-I-platy-3 clade. expressed). Nonetheless, problems associated
with WGS datasets (e.g., incorrect sequence
4. Discussion assembly or low coverage of certain regions)
may result in an underestimate of the total
A bioinformatics program, number of viable MHC-I genes found by
MHCfinder (freely available at MHCfinder. In WGS datasets with relatively
http://vgenerepertoire.org/), was devel- high coverage and N50¿15k (which is the
oped that identifies the MHC-I exons (EX2, case for most datasets we used), the results
EX3 and EX4) from WGS datasets. The algo- of MHCfinder agreed with annotated gene
rithm determines the in-frame exon nucleotide results, when available. While the use of WGS
12
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Ma cac a
89
HC|1 -89
Mac aca mul a-A AN U0 1-E X2 Ma mu -B* 1687
mu latt
ac a
-8 9
UE03 -EX2 -20|M
Mac ca mu C o n
M a c c a m u tt a -A A n s u s C a ja -2 1 |M s e u |1 -8 8
C a it h ri x x ja c
ac a at ta -A A
ca
C a ll it h ri
H C |1
m
ll it h
ul
m ul at
ri x ja c c h h u s -A
C on se-J S U E 03 X 2- 11 u -B 1 1 L
7 |M
m u latt a -A E U 0 1 -E0 -e x 2 * 0 4 0H C |1 -8
C a M a c a m m u la C o n
ja c
la
X 2 -2
C o n -J S U E 1 -E X 2-1 3 |M HA -F |1 -8 8
la
ta
ll it a c a u la t t s e C V 0 1 1 - E X 2 - H m us e u B * 0 0 4 9
M a aca
tt a
c h u s -A C C F V
ns us -E X 2- |M H C |1 *0 1
N U 01
c a ca
M
s e n 0 3 -E -0 |M C |1 -9 1
3 -E
3-E X2- 14||MH C|1 -89
89 9
Maca ca mula tta-JS
c
u
s -A F V 0 1 -E M a m - B - 8
1- - 8
c
se
ja c u la tt a - A A S U E u s 1 - B X 2 - 4 | | P s e - V | * 0 - 8 9 1
C a ll it h h r ix
C| C | 1
sus
UE0
10 -e
H K 0 X 2 -H L 1 |1 88
C a ll it
C F 0 1 -E X 2 a m u - B * 0 6 9
ch a- N U 03 0 - 8 * 0 -1| Ps u| 1 - 1 9
ll it r ix ja C
-E M a m -p 3 |1 -8
*05 5
C o - A C F UE0 1 - E X2- 2 - H 0 1 Pe u | | 1 - 88 9
MH H
Mamu-B*080
Mamu-B*142
Mamu-B*077
Mamu-B*178
Ma mu -A 1* 05 6
Mamu-B*0 93
Mamu-B *057
-25
Mamu- B*145
Mamu -B*16 3
M am -A 1* 01 8
Mam u-B* 0214
Mam u-B *12
M am u- A1 *0 92
us JS
hr
1 0 -M -0 |M H C |1 -9
V0
M am u- A1 *1 23
3| | M
M a m u -A 1 *0 1 tt a -J S
M am u- B* 00 9
M am u- A 1* 1350
Ma mu -B
x2 -H M H C |1 -8 9
0
Ma mu B* 01 7
ns
M am u-A1 *0 65
9
ix ja c c h u n s e
M a m u -A 02 2
M am -H |1 -8 9
2- - 2 3
M a m u -A 1 *0 9 0
1 -E X 2 -1 -6 |M u - B * * 0 5 01
M am u- A 1* 10
Pse u|1 -89
M am u- A1 *0
M a m u -A 1 *1 0 5
e n V 0 1 - E X 2 - 1 7 1 |M A - Ls | 1 - - 8 9 9
ja c h s - n
Ma mu -A1
M a c a c a m1 *0 0 8
Mam u-A
6|
M
H C -I H C |1 1
2
cc us B su
1 *0 9
EX 2
u-
M a m u -A 1 * 0 6
s u - E X 2 - |M H | 1 8 9
LA
M a u- A 1*
X 2 6 |P H C 0 7
M m u -A u la
ac
3- E X
M a m u -A 1 * 0 5 4
hu -A BX s
M aa m u - -A 1 * 0 8 5
-B *0 86
-F
Ma mu
c
s1
M mu A1 009
ac
M aca
E0 0 1 -
M aa m u - A 1 * 0 5 3
s- C K0 1 0
*0
M aa m uu - A 11 * 0 6 1
0 -E x 1
a
M
ac c
0 - e 2 - 1 2 2 |M H C |1 - 8- 8 9
3 X
M m -A *1 3
A C F V 1 - e M - 7 | a m u u - BC | 1 1 - 8 0 0
M m -A *05
s ja
m
M aa m uu - A 1 * 0 0 7
M a m - A 1 0 7 2 JSU N U
o
ac a m
M m -A 7* 91
7
u la
-8
x 2 |P H C |1 - 8 9
F 0 -E x 2 a P - * -8 9
M a L A u |1 8 9
a
M a m u - A 1 * 1 2 ta- A A
M ac u - A 2 * 2 4 7
M am u- 1* 02
M amu
m ula
tta
-9 1
- H s e |1 - 9
M am u- A2 00
m u - J |1 - 9 1
-9 1
ul tt
M a m u - A 1 * la t t a -
9
e
-J
M a m u - a m u la 5 2
M a m ac a m * 0 5
|1 -8 8
at a-
1 L
S
M ac a c A 1 * 0
ta A
M a u A u t
UE
-J AN
SU U
M a m u- 1 *0 4
03
M a m u- A1 *1 45
-8
C
-E
P
E0 01
M a m u- A1 *0 18
s u - 9 |1
M a mm u u - A B * 1 * 0 7 3
X
3- -E
M am u -A 1 1 81
2
e
H
EX X2 M a m m u
M 89
M
1
5|
am u - 1 *0 5
u- -A 1* *04 12
1-
2- -1 a m u - B
M 0|M se u - B B * 0 1 0
-2 u| 89
A * 01 1
1
am H u *1 5 2
1 |P
1* 0 5
EX se 1-
|P C|
02 3
8
3- 89
3
6
-4 H 1-
M a
7
M 05 59 E0 X2 -8|M C|
M
A
1
M a * U E H
|
M a m 1 0 9 S 1- X2
A 1* 11 -J |M
- *
M a m u- u - - A 1 * * 0 2a t t a U0 3-E -2 9
M a mu u-B B* AN E0 X2 |1-8
M a mu -B * 03 a mm u u - A A G ul 0 2 8 - E 89
8
A 1 o3
M a mm u - B * 0 1 8 7 3 M a m u - a m 1 * 0 1 tta- JSU |1 -
M m u B *0 6
a -
M am u -B *0 38 8
HLA-H M a m c A * la a-
M a ca u - A 7 u tt
U0 erc
AN -c
-A -I
HC
M am u -B *0 22
M a m u - m u la 3
M a m ca m *0 tta H C 21
|M -89
M am u- -B* *04 07 M a c a c a G u la - M X 2- C |1
-E H
M a m u - BB * 0 0 3 1 M a ca - A m 1 0
01
|M
M m u- * 6 4
a M a mu a us 01 -17
M am u- B*00675 M a ac ns G* 6 NU X2
M ac se -A 03 0 8 -AA - E
M a m u - BB * 0 0 6 M on u -B* * 1 tta E0
3
M am u-B *0 23 C a m u A 1 u la SU
M aa m uu - B * 0 11 5
M m -B *15 0
M a m uu - B * 0 3 0
HLA-F M a m u - m * 0 5 a-J
M a m a c a A G la tt
M ac u - mu 0 2 5
M amu -B* *05 9 M a m ca 1 * 1 0
M aca u - A 1 * 1 1 1
M a m u - B * 0 3 22 M am u-A 1*1 17
M aa m u - - B * 10 7 3
Ma
Ma mu B*1 56
Ma mu--B*0 05
MHC-I-p2 + MHC-I-cerco-3 M am u-A 1*1 0
M am u-A *01 1
M am -A1 *00
Ma caca M a m u -BB * 0 0 2 M amu -A1 *016
M a c c a c a m m u la tt a M a m u -B * 0 91 2 M a m u -A 1 3 0
aca u -J M a m u -B * 0 1 84 M a m u -A 5 * 0 4 8
m u la la tt a -A S U E 0 M mu *0 M a m u -A 1 * 2 0
tt E 3 -E a -B *0 8 3 M a m u 1 *1 4
C o n a -A A N H K 0 1 X m
se n -E 2 -9 |Mu -B 1 7 2 9 M a m u -A 4 *1 2
su s1U 0 1 -E X 2 -1 |M H C *0 1 M m u -A 1 *0 0
0 -M X 2 -1 |1 -8 M a m u -A 6 *0 1
H C -I 8 |M H C |1 -8 9
M a m u -A 1 *1 1 2
M ac ac
M ac ac a m ul at
-c e rc H C |1 9
M a m o -1 |1 -8 9
M am u -B *1-8 9
MHC-I-cerco-1 M a m u -A 1 *0 2 3
M a m u -A *0 0 3 |1 -8 9
a ta -J M a m u -A 1 10 6 C -I- p2
M ac ac m ul at ta -A SU E0 3- EX M am u- B *1 17 62
M a u- A 1* 10 ?? ?M H
a m ul AN u-
at ta -J U0 1- EX 2- 15 |M H B *0 56 M amse ns us 2
SU 2- C |1 -8 C on u- A4 *0 3
Ca llit hri Co ns en E0 3- EX 2- 14 |M HC |1 9
x su 22 |P se -8 M am u- A3 *1
Ca llit hri jac ch us -B BX s1 0- ex 2- HL u| 1- 9 M am u- A4 *0 1 HC |1- 89
x jac ch
us -AC K0 1-E X2 -3| MH A- E| 1-89 X2 -15 |M |1- 89
Ca llith rix 89 M am -A 4* 03 NU 01 -E eu
Cal lith rix
jac chu s-B FV 01 -EX 2-8 C| 1-8 9 Ma muca mu lat ta- AA E0 3-E X2 -2| Ps
BX K0 1-E |M HC |1- Ma ca a mu latt a-J SU
jac X2 89
Cal lithr ix jaccchu s-A CF V01 -EX -2| MH C|1 -89 Ma cac *06 1
Mac aca mula
hus -AC FV0
1-E
2-9 |Ps eu|
1-8 9 Ma mu -A1 *04 9
tta-A EHK 01-E X2- 3|M HC| 1-89 Ma mu -A11*06 0
Caja-B
X2-2 |Pse u|1-8
Mam u-B*0 31 7 HLA-E Mam u-A *121
Mam u-A1 04
Mamu -A1*0
Callithrix jacchus-A CFV01-E 4*01 01|1-8 9 Mamu-A 1*043
X2-5|MH C|1-89
Callithrix jacchus-ACFV0 1-EX2-19|Pseu |1-89 Mamu-A1* 040
Mamu-B*074 Mamu-A1*042
Caja-G*07 02|1-89
Macaca mulatta-AANU01-EX2-20|MHC|1-89 Consensus10-ex2-HL A-A|1-89
Mamu-B*064
Mamu-B* 070
9
HLA-ABC Consensus 10-ex2-HL A-C|1-89
Consen sus10-E X2-HLA
U01-E X2-8|M HC|1-8 Callith rix jacchu s-ACF -B|1-89
Macac a mulatt a-AAN seu|1 -89 Calli thrix V01-E X2-14 |Pseu
UE03 -EX2 -18|P u-B* 013 Call ithri x jacch us-A CFV 01-E X2-1 7|Ps |1-89
Maca ca mulat ta-JS Mam
Mam u-B *09
2 Caj a-B 6*0jacc hus -AC FV0 1-EX eu|1 -89
eu| 1-8 9 Ca llith rix 1 01| 1-8 2-15 |MH C|1-
U01 -EX 2-9 |Ps -B* 053 Ca llith rix jac chu s-B 9 89
Ma cac a mu
latt a-A AN Ma mu
Ma mu -B|1- 89
eu
*07 6 MHC-I-platy-2 Ca ja- B*
Ca llit hr 02
jac chu s-A BX K01 -EX 2-1
01
ix jac ch|1- 89
CF V0 |MH
1-E X2 -10 C|1 -89
03 -E X2 -4| Ps C| 1- 89 Ca llit hr us
|M HC |1-
89
JS UE 1- EX 2- 5| MHu- B* 05 1 Ca ja ix jac ch -A CF V0
mu lat ta-
Ma ca ca lat ta -A AN
mu
U0 M am u- B* 09 5
M am H C |1 -8 9
MHC-I-p1 + MHC-I-cerco-2 C aj a--B *0 1 01 us -J RU L0 1- EX 2- 20 |M
C aj B7 *0 1 |1 -8 9 1- EX 2- HC
0| Ps eu |1- 89
Ma ca ca 13 |Mse u| 1- 89 C al a- B *0 1 01 |1 -8 |1 -8 9
ul at ta -J SUN U 01 -E
EX 2-
E0 3- X 2- 3| P u- B *0 82
M am u -B *1 34 5
7 MHC-I-platy-3 C al lit hr ix ja 02 |1 -8 9 9
C a jalit hr ix ja
C a ja -B 3
cc hu
cc hu s- A C FV
am -A A Mam *0 -B 3 *0 1 0 1 s- A C F 01 -E X 2-
M ac ac a m ul at ta Mam
u -B *0 1 1 C a ja *0
u -B 8 9 C a -B 5 1 0 |1 -8 9 V 01 -E X 18 |M H
M ac ac M a mm u -B *0 51 C a llll it h ri x*0 1 0 12 |1 -8 9 2- 7| C |1 -8
MHC
M a m u -B *1 -8 9 C a it h ri ja c P s |1 |1 -8 9
|1
M a |P s e u |1 -8 99 C a ll it h x ja c c h u s -A -8 9 9
C a ja -B ri x ja c h u s
E 0 3 X
2 -1
9 H C -8
-E X 2 -2 2 |M e rc o 2 |1* 1 0 16
-B 6
HLA-G C ll it h 1 * 0 c c h u -B B X V 0 1 -E
C a ja -B ri x ja1 0 1 s -A C K 0 1
CF
X 2 -1
S U 1 -E C -I -c a m u -B * 00 9 1 C aa ja - G * 0 3 c c h u|1 -8 9 F V 0 1 -E X 2 -0 1 |P s e
tt a -J A N U 00 -M H M mu -B* 37 C ja - * 1 0 1 s -A -E X |P u
u la Ma mu -B*0 28 C aa ja - GG * 1 88 0 4 |1 -8 9 C F V 2 -0 s e u |1 |1 -8 9
a m la tt a -An s u s 1 |M
cac M aa m u - B * 00 9 0 C ja * 1 0 |1 - 0 1 -E H C -8 9
Ma c a m u o n s e M amu -B* 071 C a ja - G * 8 0 1 |1 8 9 X2 |1 -8
Ma
ca C
M a m u - B * 0 4 79 C a ja - G * 1 8 3 |1 - 8 9 -2 |M 9
M a m u - B * -8 9 C a ja - G * 1 8 0 6 |1 - 8 9 HC
|1 -8
M a m u H C |1 - 8 7 C a ja - G 1 8 0 9 |1 - 8
C a ja - G * 1 8 0 2 - 8 9 9
M 9 |M e u |1* 0 24 4 C aa ja - G * * 1 8 0 5 |1|1 - 8 9
-2 s -B *0 4 C ja - G 1 0 9
X 2 2 |P u - B 0 8 0 C a a j a - G * 2 58 0 87 | 1 - - 8 9
-E -1 a m u B * 3 C j -G *1 0 |1 8
E 0 3 - E X 2 M a m u - - B * 01 - 8 99 C aj a-G *1 9 1| -8 9
U 1 M am u | -8 5 C aj a- * 9 01 1- 9
-JS NU0 M a m HC | 1 0 2 9 C a a- G* 10 02 |1- 89
a
la tt - A A M 1|M HC B * 0 6 6 M o ja- G* 10 0 |1 89
mu tta -2 |M u - B * 0 1 6 M a ns G 15 0 1|1 -8
ca X2 19 a m u - B * 1 0 8 M a ca e n * 1 0 2 | - 8 9
ca mu
la
-E 2- M a m u - - B * * 0 0 0 3 4 M a c cac c a s u 5 0 1 | 1 1 - 8 9
Ma aca 03 - EX M am u -B *0 1 5 M a a a m s 2 - 9
M a ca c m u 1 0 | 1 8 9
c UE 01 M a mm u - B * 0 0 0 1 C ac c a ca a m ul l a t - e x - 8 9
Ma - JS NU M a mu -B * *0 a j a ca m u at t a 2
M | u - B |1 8 0
tta AA M a mu u-B A2 a - ca m u l a ta - A - H
C aj ja- -G *0 7 0 01| 01 2|1
26 m u C 1- 6
M a mP s e - B * 1 - 8 9 9
G m u lat t t a -JS A N L A
aj a- G *0 7 6 1 |1 -8
2- M a a m MH C| * 0
C a a G 0
u la - M a m u-
m latta * 2 u la ta - A U U - G
C aj ja- -G* *23 7 0 1 0 1-8 9
a- G *1 7 05 |1 -8 -8 9
M 8| MH u - B
M a m
M a m u - u| * 0 4 8 8
C a ja G *0 0 4| -8 9
1 lat t t a -J A E0 0 1 | 1
G *1 6 03 |1 -8 9 9
a m u B 1- 6
a M a
C a ja- G 07 0 1|1 -8
ac m u 0 1 ta - A SU N U 3 - E - 8
*2 6 02 |1 -8 9
2- -6 a m
E X M M a a m u - BB * 0 0 7 9
C a ja- G* 07 0 |1 -89
ac ca M
- * 8
| 1 -JS A N E0 0 1 -EX X 2 9
2- am mu u-B *0 54 9
4 01 |1 -8 9
C a a- G* 20 01 |1 89
EX X2 M
M a
2 |
C aj a- G* 17 02 |1- 9
01 |1 -8 9
- 8 U U 3 - E 2- - 1
M |P -B B* 09 3
ac
C aj a- * 7 03 1-8 9
9 E0 0 1 -E X 1| 0 | P
16 u - * 6
M am seu 20 097 9
|1 -8 9
C j -G *1
M
3- - E X2- 2 - 2 Pse s e
C a a j a - G * 1 72 0 12 | 1 - - 8 9 C F V
-8 9
M aa m u u - B | 1 - 8* 0 1
3- -E
C a ja - G * 1 3 0 1 | 1 - A
M m -B *1 9
9
C a ja - G * 1 0 h u s 9
EX X 23 4 | u u
E0 01
M a m u - B * 1 41 6
C a ja - G * 1 3 c c |1 - 8 9
2- 2 - 7 |P P s |1-8 | 1 - 8
M am u-B *04 9
C a ja - G ix ja 0 3 - 8
SU U
M a m uu - B * 1 3 2
C a ja h r 1 3 0 4 |1 - 8 9 - 8 9
5| | P se e u 9 9
M amu -B* *08 4
-J AN
EX
C ll it G * 3 2 |1 |1 9
M aa m u - B * 01 8 95
C aa ja - - G * 11 2 0 3 0 1 |1 - 8
Ps s e u| | 1
1
C a ja - G * 2 0 3 0 2 9
ta A
M
3-
X 2 M a m u - BB * 0 57 5
eu u | 1-8 - 8 9
C a ja G * 1 2 0 |1 -8 9
at a -
u *0 5
C a ja - G * 1 2 0 4 |1 -8
E0
M a |P s e - B * 0 4 8
C a ja - * 1 0 1 -8 9
|1 1 - 9
ul tt
M m u -Bu |1 -87 2
C a ja -G * 2 2 0 4 |1 -8 9
-E X M aa m u -B * 0 8 9
m la
-8 8 9
C ja -G * 0 9 5 |1 9
SU
2 -3 m u * 0 8
C a ja -G * 0 9 00 6 |1 -8 9
0 |P -B * 4 9
a mu
C a ja -G *0 9 8 |1 -8 9
9
035
C a ja -G *0 9 0 1 |1 -8
-J
9
C a ja -G *0 9 0 |1 -8 9
M a m s e u |1*0 9 8
-
C a ja -G 9 0 3 |1 -8 9
X 2 -2M a m us e u |1 -8
u -B -8 9
C a ja -G *0 9 0 7 -8 9
ta
ac a
s1 0 -M M a m u -B *0 4 7
C a -G *0 0 1 |1 9
H C -I u -B *0 0 4
| 8
ac c
C a ja -G *1 4 04 |1 -8 -8 9
4
at
01 P 1 |1 -8 9
C a jaa- G *1 0 03 01 |1 -8 9
M aca
1-
C aj a- G *1 0 03 02 |1 9
C aj a- G *0 8 23s| 1- 89
C aj a- G *1 0 03 03 |1 -8
|1 -8 9
2
*1
C aj a- G *1 0
9
ul
C aj -G *1 4 02 |1 -8 9
Ca ja- G**0 8 09 |1 -8 9
Ca ja -G *0 9 02
9
U0
4 |P -B
Ca ja
20 |1 -8
Ca ja- G*
m
M
Ca ja- G* 08 03 |1- 89
Ca ja- G*
9
Caja -G*0
Caja-G *08 15|1-89 89
Caja-G*08 04|1-89
AN
08 02| 1-8
a
Ca ja -G *0 8 10 |1
ac
-p
-A
Mam
tta
M
0 3 -E
G *0 8
11
08
la
01
11 03 |1-
B 2* 01
11 01 01
Caj a-G *08
mu
-E
Ca ja -G
UE
Ca ja- G*
0 3 -E
02 |1- 89
C aj a-
E03
X2
a
|1 -8 9
-J S
ac
C aj a-
-13
SU
SUE
ac
se n su
tt a
89
|1- 89
M
|M
89
tt a -J
u la
tt a -J
HC
am
u la
Con
|1 -
m u la
cac
89
am
Ma
cac
aca
Ma
Mac
Figure 6: The phylogenetic tree using germline EX2 sequences from five WGS assemblies: two from C. jacchus and
three from M. mulatta. Next, the allelic sequences of M. mulatta (blue) and C jacchus (green) obtained from the
IDP-MHC database were aligned with the EX2 WGS sequences. Clades are collapsed to improve the visualization of
the results. Also, consensus sequences (brown) were aligned with the EX2 sequences to identify clades.
datasets by MHCfinder cannot guarantee the obtained across a wide range of primate
exact number of MHC-I genes within a partic- species. Until now, sequences were obtained
ular species, it can make claims in studies of from the genomes or cDNA sequences of
several species or taxanomic orders. only few specific primates ((Heimbruch et al.,
For the first time, a large number of 2015; Uda et al., 2005; Yan et al., 2013)).
germline MHC-I gene sequences have been Moreover, this work can shed more light on
13
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
previous studies of nonhuman primates that longing to the HLA-G that is responsible for
have identified expressed MHC-I alleles, at- most of the allelic variation.
tempting to classify these sequences without The presence of orthologs to the nonclas-
exact knowledge of the genes present in these sical genes (HLA-E, HLA-F, HLA-G and
species. HLA-H) in non Hominid primates indicates
Given the ability to construct phyloge- a previous origin prior to the separation from
netic trees from the MHC-I exons of various New World monkeys. In Prosimians, no or-
species, it is possible to group the evolution- thologs were found, thereby raising the ques-
ary origin of these genes and more precisely tion whether the separation between the clas-
infer orthologs and paralogs. Here, sequences sical and nonclassical MHC-I is general to
of exon EX2 were used to construct such trees; all mammals or specific to Platyrrhini and
EX2 being the most discriminative constituent Catarrhini. Similar results are found by other
of the MHC-I gene, while EX3 and EX4 alone authors (Piontkivska & Nei, 2003; Fukami-
cannot resolve clades. The results indicate Kobayashi et al., 2005).
that the diversification of these genes has been The birth and death processes observed in
driven by birth and death processes, thereby the data are similar to that which occurs in the
explaining the large number of pseudogenes. variable (V) regions of immunoglobulin (IG)
The results presented here demonstrate that and T-cell receptor (TCR) genes. In the case of
the classical HLA-A, -B and -C genes found MHC-I, the situation is more complex because
in Hominids were generated recently, coincid- a viable replication process involves at least
ing with the separation from Old World mon- six separate exons. The birth and death process
keys. In non-Hominid primates, orthologs to must be related to the adaptability and survival
these genes can no longer be found, but in- of the species. This evolutionary mechanism
stead there are paralogs that proceeded from a creates a greater genetic variability in genes
common ancestor. The sequences from Cerco- that must present antigen, as well as in those
pithecus are practically absent, except for one that must recognize antigen. Nonetheless, it is
sequence from M. mulatta, which is of interest still unknown why some species possess more
since it corresponds to a gene having a large V genes and/or MHC-I genes than others, or
amount of allelic variability. Similar to these whether the absolute number of such genes
genes, which are quasi-specific to Hominids, provides the species with immunological ad-
there are other genes that are specific to Cer- vantages. More studies are needed to clarify
copithecus and Platyrrhini. Taken together, whether correlaciones exists between the num-
data from this study provides evidence of rapid ber of these genes and to establish if coevolu-
birth and death processes. The absence of or- tion processes are at play.
thologs to HLA-A, -B and C in Cercopithecus,
and to a lesser extent in Platyrrhini, may ex-
plain the generation and/or expansion of other
clades consisting of genes that have the func-
tions of the classical MHC-I. Confirmation of
this hypothesis is confirmed in several cases,
as seen in M. Mulatta, for which an allelic vari-
ation is seen that is not present in Hominids,
while in C. jacchus a gene was identified be-
14
bioRxiv preprint first posted online Feb. 15, 2018; doi: http://dx.doi.org/10.1101/266064. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
16