Documente Academic
Documente Profesional
Documente Cultură
2012
School of Biology
PhD Thesis
An investigation of human protein interactions using the
comparative method
by
Saif Ur-Rehman
1. Candidates declarations:
I, Saif Ur-Rehman hereby certify that this thesis, which is approximately 52,000 words in length, has been written
by me, that it is the record of work carried out by me and that it has not been submitted in any previous
application for a higher degree.
I was admitted as a research student in [09, 2006] and as a candidate for the degree of PhD in [09, 2007]; the
higher study for which this is a record was carried out in the University of St Andrews between [2006] and [2010].
Date signature of candidate
2. Supervisors declaration:
I hereby certify that the candidate has fulfilled the conditions of the Resolution and Regulations appropriate for the
degree of PhD in the University of St Andrews and that the candidate is qualified to submit this thesis in
application for that degree.
Date signature of supervisor
3. Permission for electronic publication: (to be signed by both candidate and supervisor)
In submitting this thesis to the University of St Andrews I understand that I am giving permission for it to be made
available for use in accordance with the regulations of the University Library for the time being in force, subject to
any copyright vested in the work not being affected thereby. I also understand that the title and the abstract will
be published, and that a copy of the work may be made and supplied to any bona fide library or research worker,
that my thesis will be electronically accessible for personal or research use unless exempt by award of an
embargo as requested below, and that the library has the right to migrate my thesis into new electronic forms as
required to ensure continued access to the thesis. I have obtained any third-party copyright permissions that may
be required in order to allow such access and migration, or have requested the appropriate embargo below.
The following is an agreed request by candidate and supervisor regarding the electronic publication of this thesis:
Add one of the following options:
(ii) Access to [all or part] of printed copy but embargo of [all or part] of electronic publication of thesis for a period
of 2 years (maximum five) on the following ground(s):
publication would preclude future publication;
Date signature of candidate
signature of supervisor
A supporting statement for a request for an embargo must be included with the submission of the draft copy of the
thesis. Where part of a thesis is to be embargoed, please specify the part and the reasons.
Abstract
There is currently a large increase in the speed of production of DNA sequence data as next
generation sequencing technologies become more widespread. As such there is a need for
rapid computational techniques to functionally annotate data as it is generated. One
computational method for the functional annotation of protein-coding genes is via detection
of interaction partners. If the putative partner has a functional annotation then this annotation
can be extended to the initial protein via the established principle of guilt by association.
This work presents a method for rapid detection of functional interaction partners for
proteins through the use of the comparative method. Functional links are sought between
proteins through analysis of their patterns of presence and absence amongst a set of 54
eukaryotic organisms. These links can be either direct or indirect protein interactions. These
patterns are analysed in the context of a phylogenetic tree.
The method used is a heuristic combination of an established accurate methodology
involving comparison of models of evolution the parameters of which are estimated using
maximum likelihood, with a novel technique involving the reconstruction of ancestral states
using Dollo parsimony and analysis of these reconstructions through the use of logistic
regression. The methodology achieves comparable specificity to the use of gene coexpression as a means to predict functional linkage between proteins.
The application of this method permitted a genome-wide analysis of the human
genome, which would have otherwise demanded a potentially prohibitive amount of
computational resource.
Proteins within the human genome were clustered into orthologous groups. 10 of
these proteins, which were ubiquitous across all 54 eukaryotes, were used to reconstruct a
phylogeny. An application of the heuristic predicted a set of functional protein interactions in
human cells. 1,142 functional interactions were predicted. Of these predictions 1,131 were
not present in current protein-protein interaction databases.
Acknowledgements
I thank the BBSRC for funding my work and making this project possible. I would also like
to thank my supervisor, Dr Daniel Barker for his tireless support and good advice. I would
like to thank Professor Mike Ritchie, Dr Jeff Graves and Dr Anne Smith for acting as my
committee. I would also like to thank Dr Rona Ramsay, Ji-Hiyun Lim, Wim Verleyen, Dr
Christoph Echtermeyer and Maria Keays for helpful discussion. I would also like to thank Dr
Neil Symington and Dr Herbert Fruchtl for their helpful advice in using cluster-computing
resources.
On a personal note I would like to thank my wife Cathryn for her constant unconditional
support without which I could not have brought this project to fruition. I would also like to
thank my friends Ken Armstrong and Jack Levell who played a large part in keeping me
balanced over the course of my work. Finally I would like to thank my parents for their love
and support, which has been a source of strength.
Abstract.....................................................................................................................................2
Acknowledgements ..................................................................................................................3
Chapter 1 ..................................................................................................................................8
1.1 History.......................................................................................................................................... 8
1.2 DNA/RNA .................................................................................................................................... 9
1.2.1 RNA..................................................................................................................................... 10
1.3 Proteins ...................................................................................................................................... 11
1.3.1 Protein secondary structure ................................................................................................. 11
1.3.2 Protein tertiary structure ...................................................................................................... 12
1.3.3 Protein quaternary structure ................................................................................................ 12
1.3.4 Protein domains ................................................................................................................... 12
1.3.5 Protein motifs ...................................................................................................................... 12
1.4 Genes .......................................................................................................................................... 13
1.4.1 Structure of a gene............................................................................................................... 13
1.4.1.1 Regulatory region of a gene........................................................................................................ 14
1.4.2 Transcription........................................................................................................................ 15
1.4.2.1 Post Transcriptional processing .................................................................................................. 15
1.4.2.1.1 Genetic Code....................................................................................................................... 15
1.4.2.1.2 Open reading frames ........................................................................................................... 16
1.4.2.1.3 Exons/Introns ...................................................................................................................... 17
1.4.2.2 Post Transcriptional processing (cont)........................................................................................ 18
1.4.2.2.1 RNA splicing....................................................................................................................... 18
1.4.2.2.2 Capping ............................................................................................................................... 19
1.4.2.2.3 Polyadenylation................................................................................................................... 20
Chapter 2 ................................................................................................................................39
2.1 Introduction............................................................................................................................... 39
Chapter 3 ................................................................................................................................96
3.1 Introduction............................................................................................................................... 96
3.1.1 Hamming distance ............................................................................................................... 96
3.1.2 Comparative method ........................................................................................................... 97
3.1.2.1 Phylogenetic profile analysis using the comparative method..................................................... 97
Chapter 4 ..............................................................................................................................130
4.1. Introduction............................................................................................................................ 130
4.1.1 Ancestral state reconstruction............................................................................................ 132
4.1.1.1 Parsimony ................................................................................................................................. 132
4.1.1.2 Likelihood ................................................................................................................................. 133
Chapter 5 ..............................................................................................................................170
5.1 Introduction............................................................................................................................. 170
5.1.1 PPI databases ..................................................................................................................... 170
5.1.1.1 MIPS ......................................................................................................................................... 170
5.1.1.2 BIND......................................................................................................................................... 171
5.1.1.3 MINT ........................................................................................................................................ 171
5.1.1.4 INTACT.................................................................................................................................... 171
5.1.1.5 HPRD........................................................................................................................................ 171
5.1.1.6 DIP ............................................................................................................................................ 171
5.1.1.7 REACTOME............................................................................................................................. 171
5.1.1.8 STRING .................................................................................................................................... 172
5.1.1.9 I2D ............................................................................................................................................ 172
5.1.1.10 KEGG ..................................................................................................................................... 172
5.1.1.11 BIOGRID................................................................................................................................ 172
5.1.1.12 Discussion ............................................................................................................................... 172
Chapter 6 ..............................................................................................................................194
6.1 Summary of Project................................................................................................................ 194
6.1.1 Repeat Analysis ................................................................................................................. 199
6.2 Conclusion ............................................................................................................................... 200
6.3 Future directions..................................................................................................................... 202
6.3.1 Computational extensions ................................................................................................. 202
6.3.2 Consensus profiles............................................................................................................. 202
6.3.3 Correlated evolution of proteins with the presence or absence of phenotypes ................. 203
6.3.4 Drug Targets ...................................................................................................................... 204
References.............................................................................................................................205
Appendix A Description of divergence of Java implementation of Inparanoid algorithm
from Perl implementation. ..................................................................................................223
Appendix B Individual Gene trees for genes in super matrix utilised in construction of
Phylogeny..............................................................................................................................232
Appendix C: Predictions made by constrained ML .........................................................242
Appendix D Concatenated Filtered Alignment.................................................................316
Chapter 1
Chapter 1
Introduction to computational annotation of protein coding genes
1.1 History
The discovery in the 1940s (Avery et al. 1944) and confirmation in the 1950s (Hershey and
Chase 1952) of DNA (deoxyribonucleic acid) as the physical basis for inheritance was a
milestone in biological research. It provided for a means to examine the materials and
processes underlying phenotypic traits and provided a conceptual link to the other natural
sciences. This was rapidly followed by the elucidation of the three dimensional structure of
B-DNA (Watson and Crick 1953) which is the form of DNA prevalent in living cells as it is
conducive to nucleosome formation (Richmond and Davey 2003). This structure was the now
famous double helix. It had been previously established (Beadle and Tatum 1941) that genes
exist as discrete regions within the genome whose sequence codes for the sequence of a
corresponding chain of amino acids. The genome of an organism is the full set of hereditary
material it possesses (Alberts 2010). This is RNA in the case of some viruses and DNA in the
case of all other types of cellular organism (Brown 2006). The discovery of the genetic code
(Crick et al. 1961) provided information on the mechanism for this production which
operates via initial intermediary transcription into RNA (ribonucleic acid) and then
translation into proteins. (Some genes also code for RNA products such as tRNAs and other
non-coding RNAs (Brown 2006)).
The first feasible method for determining the sequence of DNA was the MaxamGilbert chemical degradation method (Maxam and Gilbert 1977). This method was however
supplanted by the near simultaneous invention of the chain termination reaction method by
Frederick Sanger (Sanger et al. 1977) of DNA sequencing which was both safer and more
efficient (Brown 2006). This led to the first full genome of an organism to be sequenced,
which was bacteriophage fX174 (Sanger et al. 1978). Another contribution by Sanger was
that of shotgun sequencing. This entails the shattering of a piece of DNA into random
fragments and the sequencing of those fragments. The sequences of the fragments are then
assembled through searching for overlaps between them. This method facilitated the
sequencing of number of relatively larger viral and prokaryotic genomes such as
Bacteriophage MS2 (Fiers et al. 1976).
Chapter 1
In 1996 Saccharomyces cerevisiae was the first eukaryotic genome to be sequenced (Goffeau
et al. 1996) via a large collaborative effort. This was followed by the publication of the first
multi cellular eukaryotic genome Caenorhabditis elegans in 1998 (C. elegans Sequencing
Consortium 1998) and the draft genomes of the vertebrate Homo sapiens soon followed in
2001 (Venter et al. 2001). The application of industrial streamlining and automation to
sequencing efforts over the last 20 years as well as more recently with the onset of next
generation sequencing technologies there has been almost exponential growth to sequence
databases such as NCBI GenBank (Benson et al. 2009). Sequence data without further
processing and annotation cannot shed any light on either biological function or evolutionary
relationships between organisms. This means that there has been a focus on the development
of highly accurate high throughput methods for functional annotation of genes and other
functional genomic elements in recent years as the parity between rates of data generation
and rates of accurate and verifiable annotation becomes more divergent (Zhu et al. 2007).
1.2 DNA/RNA
DNA itself is made up of a linear backbone of alternating deoxyribose sugar and phosphate
residues (Strachan and Read 2004). There is a nitrogenous base attached to the 1 (one prime)
carbon of each individual sugar residue. There are two forms of nitrogenous base present
within DNA. One form possesses a single interlocked heterocyclic ring of carbon and
nitrogen atoms. Bases that exist in this conformation are known as pyrimidines (Strachan and
Read 2004). The second form of base consists of two interlocked heterocyclic rings of carbon
and nitrogen atoms. These bases are known as purines (Strachan and Read 2004). There are
two pyrimidines represented within DNA (Strachan and Read 2004). These are cytosine and
thymine commonly represented by the abbreviations C and T respectively (Brown 2006).
There are also two purines present, adenine and guanine represented as A and G (Brown
2006). The stability of the double helix structure of DNA is maintained through hydrogen
bond formation between the pyrimidine-purine pair C and G and hydrogen bond formation
between the remaining pyrimidine-purine pair T and A as well as base stacking interactions
between adjacent bases (Yakovchuk et al. 2006). Due to structural constraints base pairing
can only occur between a pyrimidine and a purine (Brown 2006).
The linear backbone of DNA/RNA is maintained by a phosphodiester bond formed
between the 3 (3 prime) carbon atom of the sugar and the 5 (5 prime) carbon of the
succeeding sugar (Strachan and Read 2004). The backbone is terminated by a sugar where
the 5 carbon is not linked to a succeeding sugar residue. This point is known as the 5 end.
9
Chapter 1
Similarly the other end of the molecule lacks a phosphodiester bond on the 3 carbon and is
known as the 3 end (Strachan and Read 2004). The sequence of DNA is usually described
in the 5!3 direction, as this is the direction of DNA replication as well as transcription of
RNA using DNA as a template (Strachan and Read 2004). Thus a feature along a DNA
molecule is referred to as being upstream of another feature if it is closer to the 5 end. The
length of a DNA molecule is measured in units of individual base pairs (bp).
DNA is a biopolymer and as such can be fully represented by the sequence of its
constituent nucleotide bases. Determination of this sequence for a complete organism
effectively represents the DNA blueprints for the construction of that organism, i.e. the amino
acid sequences of its constituent proteins and RNA molecules, as well as the regulatory
sequences that regulate production of these molecules both spatially and temporally.
1.2.1 RNA
RNA is constructed of similar residues, however the sugar is a ribose as opposed to
deoxyribose and the pyrimidine base thymine is replaced with the base uracil commonly
represented by the abbreviation U (Strachan and Read 2004). There is a diverse population of
RNA molecules produced by the eukaryotic genome. These molecules are involved with a
number of processes essential to life, including protein synthesis and regulation of gene
expression. A breakdown of general RNA types and their functions is presented in Table 1.1.
Abbreviated Name
Full name
mRNA
Messenger RNA
Primary Function
Provides a template for protein
synthesis.
tRNA
Transfer RNA
rRNA
Ribosomal RNA
snRNA
snoRNA
miRNA
Micro RNA
siRNA
10
Chapter 1
Table 1.1: General types of RNA molecules with function (Blow 2004).
1.3 Proteins
Protein molecules are polymers comprised of one or more chains of amino acids. A chain of
amino acids can also be referred to as a polypeptide chain. Amino acids are molecules that
consist of an amino group, a carboxylic group, an R group and a hydrogen atom (Berg et al.
2001). These components are all linked to a central carbon atom known as the ! carbon
(Berg et al. 2001). A polypeptide chain is formed when a peptide bond is formed between
the amino group of one amino acid and the carboxyl group of another. All polypeptide chains
have a free amino group at one end and a free carboxyl group at the other. These are known
as the N-terminus and C-terminus respectively (Alberts 2002). The sequence of a polypeptide
chain is presented as moving from the N-terminus to the C-terminus (Alberts 2002). A linear
polypeptide chain is also considered the primary structure of a protein (Brown 2006).
It is the R group that distinguishes amino acids (Berg et al. 2001). R groups vary in
factors such as size, shape, charge, hydrogen-bonding capacity, hydrophobic
character, and chemical reactivity (Berg et al. 2001). There are 20 naturally occurring
amino acids that are typically utilised by living cells (Alberts 2002).
1.3.1 Protein secondary structure
The interactions of the R, carboxyl, and amine groups of individual amino acids in a
polypeptide chain with each other cause polypeptide chains to fold into characteristic
conformations. These conformations are known as the secondary structure of a protein. There
are two main types of secondary structure (Brown 2006).
The ! helix: This is a structure formed by interactions between the carboxyl groups
and amine groups of amino acids which are separated by a number intermediate
amino acids (Berg et al. 2001).
The " sheet: This is a structure formed by the interactions between two polypeptide
chains running either parallel or anti parallel to each other (Brown 2006).
11
Chapter 1
1.3.2 Protein tertiary structure
The tertiary structure of a protein is formed by the folding up of the secondary structural
constructs formed by the polypeptide chain into a three dimensional configuration (Brown
2006). This configuration is held together a number of chemical forces including hydrogen
bonding between individual amino acids and the interactions of hydrophobic amino acids
with water (Brown 2006).
1.3.3 Protein quaternary structure
The quaternary structure of a protein is formed by the interactions of multiple polypeptide
chains. Quaternary structure is a hallmark of proteins with a complex function (Brown 2006).
1.3.4 Protein domains
A protein domain can be defined as a substructure produced by any part of
a polypeptide chain that can fold independently into a compact, stable structure (Alberts
2002). There are a number of recurrent protein domains that are functionally important within
the eukaryotic cell. These include:
Helix turn helix: This is a domain comprised of two ! helices separated by a short
strand of amino acids. It is functionally important due to its ability to bind DNA
(Brennan and Matthews 1989).
Leucine zipper: This motif is important in that it facilitates the formation of protein
quartenary structure by the dimerisation of two leucine rich regions of separate
polypeptides (Brown 2006). It is a motif that is found in a number of proteins that
bind DNA (Brown 2006).
12
Chapter 1
Zinc finger: The zinc finger motif is a set of polypeptide chains whose interactions is
stabilised by the presence of zinc ions. It is also present in DNA binding proteins
(Brown 2006).
1.4 Genes
As mentioned above the blueprints for the production of given protein and RNA molecules
within an organism are contained in subsections of its genome known as genes. A current
more specific definition of a gene presented by Pesole (Pesole 2008) defines them as a
discrete genomic region whose transcription is regulated by one or more promoters
and distal regulatory elements and which contains the information for the synthesis of
functional proteins or non-coding RNAs, related by the sharing of a portion of genetic
information at the level of the ultimate products (proteins or RNAs).
1.4.1 Structure of a gene
As implied by that definition a gene is made up of two distinct parts. These are firstly a
transcribed area, which is the portion of DNA that is actually converted into RNA and
secondly regulatory regions, which can occur either upstream or down stream of the
transcribed region. Regulatory regions within the vicinity of a gene provide recognition
signals for proteins known as transcription factors. These proteins regulate the transcription
rate of a gene by either carrying out the actual transcription, or by binding to DNA and either
promoting or silencing transcription (Maston et al. 2006). As the binding of the proteins to
these regions provides this functionality, the regions are known as transcription factor
binding sites.
13
Chapter 1
Figure 1.1: General structure of a gene. Adapted from (Maston et al. 2006).
1.4.1.1 Regulatory region of a gene
A typical regulatory region associated with a gene consists of a promoter element and distal
regulatory elements (Maston et al. 2006). The promoter element consists of a core promoter
and proximal promoter elements and typically spans less than 1 kb (kilobase) pairs (Maston
et al. 2006). The core promoter of a gene is the region of DNA at which the proteins
primarily responsible for transcription bind and initiate the process of transcription. Wellstudied elements of the eukaryotic core promoter include the TATA box and the initiator or
Inr sequence (Brown 2006;Strachan and Read 2004). The TATA box generally has a
consensus sequence of 5!-TATAWAW-3! where W is A or T (Brown 2006). The INR
sequence has a consensus 5!-YYCARR-3!, where Y is C or T, and R is A or G (Brown 2006).
The TATA box and Inr sequence are generally present upstream of a large number of
eukaryotic genes. Generally most of the elements of the core promoter are generally
comprised of near identical DNA sequences.
The proximal promoter is generally located a few hundred base pairs upstream of the
core promoter element (Maston et al. 2006). This region of DNA typically contains binding
sites for other proteins, which contribute to the transcription of the gene but are not the
primary mechanism (Maston et al. 2006).
Distal regulatory tend to be further away from the transcribed portion of the gene and
contains elements that either activate or repress the transcription of the gene. Elements that
activate transcription are known as enhancers and conversely elements that repress it are
known as silencers (Raab and Kamakaka 2010).
14
Chapter 1
1.4.2 Transcription
A family of enzymes known as RNA polymerases carry out the process of transcription of
DNA into RNA in eukaryotic cells (Brown 2006). This process is known as transcription as
the fundamental chemical language is not changed (Alberts 2002). There are three RNA
polymerases typically encoded by the eukaryotic genome (Strachan and Read 2004). RNA
polymerase I and RNA polymerase III tend to transcribe genes which code for functional
RNA molecules, while RNA polymerase II is generally utilised for the production of RNA
which is further translated into a protein (Alberts 1998). Transcription proceeds via the
following general steps (Brown 2006):
A protein known as TATA binding protein (TBP) binds to the TATA box sequence.
This causes a bend in the DNA molecule.
This bend provides a recognition signal for other transcription factors to bind to the
DNA creating a structure known as the preinitiation complex (PIC) (Brown 2006).
The formation of the PIC also disrupts base pairing thus creating a single stranded
DNA template from which the RNA molecule is synthesised.
RNA polymerase binds to the PIC and them moves along the single strand on DNA
creating a complementary RNA molecule that conforms to base pairing rules. This
RNA molecule is known as the primary transcript.
15
Chapter 1
16
Chapter 1
Figure 1.2: Starting positions for possible ORFs within a double stranded DNA molecule.
1.4.2.1.3 Exons/Introns
ORFs as discussed above are subsections of the primary transcript or pre-mRNA molecule.
ORFs are interrupted within pre-mRNA by sections known as introns (Brown 2006). The
sections of the ORF thus separated by the introns are known as exons (Brown 2006). Thus in
order to produce a molecule containing the full-uninterrupted ORF it is necessary to excise
the introns and splice the exons together as shown in Figure 1.4.
17
Chapter 1
It is not necessary however for all the exons within a given ORF to be utilised (Brown 2006)
as shown in Figure 1.5. Different permutations of exons can be created to produce different
protein molecules. This process is known as alternative splicing and is responsible for the
disparity between the number of genes within a eukaryotic genome and the number of
proteins it is capable of producing (Strachan and Read 2004). Alternate splicing is a feature
of higher eukaryotes and contributes to overall protein diversity (Black 2003). Estimates of
how many human gene products are alternately spliced include 60% (Black 2003) and 74%
(Johnson et al. 2003).
1.4.2.2 Post Transcriptional processing (cont)
Having now discussed the necessity of posttranscriptional modification it is now possible to
move on to the mechanisms by which splicing is carried out as well as covering other
elements of posttranscriptional processing.
1.4.2.2.1 RNA splicing
As mentioned above the primary transcript or pre-mRNA is treated so as to excise intronic
sequences and splice together exonic sequences. In order for this process to occur a necessary
first step is the recognition of the borders between exons and introns. These areas are known
as splice junctions (Strachan and Read 2004). It has been observed in a large number of cases
18
Chapter 1
that introns in pre-mRNA commence with the sequence GU and end with the sequence AG
(Strachan and Read 2004). These dinucleotides are not in themselves sufficient to signal a
splice junction (Strachan and Read 2004) as splice junctions have been observed to show a
greater degree of conservation (Breathnach et al. 1978). In vertebrates the following motifs
have been observed at splice junctions (Brown 2006).
In these consensus sequences the " symbol indicates the border between an exon and
intron or vice versa (Brown 2006). Py indicates that the nucleotide is a pyrimidine and N
indicates that any nucleotide could be present at this position (Brown 2006). In addition to
the conserved sequences at splice junctions introns also contain a conserved sequence around
40bp away from the end on the intron known as the branch sequence (Strachan and Read
2004). A large RNA-protein complex known as the spliceosome actually carries out the
actual process of RNA splicing (Strachan and Read 2004). The spliceosome is one of the
largest molecular machines in the human cell containing ~170 distinct proteins (Valadkhan
and Jaladat 2010).
The process of RNA splicing typically involves the following sequence (Brown 2006;
Strachan and Read 2004):
Cleavage of the 5 splice junction detaching the exon from the intron at one end.
The attachment of the cleaved 5 end to the branch sequence forming a lariat like
structure.
Removal of the intronic lariat like RNA structure and the ligation of the two
exons.
1.4.2.2.2 Capping
Another step in posttranscriptional modification of protein-coding genes is capping. This
process is the first step in posttranscriptional processing of eukaryotic pre-mRNAs (Alberts
2002). This entails the addition of a methylated nucleoside (a nucleoside is a molecule
consisting of a deoxyribose or ribose sugar bound to a nitrogenous base (Brown 2006)) to the
19
Chapter 1
first 5 prime end of the transcript (Strachan and Read 2004). This process protects the
transcript from rapid degradation via ribonuclease digestion (Strachan and Read 2004).
1.4.2.2.3 Polyadenylation
Post the termination of transcription the primary transcript is also modified via the addition of
about 200 adenosine nucleotides to the 3 end of the transcript (Alberts 2002). This structure
is known as a poly-A tail. The process is thought to facilitate the transport of the mature
mRNA molecule into the cytoplasm (Strachan and Read 2004).
1.4.3 Translation
After a transcript associated with a protein-coding gene has been transcribed and processed, it
then migrates to the cytoplasm, where a process known as translation occurs. This process
entails the production a polypeptide chain that is specified by the transcript via the genetic
code. The mature mRNA molecule is not synonymous with an ORF (Strachan and Read
2004). Generally an ORF is a subsection within the mature transcript. The ORF is flanked by
sequences known as the 5 UTR and 3UTR (UTR=untranslated regions) (Brown 2006) as
illustrated in Figure 1.6.
Chapter 1
The larger subunit is known as the 60S subunit and consists of three different types of
ribosomal RNA (rRNA) molecule and up to 50 ribosomal proteins (Strachan and Read 2004).
The smaller subunit is known as the 40S subunit and contains a single rRNA molecule and
over 30 ribosomal proteins (Strachan and Read 2004). The two subunits of the ribosome exist
as separate entities and attach for the process of translation.
The other molecule that provides the physical basis for the implementation of the
genetic code is transfer RNA (tRNA). tRNA has a secondary structure consisting of four
double helical structures as illustrated in Figure 1.7. tRNA attaches to an amino acid at its 3
end. The anticodon arm of the tRNA molecule has a triplet sequence, which is
complementary to the codon of the amino acid to which it is bound. Thus tRNA attaches
codons to their corresponding amino acids.
The process of translation typically proceeds via the following steps (Strachan and Read
2004):
The two subunits of the ribosome attach to each other and also to a mature mRNA
molecule at the methylated cap at the 5 end.
21
Chapter 1
The next tRNA corresponding to next codon will then enter the ribosome.
The amino acid attached to the first tRNA will detach from the tRNA and attach
to the amino acid attached to the 3 end of the second tRNA.
When a stop codon is encountered an enzyme known as a release factor causes the
ribosome to disassociate and release the protein molecule.
1.5 Genomics
The term genome can de defined as the entire genetic complement of a living organism
(Brown 2006). The field of study around ascertaining information about the genome of a
living organism is thus known as genomics. The primary step of any full genomic study is the
determination of the DNA sequence of the genome of the organism in question. Once this has
been determined the next step is annotating the sequence.
1.5.1 Genome annotation
The full genome of an organism is generally a mosaic of functional and non-functional
elements. The percentage of an organisms genome that is functional is variable. In the case
of the human genome it has been calculated that potentially between 2.56% and 3.25% is
functional (Lunter et al. 2006).
Functional elements in a genome include:
Genes.
CpG Islands: These are stretches of the dinucleotide repeat CG. These areas of DNA
are subject to methylation, which is a form of epigenetic control over gene
transcription (Kawaji and Hayashizaki 2008).
22
Chapter 1
Genome annotation can be described as the systematic location of these functional
elements within a genome sequence (structural annotation) and the ascertainment of that
function (functional annotation). The location of functional elements is based on the principle
of sequence specifying function. Thus the sequence of a functional element will vary in some
detectable way from the remainder of the background sequence.
1.5.2 Genome and cDNA assembly
The initial challenge post the generation of sequencing data is the fact that the output of DNA
sequencing is generally reads of short stretches of DNA. These reads range in length from >
700 bp long for Sanger sequencing (Hert et al. 2008) and ~200bp for pyro sequencing
(Sundquist et al. 2007) and down to ~50bp for ligation based sequencing methods
(McKernan et al. 2009).
These short reads have to be assembled into a full sequence for the whole genome.
This process is known as contig assembly. Contig assembly is carried out through scanning a
set of short reads for overlaps. The discovery of an overlap indicates that two fragments are
contiguous and should be connected. This process is necessary both at the level of the full
genome as well at the level of the individual gene (Wang et al. 2005a).
1.5.3 Gene detection
Given a fully sequenced and assembled genome lacking annotation there are a number of
computational techniques available to delineate coding sequence. These can be divided into
two main subtypes: extrinsic and intrinsic (Borodovsky et al. 1994). Extrinsic methods utilise
comparisons of sequence data to an external reference point while intrinsic methods evaluate
sequences based on properties that are internal to the sequence (Borodovsky et al. 1994).
Construction of a cDNA library is one of the standard methods of extrinsic gene
detection. cDNA stands for complementary DNA and is created through application of an
enzyme known as reverse transcriptase to mature mRNA. Reverse transcriptase as the name
implies reverses the process of transcription and creates a DNA strand complementary to the
single stranded mRNA. Further steps are then taken in order to create a double stranded DNA
molecule (Strachan and Read 2004).
A library of cDNA sequences is compiled through the collection of mRNA molecules
from cells under various experimental conditions. This RNA is then converted to cDNA
using the enzyme reverse transcriptase. The resultant cDNA is then amplified using the
23
Chapter 1
polymerase chain reaction (PCR) (Mount 2004) and then sequenced. The library of sequences
thus generated corresponds to the sequence of protein coding genes within the genome minus
the introns. These sequences are then systematically mapped onto the genomic sequence
using local alignment algorithms. This technique is known as cis-alignment. There are a
number of local sequence programs that can be used to carry out these alignments. Exonerate
is one such program. It utilises a bounded dynamic programming approach (Slater and Birney
2005) to generate local alignments. Dynamic programming is discussed in more detail later in
this chapter. Another program, which can be utilised, is Spidey (hosted by the NCBI). This
program employs the Blast heuristic algorithm (Altschul et al. 1990) to generate its
alignments. SIM4 is another program that utilises an algorithm based on Blast but tailored to
the specific problem of mapping cDNA to genomic DNA by factoring in introns and
potential sequencing errors (Florea et al. 1998).
The Ensembl automatic genome annotation system (Curwen et al. 2004;Potter et al.
2004) uses the algorithm GeneWise (Birney et al. 2004) to map cDNA to full genomic data
and the algorithm GenomeWise (Birney et al. 2004) to create a final putative structure for the
gene in question post the initial alignment. cis alignment can be considered to be one of the
most reliable methods for protein coding gene detection/prediction (Brent 2008).
In cases where cDNA libraries are not available or incomplete for the organism under
consideration it is also possible to use cDNA sequences of homologous genes from either the
same species or a different species in order to detect coding sequence. This technique is also
referred to as trans-alignment and is central to various gene prediction tools (Brent 2008).
The GeneWise (Birney et al. 2004), algorithm is also used in this context by the Ensembl
pipeline (Potter et al. 2004). Extrinsic methods for genome annotation are far more cost and
labour intensive as opposed to the strictly in-silico intrinsic approach.
Intrinsic approaches to gene detection are predominantly computational and as such
require an explicit definition/description in order to delineate between coding and non-coding
sequence (Picardi and Pesole 2010). Picardi (Picardi and Pesole 2010) gives a good working
definition of a gene for detection purposes, which defines a gene as a transcribed region of
DNA whose expression is regulated by cis acting elements such as upstream promoters.
Examples of tasks undertaken as a part of intrinsic gene detection include:
24
Chapter 1
detection drastically reduces the search space for potential genes in the case of
prokaryotes.
Promoter regions detection: Genes are typically associated with one to several
promoter regions. In prokaryotes these include the upstream Pribnow box with the
consensus sequence TATAAT. This sequence is homologous to the eukaryotic
TATA box (Berg et al. 2007). Detection of these motifs within a sequence upstream
of an ORF strengthens the case for a potential gene.
Internal splice junction detection: As the sequence of exon intron borders is broadly
conserved discovery of splice junctions can also contribute to the case for a
prospective gene.
These features can be can be detected within a stretch of sequence using various
techniques to model sequence motifs ranging from simple regular expressions to hidden
Markov models and position weight matrices (Picardi and Pesole 2010). Examples of specific
applications of the intrinsic approach to gene prediction include SNAP (Korf 2004) and
Genscan (Burge and Karlin 1997) both of which utilise Markov models in order to detect
delineating features of genes. The primary weaknesses of the intrinsic approach lie in the fact
that that it requires a representative sample of protein coding genes specifically from the
organism under consideration in order to operate (Aubourg and Rouze 2001).
1.5.4 Functional annotation of genes
After a putative gene has been identified the next stage is determination of the exact
biological role of the product coded for. This process can be carried out computationally or
by entirely laboratory based techniques.
1.5.4.1 Laboratory based techniques
Laboratory based techniques for determination of biological function involve alteration of the
gene in question either in the organism of study (in the case of prokaryotes, unicellular
eukaryotes as well as higher eukaryotes which are deemed suitable) or in the case of
organisms where modification would be impractical or unethical such as Homo sapiens
alteration of the homologous gene in a model organism. The main model organism of choice
for study of mammalian gene function is Mus musculus (Kim et al. 2010). The main
alterations that are possible include:
25
Chapter 1
Knockouts: This entails the removal of the gene in order to observe the effects of its
absence. This technique is only effective if the gene in question is not essential to
organism survival and has a visible/measurable effect on phenotype (Moore 1999).
26
Chapter 1
derived functional annotation can be applied to all members. Alignment methods can be
applied at either the gene or the protein level.
There are three primary ways of carrying out pairwise sequence alignments.
Dot matrix analysis: This method entails arranging one sequence horizontally and the
other sequence vertically perpendicular, starting from the left end of the horizontal
sequence. Matches between the two sequences are then marked with a dot. Areas of
similarity can then be viewed as diagonal lines between the two sequences (Mount
2004).
27
Chapter 1
judged through use of the BLOSUM62 substitution matrix (Mount 2004). Given the
rapid expansion of most of the large sequence databases it is typical to use heuristic
algorithms as a search tool.
Profile Hidden Markov models have been used by Eddy (Eddy 1998) to create a
scoring system, which allows detection of remotely homologous sequences. Hidden
Markov models score the probability of a discrete chain of events based on model
parameters whose values are unknown (Durbin 1998).
Alignment methods can also be applied to the three dimensional structures of protein
molecules as well as sequence (Hasegawa and Holm 2009). This method is potentially useful
in cases where sequence divergence reaches a point where two proteins can no longer be
identified as homologous. However as the rate of structure generation lags behind sequence
generation by a considerable degree this method can only be applied in a small subset of
cases.
Detection of a significant alignment with a gene of known function can be used to attach
the same function to a gene of known function. Martin (Martin et al. 2004) used GO terms
(Ashburner et al. 2000) in conjunction with Blast (Altschul et al. 1990) to achieve this with
some success. There is however a danger with alignment based methods of a Chinese
whispers effect where if for example a gene p with known function a displayed 90 %
identity using some form of pairwise alignment algorithm with gene q of unknown function.
Assigning function a to gene q would seem to be intuitively legitimate. However if gene q
was assigned function a and the process was iterated a number of times a situation could arise
where a gene x would be assigned function a with little or no sequence similarity to the
original protein p. Examples of incorrect annotation by automated methods of homology
detection occur in the case of genes where translations of the antisense strand of the coding
region are entered into databases such as GenBank (Linial 2003).
1.5.4.2.2 Genome context methods
The recent proliferation of genome data has made it possible to detect and assign function to
proteins through examination of their genomic context. Genome context methods compare
and contrast the context of a gene between genomes (i.e. the arrangement of its homologues)
in other genomes. Context methods are based on the principle of guilt by association which
is the hypothesis that genes, which show proximity or association by some measure, e.g.
phyletic distribution or chromosomal ordering are functionally associated (Aravind 2000).
28
Chapter 1
Thus through demonstration of functional association or interaction between one gene/protein
of known function with one of unknown function, the latter entity may be annotated with the
function of the former.
1.5.4.2.2.1 Rosetta stone
The Rosetta stone method or detection of domain fusion was recognised through work by
Marcotte (Marcotte et al. 1999) and Enright (Enright et al. 1999) which showed that sets of
separate proteins in one organism which exist in a unified (fused) homologous form in
another organism are likely to be interaction partners. As fusion events are comparatively
rare and generally affect genes that are tightly functionally coupled this method is effective at
detection of interaction partners (Kensche et al. 2008). However the rareness of these events
lowers the overall coverage of this method.
1.5.4.2.2.2 Gene neighbour
Examination of the genomes of nine bacterial and archaeal genomes by Dandekar (Dandekar
et al. 1998) showed that the proteins encoded by genes which showed conserved physical
order along a chromosome tended to interact physically.
1.5.4.2.2.3 Interolog detection
A term introduced by Walhout (Walhout et al. 2000) an interolog is a pair of proteins that
interact in a given organism. If both proteins involved in the interaction are conserved in
another organism a similar interaction can be inferred in the second organism. This method
has shown comparable accuracy with large-scale experimental data (Yu et al. 2004b).
1.5.4.2.2.4 Phylogenetic profiling
Phylogenetic profiling is a method that operates on the hypothesis that functionally linked
proteins evolve in a correlated manner (Pellegrini et al. 1999). Consider for example a
group of genes/proteins, which exist as a self-contained modular group and are associated
with a particular cellular function. If this associated function was no longer needed by a given
set of organisms the selective pressure to maintain all the genes/proteins within that group
would be lowered thus leading to an eventual correlated cascade of losses for the genes in
question. Genes are primarily lost through psdeudogenisation, which is the conversion of a
functional gene to a non-functional copy. This can be caused by mutations that cause the
premature truncation of a transcript through the creation of a premature stop codon or a
29
Chapter 1
mutation in upstream cis-regulatory sequences thus removing the potential for transcription
(Brown 2006). Pseudogenes can also be formed through retrotransposition of mature mRNA
(Graur et al. 1989).
Thus through examination of multiple genomes for correlations in the presence and
absence of proteins potential functional linkages can be detected. A phylogenetic profile is
typically a binary string representing the presence or absence of a homolog of a given
gene/protein. Predictions are made through examination of levels of similarity between these
strings. These suggestions are suggestive in their nature rather than specific as it is unclear
what the nature of a functional linkage between two proteins with similar profiles might be.
The relationship could be a direct physical interaction such as subunits involved in
heterodimerisation or more indirect such as the link between a transcription factor and the
product of its associate gene.
The first use of phylogenetic profiles to predict functional linkages used Hamming
distance as a metric in order to cluster similar profiles (Pellegrini et al. 1999). The Hamming
distance of two strings can be defined as the number of points at which they differ (Hamming
1950). There have been various extensions and reinterpretations of the method since then
(Ranea et al. 2007). Some of these involved examination of profiles using higher logical
operations to carry out more complex comparisons of profiles (Bowers et al. 2004; Antonov
and Mewes 2008). The method was also applied to protein domains rather then whole
sequences (Pagel et al. 2004b). Work by Ranea utilised domain information from the Gene3D
database to create phylogenetic profiles of the presence and absence of structural domains
within genomes (Ranea et al. 2007). This method thus bypasses the problem of identification
of genes that are functionally homologous by focussing on the presence and absence of
predefined domains within proteins. Chen and Vitkup used examination of correlation
coefficients to measure similarity in phylogenetic profiles (Chen and Vitkup 2006). They
observed that the method was successful in identifying genes that were members of the same
metabolic pathways (Chen and Vitkup 2006).
As a tool phylogenetic profiling could be used to detect errors in genome annotation
through the detection and displays of gene absences, which are not plausible in closely
related species. A similar approach has in fact been used by Pinney to detect and annotate
enzyme-coding genes in the protist E. tenella (Pinney et al. 2005).
Other extensions to the method involved the utilisation of the phylogenetic
relationships of the organisms include work by Barker and Pagel (Barker and Pagel 2005).
30
Chapter 1
This method made use of an explicit phylogeny and ancestral reconstruction over the
phylogeny based on a continuous-time Markov model. The likelihood of a model of
dependent or contingent evolution was compared with the likelihood of a model of
independent evolution over the phylogeny. This method was then further extended by
investigating the effects of constraining the rate at which genes could be acquired over the
phylogeny (Barker et al. 2007).
Other methods of incorporating phylogenetic information included the work by Vert
(Vert 2002), which utilised support vector machines, as well as the work by Cokus (Cokus et
al. 2007), which utilised phylogeny as a heuristic by ordering profiles by the phylogenetic
closeness of the organisms involved.
1.5.4.2.2.5 Comparative methods
Comparing phylogenetic profiles over a phylogenetic tree can be considered to be an
application of the comparative method to traits at the molecular level. The comparative
method is a well-established method in biology (Harvey and Pagel 1991). The fundamental
idea of underpinning the comparative method is how the state of one factor (which can be a
trait or environmental condition) influences the state of another over the context of a
topology of a phylogenetic tree (Maddison 1990). Testing for correlations without
considering the phylogeny will detect correlations in gene content based on phylogenetic
relationships rather than functional linkage. For example the set of all genes that are intrinsic
to the class Mammalia will share similar phylogenetic profiles. This does not however
suggest that they are all functionally linked.
There are a number of tests that have been developed in order to test the correlations
in the states of traits over a phylogeny. Ridley (Ridley 1983) developed one of the earliest of
these tests. This test involved the construction of a 2x2 contingency table where the state of
each trait was considered as a categorical variable defined at each node in the tree. The
method assumed that the construction of an accurate phylogeny and accurate reconstruction
of ancestral context for each node within the phylogeny. Ridleys method did not however
differentiate between dependant and independent variables in measuring the significance of a
given set of changes (Maddison 1990). The method did not take into account the sequence of
changes in the states of traits (i.e. was a change in state A followed by a change in state B or
vice versa). This makes the results of the method difficult to interpret (Maddison 1990).
Joe Felsenstein (Felsenstein 1985b) developed another test for correlations in traits
over a phylogeny. This test was developed to measure continuous data and modelled changes
31
Chapter 1
over a tree as a Brownian process. Another test for detection of correlations in traits and/or
external environmental conditions was devised by Grafen. This test was a phylogenetically
corrected regression, which did not rely on any form of ancestral reconstruction (Grafen
1989).
Maddison developed a similar test to Ridleys in 1990 (Maddison 1990). It however
did distinguish between dependant and independent variable by defining areas of a phylogeny
to be in state A or state B depending on the state of one of the traits under consideration. The
test then measured how many of the changes in the other trait occurred in the area of the tree
that was in state A compared to how many changes were possible over the whole tree.
One of the issues with the tests described above was the fact that none of them
integrated information on branch lengths of the phylogeny. This meant that the probability of
a change in the state of a given trait was equally likely over a branch of a phylogenetic tree
regardless of its length. However clearly a change on a short branch is less likely than a
longer branch. Work by Pagel took this into account by integrating branch lengths into a test
for correlated evolution (Pagel 1994). The parameters defined by this work were utilised by
Barker and Pagel in their approach to phylogenetic profile analysis (Barker and Pagel 2005).
1.5.4.2.2.6 Mirror trees
Another method of detection potential protein interactions is known as mirror trees. This
method involves the detection of protein interactions through the construction and
comparison of phylogenetic trees of proteins with a single genome (Pazos and Valencia
2001). The rationale behind this method is similar to that of phylogenetic profiling. However
correlation is sought not in the presence and absence of homologous genes but in the pattern
of sequence evolution of interacting proteins. Trees are examined by examining distance
matrices of homologous sequences for correlations. These matrices are the inputs used in the
formation of the trees in question. The phylogenetic tree of any given protein in a genome
will however carry signal from the speciation events, which shaped the genome of the
organism in question. An upgrade of the method has been developed to take into account this
background similarity (Pazos et al. 2005). Hakes and others have however pointed out that
the evolutionary pressures as well as the functional constraints on duplicated genes differ
depending whether the mechanism of duplication was whole genome duplication or smallscale duplication (Hakes et al. 2007). This indicates that sequence divergence and functional
evolution are not necessarily correlated (Robertson and Lovell 2009). Thus any similarity in
32
Chapter 1
the phylogenetic trees of functionally linked genes is more likely to be due to chance or as
mentioned above due to background similarity.
1.5.5 Storage of functional information
With the exponential increase in sequence data that has been generated through the 2000s
there have been a number of attempts with which to organise and contextualise function
information surrounding genomic entities.
1.5.5.1 GO
A notable attempt to do this has been the establishment of a controlled vocabulary with which
to describe the functional role of a gene as well as its physical location within the cell. The
vocabulary is known as the Gene Ontology (GO) (Ashburner et al. 2000). GO associates a set
of terms with gene products. These terms are known as GO terms and fall into three general
domains. These are
Cellular component: This is the physical location within the cell where the gene
product is generally to be found.
Biological process: This is the biological pathway or process that the gene product has
been localised in.
Molecular function: This is a lower level to the biological process domain and
includes the specific molecular capabilities of the molecule in question. An example
of molecular function could be the ability to bind a particular metal.
Terms are organised as a network starting from the root terms defined above. As the
network is traversed starting from a root term, terms become more specific, i.e. if term B
is directly below term A in the ontology then term B is a subclass of term A.
1.5.5.1 KEGG
Another database that localises gene products within functional pathways is KEGG (Kyoto
Encylopedia of Genes and Genomics) (Kanehisa 1997; Kanehisa et al. 2006). KEGG
maintains a list of functional pathways of processes that occur within the cell. These
processes are arranged in a similar manner to GO in that they start from general categories
and become more specific.
33
Chapter 1
1.6 Transcriptomics
The transcriptome of a cell can be considered to be the sum total of its genome that is
transcribed into RNA. Studying the transcriptome can also yield insights into the
functionality of gene products.
1.6.1 Microarrays
At the transcriptomic level the putative function of a gene can be at least partially determined
through establishing the association of the expression of a particular gene with a particular
external condition or treatment. This can be achieved through the use of glass slides known
as microarrays (Mount 2004). These slides have oligonucleotides, which are subsections of a
set of genes attached to them. Cells of the organism under study are subjected to variable
experimental conditions. mRNA is then extracted from these cells, converted to cDNA and
fused with a unique florescent dye. By examining the relative degrees of florescence for the
colours associated with the two versions of the cDNA of the gene of interest it is possible to
measure levels of gene expression in response to a given experimental condition. A variant of
this involves using full cDNA molecules as the contents of the chip.
1.6.2 Other methods for transcriptome examination
Expression levels for a given environmental condition can also be measured through direct
sequencing and counting through use of the SAGE (Serial analysis of gene expression). In
this method mRNA is extracted from the cells of interest. A small section is excised from
each mRNA molecule. A tag is then connected to each separate subsection. These
subsections are then amplified and the tags counted thus providing a measure of gene
expression levels (Velculescu et al. 1997). Another protocol for sequencing mRNA to detect
gene expression levels has also been developed. This protocol is known as RNA-Seq and is
made feasible through the utilisation of the high throughput nature of next generation
sequencing (Wang et al. 2009b).
1.7 Proteomics
Proteomics in a similar way to genomics and transcriptomics is the study of the full protein
complement produced by a cell. The proteomic level is the point where the connection
between macromolecules and measurable phenotypes is first bridged. Proteins can be
considered as making up close to the totality of both structural (e.g. microtubules) and active
(e.g. enzymes) components of a cell. The function of a protein can be determined by the
determination of its structure and/or the determination of its interaction partners.
34
Chapter 1
1.7.1 Protein Structure
There are two main methods utilised to determine the three dimensional structure of a protein
molecule (Brown 2006). These are:
X-Ray crystallography: This procedure involves the production of a crystal from the
protein of interest. X-rays are then fired through this crystal to acquire a backscatter
diffraction pattern. This diffraction pattern can then be used to reconstruct the
structure of the protein. X-ray crystallography is limited by the fact that it requires the
protein to be able to crystallise (Brown 2006).
35
Chapter 1
The other form of interaction between proteins is indirect interactions. Examples of these
could be two proteins that have a role in a given metabolic pathway but whose production is
temporally and spatially separated. Examples of indirect interactions include the interaction
between SHC-transforming protein and mitogen-activated protein kinase 1 over several steps
of the insulin-signalling pathway (Sasaoka and Kobayashi 2000).
The full collection of all protein interactions within a cell has been labelled the interactome.
1.7.2.1 Experimental detection of protein interactions
Protein interactions can be detected using a variety of techniques. The main techniques
include:
Yeast two-hybrid: In order to detect protein interactions one widely used (Marcotte et
al. 1999) method is the yeast two-hybrid technique. This technique exploits the S.
cerevisiae GAL4 transcription factor. This transcription factor has two domains that
require physical proximity in order to operate. One of these domains binds DNA and
the other domain is an activator for the transcription factor. A protein interaction can
be detected by fusing two genes of interest to both of these domains respectively on
separate plasmids and insertion of these plasmids into a yeast cell with a reporter gene
upstream of the GAL4 transcription factor-binding site. Reporter gene transcription is
only possible if the protein products of the two genes of interest were able to maintain
a physical interaction (Griffiths 2002). The primary drawbacks to this method are the
facts that all interactions must take place in the nucleus removing a large number of
proteins from their native cell compartment and that only binary protein interactions
can be tested for (von Mering et al. 2002). The yeast two-hybrid method does have a
high rate of false positives. One reason for this is that pairs of proteins that stick
together are not necessarily ever expressed at the same time or in the same tissue
(Vidalain et al. 2004). Also some proteins such as heat shock proteins are inherently
promiscuous in their binding affinities (Vidalain et al. 2004).
Proteome chips: In a manner similar to the use of microarrays described above for the
measurement of gene expression levels microarrays can also be used with proteins.
By printing translations of 5800 ORFs from S. cerevisiae on to a microarray chip Zhu
and others (Zhu et al. 2001) were able to detect 33 novel interactions for the multi
functional calcium binding protein calmodulin. The drawbacks to this method are that
it is low throughput and again is restricted to binary interactions.
36
Chapter 1
Mass spectrometry of purified complexes: In order to detect interactions that are not
binary, complexes of proteins can be isolated using techniques such as tandem affinity
purification. This technique entails the tagging of a protein of interest with a tag that
allows the purification of the main protein and any complex partners that it might
have. These complexes can be characterised through the use of mass spectrometry
(von Mering et al. 2002).
37
Chapter 1
al. 2009; Scott and Barton 2007). This system makes novel predictions through the
combination of different informative features.
Chapter 4 describes the construction of the data filter, which is based on Dollo
parsimony. The filter reduces the size of the overall search space facilitating the use of the
method for whole genome comparisons. This is achieved through the elimination of pairs of
proteins, whose function cannot be detected via examination of patterns of presence and
absence.
Chapter 5 presents a network of predictions generated as a putative human
interactome of proteins, which are susceptible to this line of enquiry. This network is
analysed for consistency with known data. A set of novel predictions is presented.
Finally Chapter 6 will sum up this work and present details on potential future
directions.
38
Chapter 2
Chapter 2
Reconstruction of eukaryotic phylogeny as precursor to comparative
analysis
2.1 Introduction
Examination of the evolutionary histories of organisms is a fundamental step for any form of
study of biological function as adaptation can only be examined within an historical context
(Harvey and Pagel 1991). As a phylogeny is by definition an evolutionary history of species
(Harrison and Langdale 2006) it is a necessary step within the process of a comparative
study. In terms of examination of changes in gene content within a probabilistic framework it
provides the necessary topology over which such changes occur. This is a fundamental
parameter in any such model.
2.1.1 Homology
The fundamental object of any phylogenetic study, whether molecular or morphological, is
the comparison of homologous structures within the organisms under consideration. When
genomic data is under consideration homologous structures within organisms correspond to
those genomic elements, which were present in the last common ancestor of the set of
organisms under consideration. These elements can provide a measure of divergence (Fitch
1970). These elements if functional (which is implied by conservation) can either maintain
their ancestral function or if sufficiently diverged have a new (or no) function. In discussions
of elements of genomes (genes) there are a number of subclasses of homologous
relationships. These are:
Orthology: Genetic elements are orthologous if they are the direct product of
divergence from a common ancestral species (speciation) (Fitch 1970).
Paralogy: Genetic elements are paralogous if they are the product of a duplication
event within a given species. Mechanisms of duplication include retrotransposition
(insertion of reverse transcribed RNA back into a genome) and unequal crossover
leading to tandem duplication of a portion of a chromosome (Hurles 2004). It is
thought that these duplication events are a major force in creating and broadening
genetic repertoires (Zhang 2003).
Xenology: Genetic elements are xenologous if they are the product of a direct
exchange of DNA between organisms (Fitch 2000). These exchanges are known to be
far more prevalent in prokaryotes given their lack of a true nucleus and the existence
39
Chapter 2
of plasmids (free floating segments of DNA) in some prokaryotes. Genes have also
been observed as xenologous in eukaryotes. Xenologous genes in eukaryotes can be
acquired via organelles, which are the product of endosymbiosis such as the
mitochondrion and chloroplasts (Blanchard and Lynch 2000).
It is important for purposes of phylogenetic reconstruction to be able to draw a distinction
between genes which are paralogous and which are orthologous. If paralogous genes are
compared between species the distance between them does not necessarily reflect the overall
genetic divergence between the species under consideration. Genetic elements that are
orthologous provide information on levels of divergence between speciation events whereas
those that are paralogous provide data on duplication events.
A converse relationship to homology is that of analogy where through convergent
evolution genes that share no common ancestry develop and maintain sequence similarity due
to similar demands placed on the organisms in question by their environment. A classic
example of this at the molecular level is that of the convergent evolution of the enzyme
lysozyme in both the langur monkeys of the Indian subcontinent (Semnopithecus entellus)
and ruminants due to the similar requirements imposed by a herbivorous diet (Swanson et al.
1991).
2.1.2 Molecular evolution
The fundamental idea at the heart of modern biology is that of random mutations guided by
natural selection producing adaptation, which allow an organism to thrive in a given
ecological niche. The large-scale study of evolution at a molecular level has only recently
become possible due to advance in DNA sequencing technologies. This has been extremely
useful as random mutations occur at the molecular level and also DNA/ amino acids are the
fundamental comparable common denominator across morphologically and physiological
diverse species (Nei and Kumar 2000).
At the DNA level there are four basic types of mutation (Nei and Kumar 2000). These are:
Deletions: Deletions are the opposite of insertions. Deletions within an ORF can also
cause a frame shift (Brown 2006).
40
Chapter 2
Substitutions: These mutations are also referred to as point mutations and involve the
substitution of a nucleotide with any other nucleotide. Substitutions do not necessarily
have to involve a single nucleotide (Brown 2006). There are two types of
substitutions transitions which entail the replacement of a purine with another purine,
e.g. A to G or a pyrimidine with another pyrimidine, e.g. C to T. The other form of
substitution is a transversion, which involves the replacement of a purine with a
pyrimidine or vice versa (Nei and Kumar 2000).
41
Chapter 2
In the case of a phylogenetic tree leaf nodes are extant taxonomic units or taxa and
internal nodes are proposed hypothetical common ancestors as illustrated in Figure 2.1. A
subsection of a phylogenetic tree can be referred to as a clade (Nei and Kumar 2000).
Figure 2.1: Sample phylogenetic tree. In this tree the extant taxa are nodes A, B and C
while node E is an ancestral node for A and C.
2.1.3.1 Species trees and gene trees
There are two main types of phylogenetic tree that are commonly investigated. These are:
Species trees: The topology of these phylogenetic trees represents the branching order
of species. Thus internal nodes are hypothetical common ancestors for the nodes that
succeed them. The split at these ancestral nodes represent speciation events. A
42
Chapter 2
speciation event is considered to be the moment in time when two species were
reproductively isolated from each other (Nei and Kumar 2000).
Gene trees: Gene trees measure the degree of divergence between homologous genes
within and/or across species. Thus internal nodes in a gene tree represent a
hypothetical gene that existed prior to a mutation event that created its two immediate
descendants (Nei and Kumar 2000).
Figures 2.2 and 2.3 illustrate the differences between gene trees and species trees.
43
Chapter 2
44
Chapter 2
45
Chapter 2
t
(1)
i= 3
The number of possible rooted bifurcating topologies B(t) can be counted using the following
formula:
B(t) =
(2)
46
Chapter 2
A
Table 2.1: Rates of nucleotide substitution for the Jukes-Cantor model (Nei and Kumar
2000).
The methods of tree estimation that utilise these models of evolution include the
distance method, tree estimation by Bayesian methods, and tree estimation by maximum
likelihood (Felsenstein 2004).
In distance methods an evolutionary model provides a measure of evolutionary
distance between taxa, whereas in probabilistic methodologies such as maximum likelihood
and Bayesian methods they provide a measure of probability for a given set of substitutions
between taxa. Evolutionary models can be calculated via a priori assumptions about the
evolutionary process or can be constructed empirically by examining the rate of observed
substitutions in homologous sequences. Examples of empirically calculated substitution
matrices for amino acids include the PAM matrices created in the seminal work by Margaret
Dayhoff (Dayhoff et al. 1978) and more recently the WAG (Whelan and Goldman 2001) and
LG matrices (Le and Gascuel 2008).
2.1.4 Detection of homology in molecular data
In order to construct a phylogenetic tree, which represents the evolutionary history of a set of
taxa using molecular data, it is necessary to compare homologous sequences. More
specifically it is necessary to detect orthologous genes/proteins. These genes/proteins are the
most appropriate measure of genetic divergence between species, as an equal level of genetic
divergence will have occurred since the speciation event causing the split.
There are a number of algorithms, which are utilised in the selection of homologous
genes/proteins and their subsequent classification as orthologous or paralogous. These
include:
47
Chapter 2
Reciprocal Best Hits (RBH): This procedure is implemented by the COGs (Tatusov et
al. 2003) database hosted by the NCBI. The underlying rationale of the algorithm is
that orthologous genes between two species will possess more similarity with each
other then with any other gene. This similarity is generally established using pairwise
sequence alignment algorithms such as BLAST (Altschul et al. 1990) or the SmithWaterman algorithm (Smith and Waterman 1981).
InParanoid: This algorithm extends the idea behind RBHs by using them to seed
orthologous clusters, and then by an application of an iterative inclusion process
constructs a set of gene/protein families (Remm et al. 2001).
OrthoMCL: This process also utilises RBHs as seed pairs for clusters. Similarity
relations between gene/proteins are then established as a graph and additional
paralogous sequences are determined through a process of graph clustering (Li et al.
2003).
Reciprocal smallest distance (RSD): This procedure does not utilise RBHs and
instead, for a set of hits for a given query protein, over a given E-value (Expect
value), conducts pairwise alignments between each of the hits and the original query.
Hits that are alignable to a given threshold are then subjected to further analysis to
calculate the number of amino acid substitutions or distance between them and the
original query. The hit with the shortest distance is then used to reverse the process. If
the reversal yields the original query then the two sequences are declared orthologous
(Wall et al. 2003).
perform better than its rivals (Hulsen et al. 2006). This work showed the Inparanoid
algorithm tied as the best performer with simple reciprocal best hits at identification of
orthologs. However reciprocal best hits in practise only yield one to one orthologous
relationships (Hulsen et al. 2006). This reduces the coverage of the method (Hulsen et al.
48
Chapter 2
2006). OrthoMCL was shown to perform a close second to the Inparanoid algorithm in
benchmarking tests (Hulsen et al. 2006). Subsequent benchmarking work (Altenhoff and
Dessimoz 2009) showed that OrthoMCL outperformed Inparanoid to an extent at lower
levels of specificity but higher coverage. However at points benchmarking was applied to
data and organisms common to both reviews the results were seen as broadly congruent
(Altenhoff and Dessimoz 2009).
2.1.5 Multiple sequence alignment
Given a set of orthologous sequences further processing is required in order to convert them
into a suitable input for a phylogenetic tree estimation procedure. This input is known as a
multiple sequence alignment (MSA) (Edgar and Batzoglou 2006). The process involves
creating an optimal alignment between three or more protein sequences. Insertions and
deletions between orthologous proteins are represented by introducing gaps into the
alignment. Alignments are scored through the use of substitution matrices. The process
converts orthologous sequences into a rectangular array where each column of the array
corresponds to a homologous attribute between the taxa under consideration (Edgar and
Batzoglou 2006).
Forms of multiple sequence alignment include.
Iterative: In order to reduce the errors introduced by the progressive approach to MSA
the iterative approach realigns sub-groups of the sequences repeatedly (Mount 2004).
Examples of iterative MSA programs include MUSCLE (Edgar 2004) and DIALIGN
(Morgenstern et al. 1998). The performance of the iterative approach can be improved
by the inclusion of consistency information between the growing MSA and the pre-
49
Chapter 2
computed pairwise alignments used by some of algorithms within the MAFFT
(Multiple alignment by fast Fourier transform) program (Katoh et al. 2002).
The quality of a multiple alignment is crucial to the accuracy of the phylogenetic tree
created via its analysis (Blair and Murphy 2011). This is especially true when there are gaps
in the alignment (Talavera and Castresana 2007). Thus benchmarking tests have been carried
out to examine the performance of various algorithms currently available. The results of these
have found that MAFFT (running in its iterative, consistency enhanced mode) using the
Smith-Waterman algorithm (Smith and Waterman 1981) for its initial pairwise alignment
outperformed its nearest rivals (Ahola et al. 2006; Nuin et al. 2006). This mode of MAFFT is
known as MAFFT-L-INS-i.
2.1.5.1 Multiple sequence alignment quality filtration
Given the effects of MSA quality on phylogenetic analysis it is argued that filtration of
areas, which are problematic to align, will improve the outcome of subsequent phylogenetic
analyses (Talavera and Castresana 2007). It is common practise to edit MSAs by hand before
analysing them further though it is considered that this makes all results thus gained
irreproducible through the subjectivity of the overall process (Blair and Murphy 2011). Thus
this process has been semi automated by programs such as Gblocks (Talavera and Castresana
2007) and Trimal (Capella-Gutierrez et al. 2009). These programs will retain sections of
MSAs, which are highly conserved and remove gaps in the alignment.
Gblocks will either remove all gaps in its stringent mode or only remove gaps if they are
present in more than half the sequences in the alignment in its relaxed mode (Talavera and
Castresana 2007). Trimal will remove columns from an alignment based on a conservation
threshold defined by the user, i.e. how much of the original alignment does the user wish to
conserve (Capella-Gutierrez et al. 2009). In benchmarking tests optimum performance for
Gblocks in enhancing tree estimation was observed using its relaxed mode (Capella-Gutierrez
et al. 2009).
2.1.6 Methods to estimate phylogenetic trees
The focus of this section as mentioned above shall be on the analysis of molecular data
though the methods described are applicable to any form of measurable polymorphic trait.
These data provide a measure of distance between the species under consideration.
50
Chapter 2
The first subdivision in types of methods of phylogenetic analyses is between discrete
character state and distance matrix methods (Salemi and Vandamme 2003). Discrete
character state methods examine the differences in state of a set of discrete characters or
traits. Distance matrix methods utilise the distance between sets of data through the creation
of a matrix of pairwise distances and application of clustering techniques. Subtypes of the
character state method include the maximum parsimony method that does not utilise an
explicit model of evolution and maximum likelihood, which conversely does (Salemi and
Vandamme 2003).
2.1.6.1 Distance methods
Distance methods were originally developed to construct phenograms, i.e. (diagrams which
reflect the similarity between a given group of taxa without consideration of
ancestor/descendant relationships (Salemi and Vandamme 2003; Sneath and Sokal 1973) as
opposed to phylogenies. Distance methods however can also be applied to elucidating
phylogeny under the assumption of equal rates of mutation in cases where a quick initial
result is required.
Distance methods of phylogeny depend on the construction of a matrix of pairwise
distances for the trait data of the organisms under consideration. This data is generally
nucleotide and or amino acid sequence data though the method is also applicable to any other
form of discrete descriptive data. In the case of amino acid or nucleotide data distances are
estimated according to evolutionary models, which allow a meaningful calculation of the
evolutionary distance between two species.
The simplest form of evolutionary distance measure is the proportion of differing sites
between two sequences p. This is calculated through a simple count of differing sites nd and
division by the total number of sites n as shown in Equation 3 (Nei and Kumar 2000).
p=
nd
n
(3)
substitutions accumulate per site. Thus in order to represent this information substitutions
can be modelled as a Poisson process over time and then the probability of k mutations over t
time can be can be calculated by the standard Poisson distribution function where # = the rate
of mutations / unit time and e = the base of the natural logarithm (Nei and Kumar 2000).
51
Chapter 2
e" # #k
p(k;t) =
k!
(4)
This probability can then be used to calculate a distance between two sequences. This
distance is referred to as the Poisson corrected distance (Nei and Kumar 2000).
The Poisson corrected distance assumes a homogenous rate of mutations /
substitutions over a molecular sequence. This assumption however is not true as different
areas of a sequence (coding or not coding in the case of nucleotides, for example) will be
subject to differing selective pressure hence differing mutation rates (Nei and Kumar 2000).
This information is integrated into calculations of distance via the observation that
variation in rates of substitution over a sequence follows a gamma distribution (Nei and
Kumar 2000).
Having created a matrix of pairwise distances between the sequences under
comparison this matrix can then used to generate a phylogenetic tree via clustering.
A commonly used form of clustering in the generation of distance-based trees is
neighbour joining. This algorithm follows the following steps (Brown 2006):
Construction of a fully multifurcating star shaped tree including all taxa under
consideration.
The selection of a random pair of taxa and removing them from the star to
form a tree consisting of a clade containing that pair and a clade containing the
rest of the star.
Iteration of this process for all possible pairs storing the results of the branch
length calculation.
Identification of the pair, which yields the first interim tree with the shortest
branch length.
This pair is now placed on their own branch and the process is iterated until a
fully bifurcating tree is retrieved.
Another method of tree estimation involving distance matrices is least squares fitting
in which for each tree the residual sum of squares is calculated between pairs of taxa. This
method is known as the Fitch-Margoliash method. This involves applying the following
equation (Nei and Kumar 2000).
52
Chapter 2
(5)
Where dij is the observed distance in the matrix between taxa i and taxa j and eij is the
patristic distance between the taxa. The patristic distance between two taxa is the sum of the
branch lengths that make up the shortest path between the two taxa. The tree with the lowest
Rs is selected by the method. Generally tree space is searched using a heuristic search
method as described below in Section 2.1.6.3.
Other standard techniques for this process are clustering methods such as UPGMA
(unweighted pair group methods with arithmetic means), which group organisms by degree
of closeness in the matrix. The underlying assumption of UPGMA is that the evolutionary
process occurs at a consistent pace, i.e. follows a molecular clock (Felsenstein 2004). Thus in
cases where data does not follow a molecular clock, UPGMA will deliver misleading results
as it will cluster species on short branches with each other (Felsenstein 2004).
Another commonly applied method is minimum evolution, which creates a tree where
the overall amount of evolution (measured by the total branch lengths of the tree from root to
tip) is minimised (Salemi and Vandamme 2003). Again tree space is traversed by heuristic
search as described below.
Distance methods are comparatively fast compared to character based methods and
given a dataset with relatively constant rates of evolution and closely related taxonomic units
fairly accurate (Felsenstein 2004). However they suffer from a systemic issue where if the
taxa under consideration display variability of rates of evolution along a sequence at different
points in a tree this cannot be detected as all distances between the sequence are calculated
locally, i.e. between adjacent species (Felsenstein 2004).
2.1.6.2 Discrete character state methods
Discrete state character methods operate on matrices populated with assigned attributes or
characters to each taxon under consideration. Possible trees are then evaluated against this
matrix in an attempt to satisfy an optimality criterion (Salemi and Vandamme 2003). One of
the two most popular optimality criterions is parsimony, which entails minimisation of the
amount of change required over a given tree to produce the data observed in the matrix. The
other widely used criterion for selection of trees is likelihood. This method frames the tree as
a hypothesis for the matrix of observed data and evaluates its likelihood given the matrix of
53
Chapter 2
observed data (Felsenstein 2004). Maximisation of the likelihood function yields the
optimum tree.
2.1.6.2.1 Maximum Parsimony
Using parsimony, as a criterion for judging potential trees was first introduced by Camin and
Sokal in 1965 (Camin and Sokal 1965). The rationale behind considering a tree that is more
parsimonious is based on the principle of Ockhams razor, which can be stated, as a simpler
explanation for an observed phenomenon is to be preferred to a more complex ad hoc
explanation (Steel and Penny 2000).
Specific variants of parsimony that can be utilised are (Felsenstein 2004):
Parsimony on an ordinal scale: this deals with the case where changes in a
multi state character are considered on an ordinal scale. Thus only changes
that are adjacent are allowed (Felsenstein 2004).
Evaluating the number of character changes required over a particular tree for a given
character matrix is computationally easy and can be calculated rapidly through applications
of dynamic programming algorithms such as:
54
Chapter 2
The Fitch algorithm (Fitch 1971): operates by carrying out a post order
traversal of the phylogenetic tree (Felsenstein 2004). At each internal node the
set of potential ancestral states is set to either the intersection of the states of
its immediate descendant nodes if such an intersection exists. If no such
intersection exists then the state of the nodes is set to the union of the states of
the two descendant nodes.
The Sankoff algorithm (Sankoff 1975): is similar but not identical to the Fitch
algorithm (Felsenstein 2004). A cost matrix is created which stores the cost of
all possible changes of state within the context of the data under consideration.
Ancestral node states are then assigned by selecting the state with the minimal
cost.
(6)
!
55
Chapter 2
Where k is an arbitrary constant. Use of this constant allows relative comparison of
likelihoods (Edwards 1992). To paraphrase an example from (Durbin 1998) in the case of a
die if our hypothesis is that the die is fair then the probability of any outcome is equal to 0.16.
If we go on to roll 5 sixes then this forms our observed data. The likelihood of the hypothesis
is then proportional to 0.165 or 0.000104. Hypotheses can thus be judged on their relative
abilities to explain observed results. A hypothesis with a higher estimate of the probability of
rolling a 6 would be better fit to the observed data in the case of the die. Thus likelihood
provides a framework with which to select a hypothesis or model appropriate to the observed
data.
In the case of phylogeny each tree is a hypothesis explaining the distribution of the
traits under consideration. The phylogeny with the maximum likelihood is selected as the
optimal tree. The likelihood of a tree can be measured through the application of a
substitution model of evolution, which models the probability of individual evolutionary
events over the tree. Empirically calculated substitution models can be used as a substitute for
the calculation of a set of probabilities, which permits the application of more generalised
rules of evolution to each individual phylogenetic study. Empirical models of evolutionary
events can be created through the examination of homologous sequences in different species.
Models currently in use for amino acid based phylogenies include the WAG (Whelan and
Goldman 2001) and LG (Le and Gascuel 2008) substitution models.
If a model is badly specified and a poor fit for the data then likelihood methods can
return an inaccurate tree with high statistical support (Keane et al. 2006). There are a limited
number of cases where parsimony methods can outperform likelihood-based methods, which
has been called the inverse Felsenstein zone, or Farris zone (Siddall 1998). It has been shown
however that these cases are extremely rare in real data (Swofford et al. 2001) and in cases
where it is computationally feasible maximum likelihood has become the one of the dominant
paradigms in phylogeny reconstruction.
2.1.6.2.3 Bayesian Methods
Another criterion related to likelihood is the posterior probability of a tree given a matrix of
observations and a prior probability for the tree. The posterior probability of a hypothesis is
the probability of the hypothesis being true given some observed data. The posterior
probability of a tree given a multiple alignment is calculated through the application of Bayes
theorem, which is defined as:
56
Chapter 2
P(X | Y ) =
P(Y | X)P(X)
P(Y )
(7)
Where X and Y are separate events and P(Y|X) is the conditional probability of event Y given
event X has occurred. P(X) is known as the prior probability of event X. P(X|Y) is the
posterior probability of event X given event Y has occurred. P(X) represents a subjective prior
belief in the probability of X occurring.
In the case of phylogenetic analysis X is a phylogenetic tree and Y is a given multiple
alignment. It is however non-trivial to evaluate the posterior probabilities over all possible
tree topologies exhaustively (Huelsenbeck et al. 2001). This process had been made feasible
by sampling the distribution of posterior probabilities of trees. The posterior probability of a
particular tree is measured as the amount of times it is visited over traversal of tree space.
Tree space traversal is facilitated by the use of Metropolis-coupled MCMC (Markov chain
Monte Carlo) methods first introduced by the doctoral work of Li and Mau (Pickett and
Randle 2005).
The algorithm returns a set of trees sampled from the posterior distribution. An
individual phylogeny is then generally assembled from the returned sample through using
majority rule consensus methods (Cranston and Rannala 2007).
Bayesian methods suffer from the potential source of bias of prior probabilities
(Holder and Lewis 2003). This issue can be ameliorated through the use of flat or
uninformative priors. Flat priors can however still bias a Bayesian phylogenetic study
towards trees with particular configurations of clades (Pickett and Randle 2005).
2.1.6.3 Heuristic search methods
Given the large number of possible topologies possible for even a small number of taxa the
estimation of phylogenetic trees is a problem that is intractable by brute force searching. Thus
the space of all possible trees is usually searched heuristically (Felsenstein 2004). What this
entails is the selection of a random first tree. This tree is then evaluated on the basis for
whatever measure that has been defined to evaluate the quality of the tree. Examples of
possible quality measures for a phylogeny include as previously discussed parsimony,
likelihood and distance. The tree is then altered thus moving to a new point in tree space.
This new tree is then evaluated. This process is then iterated until a local optimum point has
57
Chapter 2
been reached within the space. This point is not guaranteed to be a global optimum within the
space (Felsenstein 2004).
Examples of alterations/moves that are used to traverse tree space include (Felsenstein 2004):
Nearest neighbour interchange (NNI): This process involves the swapping of adjacent
branches within a tree. This is a local rearrangement of the tree.
Subtree pruning and regrafting (SPR): This process involves the removal or pruning
of a subtree from an overall tree and reattaching it at another point. As opposed to
NNI this is a global rearrangement of the tree.
Tree bisection and reconnection (TBR): This involves the deletion of an interior
branch to split a tree into separate trees and then all possible connections are made
between the branch set of the first tree and the second. This is also a global
rearrangement of the tree.
Global rearrangements are more radical moves within the tree space and thus are less
likely to stabilise in local optima. Modern phylogeny estimation programs generally
provide the options to carry out either form of rearrangement. The advantage of using
local rearrangements is greater speed in arriving at the optimum tree. Examples of
programs, which offer this choice, are a number of programs within the PHYLIP suite
(Felsenstein 1989) and PhyML (Guindon and Gascuel 2003). PhyML is generally as
accurate as other phylogeny estimation programs while being considerably faster
(Dereeper et al. 2008). Programs within PHYLIP can carry out multiple searches through
the space jumbling the order of the taxonomic data to widen space coverage. The
programs within PHYLIP that offer heuristic search are:
PROTPARS
DNAPARS
DNACOMP
DNAML
DNAMLK
PROML
PROMLK
RESTML
58
Chapter 2
FITCH
KITSCH
NEIGHBOR
CONTML
PARS
MIX
DOLLOP
Chapter 2
AICi = "2 ln Li + 2 pi
Where Li is the likelihood of model i and pi is the number of parameters in model i. The BIC
has a higher penalty for parameter richness than the AIC and is calculated via the following
equation (Felsenstein 2004).
(9)
!
60
Chapter 2
Where Li is the likelihood of model i and pi is the number of parameters in model i and n is
the number of data points in the dataset.
An example of a model selection procedure involving likelihood, utilised by a model
selection tool ModelGenerator is as follows (Keane et al. 2006).
The construction of a simple guide tree using neighbour joining on the dataset.
Each model to be evaluated is then examined over this guide tree and the dataset to
calculate the likelihood of that model.
The model with lowest AIC/BIC is presented as the best model for the given dataset.
The NNI distance: This distance can be considered an edit distance, analogous
to the Levenshtein distance used to compare strings of text. It is the number of
NNI operations it would take to transform one of the trees into the other
(Felsenstein 2004).
The Branch Score distance: This measure uses branch lengths as well as
topology to calculate the distance between two trees (Felsenstein 2004).
61
Chapter 2
Another way to measure the quality of a pair of estimated tree relative to a given dataset is
use of the Kishino-Hasegawa (KH) test. This is a test of how well individual homologous
sites within a dataset support a given tree in contrast to another tree (Goldman et al. 2000).
Both trees are selected a priori as the possible best hypotheses for an observed dataset. The
test was first introduced in the work (Hasegawa and Kishino 1989). The underlying rationale
is that if the trees are equally well supported by the dataset. Thus using the notation provided
in (Goldman et al. 2000) the test can be carried out via the following procedure.
Given two trees T1 and T2 calculated by a given quality criterion, e.g. parsimony or
likelihood.
Assuming for the purposes of explanation that the quality criterion is likelihood, the
likelihoods of T1 and T2 (with respect to a given dataset D) are L1 and L2 respectively.
The underlying hypothesis of the test is that T1 and T2 do not explain D equally well or
$ %0.
In order to test these hypotheses it is necessary to calculate how extreme the observed
value of $ is with respect to the distribution of $.
The likelihoods of the trees T1 and T2 are then recalculated for each bootstrapped
dataset.
62
Chapter 2
This provides a distribution for $ against which the position of the initial $ can be
compared via a two-tailed test. A two tailed test is used as there is no a priori
expectation of which tree is to be preferred (Goldman et al. 2000).
The test assumes that all columns within the dataset are independent and identically
distributed according the evolutionary history of the taxa under consideration.
The KH test is used to compare trees, which are selected a priori as possible best
explanations for a given dataset. To examine how well a given best estimated tree matches an
underlying dataset relative to another tree the SOWH (Swofford-Olsen-Wadell-Hillis) test
can be used in conjunction with the KH test. This test is an example of parametric
bootstrapping (Goldman et al. 2000). Essentially the test involves the construction of a tree
with a given quality criterion over a given dataset. Then this initial tree is used to create
multiple datasets, simulated using the parameters that define the tree. Each of these datasets
is then used to create a new tree. The likelihoods of these new trees can then be compared to
the likelihood of the initial tree relative to the simulated datasets. This creates a set of
likelihood differences, the distribution of which can be compared to the difference between
the initial tree and other trees of interest. This test is not widely used due the computational
demands of the construction of multiple trees and the construction of simulated datasets.
2.1.10 Phylogenetic analysis using gene presence
Leaving aside sequence data there are other aspects to the genomes of a set of organisms that
also provide signal indicating their evolutionary divergence. One of these aspects is proteome
content, i.e. which genomic features are present in a genome and which are absent (Snel et al.
1999). This aspect is essentially the phylogenetic profile of the genomic feature. The
presence and absence of the genomic feature can be treated as a binary trait and then used as
an input to apply standard phylogenetic tree estimation procedures. Methods for the
estimation of trees from discrete traits precede the methods described above for the analysis
of molecular sequence data. The issues and criteria for estimation procedures surrounding the
tree estimation from this form of data remain the same, as the underlying process for the
generation of the data is the same.
As an illustrative example consider 5 species with six genes. Any of these genes can either be
present or absent. An absence of a gene is coded as 0 and the presence of a gene is coded as
1.
63
Chapter 2
Thus given the following distribution of the genes over the 5 species named [A-E].
A
111010
111111
000001
111100
000000
A phylogenetic tree can be estimated from this data clustering these species. An example
tree reconstructed using Dollo parsimony as implemented in the PHYLIP package in the
program DOLLOP (Felsenstein 1989) is shown in the figure below.
Figure 2.5: Dollo parsimonious estimation of a phylogenetic tree from example data.
As species E is devoid of all six genes it is placed as an outgroup relative to the other 5.
2.2 Methods
In order to carry out a comparative study on human protein function a phylogeny was
constructed to provide a framework on which to evaluate the distribution of human proteins
over the eukaryotic kingdom. Sets of phylogenetic profiles, which detail this distribution,
were also generated.
64
Chapter 2
2.2.1 Data Selection
Given the computational nature of this project as well as the abundance of molecular data it
was clear that its use was to be preferred over morphological data. Also given the wide
morphological divergence of eukaryotes isolating individual features to be compared was not
considered a plausible option.
The next point of consideration was whether to utilise nucleotide or amino acid
molecular data. It is known that over long periods of evolution it is more likely for nucleotide
data to become saturated with multiple back substitutions as nucleotide data has four
potential changes of state at any given site as opposed to 20 potential changes in amino acid
data. This can lead to an underestimate of genetic distance (Harrison and Langdale 2006;
Salemi and Vandamme 2003). Thus despite nucleotide data outperforming amino acid data
over smaller time frames such as the period of time covering the divergence of the division
Angiospermae (Simmons et al. 2002) and of the sub phylum Vertebrata (Townsend et al.
2008) it was felt that amino acid data was a more appropriate choice as a measure of genetic
distance over all eukaryotes.
2.2.2 Data Acquisition
Having decided to utilise amino acid data the next step was to acquire usable data. The
protein sets of 54 eukaryotic genomes were downloaded on the 16th and 17th of August 2007
of which 41 organisms were accessed from the NCBI RefSeq database (using the Entrez data
retrieval interface at http://www.ncbi.nlm.nih.gov/sites/gquery) (Pruitt et al. 2005) and the
remainder from the Sanger Centre (ftp://ftp.sanger.ac.uk), Genoscope
(http://www.genoscope.cns.fr/spip/spip.php?lang=en), TIGR ((now the JCVI )
http://www.jcvi.org/), the Broad Institute (http://www.broadinstitute.org), Ensembl (using
BioMart at http://www.ensembl.org/biomart) and lastly SilkDb
(http://silkworm.genomics.org.cn) (Wang et al. 2005b). These databases were employed, as
they were (at the time of access) the sources utilised by the KEGG database (Kanehisa et al.
2006). An additional archeon Methanosarcina acetivorans was downloaded from the NCBI
RefSeq database (Pruitt et al. 2005) in order to root the phylogeny using the outgroup
criterion. This method entails using an organism that falls outside the known taxonomy of the
group under consideration to provide a point of reference for the overall topology of the tree
(Felsenstein 2004). It is widely accepted after work by Carl Woese (Woese et al. 1990) that
the archea form a sister group to the Eucarya. Thus it was felt that an archeon was an
65
Chapter 2
appropriate choice for an outgroup species. Full details of all data sources and species can be
seen in the table below.
66
Chapter 2
Organism
Database
Common Name
RefSeq
NA
Anopheles gambiae
Ensembl
Mosquito
Arabidopsis thaliana
RefSeq
Thale Cress
Ashbya gossypii
RefSeq
NA
Aspergillus fumigatus
RefSeq
NA
Aspergillus niger
RefSeq
NA
Bombyx mori
SilkDB
Silkworm
Bos Taurus
RefSeq
Cow
Caenorhabditis briggsae
Ensembl
NA
Caenorhabditis elegans
RefSeq
NA
Candida albicans
RefSeq
NA
Candida glabrata
RefSeq
NA
Canis familiaris
RefSeq
Dog
Ciona intestinalis
RefSeq
Sea squirt
Cryptococcus neoformans
RefSeq
NA
Cryptosporidium hominis
RefSeq
NA
Cryptosporidium parvum
RefSeq
NA
Danio rerio
RefSeq
Zebrafish
Debaryomyces hansenii
RefSeq
NA
Dictyostelium discoideum
RefSeq
NA
Drosophila melanogaster
RefSeq
Fruitfly
Drosophila pseudoobscura
RefSeq
NA
Encephalitozoon cuniculi
RefSeq
NA
Entamoeba histolytica
RefSeq
NA
Gallus gallus
RefSeq
Chicken
Homo sapiens
RefSeq
Human
Kluyveromyces lactis
RefSeq
NA
Leishmania major
Sanger
NA
Macaca mulatta
RefSeq
Rhesus macaque
Magnaporthe grisea
RefSeq
NA
67
Chapter 2
Monodelphis domestica
RefSeq
Mus musculus
RefSeq
Mouse
Neurospora crassa
Broad Institute
NA
Oryza sativa
RefSeq
Rice
Ostreococcus lucimarinus
RefSeq
NA
Pan troglodytes
RefSeq
Chimpanzee
Paramecium tetraurelia
Genoscope
NA
Pichia stipitis
RefSeq
NA
Plasmodium falciparum
RefSeq
NA
Plasmodium knowlesi
Sanger
NA
Plasmodium yoelii
TIGR
NA
Populus trichocarpa
JGI
Rattus norvegicus
RefSeq
Rat
Saccharomyces cerevisiae
RefSeq
Brewers yeast
Schizosaccharomyces pombe
RefSeq
Fission yeast
Strongylocentrotus purpuratus
RefSeq
NA
Takifugu rubripes
Ensembl
Pufferfish
Tetrahymena thermophila
RefSeq
NA
Theileria annulata
RefSeq
NA
Theileria parva
RefSeq
NA
Trichomonas vaginalis
RefSeq
NA
Trypanosoma brucei
RefSeq
NA
Trypanosoma cruzi
RefSeq
NA
Ustilago maydis
RefSeq
NA
Yarrowia lipolytica
RefSeq
NA
68
Chapter 2
2.2.3 Pairwise Alignment
In order to gauge the relatedness of individual proteins in the organisms it was necessary to
use a pairwise alignment algorithm, which would deliver a measure of similarity between any
two given sequences. The Smith-Waterman algorithm (Smith and Waterman 1981) was
selected as it is guaranteed to locate optimal regions of local similarity. Speed is an issue with
use of the Smith-Waterman algorithm however an accelerated implementation developed by
Michael Farrar provided within the Fasta package made its use feasible (Farrar 2007; Pearson
and Lipman 1988).
A necessary pre-processing step was to subject all sequences to low complexity
filtering to remove regions of the sequence which are non random but not biologically
significant such as regions of compositional bias. Thus all sequences were fed into the SEG
program (Wootton and Federhen 1993) with the parameter x, which masks out regions of
low complexity sequence and replaces them with the character lower case x.
Each protein set was then split into its individual proteins and each protein compared
against every other organism in the dataset in order to locate sequences that were
significantly similar. Each comparison was run with a gap-opening penalty of -12 and a gap
extension penalty of -2. The substitution matrix BLOSUM62 (Henikoff and Henikoff 1992)
was used to score the alignments. The results of these searches were then parsed and
pertinent data, i.e. raw Smith-Waterman score, E value, bit score and the coordinates of the
alignments along the sequences were stored in a relational database structure in MySQL to
facilitate further analysis.
2.2.4 Orthology Determination
In order to select data which would allow the measurement of evolutionary divergence over
the species it was necessary to cluster the proteins into orthologous clusters. As the
Inparanoid procedure (Remm et al. 2001) had been observed to perform well in this function
it was decided to utilise this procedure. The Inparanoid procedure as described in work
published by Remm (Remm et al. 2001) is detailed below.
2.2.4.1 Inparanoid
Given a set of n pairwise alignments between organism A and organism B the Inparanoid
algorithm returns a set of clusters s using sequence similarity as an inverse distance. A
pairwise alignment of two proteins protein a and protein b in this case is a composite input
consisting of
69
Chapter 2
Bit score aSb: This is the result of the normalisation of a raw pairwise alignment with
respect to the scoring system (Karlin and Altschul 1990). Normalisation places all
scores on the same scale, which is a fundamental prerequisite for use as a distance
metric.
Alignment lengths: The length of the alignments along both proteins alength and
blength.
The latter two length inputs are used to eliminate short hits using a minimum length
cut-off. These short hits may reflect functionally homologous (potentially orthologous)
domains as opposed to whole proteins, which are inherited intact as a discrete unit from the
last common ancestor of the two species under consideration. The bit score is employed as a
cut-off to limit the radius of the clustering step. The Inparanoid algorithm runs the following
steps in a pairwise comparison (Remm et al. 2001).
Read in all hits excluding those that fall below score and length cutoffs.
For each best-hit protein from A-B examine the reciprocal relationship B-A.
All reciprocal best hits are stored as a set of seed pairs for orthologous clusters.
For each seed pair paralogous genes are grouped around them if the largest score that
the putative paralog has in the set of all scores is against the putative ortholog in the
seed cluster of the organism under consideration.
2.2.4.2 Implementation
An implementation of the Inparanoid algorithm in the Perl language (Remm et al. 2001) was
acquired from the Inparanoid website (http://inparanoid.sbc.su.se). However as this
implementation provided by proved to not be amenable to the analysis of bespoke output, it
was decided to re-implement the procedure described above.
The Inparanoid algorithm was implemented through the application of object
orientated (OO) software design principles. OO principles involve the characterisation of a
problem domain as a collection of interacting objects where functionality inherent to each
70
Chapter 2
object is implemented as internal to that particular object (Pressman 2001). Objects within a
problem domain are generally identified through the identification of nouns within a problem
statement (Pressman 2001). Perusal of the algorithm specification provided in (Remm et al.
2001) led to the above design.
2.2.4.2.1 Design
The main object that the procedure required in order to operate was identified as a Cluster
object, which was implemented with the following attributes and operations.
71
Chapter 2
It was decided to use the programming language Java to implement the above design
as the language provides functionality, which facilitates the OO paradigm. The
implementation was then tested against the author provided Perl implementation to ensure
correctness.
The Java implementation deviated slightly from the author provided Perl
implementation in two respects. The first deviation from the author provided Perl
implementation was the use of higher precision double values to represent the bit scores of
alignments as opposed to the use of integers by Perl. This led to cases where scores that had
been rounded in the Perl implementation and (thus marked as reciprocal best hits) were not
marked as reciprocal best hits (examples in Appendix A).
The second deviation was to cluster orthologous genes with two equal reciprocal best
hits between the species at the stage of sorting. This change in the order of steps has no effect
overall on the groups produced by the implementation. The implementation was run using a
bit score cut-off of 50 and an alignment length cut-off, which was 50% of the length of the
longer protein.
2.2.4.3 Application
The data generated from the similarity searches was then clustered to identify orthologous
genes using the constructed Inparanoid implementation. The study was carried out using H.
sapiens as a reference species. Orthologous groups are sought in each organism for every
protein within the human proteome. This study can be considered unbalanced as no
information was collected on proteins that are absent in H. sapiens (Davey et al. 2007).
As the dataset under consideration was amino acid sequence data there was a choice
as to how to deal with alternatively spliced isoforms of the same protein. As the goal of this
project was to examine the presence and absence of the proteins under consideration it was
decided that the retention of all isoforms in the dataset and clustering them as inparalogs was
appropriate. This would allow an examination of correlations in gain and loss of the protein
as an independent phenotypic entity.
2.2.5 Phylogenetic profiles
The output from the clustering step was then used to generate phylogenetic profiles, which as
mentioned previously are binary strings of presence and absence of an orthologous group for
a given gene in the reference species.
72
Chapter 2
In order to establish create phylogenetic profiles of each protein within the human
proteome the following steps were undertaken.
A list of each GI identifier for the set of human proteins was generated.
For each entry in this list the relevant identifier was scanned against all files
containing orthology predictions in alphabetical order.
If the entry was present within the orthology prediction file for a given organism that
position within the profile string was marked as 1. Otherwise that position was
marked as 0. The order of the profiles is alphabetical. Therefore for example the
profile 100000000000000000000000100000000000000000000000000000 indicates a
protein with an orthologous group present in Anopheles gambiae and Homo sapiens
but absent in all other species under consideration.
Two sets of profiles were generated, one including the outgroup for use in ortholog
selection and the other excluding the outgroup for use in prediction of functional
linkage.
Table 2.2 lists the organisms under study along with their proteome sizes and number of
orthologous groups with reference to H. sapiens. Figure 2.7 shows the distribution of
proteome sizes in the organisms under consideration. Figure 2.8 shows the number of
proteins clustered in each organism.
73
Chapter 2
Organism
No. of Proteins
No. of Clusters
74
Chapter 2
Strongylocentrotus
42373
6638
13387
Takifugu rubripes
22428
10891
11046
purpuratus
Tetrahymena thermophila
26235
2014
2051
Theileria annulata
3795
1000
1016
Theileria parva
4079
1005
2044
Trichomonas vaginalis
59681
1823
2567
Trypanosoma brucei
8772
1531
3596
Trypanosoma cruzi
19606
1737
2079
Ustilago maydis
6548
3756
3762
Yarrowia lipolytica
6545
4092
4132
Table 2.2: List of organisms used in study along with data source and proteome size as well
as number of orthologous groups in alphabetical order (cont).
75
Chapter 2
Figure 2.8: Distribution of number of proteins placed within clusters in each organism. N=55.
Table 2.3 shows the top ten profiles within the human genome and provides an interpretation.
76
Chapter 2
Profile
Count
Interpretation
000000000000000000000000100000000000000000000000000000
5150
Present in species
Homo sapiens.
000000000000000000000000100000000010000000000000000000
2281
Present in Tribe
Hominini.
000000100001000000000001100101100010000001000000000000
1089
Present in Phylum
Chordata.
000000000000000000000000100100000010000000000000000000
622
Present in Order
Primate.
000000100001000000000000100101100010000001000000000000
495
Present in Infraclass
Eutheria.
000000000000000000000000100100000000000000000000000000
466
Present in species
Homo sapiens and
Macaca mulatta.
000000100001000000000001100101100010000001000000000000
378
Mammalia and
000000100001000010000001100101100010000001000000000000
000000100001000000000000100100100010000001000000000000
Present in Class
Class Aves
342
Present in Class
Mammalia
000000100001000010000001100101100010000001000000000000
281
Present in the
Phylum Chordata
with the exception
of the Species
Takifugu rubripes
000000100001100010000001100101100010000001000100000000
235
Present in in Class
Mammalia ,Class
Actinopterygii and
Class Aves
Chapter 2
116805340
glycyl-tRNA synthetase
GARS
32307132
NFS1
4506605
RPL23
4506743
ribosomal protein S8
RPS8
SRP54
4507215
isoform 1
excision repair cross-
ERCC3
5031815
KARS
H(+)-transporting two-sector
ATP6V1D
7706757
ATPase
5803092
Methioine aminopeptidase 2
METAP2
PSMC1
24430151
Table 2.4: Single copy ubiquitous genes extracted via analysis of profiles.
78
Chapter 2
2.2.5.2 Proteome content data/tree
The phylogenetic profiles as developed provided a matrix of presence and absence of every
human protein in the dataset across the remaining 53 eukaryotes and 1 outgroup archeon. As
this form of data also contains phylogenetic signal, i.e. shows the divergence in the
proteomes of the given species over time, it was decided to subject this data to a phylogenetic
analysis as well as the main phylogenetic analysis to be carried out on the multiple sequence
alignments of the homologous proteins. The tree was reconstructed using Dollo parsimony
(Farris 1977) via the program DOLLOP (Felsenstein 1989). This tree can be seen in Figure
2.17.
In order to carry this out the profiles were transposed, so that rather than showing the
distribution of human proteins over a set of species they showed the pattern of presence and
absence of human proteins over a single species. Thus instead of a matrix of 33,473 proteins
by 55 species the end product was a matrix of 55 species by 33,473 proteins. In other words
each species was assigned a binary string of length 34,373 where 1 indicated the presence of
a particular human protein and 0 its absence.
This matrix was converted into PHYLIP format through the truncation of species
names to 10 characters and the addition of header information about the size of the matrix.
Finally this formatted file was input to the DOLLOP program (Felsenstein 1989). The
program was run with its default settings.
In order to examine the level of support for the initial outputted tree 100 bootstrap
replicates were created with SEQBOOT (Felsenstein 1989). These 100 replicates were
resubmitted to DOLLOP to produce 100 bootstrap trees. These trees were unified using
CONSENSE (Felsenstein 1989).
2.2.6 Multiple alignment
As an initial step to generate a phylogenetic tree using the orthologous proteins selected a
multiple alignment of each of the 10 proteins was constructed utilising Mafft (Katoh et al.
2002) (Multiple Alignment By Fast Fourier Transform) using the L-INS-i algorithm for 1000
iterations. Each alignment was then subjected to Gblocks filtration (Talavera and Castresana
2007) to remove columns that were poorly aligned. Gblocks was run in its relaxed mode.
These alignments were then concatenated to form a super matrix measure of divergence in
79
Chapter 2
order to generate a measure of divergence across the genomes as opposed to at a single locus,
as has been suggested by (Rokas et al. 2003). The full alignment can be seen in Appendix D.
2.2.7 Model selection
In order to select an evolutionary model which provided a statistically accurate measure of
genetic divergence ModelGenerator (Keane et al. 2006) was used to select the model that best
fitted the concatenated multiple alignment. It requires as an argument a number for gamma
categories, to account for heterogeneity in substitution rates. The argument was given a value
of 4 gamma categories as this has been observed to be sufficient number to create a nearoptimum fit of a model (Yang 1994).
It has been observed that individual gene trees can be highly incongruent with species
trees (Cranston et al. 2009). Thus it is possible for the inference of a species tree to be misled
by non-phylogenetic signal from the individual genes (Cranston et al. 2009). In order to
examine potential incongruence between gene alignments each individual alignment was first
analysed separately. The model selected for the complete supermatrix was the LG
substitution matrix (Le and Gascuel 2008). The LG matrix was predicted with the additional
parameter $ that allows different rates of evolution across the sequence. Each gene alignment
was also matched by the LG matrix along with different variations as shown in Table 2.5.
The LG matrix is generated by a model of evolution that takes into account mutation rate
heterogeneity over sites, thus yielding better results then its predecessors WAG and JTT (Le
and Gascuel 2008). The models selected were a best fit judged by both the Aikake
Information Criterion (AIC) as well as the Bayesian Information Criterion (BIC) in all but
two cases (GARS and ERCC3). Where they disagreed the BIC was selected over the AIC
(Yang 2008) thus lowering the possibility of overfitting a more complex model to the data.
The models selected for the individual orthologs were exactly the same as the model
for the concatenated alignment except in the case of SRP54 where the additional parameter I
indicating that a proportion of the alignment was invariant.
80
Chapter 2
GARS
LG+$
NFS1
LG+$
ATP6V1D
LG+$
KARS
LG+I+$
SRP54
LG+I+$
PSMC1
LG+$
METAP2
LG+$
ERCC3
LG+$
RPL23
LG+$
RPS8
LG+$
81
Chapter 2
2.2.9 Comparison of protein content tree with super matrix tree
In order to place a measure on whether the protein content tree was a significantly worse
hypothesis of the evolutionary relationships of the species under study the PHYLIP program
PROML (Felsenstein 1989) was utilised. PROML was given the supermatrix alignment as a
dataset and the two trees as user inputted trees to evaluate against the dataset. PROML then
ran the KH test against the two trees to examine the differences in the likelihood of the trees
relative to the dataset.
2.3 Results
The individual gene trees can be seen in Appendix B. In order to examine potential
differences in phylogenetic signal between the individual genes trees TREEDIST
(Felsenstein 1989) was used to generate a distance matrix between the 10 gene trees.
TREEDIST uses the Branch Score distance (Kuhner and Felsenstein 1994) to calculate the
distance between two trees. This distance takes into account the branch lengths of the trees
input as well as the overall topology. This matrix was then used to generate a dendogram
using UPGMA (Unweighted Pair Group Method with Arithmetic mean) clustering as shown
in Figure 2.9.
82
Chapter 2
The gene trees fell into three main clusters. In order to examine the degree of incongruence of
each cluster from the super matrix species tree the trees contained with each cluster were then
submitted to CONSENSE (Felsenstein 1989) in order to view the consensus trees.
CONSENSE was run with the majority rule setting where a group has to appear more than
50% of the time in the input trees in order to be conserved in the consensus tree. Figures 2.10
shows the outlier tree estimated from ERCC3 and Figures 2.11 and 2.12 show the consensus
trees.
83
Chapter 2
Chapter 2
Figure 2.11: Consensus Tree for Cluster 2 containing genes: RSP8 ATP6V1D PSMC1 and
METAP2.
85
Chapter 2
Figure 2.12: Consensus Tree 2 for Cluster 3 containing genes: GARS, NFS1, RPL23, SRP54
and KARS.
86
Chapter 2
Both consensus trees preserve the kingdoms of Plantae, Animalia and Fungi though
the order of branching is lost. However both clusters demonstrate a broad congruence with
the fully concatenated ML tree, which can be seen in Figure 2.13. This is a useful measure of
the degree of overlap of phylogenetic signal contributed from each of the individual genes.
Figure 2.13 is an illustration of the topology of the tree with the animals, fungi and plants
highlighted. Figure 2.14 also overlays the bootstrap support values for each proposed clade.
Figure 2.15 presents the reconstructed phylogeny with a measure of support, which is the
proportion of the individual gene trees that supported a clade (Bratke 2009). Figure 2.16
shows the topology of the tree in combination with branch lengths.
87
Chapter 2
Figure 2.13: ML tree of 54 eukaryotes without branch lengths created from a super matrix of
the concatenated alignments of all genes listed in Table 2.5. The clades containing animals,
fungi and plants are coloured blue, red and green respectively.
88
Chapter 2
Figure 2.14: ML tree of 54 eukaryotes without branch lengths created from a super matrix of
the concatenated alignments of all genes listed in Table 2.5. Bootstrap support values are
only shown at each node where support was less than 1000 (not universally supported across
1000 bootstrap replicates).
89
Chapter 2
Figure 2.15: ML tree of 54 eukaryotes created from the concatenated alignments of genes
listed in Table 2.5. Support Values are proportion of individual gene trees, which show a
given clade (Bratke 2009).
90
Chapter 2
91
Chapter 2
Figure 2.17: Proteome content phylogeny with bootstrap support only shown at each node
which was not 100% supported out of 100 bootstraps.
92
Chapter 2
Tree Comparison
The results of the KH test carried out using PROML (Felsenstein 1989) showed that the
proteome content tree is a significantly worse hypothesis of the evolutionary relationships
between the organisms as shown in the table below.
Tree
Log likelihood
-131725.8
-133941.0
PROML (Felsenstein 1989) reported that the log likelihood of the proteome content tree was
significantly worse than that of the protein supermatrix tree.
2.4 Discussion
2.4.1.ML tree
In terms of current thought about super groups within eukaryotes the phylogeny
reconstructed as seen in Figure 2.13 is incongruent. However given that it is based on a
concatenation of nuclear genes this is not surprising (Parfrey et al. 2006). This work showed
that there is generally weak support for most putative eukaryotic super groups in phylogenies
built using proteins coded by nuclear genes. The super group Opisthokonta is however
supported as would be expected from the work (Parfrey et al. 2006) though it does subsume
the Amoebozoa. The ML tree is consistent with known eukaryotic trees (Baldauf et al. 2000)
in placing plantae as an outgroup to fungi and metazoans. The base of the tree is inconsistent
with known trees in its placement of E. histolytica as an early braching eukaryote with T.
vaginalis when it is thought that E. histolytica branches higher up in the tree as a member of
the Amoebozoa super group (Parfrey et al. 2006). However there is no clear synapomorphy
(shared derived character) which defines the group Amoebozoa. There is also a lack of
unambiguous support for the existence of the group as a whole within the nuclear genome
(Parfrey et al. 2006). This placement is also not novel as both organisms lack mitochondria
and have been grouped together at the base of eukaryota by phylogenetic analyses of small
subunit (SSU) RNA genes though their ultimate placement is not certain (Vanacova et al.
2003).
93
Chapter 2
Within the animals the tree is consistent with the Coelomata hypothesis which places
the nematoda as an outgroup to both arthropods and vertebrates. This grouping is fairly
common in phylogenies derived using molecular data (Wolf et al. 2004) despite being held to
be false (Aguinaldo et al. 1997). This is thought to be an artefact of long branch attraction
due to rapid evolution along the C. elegans line (Telford 2004). The classes Mammalia (H.
sapiens, P. troglodytes, B .taurus, M. domestica, C. familiaris, M. musculus and R.
norvegicus), Aves (G. gallus), Osteichthyes (T. rubripes, D. rerio) are all maintained in the
order that they are generally found in most broad vertebrate phylogenies, e.g. work by Stuart
(Stuart et al. 2002).
The tree also arranges its four plant species as expected (Rodriguez-Ezpeleta et al.
2005) with the algae O. lucimarinus forming an outgroup to the monocot O. sativa and the
two dicots A. thaliana and P. trichocarpa.
Within the fungi the tree is consistent with known fungal phylogenies (Fitzpatrick et
al. 2006). The kingdom Dikarya is a separate clade. Within Dikarya in the phylum
Ascomycota the subphyla Saccharomycotina (S. cerevisiae, C. albicans, C. glabrata, P
.stipitis, D. hanseii, Y. lipolytica, K. lactis , A. gossypii), Taphrinomycotina (S. pombe) and
Pezizomycotina (A. fumigatus, A. niger, N. crassa, M. grisea) are grouped as separate clades.
Another phylum in Dikarya Basidiomycota (U. maydis, C. neoformans) is a separate clade
within the tree. The microsporidium E. cuniculi branches out as an outgroup to the Dikarya.
Within Saccharomycotina the WGD (fungi which have undergone whole genome
duplication) (C. glabrata, S. cerivisae) are presented as a clade. Also the CTG group (fungi
which utilise the codon CTG to encode serine instead of leucine) (P. stipitis, D. hanseii, C.
albicans) (Fitzpatrick et al. 2006) is proposed as a separate clade within the tree. Within the
CTG group the tree shows disagreement with some published trees (Wang et al. 2009a) by
placing P. stipitis and C. albicans together with D. hanseii as an outgroup. Figure 2.6 shows
that this fungal topology was highly supported by the bootstrap analysis.
The Chromoalveolates are grouped together in one clade. Within this clade the
Apicomplexa form a monophyletic group within that clade with the Ciliates as an outgroup.
These groupings are congruent with published trees (Burki et al. 2008; Rodriguez-Ezpeleta et
al. 2007).
2.4.2 Proteome content phylogeny
The proteome content phylogenetic tree as presented in Figure 2.17 shows a degree of
topological congruence with the tree based on concatenated protein sequences shown in
94
Chapter 2
Figure 2.13 in that it preserves the animals as a monophyletic group. The branching order of
the taxa is however different. It also shows the fungi as a clustered group (though not
monophyletic). However given that PROML reports that it is a significantly worse fit to the
alignment of homologous proteins it is clearly a worse representation of the dataset then the
ML supermatrix tree.
2.4.3 Conclusion
The reconstructed phylogeny via an application of maximum likelihood to the concatenated
supermatrix of 10 eukaryotic proteins appeared to be a plausibly accurate reflection of the
relationships between the taxa. This plausibility was assessed by both by inspection by eye
and comparisons to previously published eukaryotic phylogenies.
The proteome content phylogeny reconstructed by Dollo parsimony on the other hand
was shown to be a significantly worse representation of the evolutionary relationships
between the species. As such it was the ML supermatrix tree that was utilised as the
framework for comparative analysis of protein function within the species.
The work described in this chapter also produced phylogenetic profiles for each human
protein across 54 other eukaryotes.
95
Chapter 3
Chapter 3
Comparison of methods of prediction of functional linkage in proteins
3.1 Introduction
This chapter presents the comparison of four systems of inferring functional links between
proteins using phylogenetic profiles. These systems were:
1) Hamming distance: Phylogenetic profiling was initially used without taking into
account species phylogeny and treating the state of each point in the profile as
independent (Pellegrini et al. 1999). Using profiling in this way entailed comparison
of profiles using the string comparison algorithm Hamming distance (Hamming
1950), which is a count of the points at which two strings differ.
2) Use of the comparative method (Barker and Pagel 2005; Pagel 1994) in the context of
phylogenetic profiling (Pellegrini et al. 1999) with constrained rates of gene gain
(Barker et al. 2007) over the phylogeny developed in Chapter 2 to detect protein
interactions. An implementation of the method, BayesTraits (Pagel et al. 2004a) was
used in order to calculate the relevant likelihoods.
3) Co-expression of mRNAs corresponding to given proteins: Proteins that physically
interact or are required to be produced in some of form of spatio-temporal order tend
to show correlations (positive or negative) in the expression of their underlying
mRNA molecules. This method has been shown to be effective in detecting
interactions in Saccharomyces cerevisiae (von Mering et al. 2002) as well as in
Arabidopsis thaliana in combination with examination of other genomic features (De
Bodt et al. 2009). Use of this system presents a comparison of an un-curated highthroughput physical experimental system with an equivalently un-curated
computational system.
4) Use of a Bayesian classifier to combine disparate sources of evidence comprising
gene co-expression, orthology, post translational modification, co-localisation,
intrinsic disorder, domain co-occurrence and network analysis data in order to predict
protein interactions (McDowall et al. 2009).
3.1.1 Hamming distance
The distance measure Hamming distance is named after its creator Richard Hamming who
introduced it in his work (Hamming 1950). As mentioned above and in previous chapters, it
is the distance between two strings of equal length calculated as a count of the points where
96
Chapter 3
they differ. A string can be represented as a vector of characters. As an illustrative example
given the two strings x and y as defined below:
x = [c, a,t]
y = [h, a,t]
The hamming distance between the two strings is 1 as they vary by 1 character.
Chapter 3
Over a given branch if the state of G1 at the ancestral node is 0 then there is a
probability of a gain, i.e. moving to state 1 at the descendant node over the time period
represented by the branch. Conversely if the ancestral state is 1 then there is a corresponding
probability of a loss. These probabilities are represented as P01(t) and P10(t) where t is equal
to the time interval represented by the branch. There are also the probabilities of no
transitions which are represented by P00(t) and P11(t). The probabilities P01(t) and P10(t) can
also be considered the rate of transitions.
The comparative method as applied to phylogenetic profiling is an examination of
whether the state of a second gene/protein has an effect on the state of the first. Thus
introducing a second gene / protein G2 there is a corresponding set of probabilities for G2. In
order to examine whether the state of G2 has an effect on the state of G1 the probabilities or
transition rates P01(t), P10(t), P00(t) and P11(t) for G1 can be split in order to factor in the state
of G2. Thus for example P01(t) can be split into two probabilities one corresponding to the rate
of gain of G1 if G2 is present and the other corresponding to the rate of gain of G1 if G2 is
absent. These transition rates were the basic parameters used by Barker and Pagel (Barker
and Pagel 2005) as represented in the following figure.
Figure 3.1: Parameters for modelling state transitions for pairs of genes as used by
Barker and Pagel (Barker and Pagel 2005) (Figure is directly reproduced).
98
Chapter 3
Assuming the numbers at the corners of the above figure represent the states of G1 and G2,
The parameters as presented above represent the following rates of transitions.
Parameter
Description
q13
q31
q12
q21
q34
q43
q24
q42
Table 3.1: Description of rate parameters used by Barker and Pagel (Barker and Pagel 2005).
Given these rates it is possible to investigate whether the state of G2 has an effect on the
transition rates for G1. In order to carry out this investigation two competing models /
hypotheses were constructed using these parameters (Barker and Pagel 2005). One was a
dependant model where the presence of G1 is somehow contingent on the presence of the
absence of G2 or an independent model where there was no connection (Barker and Pagel
2005). The dependent model makes an assumption that the rate of gain/loss of G1 is
somehow affected by the state of G2. Thus for example the rate of gain of G1 in the presence
of G2 (q24) will be different from the rate of gain of G1 in the absence of G2 (q13). Conversely
the dependant model makes the assumption that there is no effect on the transition rates of G1
by the state of G2. To detect gains and losses over a phylogenetic tree Barker and Pagel
reconstructed the likelihood of these two competing hypothesis about the distribution of pairs
of proteins in the constituent species (Barker and Pagel 2005). The premise of the work was
that the dependent model would prove a better fit to observed data if the transition rate for a
given protein were affected by the state of the other. In order to detect correlated evolution
the two competing models were thus defined as follows
99
Chapter 3
The independent model of evolution where the probabilities of gain and loss of A
were independent of the state of B. In order to create this model the parameters
involving gain and loss of A were constrained to be equal irrespective of the state B
and vice versa. Using the symbols to define the rates shown in Figure 3.1 this entailed
setting the transition rates as q13=q24, q42=q31, q31=q43 and q12=q34. This reduced the
number of parameters for the independent model to four.
parameters that maximised the likelihood of each of the two models was calculated in turn
between pairs of profiles. These likelihoods were calculated summing the likelihoods of all
possible ancestral reconstructions at each internal node of the tree thus removing the need for
the reliance on a single set. Having calculated the likelihoods of both models, the goodness of
fit of the models to the observed data was compared using the likelihood ratio statistic, LR.
This can be calculated using the following equation (Yang 2006):
(1)
100
Chapter 3
This method of fitting models of correlated and uncorrelated models of evolution for
pairs of proteins using maximum likelihood (ML) (Barker and Pagel 2005) while
constraining the rate of protein gain (Barker et al. 2007) shall from here on be referred to as
constrained ML.
3.1.3 Co-expression as measured by microarray
As mentioned in Chapter 1 a microarray is a chip usually made of glass, with fluorescently
labelled oligonucleotide probes representing subsections of genes. The degree of florescence
from these probes corresponds to the abundance of a given mRNA in a sample and therefore
the level of expression (Quackenbush 2002; Wodicka et al. 1997). In a typical microarray
experiment cells are subjected to different treatments, or harvested from organisms with
differing phenotypic or disease states (e.g. cancerous tissue vs. normal in human cancer
patients). The probes on a given microarray chip are generally designed to map to a within a
coding region on a given gene (Brown 2006).
To establish whether a given experimental condition corresponds with differential
expression of a given gene, a descriptive statistic illustrating the central tendency of
expression (usually the median) is calculated for the entire set of samples. If a given gene is
found to be expressed at a statistically significant higher level than the central value, this
gene is interpreted as up-regulated. Similarly if a gene is expressed at a significantly lower
level then that gene is interpreted as down-regulated. Probes on a microarray chip can map to
a single gene, or members of a gene family depending on the specificity of the probe design
(Heyer et al. 1999).
3.1.4 Bayesian classifier
The fourth system of prediction of functional linkage to be considered was that utilised by the
PIPs server (http://www.compbio.dundee.ac.uk/www-pips)(McDowall et al. 2009). This
system (Scott and Barton 2007) utilised a combination of sources of evidence for protein
interactions. These sources were:
Chapter 3
Subcellular localisation, domain co-occurrence, and posttranslational modification cooccurrence: These features of a protein can also be informative as to its interaction
partners. The PIPs system (Scott and Barton 2007) combines these as a joint source of
evidence.
Protein disorder: This measure is based on the observation that the unstructured
regions within protein molecules are often involved in transient protein interactions.
(Singh et al. 2007) showed that intrinsic disorder is enriched in date hubs, proteins
that maintain multiple interactions but at different times.
Network topology similarity: This measure utilises the principle that proteins that
interact will share other interacting partners.
These five predictors are combined using a nave Bayesian classifier to generate a single
score based on the posterior odds ratio of interaction after calculation of likelihood ratios
over each of the individual predictor modules (Scott and Barton 2007).
In order to explain what an odds ratio is it is necessary first to define odds. Odds are a
method of presenting the probability of an event by relating this probability to the probability
of the event not occurring. Thus the odds of an event are simply the probability of an event
occurring divided by the probability of the event not occurring (Sokal and Rohlf 1995). The
odds of a given event can be calculated by the equation:
odds(e) =
p(e)
1" p(e)
(2)
An odds ratio is thus is the ratio of multiple odds. It can be used to measure the effect size
of a given factor on the probability of an event. Thus if for example the probability of heart
disease in people who consume a high fat diet is calculated as
1
and the converse probability
4
of heart disease in individuals who do not consume a high fat diet is calculated as
1
, the
8
102
Chapter 3
The posterior odds ratio utilised by PIPs (Scott and Barton 2007) was calculated by
utilising a prior odds ratio calculated by using a prior probability of interaction estimated as
1
. This prior odds ratio was then multiplied by the likelihood ratios yielded by each of the
400
individual predictor modules. The product of this calculation is the posterior odds ratio.
As in the example, the posterior odds ratio corresponds to the posterior probability of
interacting, e.g. a score of 2 translates to the probability of interacting being twice as high as
the probability of not interacting (McDowall et al. 2009; Scott and Barton 2007).
3.2 Methods
3.2.1 Assessing quality
In terms of classification of the accuracy of a binary classification system a common method
of measurement is the use of sensitivity and precision. Sensitivity can be defined as the
probability of predicting a true positive and precision as the probability of that prediction
being correct (Baldi and Brunak 2001). In order to calculate these measures some
terminology must be introduced.
True positives (TP): The number of positive predictions made by a binary classifier that
lie within the positive training set.
False positives (FP): The number of positive predictions made by a binary classifier that
lie within the known negative training set.
False negatives (FN): The number of items in the known positive set, which were not
predicted by a binary classifier.
Given these values precision and sensitivity can be calculated as follows (Baldi and
Brunak 2001; Barker et al. 2007; von Mering et al. 2003):
precision =
(TP)
(TP + FP)
(3)
sensitivity =
(TP)
(TP + FN )
(4)
103
Chapter 3
Chapter 3
training set, as they were present in a B-A orientation in the positive set. Similarly two pairs
of proteins had to be removed from the negative testing set.
The ratio of the size of negative to positive datasets in this case was roughly 11 to
1.This is biologically unrealistic as current estimates of the size of the full human interactome
range from 154,000-369,000 (Hart et al. 2006) to 650,000 (Stumpf et al. 2008). Stumpf
estimated the potential size of the interactome by treating known experimentally verified data
as a sub-network of the true network and extrapolating from the sub-network to the full
network (Stumpf et al. 2008). Hart on the other hand employed the idea that two independent
samples (experiments) from the complete interactome or subspace of the interactome of size
N would be expected to share k interactions by random chance under the hypergeometric
distribution (Hart et al. 2006). Thus Hart estimated the size of N using actually observed
intersections between experiments (Hart et al. 2006).
If these numbers are subtracted from the size of all potential interactions
112,044,172,9 (calculated as all possible pairs from version of RefSeq held) the remaining
ratios of negative to positive range from 1722:1 to 8 617:1. For any full genome-wise survey
it would be necessary to scale all the precision and sensitivity scores from the training set
ratios to ratios constructed from estimates of the interactome size. This issue is addressed
more fully in Chapter 5.
3.2.3 Hamming distance
The ability of Hamming distance to differentiate between the positive and negative training
set was measured with a lower distance corresponding to a higher score. Precision/sensitivity
were evaluated at every integer within a range of Hamming distance cut-offs ranging from 0
to 54.
3.2.4 Constrained ML
To use phylogenetic profiling in a phylogenetically aware manner to detect correlations in
gain and loss the software package BayesTraits was utilised (Pagel et al. 2004a). This has
been used in previous work (Barker and Pagel 2005) to demonstrate that detection of
correlations in gain and loss of particular genes can be used as a tool with which to detect
functional interactions.
The script bms_runner (Barker et al. 2007) was used to examine the performance of
different rates of gain in predicting functional interactions amongst the training sets in order
to select an optimal rate. The script utilised the phylogenetic profiles and phylogeny
105
Chapter 3
described in Chapter 2 as well the positive and negative training sets to evaluate the
performance of different rates of gain. bms_runner (Barker et al. 2007) creates input for the
program BayesTraits (Pagel et al. 2004a) to evaluate the relative likelihood of correlated
evolution at a range of rates of gain. bms_runner creates a non-redundant set of profiles
(Barker et al. 2007) before passing them on to BayesTraits for comparisons. Thus 113,132
protein pairs in the training set were reduced to a set of 54,906 non-redundant pairs of
profiles.
A number of rates of gain were evaluated for precision and sensitivity over the
training data ranging from 1 " 10-6 up to placing no restriction on gain. An LR score was
calculated for each profile pair at rate of gain and assigned to each protein pair corresponding
with that profile pair. bms_runner then evaluates precision and sensitivity at a range of cut!
offs commencing at the minimum LR encountered and moving up by a decreasing interval
until a value close to the maximum LR is reached (Barker et al. 2007). The program then
provides a table that includes the following information for this range of cut-offs.
LR cutoff
No of predictions
Precision
Sensitivity
Table 3.2: Column headings for data matrix returned by bms_runner (Barker et al. 2007).
3.2.5 Co-expression of mRNA
The co-expression of two genes in association with a given environmental condition can be
considered a potential indictor of functional linkage. In order to examine the performance of
the ML reconstruction method in predicting protein functional interactions against the coexpression of mRNA the results of all microarray experiments held in the EBIs ArrayExpress
database were downloaded.
This data was pre-processed and thus contained expression data at the gene level
rather than at the probe level. As oligonucleotide probes only map to small subsections of a
gene and also can hybridise with multiple targets the relationship between probe to gene is
many-to-many. This many-to-many relationship was collapsed by data processing carried out
by ArrayExpress on each individual experiment.
Thus a total of 377 experiments were downloaded. Each experiment record contained
information on genes whose expression level varied significantly in response to the
experimental treatment/tissue state. The size of individual experiments ranged in size from 1
106
Chapter 3
gene to a maximum of 15,987. The mean number of genes showing significant variation per
experiment was approximately 3,143.
A sample line from the downloaded data is shown below for illustrative purposes:
Gene Symbol
STAT1
Ensembl ID
Species
ENSG00000115415
Factor
Homo
Disease
sapiens
state
Value
Accession
normal
E-
Expression
DOWN
p Value
0.0423247888020165
GEOD3790
Table 3.3: Sample processed data from ArrayExpress for experiment E-GEOD-3790 (Hodges
et al. 2006).
E-GEOD-3790 is a study on gene expression in brain tissue afflicted with Huntingtons
disease (Hodges et al. 2006). The factor column has a number of potential values
corresponding to the annotation of the individual samples. In this case it corresponds to
whether the tissue comes from a patient diagnosed with Huntingtons disease as opposed to
normal tissue. The value column shows the value of the factor (in this case normal). The p
value column shows the significance of the identified differential expression. Thus the data
presented above shows that the gene STAT1 is significantly (p<0.05) down-regulated in
tissue annotated as normal.
To use the training datasets to measure the ability of gene co-expression to predict
protein interactions it was necessary to convert the training data from protein pairs to gene
pairs. Using a translation key provided by the International Protein Index (IPI) (Kersey et al.
2004) each RefSeq Gi number was mapped to the associated gene name. As individual genes
can produce multiple proteins via the process of alternative splicing there isnt a one to one
correspondence between the number of genes and the number of protein in both training sets.
With some Gi entries missing in the translation key this created a positive gene training set of
4319 genes /8057 gene pairs and a negative set of 2833 genes /89549 pairs of genes.
In order to predict functional linkage the following procedure was followed.
For each experimental condition pairs of genes were marked as functionally linked if
their expression level went up or down in response to the given experimental
condition.
True positives were counted if the genes existed as a pair in the positive set.
107
Chapter 3
False positives were counted if the genes existed as a pair in the negative set.
False negatives were counted as the complement of the set of predictions and the
positive set.
This data was processed using a program implemented in Java, which followed the steps
presented below. The input to the program was a file containing a set of lines as shown above
for a single experiment.
108
Chapter 3
3.3 Results
3.3.1 Hamming distance
Figure 3.2 shows the performance of phylogenetic profiling over the training sets using
Hamming distance.
Figure 3.2: Performance of phylogenetic profiling using Hamming distance over the training
data.
Hamming distance as a measure does not perform well over the training data
achieving a maximum precision of 0.08796.
109
Chapter 3
3.3.2 Constrained ML
The ability of constrained ML (Barker et al. 2007; Barker and Pagel 2005) to distinguish
between the training data was tested at a number of rates of gain. The results of this can be
seen in Figure 3.3.
Figure 3.3: Performance of constrained ML (Barker et al. 2007; Barker and Pagel 2005) over
training data at different rates of gain.
Figure 3.3 shows points at a range of sensitivity between 0-1. Sensitivity over the
whole training set ranges between 1 at LR cut-offs of 0 where all pairs of proteins are
110
Chapter 3
predicted to be functionally linked to 0 at the points where no pairs from the positive set are
predicted to be functionally linked.
Precision ranges from 0.0809 (a base level that is derived from the ratio of the size of
the positive set to the size of the negative set) up to 1 which is the point at which all
predictions made at a given LR cut-off lie in the positive set, i.e. are true positives.
Of the two metrics (precision/sensitivity) it is precision, which appears to be the
strong suit of constrained ML. This is probably due to the fact that correlated evolution will
not occur in all cases of protein interactions. A large number of protein interactions will
contain members that are phylogenetically ubiquitous. In some of these cases the interaction
will be essential to maintenance of normal eukaryotic cellular function. In other cases even if
an interaction is being lost or gained in an organism, its individual members might still be
present (the interaction being lost due to some form of temporal/spatial separation of the
members). Thus as low sensitivity is inevitable with this method; it was decided to focus on
rates of gain that achieve 100% precision. Figure 3.4 places all rates of gain on a single plot
and zooms into a range of sensitivities between 0 and 0.001.
111
Chapter 3
Figure 3.4: Performance of constrained ML (Barker et al. 2007; Barker and Pagel 2005) over
training data magnified to a scale of sensitivity ranging from 0 to 0.001. For clarity some of
the worse performing rates of gain are removed.
Figure 3.4 shows that the rate 0.025 is the clear best performer as it delivers
predictions with a precision of 1 at the highest sensitivity. The LR cut-off at this point is
58.54. The sensitivity at this cut-off for the rate 0.025 is 0.000545. This rate is thus chosen as
the exemplar rate to represent the method in comparisons and to utilise for further analysis.
The findings of (Barker et al. 2007) were borne out in this investigation as lower rates of gain
were generally seen as the best performers.
112
Chapter 3
Over the training data constrained ML (Barker et al. 2007; Barker and Pagel 2005)
with the rate of gain constrained to 0.025 makes five predictions from the positive set shown
in Table 3.1
RefSeq Accessions
Annotation Protein A
Annotation Protein B
NP_001789
Origin recognition
NP_004144
kinase 2
complex subunit 1
Interaction type
Direct
Verified By
Protein microarray
(Ramachandran et
al. 2004).
NP_005617
NP_006266
Splicing factor,
Splicing factor,
arginine/serine-rich
arginine/serine-rich 6
Complex
Site-directed
mutagenesis
(Monsalve et al.
2000)
NP_001789
NP_001790
Cyclin dependent
kinase 2
kinase 7
Direct
In-vitro
experimentation
(Garrett et al. 2001)
NP_001347
NP_003391
DEAD/H (Asp-Glu-
Exportin 1
Direct
In-vivo/in-vitro
Ala-Asp/His) box
experimentation
polypeptide 3
(Yedavalli et al.
2004)
NP_066953
NP_000935
Peptidyl-prolyl cis-
Serine/threonine-
trans isomerase A
protein phosphatase
Direct
Yeast 2-hybrid
(Stelzl et al. 2005)
2B catalytic subunit
alpha
Table 3.1: True positive proteins predicted by constrained ML (Barker et al. 2007; Barker
and Pagel 2005) from the training data with rate of gain constrained to 0.025 at an LR cutoff
of 58.54.
The examination of the training datasets yielded a similar result to (Barker et al. 2007) in so
much as lower rates of gain tended to perform better.
113
Chapter 3
As 0.025 was selected as the optimum rate of gene gain over the training data it was
also tested on the testing data to cross validate this selection. Figure 3.5 shows the results of
this cross validation check.
Figure 3.5: Performance of constrained ML (Barker et al. 2007; Barker and Pagel 2005) over
test data with rate of gain constrained to 0.025.
Figure 3.5 shows that constraining the rate of gain to 0.025 also achieves a precision
of 1 over the testing data. This precision occurs at an LR cutoff of 53.3. The sensitivity at
this point is 0.00054. At this cutoff constrained ML (Barker et al. 2007; Barker and Pagel
2005) makes 5 predictions that are true positives as shown in Table 3.2.
114
Chapter 3
RefSeq Accessions
NP_002583
NP_001347
NP_001118
NP_001119
Annotation Protein A
Annotation Protein B
Interaction type
Direct
Verified By
proliferating cell
ATP-dependent
nuclear antigen
RNA helicase
experimentation
DDX3X isoform 1
Direct
In-vitro
adaptor-related
AP-1 complex
Yeast 2-hybrid
protein complex 1
subunit gamma-1
(Takatsu et al.
beta 1 subunit
isoform b
2001)
isoform a
NP_000391
NP_001790
TFIIH basal
transcription factor
kinase 7
Direct
In-vitro
experimentation
complex helicase
XPD subunit
isoform 1
NP_005517
NP_004497
NP_066953
NP_000936
protein 1
protein 2 isoform a
Direct
In-vivo/in-vitro
experimentation
(He et al. 2003)
peptidyl-prolyl cis-
calcineurin subunit
trans isomerase A
B type 1
Complex
In-vitro
experimentation
(Huai et al. 2002)
Table 3.2: True positive proteins predicted by constrained ML (Barker et al. 2007; Barker
and Pagel 2005) from the testing data with rate of gain constrained to 0.025 at an LR cutoff
of 50.217.
3.3.2.1 Likelihood ratio statistic
The likelihood ratio statistic (LR) derived from the comparison of the independent and
dependant models of evolution is asymptotically distributed as a %2 variate with degrees of
freedom equal to the difference of numbers in parameters between the two models which in
this case equals 4 under assumptions about the size of the phylogeny and the speed of
evolution of the character under consideration (Barker and Pagel 2005; Pagel 1997). Thus if
the LR falls within the critical region of the distribution it is considered significant. A
histogram showing the theoretical %2 distributions with 4 degrees of freedom is shown below
in Figure 3.6.
115
Chapter 3
The distribution of LRs in the positive and negative set as well as over the combined
training data differs from this theoretical distribution as can be seen in Figures 3.7, 3.8, 3.9
and 3.10.
116
Chapter 3
Figure 3.7: Distribution of likelihood ratio statistic for constrained ML within the rate
of gain 0.025 over the positive training set.
117
Chapter 3
Figure 3.8: Distribution of likelihood ratio statistic for constrained ML within the rate
of gain 0.025 over the negative training set.
118
Chapter 3
Figure 3.9: Distribution of likelihood ratio statistics for constrained ML (Barker et al.
2007; Barker and Pagel 2005) within the rate of gain 0.025 over the complete training
dataset.
Minimum
0.08932
1st Quartile
7.85
Median
10.51
Mean
11.14
74.80
Table 3.3: Descriptive statistics for the distribution of likelihood ratios for the rate of gain
0.025 over the complete training data.
119
Chapter 3
Figure 3.10: Distribution of likelihood ratio statistics for constrained ML within the
rate of gain 0.025 over the complete training dataset, the positive training dataset and the
negative training dataset compared with the theoretical %2 distribution with 4 degrees of
freedom.
The distribution of LR statistics over the training data seems to differ from the
theoretical %2 distributions with 4 degrees of freedom.
This distribution was also tested via a two-sample Kolmogorov-Smirnov test for
goodness of fit between a generated theoretical %2 distribution with 4 degrees of freedom and
the LR statistic score distribution over the training data using R (R Development Core Team
2011). This also showed a difference between the two distributions (D=0.9993, p-value<2.2e16
).
120
Chapter 3
This may be due to a violation of assumptions of the model with regards to the speed
of character transition.
The overall frequency of higher LR statistics does appear to be higher in the positive
set which is further validation for the constrained ML method.
3.3.3 Co-expression of mRNA
The results for each microarray experiment measured over the training data are given below
in Figure 3.11.
Figure 3.11: Precision/ sensitivity results for 377 microarray experiments over the
training datasets.
121
Chapter 3
As before the area of interest in Figure 3.11 is the point at which precision equals 1.
This is because the average correlation between transcript abundance and peptide abundance
has been observed to be fairly low in primates at around 0.33 (Fu et al. 2007). Thus mRNA
co-expression is unlikely to be capable of high sensitivities in protein-protein interaction
detection. Figure 3.12 is a magnification of this area.
Figure 3.12: precision/ sensitivity results for microarray experiments over the training
datasets magnified to a scale of sensitivity ranging from 0 to 0.01.
Mean precision over all 377 microarray experiments was 0.2141 and mean sensitivity
was 0.1195. Out of the 377 total 18 experiments achieved a precision of 1. Details of these
experiments are shown in Table 3.4.
122
Chapter 3
Accession
E-GEOD-4567
Size
166
Description of experiment
Sensitivity
0.0006547359
168
0.0003274752
E-GEOD-3183
255
0.0002183168
266
0.0002183168
474
0.0008728860
28
0.0002182929
191
0.0004366336
538
0.0030511060
254
0.0008728860
87
0.0003274752
212
123
0.0002182929
Chapter 3
E-TABM-577
95
0.0001091584
129
0.0001091584
126
0.0008728860
293
0.0014181302
55
0.0002182929
1225
0.0008731718
333
0.0007641087
124
Chapter 3
Figure 3.13: Precision/ sensitivity results for predictions from the PIPs server over six
cut-offs over the training dataset.
125
Chapter 3
Figure 3.14: precision/ sensitivity results for predictions from the PIPs server over six cutoffs
over the training dataset zoomed in to a maximum sensitivity of 0.15.
None of the score cut-offs over the predictions from the PIPs server achieved a full
precision of 1. However none of them fell under 0.9 either as seen in Table 3.5.
126
Chapter 3
Cutoff
Predictions
Precision
Sensitivity
0.25
79441
0.9135546
0.14366504
1.00
37606
0.9395973
0.11068444
2.50
25598
0.9533333
0.09825928
25.00
5394
0.9949239
0.04742318
250.00
1232
0.9865772
0.01832689
2500.00
498
0.9883721
0.01067973
Table 3.5: precision/ sensitivity results for predictions from the PIPs server over six cutoffs.
127
Chapter 3
Figure 3.12: All methods compared over training dataset. Legend explanation (PIPs=PIPs
server, MA= microarray experiment and PP= phylogenetic profiling measuring correlation in
gain and loss over a phylogeny with constrained rate of gain).
3.4 Discussion
Arguably the best performing method out of all three methods is the PIPs server (McDowall
et al. 2009) as it achieves the highest rates of combined precision and sensitivity over the
training data. The success of the PIPs server in terms of accuracy and coverage is attributable
to its use of multiple, disparate sources of evidence. The other two methods both focus on
particular types of interactions.
Phylogenetic profiling measured with constrained ML over a phylogeny is limited to
proteins that have been gained and lost in a correlated fashion over a phylogeny. Thus protein
interactions between phylogenetically ubiquitous partners cannot be detected. Similarly it
cannot detect interactions between interactors with potentially redundant partners.
128
Chapter 3
Microarrays are more flexible in the types of interaction they are capable of detecting.
However individual experiments are limited in the types of interactions that they can uncover
by the experimental conditions under which their constituent mRNAs were extracted. They
are also biased toward stable complexes (von Mering et al. 2002). Another limitation in the
use of microarray experiments in the prediction of protein interactions is the fact that
expression levels of a gene at the transcription level do not correlate strongly with overall
levels of protein production at the translational level (Gygi et al. 1999). This is due to
regulation at the posttranscriptional level by factors such as mRNA half-life, codon usage and
ribosome occupancy and density (Wu et al. 2008). The best performing microarray
experiments outperformed constrained ML in terms of sensitivity.
However given the difference in cost and labour intensiveness between a microarray
experiment and a computational analysis employing phylogenetic profiling, the latter can
clearly be a useful tool in the functional annotation of identified genes within a newly
sequenced genome.
3.4.1 Low Sensitivities
None of the methods as described and utilised above can are particularly sensitive in
detecting protein-protein interactions. Constrained ML and gene co-expression are insensitive
to protein-protein interactions for the reasons described above.
The PIPs server as the best performer achieves a sensitivity of 0.14 at a high level of
precision. However this still corresponds to a 14% chance of detecting a possible protein
interaction despite its integration of various forms of supporting evidence. It is possible that it
is this integration of evidence that renders PIPs insensitive. If for example the likelihood ratio
returned by one of its predictor modules was high with the rest all being low, the overall
posterior odds ratio score would be low. Thus the individual sensitivities of the module
predictors are averaged out.
It seems that maximising coverage of the interactome is beyond the scope of each of
the predictive methods considered in this chapter. To use the analogy of the interactome as a
dark room, none of these methods are equivalent to an overhead light that illuminates every
corner of the room. Rather each method is more like a lamp that casts a pool of light on its
immediate surroundings. It is only by lighting a number of these lamps that the entire room
can be illuminated.
129
Chapter 4
Chapter 4
Design and implementation of data filter
4.1. Introduction
The constrained maximum likelihood (ML) method used to detect proteins which share
correlated evolutionary histories as described in Chapter 3 and in work by Barker et al.
(Barker et al. 2007; Barker and Pagel 2005) estimates values for parameters which model the
transition rates of the gain and loss of discrete characters (Pagel 1994) by integrating over all
possible ancestral states at each node within the phylogenetic tree.
As pointed out by Barker (Barker et al. 2007) placing a constraint on the rate of
acquisition of new proteins increases the ability of the likelihood method to discriminate
between proteins that interact and those that do not. The determination of an optimum rate of
gain reduces the scale of the problem of parameter estimation (Barker and Pagel 2005) as it
reduces the numbers of parameters to be fitted to 2 for the independent model and 4 for the
dependent model.
The detection of potential functional interactors for a single given protein using this
method is possible, however given the low sensitivity of the method (see Chapter 3) the
probability of detecting a functional interaction for any given single protein or even a set of
proteins is low. A complete genome-wide survey however would detect all protein pairs that
displayed evidence of correlated evolution.
The procedure is however prohibitively slow for a complete genome-wide survey
without access to a significant amount of computing power. A timed training run over the
training dataset for a single rate of gain took approximately 110 CPU-hours to conclude
54,906 comparisons of non redundant phylogenetic profile pairs on a single core of a 3 GHz
dual-core Intel Xeon processor (see Section 4.5). As there are 60,615,555 possible nonredundant pairs of phylogenetic profiles in the version of the human proteome currently held;
a full genome comparison would take 121,825.05 CPU-hours or 13.9 CPU-years on the
single core of a dual-core 3 GHz Intel Xeon processor. The speed of constrained ML (Barker
et al. 2007; Barker and Pagel 2005) was also measured in work presenting a genome order
based approach to phylogenetic profiling (Cokus et al. 2007). In this case it was found to
range between 5-15 seconds per pair of proteins (Cokus et al. 2007). This caused the authors
to utilise a subset of their data in their benchmarking study of constrained ML (Cokus et al.
2007).
130
Chapter 4
Potentially access to multi-core CPUs and/or computing clusters could ameliorate this
to a certain extent. As application of constrained ML (Barker et al. 2007; Barker and Pagel
2005) involves sequential comparison of pairs of phylogenetic profiles, it is a process that is
easily amenable to parallelisation via splitting the task into a smaller set of tasks, which can
be launched in parallel. Task farming is applied in computational biology to tasks that are
potentially intractable if tackled serially, e.g. analysis of gel electrophoresis data (Dowsey et
al. 2003) or analysis of microarray data (Hill et al. 2008). However even with the application
of task farming it is clear that a full genome-wide survey is not feasible for this method on
any averaged sized eukaryotic genome.
This chapter details the development of a data filter to remove protein pairs that
display little or no evidence of correlated evolution. There are two main types of filter
evaluated. The first type is a simple distance based test (Hamming distance) as shown in
Chapter 3 and utilised in early work on phylogenetic profiling (Pellegrini et al. 1999).
Potentially proteins that display evidence of correlated evolution will have phylogenetic
profiles that have a lower Hamming distance from each other. Thus even though Hamming
distance applied in isolation performs poorly as seen in Chapter 3, it may serve as a filter for
proteins which do not display evidence of correlated evolution in combination with the
second type of filter.
The second type of filter will utilise a single set of reconstructed ancestral states. By
using a single set of reconstructed states and a simpler method for the detection of evidence
of correlated evolution proteins that do not display any such evidence may be filtered out.
This chapter describes the implementation and comparison of five filters, which utilise a
single set of reconstructed ancestral states to detect signs of correlated evolution. As a large
amount of the computations performed by constrained ML (Barker et al. 2007; Barker and
Pagel 2005) involve estimation of the transition rate parameters by integrating over all
possible ancestral states, use of a single set of reconstructed ancestral states reduces the scope
of the problem. Through the use of an effective and accurate data filter a genome-wide
survey for an average eukaryotic organism could be rendered feasible.
The end product of this research described in this chapter is just such a filter based on
logistic regression of a set of empirically evaluated predictors/parameters, which reflect
correlated evolution between a pair of proteins. The filter is approximately 2208 times faster
then constrained ML and achieves a reasonable degree of precision/sensitivity over the
training data in its own right. Thus application of this filter can facilitate a heuristic search
131
Chapter 4
for genes/proteins displaying evidence of correlated evolution over an entire
genome/proteome.
In order to describe the process of filter development/evaluation it will firstly be
necessary to present an overview of ancestral state reconstruction.
4.1.1 Ancestral state reconstruction
The procedures involved in the reconstruction of the states of characters and traits in extinct
ancestral species are similar to those involved in phylogeny reconstruction. This is due to the
similarity of the issues involved. The reconstruction procedures for character states thus
utilise similar criteria with which to judge putative reconstructions. Ancestral reconstruction
is a useful tool for investigating hypothetical evolutionary scenarios having been used to
investigate many biological questions such as for example the demonstration of homoplasy in
the evolution of lysozyme (Malcolm et al. 1990; Messler and Stewart 1997; Stewart et al.
1987). It is also a prerequisite step for a number of comparative method tests (Maddison
1990; Ridley 1983).
4.1.1.1 Parsimony
A parsimonious reconstruction of ancestral states over a phylogenetic tree would
entail the selection of the internal state that minimised change. Thus if for example two
terminal nodes within a given clade had the same internal state the same state would be
assigned to the node immediately preceding them.
Algorithms such as the Fitch (Fitch 1971) and Sankoff (Sankoff 1975) algorithms as
described in Chapter 2 are used employed as a step within phylogeny reconstruction (Albert
2006). However given a particular already constructed phylogenetic tree they can be
employed to reconstruct a set of ancestral node values which minimises evolutionary change
over that particular tree (Felsenstein 2004). The algorithms themselves do not reconstruct
individual states at each internal node but instead construct sets of potential states at each
node. These potential states can be resolved into a singular state reconstruction through the
application of algorithms such as ACCTRAN (Accelerated transformation) (Swofford and
Maddison 1987), which reconstructs ancestral states by placing points of change as close to
the root of the tree as possible (Agnarsson and Miller 2008). The converse approach to
ACCTRAN is DELTRAN (delayed transformation)(Swofford and Maddison 1987), which
reconstructs ancestral states by placing points of change as close to the tips of the tree as
possible (Agnarsson and Miller 2008). ACCTRAN and DELTRAN are the most commonly
132
Chapter 4
used methods for collapsing node state sets into individual node states though of the two
ACCTRAN is the more widely employed (Agnarsson and Miller 2008).
Parsimony methods fail to consider different branch lengths in different parts of the
tree (Yang et al. 1995). Parsimony based methods have also been criticised for their lack of
statistical soundness (Elias and Tuller 2007). Parsimony methods are also unable to
distinguish between reconstructions that are equally parsimonious (Koshi and Goldstein
1996).
4.1.1.2 Likelihood
In a similar fashion as likelihood is employed as an optimality criterion for phylogeny
generation, it can also be used in the context of ancestral state reconstruction. Maximum
likelihood techniques are used to estimate the parameters of the specified model of evolution
(Yang 2006). Once these parameters are estimated they can be utilised to calculate the
posterior probability of ancestral states using Bayes theorem (Yang 2006). The state with the
highest posterior probability is then assigned to the node under consideration. This procedure
has been defined as empirical Bayes (Yang 2006). Empirical Bayes can be used to either
assign a character state to a set of nodes in a tree via a process known as marginal
reconstruction or it can be used to assign a set of possible characters to each node (Yang
2006). This latter process is known as joint reconstruction (Yang 2006).
Empirical Bayes can be contrasted with hierarchical Bayes where rather than estimating a
single value for the parameters of a model of evolution a prior probability distribution is
assigned for each unknown parameter (Yang 2006). The posterior probability for a given
ancestral state is then calculated by integrating over all possible values of parameters
(Huelsenbeck and Bollback 2001). Again the putative state with the highest posterior
probability is then assigned to each ancestral node.
Work by Koshi and Goldstein used the empirical Bayes method to reconstruct the
sequence of ancestral ribonuclease (Koshi and Goldstein 1996). The performance of
parsimony and the empirical Bayes method was also compared in a reconstruction of
lysozyme c by Yang et al. (Yang et al. 1995). This work found that empirical Bayes
outperformed parsimony but both methods suffered when the sites within the multiple
alignments being reconstructed were highly variable and the distance from the ancestral
nodes to the extant species was high (Yang et al. 1995).
An interesting application of empirical Bayes reconstruction was carried out by
Gashen (Gaschen et al. 2002). This work entailed reconstruction of the reconstruction of the
133
Chapter 4
sequence of the ancestor to various regional variants of the HIV-1 virus in order to contribute
to the creation of a potential vaccine (Gaschen et al. 2002).
4.2 Filters
4.2.1 Hamming distance filter
The original work which introduced the methodology of phylogenetic profiling as a means of
detection of functional interaction between genes (Pellegrini et al. 1999) utilised Hamming
distance (Hamming 1950) as a measure of similarity of profiles. Phillip Kensche also
examined this method in a review of phylogenetic profiling methods, and found it to perform
reasonably well over a dataset composed of the proteins sequences of 25 fungi (Kensche et
al. 2008). Hamming distance did not perform well over the training data as seen in Chapter 3
however it was possible that it could reduce the possible search space for an application of
constrained ML. As a potential heuristic it offers speed, as Hamming distance is one of the
simplest comparisons that can be carried out between two strings. Hamming distance
therefore was investigated as a potential filter to be used possibly in conjunction with a filter
based on a single set of reconstructed states.
4.2.2 Ancestral state reconstruction filter
The first consideration in the development of a heuristic/filter based on a single set of
reconstructed characters was which criterion to use to reconstruct that set. Likelihood as a
criterion yields more accurate results as discussed above. However as the aim of this heuristic
approach was to develop a method that reduced the search space for an application of the
computationally intensive constrained ML (Barker et al. 2007; Barker and Pagel 2005) to
phylogenetic profiling, it was decided to use the simpler though less accurate criterion of
parsimony.
4.2.2.1 Dollo parsimony
Dollo parsimony operates under the assumption that once a complex trait has been lost it
cannot be re-acquired (Albert 2006). Given that the character under investigation is the
presence and absence of genes/proteins in eukaryotic organisms it was decided that Dollo
parsimony was the appropriate variant to use. Dollo parsimony has been previously used to
investigate the propensity of particular genes to be lost over the course of evolutionary time
in eukaryotes (Krylov et al. 2003). It was chosen by the authors due to the relative rarity of
lateral gene transfer events in eukaryotes (Krylov et al. 2003).
134
Chapter 4
Dollo parsimony has also been utilised to investigate gene gain in poxviruses
(McLysaght et al. 2003). The results of this use however may have been affected by the fact
that poxviruses were later observed to acquire genetic material from infected hosts (Hughes
and Friedman 2005). Kensche also evaluated the efficacy of Dollo reconstructions of profiles
as a method of phylogenetic profiling (Kensche et al. 2008). Kensche utilised a distance
measure d(A,B) between the Dollo parsimonious reconstructions of the phylogenetic profiles
of two (orthologous groups of ) proteins A and B calculated as:
d(A, B) =
(1)
i#branches
where branches denoted the set of branches in the phylogenetic tree, anc(ai) was defined as
the state of orthologous group A at the ancestral node of branch i, desc(ai) was defined as the
state of orthologous group A at the descendant node of branch i, anc(bi) was defined as the
state of orthologous group B at the ancestral node of branch I and desc(bi) was defined as the
state of orthologous group B at the descendant node of branch i (Kensche et al. 2008). The
distance d(A,B) was a count of branches where either orthologous group was gained or lost
independently. The method performed as well as more sophisticated techniques on the data
analysed by Kensche (Kensche et al. 2008).
One of the methods evaluated by Barker as a potential source of signal for correlated
evolution was also examination of Dollo parsimony based reconstructions of phylogenetic
profiles over a phylogeny (Barker et al. 2007). Dollo parsimony was utilised as it reflected
the idea of setting the rate of acquisition of a complex trait (in this case a protein) to a preset
low level (Barker et al. 2007). Pairs of proteins were scored on branches of the tree where
they were jointly lost and jointly gained to form a score referred to as Dollo-pos (Barker et al.
2007). Branches where proteins were not gained or lost together were also counted and
subtracted from Dollo-pos to form a score referred to as Dollo-overall (Barker et al. 2007).
Both these scores however did not perform particularly well over the data examined (Barker
et al. 2007). Dollo-overall however performed significantly better than Dollo-pos (Barker et
al. 2007).
Thus given the fact that Dollo parsimony based tests had been moderately successful
at detecting correlated evolution, a series of potential data filters /heuristics for examination
of phylogenetic profiles using constrained ML (Barker et al. 2007; Barker and Pagel 2005)
135
Chapter 4
based on a single set of reconstructed ancestral states over the phylogeny using Dollo
parsimony were investigated.
4.2.2.2 Maddison Test for correlated evolution
To use the reconstructed ancestral state data a test to detect correlated evolution using the
comparative method that utilised a set of reconstructed ancestral states over a given
phylogenetic tree was needed. One candidate test was a contingency table based test
presented by Ridley where a gain or loss of a character was considered in the light of whether
it occurred in the presence or absence of another character over a phylogenetic tree (Ridley
1983). This test however does not separate which character is dependent and which is
independent.
A second candidate test considered was a procedure described by Wayne Maddison
(Maddison 1990) for the comparison of the association of changes in one binary character
with the given state of another. This test was designed to carry out this analysis assuming a
given phylogenetic tree and a set of reconstructed characters (Maddison 1990). This test has
been referred to as a test for concentrated changes (Felsenstein 2004).
The fundamental idea behind the Maddison test is to test whether changes in one trait
or character are concentrated in an area of a tree where a second trait or character in a given
state. As an illustrative example consider a fictional monophyletic group of related cow-like
animals. These animals do not possess horns. The phylogenetic relationships of these
animals are fully resolved and understood as well as the ancestral states for all morphological
and molecular traits. Now imagine that this group overall has no ability to metabolise valine.
Finally imagine the ability to metabolise valine is independently acquired by a sub-clade of
our fictional group and this leads to the development of horns in this sub-clade.
If we wished to test whether the ability to metabolise valine leads to horn
development, the Maddison test would return the probability of the observed configuration of
valine metabolism / horn presence. This probability would be calculated by firstly calculating
the total number of ways to acquire horns in the presence of the ability to metabolise valine
over the phylogenetic tree. Secondly the number of ways to acquire horns over the entire tree
irrespective of the state of the ability to metabolise valine are calculated. By dividing the first
value by the second a probability can be calculated. If horns are concentrated in parts of the
tree where valine metabolism is also present this probability will be lower.
136
Chapter 4
1
.
6
137
Chapter 4
To reiterate the test works through counting all possible ways of having a set of
observed changes in a character over a phylogeny and then counting how many ways there
are of having the same number of changes in parts of the tree where a second character is in a
given state. Thus if correlated evolution is occurring changes in the first character will be
concentrated in areas of the tree where the second character is in the causative state. Consider
as a second example two proteins, which carried out the same function. If the presence of the
first protein made the second protein redundant then losses in the second protein could be
concentrated in areas where the first protein was present.
The drawbacks of the test are the fact that it treats all forms of evolutionary change as
equally likely and its inability to take into account branch lengths (Pagel 1994). However as
the motivation behind the implementation of the test was its use as a simple data filter to
remove protein pairs that showed little or no evidence of correlated evolution it was decided
that the Maddison test (Maddison 1990) was an appropriate test.
4.3 Methods
To create Dollo parsimony based reconstructions over each phylogenetic profile over the
phylogeney presented in Chapter 2, the program DOLLOP from the PHYLIP package
(Felsenstein 1989) was used. The program implements the Dollo parsimony reconstruction
algorithm described in work by Farris (Farris 1977).
Given a binary trait T that can take on 2 possible values coded as [0,1], DOLLOP
implements Dollo parsimony by seeking to explain a given observed configuration of
presence and absence for T over a set of taxa over a phylogenetic tree by allowing one gain
(transition from 0 to 1) and multiple reversions (transition from 1 to 0) (Felsenstein 1989).
As an illustrative example consider the tree below and a trait with the distribution
010101.
138
Chapter 4
139
Chapter 4
From
To
Changed
State
root
No
Absent
Entamoeba
No
Absent
histolytica
1
No
Absent
Trichomonas
No
Absent
vaginalis
2
No
Absent
No
Absent
140
Chapter 4
The main object in the preceding figure is the Transition Matrix object. This object has 2
main attributes.
The States: This is a list of Transition objects. Transition objects contain the same 4
attributes as shown in Table 4.1
The Position Map: This is a Tree Map, which contains a position within the tree as a
key and the state of a given trait at that position as a value. Thus this attribute can be
queried for the state (present or absent) of a given trait at any point in the tree.
Calculate clade: This function returns all parts of a tree descended from a given node.
Thus if a trait is gained or lost at Node n, the function will return the monophyletic
group consisting of n and all its descendants.
Create Position Map: This function traverses the States list and utilises the Calculate
clade function to populate the Position map.
!
141
Chapter 4
Define W root (x, y | b) as the total number of ways to have x gains and y losses of
character A over the tree starting at the root node given that state of character A is b at the
root of the tree.
!
Define Broot ( p,q | x, y,b) as the total number of ways to have p gains and q losses of
character A in subset k given x gains and y losses over the entire tree starting at the root node
given that state of character A is b at the root of the tree.
!
The test for correlated evolution is thus calculated by
p(obs) =
(2)
Solving Equation 2 provides the probability p(obs) of having p gains and q losses of
character A in subset k given a total of x gains and y losses occur over the whole tree under
the null hypothesis of no correlated evolution. If gains and losses of character A are in some
way dependent on whether character B is in state s then we could expect those gains and
losses to be concentrated in subset k. W root (x, y | b) and Broot ( p,q | x, y,b) are calculated
through the use of a dynamic programming approach starting at the tips of the tree and
proceeding in a post order fashion (Maddison 1990).
!
!
4.3.1.2 Calculation of total number of ways of having x gains and y losses over the tree
In order to calculate W root (x, y | b) over the entire tree for a character A, a matrix containing
the number of ways of having 0 to x gains, 0 to y losses for either potential values of b (0 or
1) has to be calculated for each node in the tree.
!
For a leaf node there are 0 ways of having x gains and y losses at the node for all
values of x and y which are greater than 0. There is one way of having 0 gains and 0 losses at
a leaf node.
For a non-leaf node K there are four calculations to make. Firstly assume all gains and
losses occur post the nodes immediate descendants L and M and that the state of character A
is 0. A non-leaf node is only processed after both its descendants have been visited. The
number of ways of having x gains and y losses at node K given a state of 0 can be calculated
x
principle that for every way of having i gains and j losses on node L there are (x-i) gains and
(y-j) gains on node M. Thus if for example there was 1 gain and 1 loss to distribute over node
!
142
Chapter 4
K then if both of them occurred post descendent L then no changes would occur post
descendent M. If only the gain occurred post L then the loss would occur post node M.
The second part of this calculation is based on the assumption that one of the changes
occurs between K and one of its child nodes for example M. Thus as one of the changes has
occurred (the change is a gain as the state of the character is of character A at node K is 0) the
state of character A at node M is now 1 and one of x gains has already occurred. Thus the
number of ways to have the remaining number of gains and losses can be calculated by the
x#1 y
expression.
$ $W
(i, j | 0) " W M (x # i, y # j |1) . The third part of the calculation covers the
i= 0 j= 0
eventuality that the change happens between K and its other child L. Thus the number of
ways remaining to have x gains and y losses are calculated by the expression
!
x#1 y
$ $W
(i, j |1) " W M (x # i, y # j | 0) . Finally assume changes occur between K and both of its
i= 0 j= 0
child nodes L and M. The states of both nodes will be 1 and there will be two fewer gains to
distribute over the remainder of the tree. Thus the fourth part of the calculation is:
x#2 y
$ $W
i= 0 j= 0
Summing up the results of these four expressions will provide the number of ways of
having x gains and y losses at non-leaf node K given that the state of character A is 0. The
calculation of the number of ways of having x gains and y losses if the state of character A is
1 at node K is a mirror image of the process described above (Maddison 1990).
4.3.1.3 Calculation of total number of ways of having p gains and q losses in subset k
given x gains and y losses over the entire tree
This calculation of Broot ( p,q | x, y,b) is very similar to the one described above. As above a
matrix containing the number of ways of having 0 to p gains in subset k, 0 to q losses in
subset k given 0 to x gains and 0 to y losses over the whole tree for either potential values of b
!
(0 or 1) has to be calculated for each node in the tree.
For a leaf node there are 0 ways of having p gains and q losses in subset k given x
gains and y losses overall for all values of p, q, x and y which are greater than 0. There is one
way of having 0 gains and 0 losses in subset k given 0 gains and 0 losses overall.
As above a non-leaf node is only processed when both its children have been visited.
For a non-leaf node K with character A having state 0 with children L and M there are again
143
Chapter 4
four calculations to be made. The first calculation counts the possibilities where both changes
occur post the child nodes. This number is calculated through the expression
x
j= 0
f = 0 g= 0
the possibilities where one of the changes occurs between node K and node M. Whether this
change is counted as within subset k depends on whether node M lies within subset k. To
facilitate calculation (Maddison 1990) defined a number ZM as set to 1 if M lies within k.
Thus the second calculation is evaluated by the expression
x#1 y p#Z m q
$ $ $ $ B ( f ,g | i, j,0) " B
L
i= 0 j= 0 f = 0 g= 0
counts the possibility of one of the changes occurring between node K and node L. This is
$ $ $ $ B ( f ,g | i, j,1) " B
L
i= 0 j= 0 f = 0 g= 0
counts the possibilities where changes occur between K and L as well as K and M. This is
$ $ $ $ B ( f ,g | i, j,1) " B
L
i# 0 j= 0
f =0
( p # f # Z L # Z M ,q # g | x # i # 2, y # j,1)
g= 0
The summation of the solutions of the four expressions yields the total number of
ways to have p gains and q losses of character A within subset k given x gains and y losses of
character A overall under node K given the state of character A is 0. As above this process is
mirrored when the state of character A is 1 (Maddison 1990).
4.3.1.4 Permutation effects
The Maddison test for correlated evolution (Maddison 1990) is potentially susceptible to two
effects in the context of examination of protein phylogenetic profiles. Maddisons test was
designed to test specific hypotheses about correlated evolution. For example one of the first
applications of the test was on data testing the association of gregariousness in butterflies
with unpalatable larvae (Sillentullberg 1988; Maddison 1990). Thus in a pairwise comparison
of characters one character is held static as a reference while the location of changes in the
other dynamic character are examined over the tree. The terms static and dynamic shall be
used in this context in all subsequent references.
144
Chapter 4
In the case of examinations of correlated evolution in phylogenetic profiles however
it is not possible to state whether we are testing for the dependence of the distribution of
protein A with the state of protein B or vice versa. The first effect is thus permutation.
The second effect is based on how subset k is defined. As phylogenetic profiles
compare patterns of presence and absence of genes subset k can either be defined as the
presence of protein B or the absence of protein B.
This second effect is however precluded as defining subset k as the absence of protein
B shifts position of the number of changes sought. Consider the tree shown in Figure 4.4. If
for example protein A was gained once within the clade containing Species 1 and Species 2
and protein B was present in that clade but no where else within the tree. Thus the ancestral
state of B would be reconstructed parsimoniously as shown in Figure 4.5.
If k is defined as the presence of B then the test is investigating the probability of 1
gain within k with 1 gain over the entire tree. If k was defined as the absence of B then the
test is investigating the probability of 0 gains within k with 1 gain over the entire tree.
Thus over the sample tree if k is defined as the presence of B, then there are 3 ways to
have 1 gain of A within k. That is 1 on the branch leading to Species 1, 1 on the branch
leading to Species 2 and 1 on the branch leading to the clade. There are 9 ways of having 1
gain over the entire tree. Thus the probability of 1 gain in k is
3
or 0.33. If on the other hand
9
k is defined as the absence of B then there are 0 gains within k with 1 gain overall. As there is
one gain to be accounted for and this gain can only occur within the clade containing Species
!
1 and Species 2. Thus as before there are 3 ways of having one gain within that clade and 9
ways of having one gain over the whole tree thus the associated probability remains the same,
i.e. 0.33.
145
Chapter 4
Figure 4.4: Sample phylogeny of 5 hypothetical species. The numbers on the tree
represent presence and absence of protein B. The arrow points out the point post which
protein A was acquired.
146
Chapter 4
Figure 4.5: Sample phylogeny of 5 hypothetical species. The black area of the tree
corresponds to a Fitch parsimonious reconstruction (Fitch 1971) carried out by Mesquite
(Maddison and Maddison 2010) of protein B if protein B has the phylogenetic profile 11000
(where the order of species in the profile is the same as the numerical order of the species). It
is also the Dollo parsimonious reconstruction. This black area corresponds to subset k if it is
defined as the presence of B. Conversely the white area of the tree corresponds to k if it is
defined as the absence of B.
A further example of this concept can be considered by using the initial example
provided in 4.2.2.2 involving our fictional cow like species. In that case the probability of
acquiring horns in the presence of valine was calculated as
1
. If we were to examine the
6
probability of acquiring horns in the absence of valine then the denominator remains the
same and there are 0 gains of horns in k (the area of the tree where the ability to metabolise
!
valine is absent). There is 1 way of having 0 gains of horns. Thus the probability of the
observed configuration remains the same.
Thus the results shown in by the Maddison-Dollo test over the training data as
described in Chapter 3 were identical with respect to the choice of how subset k is defined.
147
Chapter 4
4.3.1.5 Evaluation of Maddison test as heuristic for constrained ML
The Maddison test (Maddison 1990) as described above modified to accommodate the
assumptions of Dollo parsimony (Section 4.3.2) was implemented using Java. This entailed
writing 3,563 lines of code. The implementation was supplied with the set of reconstructed
states for each protein pair and the phylogenetic tree on which they are reconstructed.
In order to remove permutation based effects from the analysis the training dataset as
described in Chapter 3 was doubled so it included proteins pairs in both the A-B and the B-A
orientations. The Maddison test was run on this expanded training set. Thus each protein pair
in the training set had two associated probability scores. The lower of these two scores was
selected as the lower the probability of the observed distribution of gains and losses the
stronger the evidence for correlated evolution. In order to use this probability as an ascending
score the score was defined as 1-p. The test was run with subset k defined as the absence of
trait B.
The ability of the test to detect protein interactions was then judged according to the
criterion of precision/sensitivity as defined in Chapter 3.
This process was then repeated with k defined as the presence of trait B in order to
verify the observation that there was no effect on whether k was defined as the absence or the
presence of trait B.
4.3.2 Modification of test to match Dollo constraints
The calculation of the null distribution under the standard Maddison approach (Maddison
1990) allows for all sequences and permutations of gains and losses as allowed by Fitch
parsimony (Fitch 1971). In order to reconcile the test to the assumptions of Dollo parsimony
the test was modified to remove all possibility of a gain following a loss. This was achieved
by examining the state of the character under consideration at the root node. If the root node
was 0 then the standard test as described in Equation 2 was utilised. However if the state was
1 then the following test was used.
p(obs) =
(3)
If a character is acquired at the root of the tree then no gains can be allowed to occur
post a loss thus a is calculated for 0 gains and y losses from the root.
148
Chapter 4
To illustrate this difference consider the following tree.
Figure 4.6: Example tree to illustrate the imposition of Dollo parsimonious constraints
on the Maddison test for correlated evolution.
Assume the state of a character C was reconstructed as 1 at the root node of the tree
and there was one gain and one loss to be distributed over the tree. If gains were allowed to
follow losses then there are 4 ways of having one gain and one loss over the tree. These are:
A loss between node 2 and node 6 followed by a gain between node 6 and
node 7.
A loss between node 2 and node 6 followed by a gain between node 6 and
node 8.
A loss between node 2 and node 3 followed by a gain between node 3 and
node 4.
A loss between node 2 and node 3 followed by a gain between node 3 and
node 5.
149
Chapter 4
However with the added Dollo parsimony constraint there are no ways having one
gain and one loss over the tree in Figure 4.6 if the state at the root node is 1.
4.3.3 Differential parsimony
The distance as defined in work by Phillip Kensche (Kensche et al. 2008) and reiterated in
Section 4.3.1 was implemented and its performance examined over the training data.
4.3.4 Dollo-pos/ Dollo-overall
Both measures as described by Barker et al. (Barker et al. 2007) were implemented and
examined in the light of the testing data.
4.3.5 Test based on logistic regression
In order to test for correlated evolution the reconstructed ancestral states allowed the
calculation of potential predictor variables, which bore a correspondence to the transition rate
parameters, used by Barker and Pagel (Barker and Pagel 2005; Pagel 1994) as described in
Chapter 3. These parameters represent the rates of transition in state for a discrete binary
character given a particular state for a second discrete binary character.
Given a single set of reconstructed ancestral states these transitions can be empirically
counted over the reconstructed states. For example a protein is lost on a given branch of the
phylogeny and a second protein is present on that branch according to the reconstructed states
this can be counted as a loss of one protein in the presence of the other.
The Dollo parsimony based reconstructions of each phylogenetic profile contain
within them the state of the associated protein at every given point in the tree. It was thus
possible to compare the state of any given ancestral branch within the tree for any two given
proteins. The Dollo reconstruction data is framed in terms of transitions between two nodes.
This meant that it was possible to compare a transition in the state of a given protein with the
state of the other protein at the same point in the tree.
In order to use the Dollo reconstructions of each phylogenetic profile as potential
predictors of functional interaction using logistic regression the reconstructions had to be
framed in terms of being potential predictors of correlated evolution. The possible states that
a protein could be in at any given transition in the tree were coded as:
0: Absent
1: Present
2: Lost
150
Chapter 4
3:Gained
Each profile was associated with a matrix of transitions, which was constructed using
the reconstruction of the ancestral states of the profile over the tree. Pairwise comparisons
were then carried out. Thus at each transition point the state of protein A was compared to
that of protein B.
In order to avoid permutation effects the order in which protein pairs were considered
was made redundant. This was performed by framing transitions in terms of changes in
proteins as opposed to changes in a particular protein. If for example protein A was lost in
the presence of protein B this would be counted as the loss of a protein in the presence of
another, not the loss of A in the presence of B. Pairwise comparisons were thus carried out at
each node of the tree to create the predictors shown in Table 4.2. The lower case s stands for
scenario.
151
Chapter 4
Predictor
Description
s00
s01
s02
s03
s11
s12
s13
s22
s23
s33
152
Chapter 4
Preliminary trials were carried out on the training data, which found the results of logistic
regression and a linear discriminant function to be broadly similar. However the values of
predictors associated with gains are not distributed normally as the number of gains is
restricted to 1. In such cases logistic regression is the recommended technique (Lei and
Koehly 2003) as it makes no assumptions of normality regarding the distribution of the
predictor variables. Thus logistic regression was selected as an appropriate method of testing
whether the predictor variables contribute to the outcome of interaction as well as
determining the degree to which they contribute. Logistic regression is carried out via an
application of the logistic function, which can be defined as shown in Equation 4
p(Interaction) =
e a +bX
1+ e a +bX
(4)
Where a is the y-intercept of a regression line, e is the base of the natural logarithm
and b is the coefficient of a predictor variable X for a set of predictors (X0,X1,X2.Xi) .
The logistic function, which is also known as the sigmoid function returns a value within the
closed interval [0,1] for values in the range of real numbers from "# to +" .
This probability is converted into the odds of an interaction versus no interaction with
the expression:
p
where p is equal to the probability
! of an!interaction (Sokal and Rohlf
(1" p)
1995).
Solving the expression substituting Equation 4 as a value for p yields the following
!
equation (Sokal and Rohlf 1995).
p
= e a +bX
(1" p)
(5)
Finally the odds of an interaction are converted into the log odds or logit of an
p
ln(
) = a + bX
1" p
(6)
153
Chapter 4
The optimal values for the coefficients and intercept of an optimal regression line are
estimated by maximum likelihood (Sokal and Rohlf 1995). The full positive training set of
9,161 protein pairs was used as examples of proteins, which interact. A random subsample of
9,161 proteins was then selected from the negative training set as examples of proteins,
which do not interact. The size of negative and positive sets were set to be equal to allow the
linear model to create a regression line which matched the distribution of the predictors in
both sets rather than being biased by a larger negative set.
Counts were then calculated for each parameter for each pair of profiles within the new
training dataset. The statistical package R (R Development Core Team 2011) was then used
to fit a generalised linear model between the two binary variables using a binomial (logit)
link function. The predictor variables were considered to be continuous.
Predictor variables s13, s22, s23 and s33 were found to cause singularities within the model.
s13 was found to be perfectly correlated with s03 and s33 as Dollo parsimony only allows one
acquisition of a complex trait. s22 and s33 only occur rarely as seen below.
Minimum
1st Quartile
Median
0
Mean
0.5259
3rd Quartile
Maximum.
14
Minimum
1st Quartile
Median
0
Mean
0.1
3rd Quartile
Maximum.
154
Chapter 4
Predictor
s00
Coefficient
p value <0.05
0.02179
0.1278 (Not
significant)
s01
0.02607
0.0845 (Not
significant)
s02
0.03775
0.0244
s03
-0.24041
s11
0.06446
s12
0.04350
0.0189
Coefficient
p value <0.05
s02
0.019429
s03
-0.177835
0.00115
s11
0.043359
s12
0.018787
0.00107
!
Table 4.6: Coefficients of the logistic regression equation derived via examination of
the reduced set of predictors.
The full equation of the regression line is shown below as Equation 7.
155
Chapter 4
y = 0.019429 s02 " 0.177835 s 03 + 0.043359 s11 + 0.018787 s12 " 0.791849 (7)
y is equal to the logit score of the probability of two proteins interacting versus the
probability of them not interacting. The logit scores were then transformed into a probability
of interaction via an application of the logistic function. This probability was used as the
score.
The predictor, which contributes the most to the probability of an interaction, is s03.
This would suggest that a protein being gained in the absence of the other is indicative of the
two proteins not being functionally linked (as the coefficient is negative). The other
significant terms, which contribute positively to the probability of an interaction, are s02, s11
and s12. Losses appear to be a defining event for correlated evolution whether a loss in the
presence of a protein or a loss in the absence of a protein. This could potentially be
confirmation of work postulating that gene loss is relatively the most important event shaping
gene content and determining phenotype. However this may be attributable to the use of
Dollo parsimony for ancestral state reconstruction. A loss of a protein in the presence of
another might suggest some form of redundancy-based loss. A loss of a protein in the
absence of another on the other hand might suggest a cascade of losses of a group of proteins
that carry out a particular function that is no longer needed in a particular lineage.
s11 is the final significant predictor which implies that two proteins coexisting for
periods of time at different points in the phylogeny are also more likely to be functionally
linked. This predictor can be thought of as similar to the method used by Cokus (Cokus et al.
2007) except whereas while that work carried out a horizontal comparison of the distribution
of presence and absences of proteins across a set of genomes clustered by similarity, the
predictor s11 measures co-occurrence both horizontally across species and vertically over a
set of putative ancestors as reflected by the phylogenetic tree.
156
Chapter 4
4.4 Results
The results for the five tests carried out were measured in terms of precision and sensitivity
over the training data. The intersection of the predictions made by the tests with the 6
predictions made by constrained ML method (Barker et al. 2007; Barker and Pagel 2005) at
its optimum rate of gain 0.025 and its optimum likelihood ratio cut-off of 58.54 over the
training dataset was also examined. As pointed out in Chapter 3 this combination of rate of
gain and likelihood ratio cut-off yielded 5 predictions all of which were true positives.
Maintenance of this intersection is judged to be a key criterion for any data filter to form part
of a heuristic approach. Thus in order to use a test as a data filter for constrained ML (Barker
et al. 2007; Barker and Pagel 2005) the highest acceptable cut-off for the test was considered
to be the point at which all 5 predictions made by the ML method were preserved.
157
Chapter 4
4.4.1 Maddison test for correlated evolution
Figure 4.7: Performance of the Maddison test for correlated evolution on the training data with k as the absence
of trait B. The figure shows the performance between cut-offs of 0.9999 and 1 rising by increments of 1 x 10-7.
158
Chapter 4
Figure 4.8: Performance of the Maddison test for correlated evolution on the training data with k defined as the
presence of trait B. The figure shows the performance between cut-offs of 0.9999 and 1 rising by increments of
1 x 10-7.
The Maddison test for correlated evolution (Maddison 1990) performs reasonably
well on the training data as can be seen in Figure 4.7. There is a maximum intersection of 1 in
this range of cut-offs with the 5 predictions made by constrained ML (Barker et al. 2007;
Barker and Pagel 2005) at its optimum rate of gain and optimum LR cut-off. It is however a
computationally intensive process with each node in the tree being evaluated for all
combinations of states, gains and losses respectively firstly for the calculation of the null
distribution and secondly for all combinations of gains within subset k, losses within subset k,
gains over the whole tree, losses over the whole tree and states. Binary tree traversal is
carried out in linear time (Felsenst.J 1973). The amount of time taken as the number of gains
and losses to be accounted for increase however rises at a much steeper rate (Maddison
1990).
159
Chapter 4
The mirror property of the algorithm, i.e. that the calculations mirror each other for
any given combination of states (Maddison 1990) cannot be utilised in this particular case as
gains are restricted to 1 by the use of Dollo parsimony.
Figure 4.9: Precision and sensitivity measure of differential parsimony over training data.
Differential parsimony (Kensche et al. 2008) was not very successful on the training
data as shown in Figure 4.9. It reached a maximum precision of 0.22547 with a sensitivity of
0.023. There was an intersection of 3 at this point with the predictions made by ML
constrained at its optimum rate of gain and LR cut-off.
160
Chapter 4
4.4.3 Dollo-pos
Figure 4.10: Precision and sensitivity measure of Dollo-pos over training data. Range of cutoffs at which predictions are made: 0 to14.
Dollo-pos (Barker et al. 2007) was fairly successful over the training data with a
maximum precision of 0.57 at a cut-off of 13 as shown in Figure 4.10. The sensitivity at this
point was 0.00043. There was an intersection of 2 with the 6 predictions made by constrained
ML (at its optimum rate of gain and LR cut-off) at this cut-off. In order to use Dollo-pos as a
potential data filter the lowest acceptable cut-off that maintained an intersection with all 6
predictions made by constrained ML was 1. At this cut-off the precision of Dollo-pos was
0.105 and the sensitivity was 0.389.
161
Chapter 4
4.4.4 Dollo-overall
Figure 4.11: Precision and sensitivity measure of Dollo-overall over training data. Range of
cut-offs at which predictions are made: -23 to 14.
The results for Dollo-overall (Barker et al. 2007) were found to be broadly similar to
Dollo-pos (Barker et al. 2007) as seen in Figure 4.11. However the effective range of cut-offs
for Dollo-overall is shifted down. The maximum precision achieved was 0.5 at a cut-off of
11. The sensitivity at this point was 0.00002. The lowest cut-off at which the intersection
with the 6 predictions made by constrained ML (at its optimum rate of gain and LR cut-off)
was maintained was -21. At this cut-off the precision of Dollo-overall was 0.08074 and the
sensitivity was 0.99.
4.4.5 Logistic regression
Logistic regression performed well on the training data as can be seen below in Figure 4.11.
162
Chapter 4
Figure 4.12: Precision and sensitivity measure of logistic regression over training data. Range
of probability cut-offs at which predictions are made: 0 to 1. Cut-offs were incremented by
0.001.
The maximum precision achieved by logistic regression was 0.736 at a sensitivity of
0.01. This was achieved at a probability cut-off of 0.967. The lowest probability cut-off at
which the intersection with predictions made by constrained ML was maintained was 0.85.
The precision achieved at this point was 0.479 and the sensitivity was 0.0598. The method
that achieved the highest precision while maintaining a full intersection with the predictions
made by constrained ML was thus logistic regression.
163
Chapter 4
In order to cross validate the ability of the logistic regression based filter to
discriminate between proteins that interact and those that do not, the filter was run on the
testing data. Figure 4.13 shows the performance of the filter over the testing data.
Figure 4.13: Precision and sensitivity measure of logistic regression over testing data. Range
of probability cut-offs at which predictions are made: 0 to 1. Cut-offs were incremented by
0.001.
As the s predictors used to determine the logit score used in the logistic regression
filter are based on the transition rate parameters used to detect correlated evolution by the
constrained ML technique (Barker et al. 2007; Barker and Pagel 2005; Pagel 1994) it was
expected that there is a correlation between a high logit score for a protein pair and a high LR
164
Chapter 4
(likelihood ratio statistic) score using the exemplar rate of gain elucidated in Chapter 3
(0.025). The distribution of LR scores is extremely skewed towards the lower end as can be
seen in Chapter 3. All true positive protein pairs detected by constrained ML (Barker et al.
2007; Barker and Pagel 2005) at the optimum rate of gain lie within the 99th percentile of LR
scores. This is due to the fact that many protein pairs within the training dataset do not
display little or no evidence of correlated evolution, i.e. the low sensitivity of the method as
shown in Chapter 3.
It is only in the upper ranges of the LR scores that protein pairs that interact are
distinguished from those that do not interact. This phenomenon was also observed by Barker
as well as by Kensche (Barker et al. 2007; Kensche et al. 2008). Thus in order to display the
relationship between the two prediction systems the logit derived probability scores of
protein pairs with corresponding LR scores that lay in the 95th percentile of the distribution of
LR scores were selected. This came to a set of 5,658 protein pairs.
The logit derived probability scores of these pairs were plotted against their
corresponding LR score. Figure 4.14 shows the logit derived probability scores of these 5,658
proteins pairs plotted against their LR scores. The relationship between the values is
displayed via an overlaid regression line. There is a large amount of scatter around the
regression line however the relationship between the two variables is found to be significant
(p value < 0.001).
165
Chapter 4
Figure 4.14: Linear regression line (Adjusted R2=0.1049) drawn over a plot of logit
derived probability scores against likelihood ratio statistics over the training data. Vertical
dotted line shows optimum cut-off for likelihood ratio statistic. Horizontal dotted line shows
optimum cut-off for the logit derived probability score.
Figure 4.14 shows that there is a positive relationship between the LR score generated
by constrained ML (Barker et al. 2007; Barker and Pagel 2005) and the logit derived
probability scores generated by application of Equation 6 to the set of reconstructed ancestral
states.
Figure 4.15 shows that a similar relationship is observed over the testing data set (p
value < 0.001).
166
Chapter 4
Figure 4.15: Linear regression line (Adjusted R2=0.0995) drawn over a plot of logit derived
probability scores against likelihood ratio statistics over the testing data. Vertical dotted line
shows optimum cut-off for likelihood ratio statistic determined by the training data.
Horizontal dotted line shows optimum cut-off for the logit derived probability score
determined by training data.
167
Chapter 4
168
Chapter 4
Process
Duration: Minutes/Hours
Maddison 1990)
minutes (approximately).
Dollo parsimony
reconstructions (Farris 1977;
Felsenstein 1989)
Constrained ML (Barker and
2007)
minutes.
Table 4.7: Times taken by each of the three methods on the training data. The time given for
the Maddison Dollo test is an extrapolation from a run on 12.5% of the training data. This
12.5% was selected randomly. Times given in minutes are rounded to the nearest second.
Times given in hours are rounded to the nearest minute. All tests were run on an Intel Xeon
3.0 GHz processor.
The reduction in potential protein pairs to be investigated via an application of the
logistic regression filter using the probability score cut-off of 0.85 is 111,902 (113,1321132). This is a reduction of 98.9%. As the filter discriminates between proteins, which show
evidence of correlated evolution, the remaining 1.1% should be enriched for proteins
amenable to investigation via phylogenetic profiling using ML reconstructions with
constrained rates of gain (Barker et al. 2007; Barker and Pagel 2005). Thus an application of
the filter to the full human genome followed by an application of phylogenetic profiling using
constrained ML (Barker et al. 2007; Barker and Pagel 2005) will potentially yield a large set
of interactions from within the human genome some of which may be novel.
All code implementing the procedures described in this chapter is available on request from
the author.
169
Chapter 5
Chapter 5
Genome-wide prediction of protein functional interactions in humans using
a heuristic approach
5.1 Introduction
The interactome of an organism can be defined as the complete set of molecular interactions
that occur within its full complement of cell types (Yu et al. 2008). This study focuses on the
elucidation of interactions between proteins (both direct and indirect) in the human proteome
(PPIs). PPIs have been defined as physical interactions between proteins (De Las Rivas and
Fontanillo). PPIs are detected by methods such as the yeast 2-hybrid and tandem affinity
purification coupled to mass spectrometry (TAP-MS) (De Las Rivas and Fontanillo 2010) as
well as co-immunoprecipitation (De Las Rivas and Fontanillo 2010). Interactions between
proteins that are indirect can be detected by gene co-expression as investigated in Chapter 3
or techniques like double mutant synthetic lethality (De Las Rivas and Fontanillo 2010).
Indirect protein interactions are also detected by TAP-MS as proteins that share membership
of a complex do not necessarily maintain a direct physical interaction. Computational
interaction detection methods as described in Chapter 1 can contribute to this effort by
pointing out putative interactions, which can then be further verified. This chapter describes
the application of the logistic regression-based data filter developed in Chapter 4 in
combination with constrained ML (Barker et al. 2007; Barker and Pagel 2005) phylogenetic
profile analysis to detect potential novel protein-protein interactions as well as novel indirect
interactions between proteins.
5.1.1 PPI databases
As experimental data has accumulated on protein-protein interactions there have been a
number of attempts to organise and annotate accumulated data on PPIs. There are thus a
number of databases, which contain data on human protein-protein interactions. As any
attempt to examine the quality of predicted PPIs comparison with known data, a brief
overview of the major PPI databases is presented below.
5.1.1.1 MIPS
MIPs (Mammalian Protein-Protein Interaction Database)(Pagel et al. 2005), is a PPI database
which contains 1,812 experimentally verified human protein-protein interactions (PPIs). It
170
Chapter 5
only includes published data from individual experiments as opposed to large scale highthroughput surveys (Pagel et al. 2005).
5.1.1.2 BIND
BIND (Biomolecular Interaction Network Database) contains data on three main interactions
types (Bader et al. 2001). These are binary interactions, molecular complexes and pathway
data (Bader et al. 2001).
5.1.1.3 MINT
MINT (Molecular INTeraction) in contrast to MIPS contains data from large scale highthroughput experiments (Chatr-aryamontri et al. 2007). As of 2009 it contains data derived
from more than 19,000 experiments and 25,105 curated human PPIs (Ceol et al. 2010).
5.1.1.4 INTACT
IntAct is one of the larger PPI databases containing over 200,000 curated binary protein
interactions (Aranda et al. 2010). IntAct follows an extremely specific curation process with
information from experiments being recorded in high detail using a number of controlled
vocabularies to facilitate further data analysis (Aranda et al. 2010).
5.1.1.5 HPRD
The HPRD (Human Protein Reference Database) is a human specific PPI database. There are
currently 45,207 interactions held in the HPRD (Prasad et al. 2009). It contains manually
curated data on protein interactions derived from both high throughput surveys as well as
single experiments (Prasad et al. 2009).
5.1.1.6 DIP
The DIP (Database of Interacting Proteins) is one of the earlier PPI databases. It contains data
derived from manual curation of the literature as well as from structural information on
complexes derived from the PDB (Protein Data Bank) (Salwinski et al. 2004).
5.1.1.7 REACTOME
The REACTOME database holds data on PPIs in the context of the biological pathways that
underpin cellular processes and is also manually curated (Haw et al. 2011).
171
Chapter 5
5.1.1.8 STRING
The STRING database holds data on PPIs that are experimentally verified and also adds a set
of computationally predicted PPIs (von Mering et al. 2005). It contains PPI information on
630 organisms (Jensen et al. 2009). The total number of interactions held by STRING
exceeds 50,000,000 (Jensen et al. 2009).
5.1.1.9 I2D
The I2D (Interologous Interaction) database contains the full literature derived predictions
from the databases HPRD, BIOGRID, InTact, BIND and MINT as well as computationally
predicted interactions (Brown and Jurisica 2005). The sources of evidence utilised for
computational predictions include domain co-occurrence, gene co-expression and intersection
of GO terms. I2D contains 133,250 unique entries for detected protein interactions between
13,490 proteins.
5.1.1.10 KEGG
KEGG as mentioned in Chapter 1 localises gene products within functional pathways
(Kanehisa 1997; Kanehisa et al. 2006). This is similar to REACTOME.
5.1.1.11 BIOGRID
The BIOGRID database also contains curated data. It has 49,378 interactions involving
human proteins (Stark et al. 2006).
5.1.1.12 Discussion
There is an overlap between these databases as they are all based on examination of similar
experimental data (De Las Rivas and Fontanillo 2010). Given that current estimates of the
human interactome size are around 650,000 including non-direct functional interactions
(Stumpf et al. 2008) there are still a large number of interactions still to be characterised.
5.1.2 Power law
In order to examine the statistical properties of PPI networks, these networks are usually
analysed as graphs (Jeong et al. 2001). An interesting observation of the degree distribution
within some of these graphs (the degree of a vertice within a graph is the number of edges
connected to that given vertice) appear to follow a power law (Jeong et al. 2001). That is the
number of vertices within a graph with degree k is approximately k " x where x is a constant
(Alon 2007). What this entails is that for any given protein within the PPI network the
probability of having a large degree (many interactions) is low. There will however be
!
172
Chapter 5
proteins within the network that will have a large number of interacting partners. These
proteins have been referred to as hubs (Han et al. 2004). It has been hypothesised that
there are two forms of protein hub (Han et al. 2004). These are date hubs, which interact
with different partners at different times, and party hubs, which interact with multiple
partners simultaneously (Han et al. 2004).
Networks with similar degree distributions have been observed in both natural and
man-made networks such as the neural arrangements of C. elegans and the power grid of the
western United States (Watts and Strogatz 1998).
In the case of PPI networks however as there is a clear physical limit to the number of
interacting partners that a given molecule can interact with, the power law distribution over a
PPI network will sharply decay at the upper ends of the distribution, as hub proteins reach
saturation point with a given number of interaction partners. Similarly the lower end of the
distribution may not match a power law, as the cellular environment and other physiochemical factors will affect the probability of being an entirely monogamous interacting
partner in a binary interaction. Examples of PPI networks that do not exhibit a power law in
degree distribution have been pointed out in the literature, e.g. in work by Tanaka (Tanaka et
al. 2005).
5.2 Methods
The first step in carrying out a full genome-wide survey was to develop a list of all possible
ordered pairs of proteins within the version of RefSeq (Pruitt et al. 2005) used. This came a
total of to 560,237,601 pairs. The logistic regression-based filter implemented in Chapter 4
was applied to the ordered pairs of profiles at its optimum probability cut-off of 0.85.
Removing all pairs that scored beneath this threshold resulted in a total of 5,312,880 pairs of
proteins. This was a reduction of approximately 90 % of the total search space. This set of
reduced profile pairs was then analysed by constrained ML (Barker et al. 2007; Barker and
Pagel 2005) with the rate of gain parameters restricted to the optimum rate of 0.025. This
analysis was carried out using a cluster consisting of 260 2 GHz dual core Opteron 270
processors.
The results of the constrained ML analysis were then filtered for pairs with a
likelihood ratio (LR) statistic score of higher than 58.54 (this was the optimum LR score
determined in Chapter 3). This led to a set of 20,605 predicted interactions between protein
173
Chapter 5
pairs, consisting of 2,188 individual proteins. In order to examine predicted interactions
between members of the same orthologous group predictions were then converted to
predictions between orthologous groups. These orthologous groups were identified as the
groups clustered by the Inparanoid (Remm et al. 2001) implementation described in Chapter
2 resulting in a predicted set of 7,150 interactions between orthologous group pairs,
consisting of 1,417 individual orthologous groups.
5.2.1 Short Branch filtration
Examination of the distribution of interactions amongst the individual proteins showed that
some of the individual proteins were predicted to have an extremely large number of
interacting partners. The maximum number of interactions partners was predicted for the
protein with RefSeq GI number 148613856 (described as probable ATP-dependent RNA
helicase DDX17 isoform 3 on the NCBI website). This was predicted to have 1,503
interactions. However given the overall distribution of interaction partners within the set of
predictions as shown below in Figure 5.1 these extreme numbers seem to be implausible.
Chapter 5
Another protein with the RefSeq GI numbers 29029591 (labelled putative ribosomal RNA
methyltransferase 1 isoform b on the NCBI website) was predicted to take part in 1,430
interactions. The phylogenetic profile of this protein was:
001101001001010001111010100010000001001000010001111010
This translates to these proteins being present in Ashbya gossypii, Aspergillus fumigatus,
Bombyx mori, Caenorhabditis elegans, Canis familiaris, Cryptococcus neoformans,
Debaryomyces hansenii, Dictyostelium discoideum, Drosophila melanogaster, Drosophila
pseudoobscura, Entamoeba histolytica, Homo sapiens, Magnaporthe grisea, Paramecium
tetraurelia, Plasmodium knowlesi, Schizosaccharomyces pombe, Theileria annulata,
Theileria parva, Trichomonas vaginalis, Trypanosoma brucei and Ustilago maydis. This is
an extremely unbalanced distribution over the tree as illustrated in Figure 5.2.
175
Chapter 5
It was hypothesised that this unbalanced distribution of this profile contributed to its
display of a high likelihood ratio statistic (LR) score with a large number of proteins. In
particular it was hypothesised that the profiles of prediction heavy proteins might contain
losses on the branches leading to P. troglodytes and M. mulatta. As the branches leading to
these taxa are short (see Chapter 2) this may contribute to spuriously high LR scores. In
order to investigate this hypothesis a list of RefSeq Gis for proteins lost on either the branch
leading to P. troglodytes or the branch leading to M. mulatta was sifted from the overall set
of phylogenetic profiles.
176
Chapter 5
The following procedure was then applied iteratively.
Set cut-off to 0.
Select all protein Gis from predicted interactions where no. of predicted interactions
for > cut-off.
Examine intersection of Gis in selected list with set of Gis of proteins lost on short
branches.
Increment cut-off by 1.
At the point where the cut-off was equal to 298, the intersection between the two sets was
100% as shown in the Venn diagram below.
Figure 5.3: Intersection of proteins lost on in P. troglodytes and M. mulatta with proteins
with > 298 predicted interaction partners.
177
Chapter 5
The 16 proteins in this intersection alone accounted for 13,082 of the total predicted protein
interactions or 63%. It is impossible to tell whether these proteins are genuinely evolving in a
correlated fashion or merely an artefact of the loss on a short branch.
Thus a post-processing step was applied which removed any prediction involving
proteins with profiles that matched this pattern. Thus 16,301 proteins with profiles that
contained a 0 at either the position representing P. troglodytes or the position representing M.
mulatta were removed from the set of predicted interactions. This led to the removal of
19,463 predicted interactions between proteins. This left a reduced set of 1,142 predictions.
An examination of the training data showed that 2 of the 5 predicted interactions by
constrained ML (Barker et al. 2007; Barker and Pagel 2005) at its selected optimum rate of
gain (0.025) during the training step (see Chapter 3) involved the protein
(16936528/NP_001789) which would have been removed by this post processing step. This
reduces the sensitivity of the method by 40%.
5.2.2 GO term enrichment
A plausible method to examine the potential accuracy of the predicted interaction is to test
whether the GO terms associated with the predicted interaction partners are enriched for
particular terms.
In order to subject the data to GO (Gene ontology) term analysis (Ashburner et al.
2000) the set of 1,142 predicted interactions between protein pairs was converted to
predictions between gene pairs. This was carried out using IDconverter (Alibes et al. 2007).
This produced a set of 273 interactions between pairs of genes and 183 individual genes.
In order to investigate the validity of predicted interactions the set of interactions
between genes was converted into a network of interactions. The network can be represented
as an undirected graph. The graph in this case would be undirected as there is no way of
inferring any form of directional relationship between putative predictions.
The predicted interactions were converted into a graph through insertion of the
characters xx between each predicted pair. This converted the predicted interactions into a
format known as the simple interaction format, which is usable by a platform known as
Cytoscape (Shannon et al. 2003). Cytoscape is a program that allows visualisation and
analysis of network data (Shannon et al. 2003) and is widely used for such analyses. The
broad structure of the resultant graph is shown in Figure 5.4.
178
Chapter 5
Figure 5.4: Graph of 273 interactions between genes as predicted by the application of
constrained ML post data filtering. Each vertex in the graph is one gene. The edges in the
graph represent a predicted functional interaction between two vertices.
In order to examine the quality of the predictions the graph was subjected to clique
analysis to break up the network into sub-graphs, which are densely connected. The
Cytoscape plugin ClusterViz (Cai 2010) was utilised to deconstruct the network into sub
clusters. The plugin was used with the FAG-EC agglomerative hierarchical algorithm (Li et
al. 2008), which builds up sub-clusters through analysis of the clustering coefficient of each
edge in the graph. The clustering coefficient measures the density of connections between a
given edge and its neighbours. It does this by calculating the number of triangles that a given
edge is part of and dividing this number by the number of triangles that might potentially
include it given the degree (number of incoming edges) of its adjacent nodes (Radicchi et al.
2004). FAG-EC was run with a specified cut-off of sub-cliques of at least size 3. This was
because GO terms can be found to be significantly enriched in pairs of proteins even if an
annotation is attached to just one protein in the pair.
179
Chapter 5
FAG-EC was also run with a weak module definition. This identifies modules as
sub-cliques within graphs where the sum of in-degree of each node within a module is higher
than the sum of out-degree (Li et al. 2008). The in-degree of a node within an undirected
graph is defined as defined as the number of edges connecting it to other nodes in the same
subgraph (Li et al. 2008). The out-degree of a node is defined as the number of edges
connecting it to the rest of the graph excluding its subgraph (Li et al. 2008).
The application of this algorithm yielded 10 connected sub-cliques. GO term
enrichment was examined through the use of the Cytoscape plugin Bingo (Maere et al. 2005).
Bingo operates through examination of all GO terms associated with a given network. There
are a number of sources of evidence by which a term may be associated with a gene
(Ashburner et al. 2000). These are:
In order to utilise reliable sources of evidence terms that were determined using the evidence
codes ISS, IEA, NAS and ND were excluded.
Bingo operates by calculating the probability of the association of a given set of terms
with a cluster of genes given a background distribution of terms associated with a reference
set of genes. This is calculated using the hypergeometric test (Maere et al. 2005). The
probability of a given set of genes being associated with a given GO term follows the
hypergeometric distribution, which is equivalent to the binomial distribution but utilising
sampling without replacement (Sokal and Rohlf 1995). The probability of a cluster C of r
genes being associated with a given GO term g (if evaluated against a background set of N
180
Chapter 5
genes and assuming the total number of genes associated with g is t) can be calculated. The
background probability of any given gene being associated with g is t/N and the probability
of g not being associated with a gene is (1-t/N). Thus the probability of x genes inside C
" t
%"
%
t
$( N )(N)'$(1( N )(N)'
$
'$
'
# x &# r ( x &
being associated with g can be calculated using the formula
where
"t
%
$ N (N)'
$
'
# r &
"k%
$ ' is the number of combinations of k items taken Y at a time (Sokal and Rohlf 1995).
#Y &
!
The effects of multiple testing are reduced through application of the Bonferroni
!
correction. This correction scales the point at which p values are found to be significant down
by dividing by n the number of tests performed (Sokal and Rohlf 1995). A procedure
involving the hypergeometric test is common in GO enrichment tools and is also utilised by
ClueGO (Bindea et al. 2009), Gorilla (Eden et al. 2009) and GOEAST (Zheng and Wang
2008) amongst others.
Bingo was run against a background set of genes, which consisted of the full set of
human genes held in Entrez Gene.
5.2.3 Intersection with other data sources
To examine the extent to which the predictions made by the filter in combination with
constrained ML (Barker et al. 2007; Barker and Pagel 2005) intersected with known data it
was decided to compare the predictions to a known set of PPIs. It was decided to utilise the
I2D database (Brown and Jurisica 2005) as it contained data from all the other major PPI
databases. Thus the Interologous Interaction Database (I2D) version 1.95 was downloaded.
Predictions were converted from RefSeq GI numbers to their corresponding Uniprot
(Apweiler et al. 2010) primary accession. Only Swiss-Prot accessions were used, as these are
high confidence protein molecules that have been manually annotated (Apweiler et al. 2010).
As mentioned above there were 1,142 predictions made between RefSeq GI pairs.
This conversion reduced the set of predictions to 278 predictions as a complete mapping of
RefSeq to Uniprot is lacking.
181
Chapter 5
5.3 Results
5.3.1 GO Enrichment
Table 5.1 shows details of the sub-clusters generated by ClusterViz (Cai 2010) ordered in
descending order by size.
Cluster No. No. of genes No. of interactions
1
97
188
Genes
PRPF31 RRP9 SNRPE SUPT4H1 TNPO1
COPB1 FASN GLRX5 RPS25 WWOX
DHDDS EXO1 H2AFY ATP5C1 RPS29
RER1 PHF5A PIGL RPS21 POLR2G PSMD8
TP53RK ABT1 ANAPC10 TCEA2 NOP10
POLR2L SF3B5 LZTR1 TUBGCP2 CDS1
MAK16 CTDSPL RBM34 KIFC1 GFPT1
PPP1CC UBE2D4 BYSL PSMA6 FDX1L
TFB1M C20orf118 KIAA1609 UBE2V1
NAPNAPB RLBP1 RPF1 PSMC4
TRAPPC6B RBMX2 RHOC TOP2B UBE2I
CDK5 FKBP4 CCT6A CDK7 CKS2 CTDSP1
DIMT1L FAM96B FKBP5 GNB1 GUK1
HSPE1 KIF19LSM7 POLE2 PSMB1
RIOK2RPL13 RPL19 RPL30 RPL37A
RPS23 SEC11C SLC2A6
SMARCAL1TRAPPC1 TRAPPC4
TRAPPC6ATXNL4B UBE2V2 VBP1 VPS45
ZDHHC21 ERCC2 SPO11 TRIT1 SHMT2
GDPD1 DOLK DUSP5 LIG1 TRMT112
182
Chapter 5
Cluster No. No. of genes No. of interactions
2
Genes
MTMR2 MTM1
MTMR9 MTMR1
10
ZRSR2 PPP1CB
RPL31
DERL1 DERL2
DNAJC12
H2AFV ATG4A
UNG
183
Chapter 5
Cluster No.
1
Enriched GO terms
44238
44237
8152
metabolic process
44260
43170
6414
translational elongation
6412
translation
44267
6368
RNA elongation from RNA polymerase II
promoter
6354
RNA elongation
10467
gene expression
19538
44265
No significant enrichment
19224
termination of RNA polymerase II
transcription
43653 mitochondrial fragmentation during apoptosis
No significant enrichment
Chapter 5
Cluster No.
6
Enriched GO terms
185
Chapter 5
Cluster No.
6
Enriched GO terms
44248 cellular catabolic process
31090 organelle membrane
9056 catabolic process
51649 establishment of localization in cell
42288 MHC class I protein binding
51641 cellular localization
42287 MHC protein binding
30307 positive regulation of cell growth
42221 response to chemical stimulus
45793 positive regulation of cell size
65008 regulation of biological quality
19048 virus-host interaction
45927 positive regulation of growth
51701 interaction with host
7242 intracellular signaling cascade
44419 interspecies interaction between organisms
44404 symbiosis, encompassing mutualism through
parasitism
6950 response to stress
6810 transport
51234 establishment of localization
22415 viral reproductive process
16032 viral reproduction
1558 regulation of cell growth
51179 localization
8361 regulation of cell size
44267 cellular protein metabolic process
19538 protein metabolic process
40008 regulation of growth
44260 cellular macromolecule metabolic process
51869 response to stimulus
16021 integral to membrane
7154 cell communication
43170 macromolecule metabolic process
30968 endoplasmic reticulum unfolded protein response
Chapter 5
Cluster No.
6
Enriched GO terms
31224 intrinsic to membrane
44446 intracellular organelle part
44422 organelle part
43283 biopolymer metabolic process
7165 signal transduction
44425 membrane part
51706 multi-organism process
22414 reproductive process
8284 positive regulation of cell proliferation
6986 response to unfolded protein
No Significant enrichment.
Table 5.2: GO enrichment in sub-cliques within predicted interaction network (cont).
Evidence of interaction
118600991 13129120
21361657 4758304
Table 5.2: Intersection between I2D (Brown and Jurisica 2005) and predictions by logistic
regression/constrained ML (Barker et al. 2007; Barker and Pagel 2005).
There are thus 1,131 predictions made by logistic regression/constrained ML (Barker
et al. 2007; Barker and Pagel 2005), which are potentially novel. All predictions made can be
seen in Appendix C.
187
Chapter 5
5.3.3 Network statistics
The degree distribution of nodes within the graph appears to follow a power law. This could
potentially also be indicative of the correctness of the predictions made by constrained ML.
This pattern is observed in both the full and the reduced graphs as shown in Figures 5.5 and
5.6.
Figure 5.5: Degree distribution for full graph of protein interactions. Line is fitted power law
of the form y=axb. Line is fitted by least squares regression R2=0.694.
188
Chapter 5
Figure 5.6: Degree distribution for graph of protein interactions post short branch filtration.
Line is fitted power law of the form y=axb. Line is fitted by least squares regression
R2=0.768.
5.4 Discussion
A full genome wide investigation of human protein interactions by constrained ML (Barker et
al. 2007; Barker and Pagel 2005) in combination with the logistic regression-based data filter
seems to be a potentially fruitful source of new protein interactions. The enrichment of GO
terms in some sub-cliques of the resultant network suggests that the system has an ability to
make predictions with some basis in reality and thus a proportion of the set of predictions
made are both novel and accurate.
5.4.1 GO enrichment
GO enrichment was investigated conservatively by excluding the GO evidence code IEA.
This evidence code is associated with 90% of GO annotations (Buza et al. 2008). However
189
Chapter 5
despite removing terms associated with this code as well as terms associated with the codes
ISS, ND and NAS, a reasonable degree of enrichment was still observed.
The terms enriched appear to be associated with processes, which are divergent across
eukaryotes such as transcription (enriched in sub-clique 1) (Coulson and Ouzounis 2003).
This is a demonstration of the fact that it is only proteins that show a degree of variability in
their distribution pattern that are susceptible to this line of investigation.
5.4.2 Intersection with known data
The level of intersection with the I2D database is fairly low. Using the estimate of
interactome size provided by (Stumpf et al. 2008) and assuming every prediction in I2D
(Brown and Jurisica 2005) is correct. This would correspond to a coverage level of
133,250/ 650,000 or 20%. Thus the probability of any given accurate prediction being within
this database would be 0.2. Thus the converse probability of an accurate prediction not being
in the database would be 1-0.2 or 0.8.
If every prediction made by the heuristic approach were accurate, then the observed
result of an intersection of 11 and a complement of 1,131 would be highly improbable 0.81131
or ~0). The lack of intersection between the two datasets could be due to the bias in PPI
databases to particular physical detection systems such as yeast 2 hybrid. Approximately 37
% of the binary interactions held in HPRD (Mishra et al. 2006) were detected using yeast 2
hybrid.
The issue of RefSeq to Uniprot mapping is also pertinent in contributing to this lack
of intersection as over 75% of the predictions were lost post mapping.
Finally it is also unlikely that there is 100% accuracy in all PPIs held in I2D.
5.4.3 Weaknesses
Clearly the result of a precision of 1 as achieved on the training and testing data cannot be
extended to a full genome wide survey. The fact that predictions are made through
comparisons of the phylogenetic distribution of proteins suggests that one weakness of the
method could be an inability to distinguish between paralogs/isoforms and proteins showing
evidence of correlated evolution. However this issue is far from clear-cut as there is evidence
to show that homologous proteins are more likely to interact (Ispolatov et al. 2005; Orlowski
et al. 2007). Thus it is possible that the success of the phylogenetic profile method is partly
based on this observation. This is a potentially confounding issue for the method. However
190
Chapter 5
examination of interactions between predicted orthologous groups can ameliorate this. In the
case of this study of the 1,142 pairs of proteins predicted to be functionally linked by this
study 221 lie within the same orthologous group as identified by the Inparanoid
implementation (Remm et al. 2001).
Thus predictions between members of orthologous groups are not particularly
widespread over the data examined.
The other weakness of phylogenetic profiling in general that applies to this set of
predictions is potentially inaccurate profiles. Profiles can be inaccurate for a number of
reasons including low coverage sequencing, poor annotation or incorrect assumptions in
homolog identification. The short branch filtration step undertaken before further analysis is
potentially attributable to this phenomenon.
5.4.3.1 Scaling
The precision and sensitivity results observed over the training data were based on a
biologically unrealistic ratio of 10:1 of negative to positive examples of interacting proteins.
The results observed can be adjusted for the whole genome by scaling to a more realistic
ratio. A possibly more realistic ratio can be calculated using estimates of interactome size.
These range from 154,000-369,000 (Hart et al. 2006) to 650,000 (Stumpf et al. 2008). If these
numbers are subtracted from the size of all potential interactions 560,237,601 (calculated as
all possible pairs from version of RefSeq held) estimated ratios of negative to positive range
from approximately 860:1 to 3636 :1. Assume for the sake of argument the ratio of 860:1 is
adopted (via an assumption of an interactome size of 650,000). Recall that the size of the
positive set in the training data is 9,161 pairs of known interactions. Thus as an illustrative
example if a given predictive method yielded a precision of 0.5 and a sensitivity of 0.1 over
the training data this would correspond to making 916 predictions of which 50% were correct
(TP=458, FP=458 and FN=8703). In order to scale the data the following numbers need to be
calculated:
191
Chapter 5
P(TP) =
(TP)
(PS)
(1)
P(FP) =
(FP)
(NS)
(2)
P(FN) =
(FN)
(PS)
(3)
!
!
!
Where PS= size of the positive set and NS= size of the negative set.
For the example above P(TP)=458/9161=0.049, P(FP)=458/103971=0.004 and
P(FN)=8703/9161=0.95. Thus by multiplying these probabilities by the estimated full
interactome size (in this case) 650,000 the sizes of TP and FN can be calculated over the full
interactome. In order to calculate the size of FP the size of a potential negatome (proteins that
do no interact) must be calculated. This can be calculated as the estimated size of the
interactome subtracted from the number of all possible interactions (in this case 560,237,601650,000=559,587,601). Given these numbers the values of TP, FP, and FN over the full
interactome for this example would be 31,850, 2,238,216.5 and 617,500 respectively leading
to a scaled precision of 0.014 and a scaled sensitivity of 0.049 over the whole interactome. In
cases where precision =1 scaling will not affect this value as there are no false positives
predicted.
Probabilities of predicted interactions being genuine can also be calculated via an
alternate route applying Bayes theorem with the prior probability of an interaction being
derived from an estimate of interactome size. Thus applying Bayes theorem the posterior
probability of an interaction can be calculated using the following parameters (Yang 2006):
Chapter 5
Thus the posterior probability of a predicted interaction being genuine can be calculated
using Bayes theorem as presented in Equation 4:
P(I | Pos) =
(4)
Bayes theorem however is only applicable in cases where precision < 1 as the posterior
probability is 1 when precision =1.
This can be simply demonstrated using basic algebra and recasting the terms.
P(I | Pos) =
(5)
observed in the sub-clusters, as well as the results of previous work on the phylogenetic
profile method (Barker et al. 2007; Barker and Pagel 2005; Bowers et al. 2004; Cokus et al.
2007; Kensche et al. 2008; Pagel et al. 2004b; Pellegrini et al. 1999; Vert 2002) amongst
others, it appears that the method is capable of discerning between proteins that are
functionally linked and proteins that are not. Thus the novel predictions made could
potentially be genuine interactions, which are of yet uncharacterised.
193
Chapter 6
Chapter 6
Conclusions and further work
6.1 Summary of Project
The goal of this project has been an investigation into detection of human protein interactions
using the comparative method. More specifically the development of a novel heuristic
approach to allow application of the effective but computationally intensive constrained ML
(Barker et al. 2007; Barker and Pagel 2005) approach to phylogenetic profile analysis on a
genome-wide scale. This application was intended to allow the generation of novel
predictions of protein interactions.
A database of all against all comparisons of the proteomes of 54 eukaryotic organisms
plus 1 archaeon was created. This was used to as input to an implementation of the
Inparanoid (Remm et al. 2001) procedure to cluster the contents of the proteomes into
orthologous groups. Using the human proteome as a reference point phylogenetic profiles
were then constructed for each protein within the human proteome.
10 proteins that were universally present in single copies in all organisms under
consideration were then selected through analysis of the phylogenetic profiles and
orthologous groups. The versions of these single copy proteins from each species were then
aligned to create a multiple sequence alignment. Each multiple sequence alignment was then
concatenated to create one single combined alignment. This combined alignment provides a
measure of divergence between the 55 organisms under consideration. The concatenated
multiple sequence alignment was then used to reconstruct a phylogenetic tree of the 54
eukaryotes under consideration using the archaeon as an outgroup with which to root the tree.
This phylogeny was broadly congruent with current thought on eukaryotic evolution (see
Chapter 2) .
194
Chapter 6
195
Chapter 6
microarrays. The highest performing microarray experiments also achieved a precision of 1.
The highest performing microarray experiment E-MEXP-1224 (Garman et al. 2009) achieved
a sensitivity of 0.003.
Constrained ML (Barker et al. 2007; Barker and Pagel 2005) was also compared to
the PIPs server (McDowall et al. 2009) which uses a semi-naive Bayesian classifier (Scott
and Barton 2007) in order to evaluate multiple sources of evidence for potential protein
interactions. At its highest cut-off the Bayesian classifier achieved a precision of 0.9883721
and a sensitivity of 0.01.
196
Chapter 6
al. 2007; Barker and Pagel 2005) achieved a precision of 1 over the training data it was
utilised for further analysis.
The application of constrained ML (Barker et al. 2007; Barker and Pagel 2005) to a
full genome-wide survey was found to be impractical due to time considerations. Thus a
heuristic was developed which approximated the ability of constrained ML (Barker et al.
2007; Barker and Pagel 2005) to distinguish between proteins that interact and those that do
not. This heuristic was based on the reconstruction of ancestral states using Dollo parsimony
(Farris 1977) over the phylogenetic tree. Two novel potential heuristics were developed,
implemented and tested using the Dollo parsimonious reconstruction. The first was an
implementation of a test for correlated evolution which calculates the probability of the
concentration of a set of gains and losses of a protein in the areas of a phylogenetic tree
where a second protein was either present or absent (Maddison 1990). The second potential
heuristic was based on logistic regression using empirical counts of the presence, absence,
gain or loss of one protein given the presence, absence, gain or loss of the other as predictor
variables.
The Maddison test (Maddison 1990) based heuristic performed reasonably well in its
own right as a method of detecting functional interactions. It achieved a maximum precision
of 0.857 with a sensitivity of 6.54 " 10-4 over the training data at a score cut-off of
0.9999997999999475. However it proved not to be efficient enough in terms of speed to be
justified for use as a heuristic. It also did not maintain an intersection with the 5 predictions
!
made by constrained ML (Barker et al. 2007; Barker and Pagel 2005) at its optimum rate of
gain (0.025) and at its optimum likelihood ratio (LR) statistic score cut-off (58.3).
Maintenance of an intersection with these predictions was considered a necessary property of
an effective heuristic.
The heuristic that utilised logistic regression achieved a precision of 0.736 with a
sensitivity of 0.01 at its optimum cut-off of 0.967. It also maintained an intersection with the
predictions made by constrained ML (Barker et al. 2007; Barker and Pagel 2005) (at its
optimum rate of gain and LR cut-off) up to a cut-off of 0.85. At a cut-off of 0.85 the heuristic
made 1,230 predictions, which amounted to a reduction of the search space of potential
proteins by 98.9%.
The heuristic based on logistic regression was then applied to the full human genome
in order to filter out protein pairs that displayed little or no evidence of correlated evolution.
The heuristic reduced the size of the search space by 90% over the whole genome.
197
Chapter 6
Figure 6.3: Process flow for research carried out in Chapter 4. Note: Validation sets
were used to validate all methods. The connectors have been left out for clarity.
Having applied the heuristic to the method a full genome-wide survey was launched.
The results of the genome-wide survey found that a large majority of predicted protein
interactions involved proteins, which had been lost on short branches in the phylogeny. These
predictions were removed from the overall set of predictions. The prediction set was then
recast as a network of interactions.
The results of the genome-wide survey were then examined by generating subnetworks from the complete network generated and examining these sub networks for
enrichment in Gene Ontology (GO) (Ashburner et al. 2000) terms. GO term enrichment was
found in 57% of the clusters generated. The intersection of the predictions made by
constrained ML (Barker et al. 2007; Barker and Pagel 2005) with the I2D database (Brown
and Jurisica 2005) was also examined. The intersection with the I2D (Brown and Jurisica
2005) database was low suggesting that any correct predictions generated by this project are
198
Chapter 6
also novel predictions of protein interaction. The genome-wide survey yielded a final set of
1,131 predictions of protein interaction.
Having acquired these, the programs BayesTraits (Pagel et al. 2004a) and bms_runner
(Barker et al. 2007) should be downloaded.
To determine the optimum rate of protein gain for use in the constrained ML procedure
bms_runner should be used to evaluate multiple rates of gain. The LR scores for all
proteins for the optimum rate of gain should be kept.
199
Chapter 6
Once this rate is determined, the next step is the ancestral state reconstructions. In order
to carry out these reconstructions it will be necessary to download the program
DOLLOP held in the PHYLIP package (Felsenstein 1989).
DOLLOP should be run with the U option, which will allow it to utilise the
phylogenetic tree. (Note bms_runner uses a NEXUS formatted tree while DOLLOP will
need a PHYLIP format tree). DOLLOP should be run on every profile in the dataset.
Thus the end product of this step is a set of ancestral reconstructions over the tree for
each profile.
At this point code written by the author (available on request) can be used to process
these reconstructions. This code will take in the reconstructions and return a dataset
consisting of the s parameters described in Chapter 4 calculated for each protein.
This data can then be processed using standard statistical package R (R Development
Core Team 2011) in order to carry out logistic regression. Once regression has been
carried out, this should yield a linear equation for calculating a logit based score for the
probability of interacting.
Again code available from the author can now be utilised. This code will take in the
specified coefficients for the s parameters calculated in R, the Dollo reconstructions of
the proteins, the LR scores of the proteins at the optimum rate of gain and the validation
data and return the optimum logit cut-off for the data for preserving the performance of
constrained ML (Barker et al. 2007; Barker and Pagel 2005).
At this point a dataset of all possible pairs of profiles should be prepared. Code from the
author can be used to apply the linear equation to each of these pairs to calculate the logit
score. These pairs can now be filtered by the optimum cut-off.
Once a reduced set has been created, constrained ML can be applied to this set (Barker
et al. 2007; Barker and Pagel 2005).
6.2 Conclusion
This project has investigated use of the comparative method specifically constrained ML
(Barker et al. 2007; Barker and Pagel 2005) as a means to detect protein-protein interactions.
It has generated a set of predictions that if validated by further laboratory based investigation
could contribute to knowledge about the human interactome. It has also developed a method
200
Chapter 6
that allows the application of the computationally intensive constrained ML (Barker et al.
2007; Barker and Pagel 2005) approach to phylogenetic profiling on a genome-wide scale.
The ability of the comparative method to unearth protein interactions can only be
enhanced by the current rate of data generation given the rapid uptake of next generation
sequencing technologies such as the Roche 454 GS FLX sequencer, the Illumina Genome
Analyser and the Applied Biosystems SOLID sequencer, which can generate gigabases of
sequence data in a matter of days (Mardis 2008). As more organisms are sequenced the
quality of reconstructed phylogenies and consequently the efficacy of the comparative
method in detecting associations between traits should improve due to increased taxon
sampling (Heath et al. 2008).
Given this increased pace of data generation it is also necessary to develop fast and
effective computational techniques for functional annotation of proteins. Detection of protein
interactions can be used to functionally annotate proteins via the principle of guilt by
association (Aravind 2000). Thus the combination of the developed heuristic with
constrained ML (Barker et al. 2007; Barker and Pagel 2005) can contribute to annotation
efforts. It has been seen that this method is not very sensitive thus the probability of it making
any predictions at all for a given protein are low. But used in a high throughput unsupervised
context the method is potentially capable of detecting novel interactions as one tool amongst
many.
Among the methods of detecting protein interactions examined over the course of this
study was the PIPs server (McDowall et al. 2009), which as mentioned above combines
multiple sources of evidence in order to detect potential protein interactions (Scott and Barton
2007) utilising a Bayesian classifier. A similar approach of using combined evidence in a
Bayesian framework was previously taken by (Jansen et al. 2003). This combination of
diverse sources of evidence as a means to elucidate protein interactions has also been applied
by Mohamed (Mohamed et al. 2010) utilising a classifier based on a majority vote from a
collection of decision trees. Other approaches such as support vector machines and singular
decision trees have also been investigated by (Qi et al. 2006).
Potentially the application of constrained ML (Barker et al. 2007; Barker and Pagel
2005) in combination with the heuristic in a genome-wide manner could be utilised as a
source of contributory evidence in a similar framework.
201
Chapter 6
6.3 Future directions
Constrained ML (Barker et al. 2007; Barker and Pagel 2005) has been seen to be capable of
detecting protein-protein interactions at a reasonable level of accuracy. With the data
accumulated over the course of this project there are a number of further avenues of
investigation and areas of extension.
6.3.1 Computational extensions
The procedure followed in order to utilise constrained ML (Barker et al. 2007; Barker and
Pagel 2005) for a genome-wide survey of H. sapiens involved the use of bespoke scripts and
various programs provided by a plethora of authors as cited throughout this text. To facilitate
the application of this tool by other users it will be necessary to create an interface and
combine the functionality of the programs utilised into one computational procedure.
The construction of phylogenetic profiles for all proteins in all species held in the
current dataset and the provision of these profiles online via a web interface would also
facilitate this process. The data generated by this project as presented in Appendix D could
also be presented via an online database either an extant protein interaction database such as
String or I2D or a bespoke database, which would have to be constructed.
6.3.2 Consensus profiles
The application of constrained ML (Barker et al. 2007; Barker and Pagel 2005) to detection
of protein interactions is carried out in a pairwise fashion. Work by Bowers extended the idea
of pairwise comparisons to three way comparisons using Boolean logic operators (Bowers et
al. 2004). This method attempted to detect dependencies in the presence and absence of a
given gene on the presence and absence of two other genes. A similar technique could be
utilised to integrate matching profiles into consensus profiles. By classifying mismatches as
missing information consensus phylogenetic profiles could be constructed to represent groups
of proteins. The program BayesTraits (Pagel et al. 2004a), which is utilised to apply the
constrained ML approach, handles missing data by reconstruction of the missing data as an
extension of ancestral state reconstruction. Thus when a plausible reconstruction is reached at
the immediate ancestral node of the taxon with the missing data the state of the taxa can be
estimated using rate transition parameters (Pagel 1994). Consensus profiles will utilise a
mismatch character X to represent missing information. Thus if for example we compare the
following four species profiles:
1010
202
Chapter 6
1110
The consensus profile of the above two profiles would be:
1X10
In comparisons of consensus profiles the X character will remain unchanged if matched
against another X, shift to 0 if matched against a 0 and shift to 1 if matched against a 1. Thus
a 1 or a 0 in a consensus profile will always be present in more than 50% of its constituent
profiles.
Some of these groups will represent clade specific distributions of proteins. Others
will represent distributions of proteins correlated with the distribution a given function over
the species under consideration. Comparison of a protein with an as yet unascertained
function using consensus profiles would connect a protein to either a clade-specific group or
a group, which possessed a function connected to the presence of the protein. Thus a protein
that showed correlated evolution with a consensus profile could potentially be functionally
linked to all constituent members of that profile. At a higher-level if two consensus profiles
show evidence of correlated evolution with each other this could suggest functional linkage
between two groups of proteins, e.g. the functional interaction of one pathway with another.
6.3.3 Correlated evolution of proteins with the presence or absence of phenotypes
Given the data currently generated an interesting avenue of investigation would be the
comparison of the presence and absence of given phenotypes with the presence and absence
of given proteins. This process can detect proteins that underlie the phenotype of interest.
This method was developed by Levesque (Levesque et al. 2003) and used to detect genes
associated with cell motility. It was also applied to associating a number of phenotypes with
given proteins (Jim et al. 2004; Slonim et al. 2006). The method was found to be to be
reasonably effective with traits that were evenly distributed among the organisms under
consideration (Jim et al. 2004). A further application of the method by Gonzalez and Zimmer
examined the association of optimal growth pH with given genotypes (Gonzalez and Zimmer
2008). Gonzalez and Zimmer utilised a threshold with which to discretise continuous
phenotypes (Gonzalez and Zimmer 2008). If the measured value of a measured phenotype
was over a given value then the phenotype was declared present. Applications of this method
have so far utilised measures like string distance measures (Jim et al. 2004; Levesque et al.
2003) and mutual information (Gonzalez and Zimmer 2008; Slonim et al. 2006) to compare
the phylogenetic profiles of genes and given phenotypes. Use of a phylogenetically aware
method such as constrained ML (Barker et al. 2007; Barker and Pagel 2005) would enhance
203
Chapter 6
the method and potentially yield more accurate results. Given the range of eukaryotic
organisms currently held potential traits to be investigated could include multi-cellularity,
aerobic respiration and parasitism.
6.3.4 Drug Targets
Keeping with the theme of parasitism there are a number of disease causing parasitic
organisms in the dataset currently held. These are
Plasmodium falciparum
Plasmodium knowlesi
Plasmodium yoelii
Trypanosoma brucei
Trypanosoma cruzi
Leishmania major
Trichomonas vaginalis
Theileria annulata
Theileria parva
Encephalitozoon cuniculi
These include T. cruzi and T. brucei, which cause Chagas disease (Lescure et al.
2010) and sleeping sickness (Ralston et al. 2009) respectively. Also included in the dataset
are three members of the malaria-causing genus Plasmodium. Take for example P.
falciparum. There is currently resistance to all five groups of anti-malarial drugs (Hayton and
Su 2004). The detection of protein interactions in P. falciparum could potentially aid in the
development of new anti-malarial drugs. Using this species as a reference point, phylogenetic
profiles for its proteome could be constructed. An application of the logistic regression based
heuristic would make all against all comparisons using constrained ML (Barker et al. 2007;
Barker and Pagel 2005) feasible. These studies could potentially detect novel protein
interactions within P. falciparum. Disruption of protein-protein interactions is potentially
one avenue for drug development. This could potentially be carried out via procedures such
as peptidomimetics (Hruby 1997), which involves the construction of a molecule that mimics
the properties of one of the interacting partners. The construction of phylogenetic profiles
could also reveal proteins and protein interactions that are unique to P. falciparum. These
molecules could potentially be targeted with a lower risk of side effects in the host organism.
A similar procedure could be followed with all other parasitic organisms in the dataset.
204
References
References
Agnarsson I, Miller JA (2008) Is Acctran Better Than Deltran? Cladistics 24:1032
Aguinaldo AM, Turbeville JM, Linford LS, Rivera MC, Garey JR, Raff RA, Lake JA (1997)
Evidence for a Clade of Nematodes, Arthropods and Other Moulting Animals.
Nature 387:489
Ahola V, Aittokallio T, Vihinen M, Uusipaikka E (2006) A Statistical Score for Assessing
the Quality of Multiple Sequence Alignments. BMC Bioinformatics 7:484
Albert VA (2006) Parsimony, Phylogeny, and Genomics. Oxford University Press, Oxford
Alberts B (1998) Essential Cell Biology : An Introduction to the Molecular Biology of the
Cell. Garland, New York
Alberts B (2002) Molecular Biology of the Cell. Garland Science
Alberts B (2008) Molecular Biology of the Cell. Garland Science, New York ; Abingdon
Alberts B (2010) Essential Cell Biology. Garland Science, New York ; London
Alibes A, Yankilevich P, Canada A, Diaz-Uriarte R (2007) Idconverter and Idclight:
Conversion and Annotation of Gene and Protein Ids. BMC Bioinformatics 8
Alon U (2007) An Introduction to Systems Biology Design Principles of Biological
Circuits. Chapman & Hall / CRC
Altenhoff AM, Dessimoz C (2009) Phylogenetic and Functional Assessment of Orthologs
Inference Projects and Methods. PLoS Comput Biol 5:e1000262
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic Local Alignment
Search Tool. J Mol Biol 215:403
Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, Schaffer AA, Yu YK (2005)
Protein Database Searches Using Compositionally Adjusted Substitution
Matrices. FEBS Journal 272:5101
C. elegans Sequencing Consortium (1998) Genome Sequence of the Nematode C. Elegans:
A Platform for Investigating Biology. Science 282:2012
Antonov AV, Mewes HW (2008) Complex Phylogenetic Profiling Reveals Fundamental
Genotype-Phenotype Associations. Computational Biology and Chemistry 32:412
Apweiler R, Martin MJ, O'Donovan C, Magrane M, Alam-Faruque Y, Antunes R, Barrell D,
Bely B, Bingley M, Binns D, Bower L, Browne P, Chan WM, Dimmer E, Eberhardt R,
Fedotov A, Foulger R, Garavelli J, Huntley R, Jacobsen J, Kleen M, Laiho K, Leinonen R,
Legge D, Lin Q, Liu WD, Luo J, Orchard S, Patient S, Poggioli D, Pruess M, Corbett M, di
Martino G, Donnelly M, van Rensburg P, Bairoch A, Bougueleret L, Xenarios I, Altairac S,
Auchincloss A, Argoud-Puy G, Axelsen K, Baratin D, Blatter MC, Boeckmann B, Bolleman
J, Bollondi L, Boutet E, Quintaje SB, Breuza L, Bridge A, deCastro E, Ciapina L, Coral D,
Coudert E, Cusin I, Delbard G, Doche M, Dornevil D, Roggli PD, Duvaud S, Estreicher A,
Famiglietti L, Feuermann M, Gehant S, Farriol-Mathis N, Ferro S, Gasteiger E, Gateau A,
Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hulo N, James J, Jimenez S,
Jungo F, Kappler T, Keller G, Lachaize C, Lane-Guermonprez L, Langendijk-Genevaux P,
Lara V, Lemercier P, Lieberherr D, Lima TD, Mangold V, Martin X, Masson P, Moinat M,
Morgat A, Mottaz A, Paesano S, Pedruzzi I, Pilbout S, Pillet V, Poux S, Pozzato M, Redaschi
N, Rivoire C, Roechert B, Schneider M, Sigrist C, Sonesson K, Staehli S, Stanley E, Stutz A,
Sundaram S, Tognolli M, Verbregue L, Veuthey AL, Yip LN, Zuletta L, Wu C, Arighi C,
Arminski L, Barker W, Chen CM, Chen YX, Hu ZZ, Huang HZ, Mazumder R, McGarvey P,
Natale DA, Nchoutmboube J, Petrova N, Subramanian N, Suzek BE, Ugochukwu U,
205
References
Vasudevan S, Vinayaka CR, Yeh LS, Zhang J (2010) The Universal Protein Resource
(Uniprot) in 2010. Nucleic Acids Research 38:D142
Aranda B, Achuthan P, Alam-Faruque Y, Armean I, Bridge A, Derow C, Feuermann M,
Ghanbarian AT, Kerrien S, Khadake J, Kerssemakers J, Leroy C, Menden M,
Michaut M, Montecchi-Palazzi L, Neuhauser SN, Orchard S, Perreau V, Roechert B,
van Eijk K, Hermjakob H (2010) The Intact Molecular Interaction Database in
2010. Nucleic Acids Res 38:D525
Aravind L (2000) Guilt by Association: Contextual Information in Genome Analysis.
Genome Research 10:1074
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K,
Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S,
Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene
Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium.
Nat Genet 25:25
Aubourg S, Rouze P (2001) Genome Annotation. Plant Physiology and Biochemistry
39:181
Avery OT, Macleod CM, McCarty M (1944) Studies on the Chemical Nature of the
Substance Inducing Transformation of Pneumococcal Types : Induction of
Transformation by a Desoxyribonucleic Acid Fraction Isolated from
Pneumococcus Type Iii. J Exp Med 79:137
Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW (2001) Bind--the
Biomolecular Interaction Network Database. Nucleic Acids Res 29:242
Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF (2000) A Kingdom-Level Phylogeny
of Eukaryotes Based on Combined Protein Data. Science 290:972
Baldi P, Brunak S (2001) Bioinformatics : The Machine Learning Approach. MIT Press,
Cambridge, Mass.
Barker D, Meade A, Pagel M (2007) Constrained Models of Evolution Lead to Improved
Prediction of Functional Linkage from Correlated Gain and Loss of Genes.
Bioinformatics 23:14
Barker D, Pagel M (2005) Predicting Functional Gene Links from PhylogeneticStatistical Analyses of Whole Genomes. PLoS Comput Biol 1:e3
Beadle GW, Tatum EL (1941) Genetic Control of Biochemical Reactions in Neurospora.
Proc Natl Acad Sci U S A 27:499
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009) Genbank. Nucleic
Acids Res 37:D26
Berg JM, Tymoczko JL, Stryer L (2001) Biochemistry. W. H. Freeman and CO., New York
Berg JM, Tymoczko JL, Stryer L (2007) Biochemistry. W. H. Freeman, New York
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Fridman WH,
Pages F, Trajanoski Z, Galon J (2009) Cluego: A Cytoscape Plug-in to Decipher
Functionally Grouped Gene Ontology and Pathway Annotation Networks.
Bioinformatics 25:1091
Birney E, Clamp M, Durbin R (2004) Genewise and Genomewise. Genome Res 14:988
Black DL (2003) Mechanisms of Alternative Pre-Messenger Rna Splicing. Annu Rev
Biochem 72:291
Blair C, Murphy RW (2011) Recent Trends in Molecular Phylogenetic Analysis: Where
to Next? J Hered 102:130
Blanchard JL, Lynch M (2000) Organellar Genes - Why Do They End up in the Nucleus?
Trends in Genetics 16:315
Blow MJ (2004) A Survey of RNA Editing in the Human Brain Sanger Institute.
University of Cambridge, Cambridge
206
References
Borodovsky M, Rudd KE, Koonin EV (1994) Intrinsic and Extrinsic Approaches for
Detecting Genes in a Bacterial Genome. Nucleic Acids Res 22:4756
Bowers PM, Cokus SJ, Elsenberg D, Yeates TO (2004) Use of Logic Relationships to
Decipher Protein Network Organization. Science 306:2246
Bratke K (2009) Comparative Analysis of Poxvirus Genome Evolution. University of
Dublin,Trinity College, Dublin
Breathnach R, Benoist C, O'Hare K, Gannon F, Chambon P (1978) Ovalbumin Gene:
Evidence for a Leader Sequence in mRNA and DNA Sequences at the ExonIntron Boundaries. Proc Natl Acad Sci U S A 75:4853
Brennan RG, Matthews BW (1989) The Helix-Turn-Helix DNA Binding Motif. J Biol
Chem 264:1903
Brent MR (2008) Steady Progress and Recent Breakthroughs in the Accuracy of
Automated Genome Annotation. Nat Rev Genet 9:62
Brown KR, Jurisica I (2005) Online Predicted Human Interaction Database.
Bioinformatics 21:2076
Brown TA (2006) Genomes 3. Garland Science Pub., New York
Bruno WJ, Halpern AL (1999) Topological Bias and Inconsistency of Maximum
Likelihood Using Wrong Models. Molecular Biology and Evolution 16:564
Burge C, Karlin S (1997) Prediction of Complete Gene Structures in Human Genomic
DNA. Journal of Molecular Biology 268:78
Burki F, Shalchian-Tabrizi K, Pawlowski J (2008) Phylogenomics Reveals a New
'Megagroup' Including Most Photosynthetic Eukaryotes. Biology Letters 4:366
Buza TJ, McCarthy FM, Wang N, Bridges SM, Burgess SC (2008) Gene Ontology
Annotation Quality Analysis in Model Eukaryotes. Nucleic Acids Research
36(2):e12
Cai JC, G. Wang , J (2010) ClusterViz: A Cytoscape Plugin for Graph Clustering and
Visualization Central South University, Changsha
Camin JH, Sokal RR (1965) A Method for Deducing Branching Sequences in Phylogeny.
Evolution 19:311
Capecchi MR (2005) Gene Targeting in Mice: Functional Analysis of the Mammalian
Genome for the Twenty-First Century. Nat Rev Genet 6:507
Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T (2009) Trimal: A Tool for Automated
Alignment Trimming in Large-Scale Phylogenetic Analyses. Bioinformatics
25:1972
Cavalli-Sforza LLE, Edwards A.W.F (1964) Reconstruction of Evolutionary Trees.
Phenetic and Phylogenetic Classification 6:67-76
Ceol A, Aryamontri AC, Licata L, Peluso D, Briganti L, Perfetto L, Castagnoli L, Cesareni G
(2010) Mint, the Molecular Interaction Database: 2009 Update. Nucleic Acids
Research 38:D532
Chalfie M, Tu Y, Euskirchen G, Ward WW, Prasher DC (1994) Green Fluorescent Protein
as a Marker for Gene-Expression. Science 263:802
Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni
G (2007) Mint: The Molecular Interaction Database. Nucleic Acids Res 35:D572
Chen LF, Vitkup D (2006) Predicting Genes for Orphan Metabolic Activities Using
Phylogenetic Profiles. Genome Biology 7:R17
Chen XW, Jeong JC, Dermyer P (2011) Kups: Constructing Datasets of Interacting and
Non-Interacting Protein Pairs with Associated Attributions. Nucleic Acids Res
39:D750
207
References
Coin F, Marinoni JC, Rodolfo C, Fribourg S, Pedrini AM, Egly JM (1998) Mutations in the
Xpd Helicase Gene Result in Xp and Ttd Phenotypes, Preventing Interaction
between Xpd and the P44 Subunit of Tfiih. Nature Genetics 20:184
Cokus S, Mizutani S, Pellegrini M (2007) An Improved Method for Identifying
Functionally Linked Proteins Using Phylogenetic Profiles. BMC Bioinformatics
8:S7
Coulson RMR, Ouzounis CA (2003) The Phylogenetic Diversity of Eukaryotic
Transcription. Nucleic Acids Res 31:653
Cranston KA, Hurwitz B, Ware D, Stein L, Wing RA (2009) Species Trees from Highly
Incongruent Gene Trees in Rice. Systematic Biology 58:489
Cranston KA, Rannala B (2007) Summarizing a Posterior Distribution of Trees Using
Agreement Subtrees. Systematic Biology 56:578
Crick FH, Barnett L, Brenner S, Watts-Tobin RJ (1961) General Nature of the Genetic
Code for Proteins. Nature 192:1227
Cunningham FX, Lafond TP, Gantt E (2000) Evidence of a Role for Lytb in the
Nonmevalonate Pathway of Isoprenoid Biosynthesis. Journal of Bacteriology
182:5841
Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SM, Clamp M (2004) The
Ensembl Automatic Gene Annotation System. Genome Res 14:942
Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of Gene Order: A
Fingerprint of Proteins That Physically Interact. Trends Biochem Sci 23:324
Davey R, Savva G, Dicks J, Roberts IN (2007) Mpp: A Microarray-to-Phylogeny Pipeline
for Analysis of Gene and Marker Content Datasets. Bioinformatics 23:1023
Dayhoff MO, Schwartz. RM, Orcutt. BC (1978) A Model of Evolutionary Change in
Proteins. Atlas of Protein Sequence and Structure 5:345
De Bodt S, Proost S, Vandepoele K, Rouze P, Van de Peer Y (2009) Predicting ProteinProtein Interactions in Arabidopsis Thaliana through Integration of Orthology,
Gene Ontology and Co-Expression. BMC Genomics 10:288
De Las Rivas J, Fontanillo C (2010) Protein Protein Interactions Essentials: Key
Concepts to Building and Analyzing Interactome Networks. PLoS Comput Biol
6:e1000807
Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, Dufayard JF, Guindon S,
Lefort V, Lescot M, Claverie JM, Gascuel O (2008) Phylogeny.Fr: Robust
Phylogenetic Analysis for the Non-Specialist. Nucleic Acids Research 36:W465
Dowsey AW, Dunn MJ, Yang GZ (2003) The Role of Bioinformatics in Two-Dimensional
Gel Electrophoresis. Proteomics 3:1567
Durbin R (1998) Biological Sequence Analysis : Probabilistic Models of Proteins and
Nucleic Acids. Cambridge University Press, Cambridge New York
Eddy SR (1998) Profile Hidden Markov Models. Bioinformatics 14:755
Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009) Gorilla: A Tool for Discovery
and Visualization of Enriched Go Terms in Ranked Gene Lists. BMC
Bioinformatics 10:48
Edgar RC (2004) Muscle: Multiple Sequence Alignment with High Accuracy and High
Throughput. Nucleic Acids Res 32:1792
Edgar RC, Batzoglou S (2006) Multiple Sequence Alignment. Curr Opin Struct Biol 16:368
Edgell DR, Belfort M, Shub DA (2000) Barriers to Intron Promiscuity in Bacteria. J
Bacteriol 182:5281
Edwards AWF (1992) Likelihood. Johns Hopkins University Press, Baltimore ; London
Elias I, Tuller T (2007) Reconstruction of Ancestral Genomic Sequences Using
Likelihood. Journal of Computational Biology 14:216
208
References
Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA (1999) Protein Interaction Maps for
Complete Genomes Based on Gene Fusion Events. Nature 402:86
Farrar M (2007) Striped Smith-Waterman Speeds Database Searches Six Times over
Other Simd Implementations. Bioinformatics 23:156
Farris JS (1977) Phylogenetic Analysis under Dollo's Law. Systematic Zoology 26:77
Farris JS (1978) Inferring Phylogenetic Trees from Chromosome Inversion Data.
Systematic Zoology 27:275
Felsenstein J (1973) Maximum Likelihood and Minimum-Steps Methods for Estimating
Evolutionary Trees from Data on Discrete Characters. Systematic Zoology 22:240
Felsenstein J (1978) Cases in Which Parsimony or Compatibility Methods Will Be
Positively Misleading. Syst Zool 27:401
Felsenstein J (1979) Alternative Methods of Phylogenetic Inference and Their
Interrelationship. Systematic Zoology 28:49
Felsenstein J (1985a) Confidence-Limits on Phylogenies - an Approach Using the
Bootstrap. Evolution 39:783
Felsenstein J (1985b) Phylogenies and the Comparative Method. The American Naturalist
125:1
Felsenstein J (1989) Phylip - Phylogeny Inference Package (Version 3.2). Cladistics 5:164
Felsenstein J (2004) Inferring Phylogenies. Sinauer Associates, Sunderland, Mass.
Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant D, Merregaert J, Min Jou W,
Molemans F, Raeymaekers A, Van den Berghe A, Volckaert G, Ysebaert M (1976)
Complete Nucleotide Sequence of Bacteriophage Ms2 Rna: Primary and
Secondary Structure of the Replicase Gene. Nature 260:500
Fitch WM (1970) Distinguishing Homologous from Analogous Proteins. Syst Zool 19:99
Fitch WM (1971) Toward Defining Course of Evolution - Minimum Change for a
Specific Tree Topology. Syst Zool 20:406
Fitch WM (2000) Homology a Personal View on Some of the Problems. Trends Genet
16:227
Fitzpatrick DA, Logue ME, Stajich JE, Butler G (2006) A Fungal Phylogeny Based on 42
Complete Genomes Derived from Supertree and Combined Gene Analysis. BMC
Evol Biol 6:99
Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W (1998) A Computer Program for
Aligning a Cdna Sequence with a Genomic DNA Sequence. Genome Res 8:967
Fu N, Drinnenberg I, Kelso J, Wu JR, Paabo S, Zeng R, Khaitovich P (2007) Comparison of
Protein and Mrna Expression Evolution in Humans and Chimpanzees. PLoS One
2:e216
Garman KS, Acharya CR, Edelman E, Grade M, Gaedcke J, Sud S, Barry W, Diehl AM,
Provenzale D, Ginsburg GS, Ghadimi BM, Ried T, Nevins JR, Mukherjee S, Hsu D,
Potti A (2009) A Genomic Approach to Colon Cancer Risk Stratification Yields
Biologic Insights into Therapeutic Opportunities (Vol 105, 19432, 2008).
Proceedings of the National Academy of Sciences of the United States of America
106:6878
Garrett S, Barton WA, Knights R, Jin P, Morgan DO, Fisher RP (2001) Reciprocal
Activation by Cyclin-Dependent Kinases 2 and 7 Is Directed by Substrate
Specificity Determinants Outside the T Loop. Molecular and Cellular Biology
21:88
Gaschen B, Taylor J, Yusim K, Foley B, Gao F, Lang D, Novitsky V, Haynes B, Hahn BH,
Bhattacharya T, Korber B (2002) Aids - Diversity Considerations in Hiv-1 Vaccine
Selection. Science 296:2354
209
References
Ge XJ, Yamamoto S, Tsutsumi S, Midorikawa Y, Ihara S, Wang SM, Aburatani H (2005)
Interpreting Expression Profiles of Cancers by Genome-Wide Survey of Breadth
of Expression in Normal Tissues. Genomics 86:127
Gillis B, Gavin IM, Arbieva Z, King ST, Jayaraman S, Prabhakar BS (2007) Identification
of Human Cell Responses to Benzene and Benzene Metabolites. Genomics 90:324
Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel
JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin
H, Oliver SG (1996) Life with 6000 Genes. Science 274:546
Goldman N, Anderson JP, Rodrigo AG (2000) Likelihood-Based Tests of Topologies in
Phylogenetics. Systematic Biology 49:652
Goloboff PA, Catalano SA, Mirande JM, Szumik CA, Arias JS, Kallersjo M, Farris JS (2009)
Phylogenetic Analysis of 73 060 Taxa Corroborates Major Eukaryotic Groups.
Cladistics 25:211
Gonzalez O, Zimmer R (2008) Assigning Functional Linkages to Proteins Using
Phylogenetic Profiles and Continuous Phenotypes. Bioinformatics 24:1257
Grafen A (1989) The Phylogenetic Regression. Philosophical Transactions of the Royal
Society of London Series B-Biological Sciences 326:119
Graur D, Shuali Y, Li WH (1989) Deletions in Processed Pseudogenes Accumulate Faster
in Rodents Than in Humans. Journal of Molecular Evolution 28:279
Griffiths AJF (2002) Modern Genetic Analysis : Integrating Genes and Genomes. W.H.
Freeman and Co., New York
Guindon S, Gascuel O (2003) A Simple, Fast, and Accurate Algorithm to Estimate Large
Phylogenies by Maximum Likelihood. Syst Biol 52:696
Gygi SP, Rochon Y, Franza BR, Aebersold R (1999) Correlation between Protein and
Mrna Abundance in Yeast. Molecular and Cellular Biology 19:1720
Hakes L, Pinney J.W, Lowell S.C, Oliver S.G, Robertson D.L (2007) All Duplicates Are
Not Equal: The Difference between Small-Scale and Genome Duplication.
Genome Biology 8:R209
Hamming RW (1950) Error Detecting and Error Correcting Codes. Bell System
Technical Journal 26:147
Han JDJ, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJM,
Cusick ME, Roth FP, Vidal M (2004) Evidence for Dynamically Organized
Modularity in the Yeast Protein-Protein Interaction Network. Nature 430:88
Harrison CJ, Langdale JA (2006) A Step by Step Guide to Phylogeny Reconstruction.
Plant J 45:561
Hart GT, Ramani AK, Marcotte EM (2006) How Complete Are Current Yeast and
Human Protein-Interaction Networks? Genome Biol 7:120
Harvey PH and Pagel MD (1991). The Comparative Method in Evolutionary Biology.
Oxford: Oxford University Press
Hasegawa H, Holm L (2009) Advances and Pitfalls of Protein Structural Alignment.
Curr Opin Struct Biol 19:341
Hasegawa M, Kishino H (1989) Confidence-Limits on the Maximum-Likelihood Estimate
of the Hominoid Tree from Mitochondrial-DNA Sequences. Evolution 43:672
Haw R, Hermjakob H, D'Eustachio P, Stein L (2011) Reactome Pathway Analysis to
Enrich Biological Discovery in Proteomics Datasets. Proteomics: 11(18):3598-613.
Hayton K, Su XZ (2004) Genetic and Biochemical Aspects of Drug Resistance in Malaria
Parasites. Curr Drug Targets Infect Disord 4:1
210
References
He HY, Soncin F, Grammatikakis N, Li YL, Siganou A, Gong JL, Brown SA, Kingston RE,
Calderwood SK (2003) Elevated Expression of Heat Shock Factor (Hsf) 2a
Stimulates Hsf1-Induced Transcription During Stress. Journal of Biological
Chemistry 278:35465
Heath TA, Hedtke SM, Hillis DM (2008) Taxon Sampling and the Accuracy of
Phylogenetic Analyses. Journal of Systematics and Evolution 46:239
Henikoff S, Henikoff JG (1992) Amino Acid Substitution Matrices from Protein Blocks.
Proc Natl Acad Sci U S A 89:10915
Hershey AD, Chase M (1952) Independent Functions of Viral Protein and Nucleic Acid
in Growth of Bacteriophage. J Gen Physiol 36:39
Hert DG, Fredlake CP, Barron AE (2008) Advantages and Limitations of Next-Generation
Sequencing Technologies: A Comparison of Electrophoresis and NonElectrophoresis Methods. Electrophoresis 29:4618
Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring Expression Data: Identification and
Analysis of Coexpressed Genes. Genome Research 9:1106
Higgins DG, Sharp PM (1988) Clustal: A Package for Performing Multiple Sequence
Alignment on a Microcomputer. Gene 73:237
Hill J, Hambley M, Forster T, Mewissen M, Sloan TM, Scharinger F, Trew A, Ghazal P
(2008) Sprint: A New Parallel Framework for R. BMC Bioinformatics 9
Hobolth A, Christensen OF, Mailund T, Schierup MH (2007) Genomic Relationships and
Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a
Coalescent Hidden Markov Model. PLoS Genet 3:e7
Hodges A, Strand AD, Aragaki AK, Kuhn A, Sengstag T, Hughes G, Elliston LA, Hartog C,
Goldstein DR, Thu D, Hollingsworth ZR, Collin F, Synek B, Holmans PA, Young
AB, Wexler NS, Delorenzi M, Kooperberg C, Augood SJ, Faull RL, Olson JM, Jones
L, Luthi-Carter R (2006) Regional and Cellular Gene Expression Changes in
Human Huntington's Disease Brain. Hum Mol Genet 15:965
Holder M, Lewis PO (2003) Phylogeny Estimation: Traditional and Bayesian
Approaches. Nature Reviews Genetics 4:275
Hruby VJ (1997) Prospects for Peptidomimetic Drug Design. Drug Discovery Today 2:165
Huai Q, Kim HY, Liu YD, Zhao YD, Mondragon A, Liu JO, Ke HM (2002) Crystal
Structure of Calcineurin-Cyclophilin-Cyclosporin Shows Common but Distinct
Recognition of Immunophilin-Drug Complexes. Proceedings of the National
Academy of Sciences of the United States of America 99:12037
Huelsenbeck JP, Bollback JP (2001) Empirical and Hierarchical Bayesian Estimation of
Ancestral States. Systematic Biology 50:351
Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Bayesian Inference of
Phylogeny and Its Impact on Evolutionary Biology. Science 294:2310
Hughes AL, Friedman R (2005) Poxvirus Genome Evolution by Gene Gain and Loss.
Molecular Phylogenetics and Evolution 35:186
Hulsen T, Huynen MA, de Vlieg J, Groenen PM (2006) Benchmarking Ortholog
Identification Methods Using Functional Genomics Data. Genome Biol 7:R31
Hurles M (2004) Gene Duplication: The Genomic Trade in Spare Parts. PLoS Biol
2:E206
Ispolatov I, Yuryev A, Mazo I, Maslov S (2005) Binding Properties and Evolution of
Homodimers in Protein-Protein Interaction Networks. Nucleic Acids Res 33:3629
Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M,
Greenblatt JF, Gerstein M (2003) A Bayesian Networks Approach for Predicting
Protein-Protein Interactions from Genomic Data. Science 302:449
211
References
Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A,
Simonovic M, Bork P, von Mering C (2009) String 8-a Global View on Proteins
and Their Functional Interactions in 630 Organisms. Nucleic Acids Research
37:D412
Jeong H, Mason SP, Barabasi AL, Oltvai ZN (2001) Lethality and Centrality in Protein
Networks. Nature 411:41
Jessop CE, Chakravarthi S, Garbi N, Hammerling GJ, Lovell S, Bulleid NJ (2007) Erp57 Is
Essential for Efficient Folding of Glycoproteins Sharing Common Structural
Domains. EMBO J 26:28
Jim K, Parmar K, Singh M, Tavazoie S (2004) A Cross-Genomic Approach for Systematic
Mapping of Phenotypic Traits to Genes. Genome Research 14:109
Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, Armour CD, Santos R, Schadt
EE, Stoughton R, Shoemaker DD (2003) Genome-Wide Survey of Human
Alternative Pre-Mrna Splicing with Exon Junction Microarrays. Science
302:2141
Kanehisa M (1997) A Database for Post-Genome Analysis. Trends Genet 13:375
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T,
Araki M, Hirakawa M (2006) From Genomics to Chemical Genomics: New
Developments in Kegg. Nucleic Acids Res 34:D354
Karlin S, Altschul SF (1990) Methods for Assessing the Statistical Significance of
Molecular Sequence Features by Using General Scoring Schemes. Proc Natl Acad
Sci U S A 87:2264
Katoh K, Misawa K, Kuma K, Miyata T (2002) Mafft: A Novel Method for Rapid
Multiple Sequence Alignment Based on Fast Fourier Transform. Nucleic Acids
Res 30:3059
Kawaji H, Hayashizaki Y (2008) Genome Annotation. Methods Mol Biol 452:125
Keane TM, Creevey CJ, Pentony MM, Naughton TJ, Mclnerney JO (2006) Assessment of
Methods for Amino Acid Matrix Selection and Their Use on Empirical Data
Shows That Ad Hoc Assumptions for Choice of Matrix Are Not Justified. BMC
Evolutionary Biology 6
Kensche PR, van Noort V, Dutilh BE, Huynen MA (2008) Practical and Theoretical
Advances in Predicting the Function of a Protein by Its Phylogenetic
Distribution. J R Soc Interface 5:151
Kersey PJ, Duarte J, Williams A, Karavidopoulou Y, Birney E, Apweiler R (2004) The
International Protein Index: An Integrated Database for Proteomics
Experiments. Proteomics 4:1985
Kim IY, Shin JH, Seong JK (2010) Mouse Phenogenomics, Toolbox for Functional
Annotation of Human Genome. BMB Rep 43:79
Kim MJ, Romero R, Kim CJ, Tarca AL, Chhauy S, LaJeunesse C, Lee DC, Draghici S,
Gotsch F, Kusanovic JP, Hassan SS, Kim JS (2009) Villitis of Unknown Etiology Is
Associated with a Distinct Pattern of Chemokine up-Regulation in the FetoMaternal and Placental Compartments: Implications for Conjoint Maternal
Allograft Rejection and Maternal Anti-Fetal Graft-Versus-Host Disease. Journal
of Immunology 182:3919
Knight RD, Landweber LF, Yarus M (2001) How Mitochondria Redefine the Code. J Mol
Evol 53:299
Knowles DG, McLysaght A (2009) Recent De Novo Origin of Human Protein-Coding
Genes. Genome Res 19:1752
Korf I (2004) Gene Finding in Novel Genomes. BMC Bioinformatics 5:59
212
References
Koshi JM, Goldstein RA (1996) Probabilistic Reconstruction of Ancestral Protein
Sequences. Journal of Molecular Evolution 42:313
Krane DE, Raymer ML (2003) Fundamental Concepts of Bioinformatics. Pearson
Education International, San Francisco
Krylov DM, Wolf YI, Rogozin IB, Koonin EV (2003) Gene Loss, Protein Sequence
Divergence, Gene Dispensability, Expression Level, and Interactivity Are
Correlated in Eukaryotic Evolution. Genome Research 13:2229
Kuhner MK, Felsenstein J (1994) Simulation Comparison of Phylogeny Algorithms under
Equal and Unequal Evolutionary Rates. Mol Biol Evol 11:459
Kummel D, Oeckinghaus A, Wang C, Krappmann D, Heinemann U (2008) Distinct
Isocomplexes of the Trapp Trafficking Factor Coexist inside Human Cells. FEBS
Lett 582:3729
Lande J, Gimino V, Berryman T, Hertz MI, King RA (2003) Gene Expression Profiling of
Bronchoalveolar Lavage Cells in Acute Lung Rejection. American Journal of
Human Genetics 73:421
Le SQ, Gascuel O (2008) An Improved General Amino Acid Replacement Matrix. Mol
Biol Evol 25:1307
Lei PW, Koehly LM (2003) Linear Discriminant Analysis Versus Logistic Regression: A
Comparison of Classification Errors in the Two-Group Case. Journal of
Experimental Education 72:25
Lequesne WJ (1974) Uniquely Evolved Character Concept and Its Cladistic Application.
Systematic Zoology 23:513
Lescure FX, Le Loup G, Freilij H, Develoux M, Paris L, Brutus L, Pialoux G (2010) Chagas
Disease: Changes in Knowledge and Management. Lancet Infectious Diseases
10:556
Levesque M, Shasha D, Kim W, Surette MG, Benfey PN (2003) Trait-to-Gene: A
Computational Method for Predicting the Function of Uncharacterized Genes.
Current Biology 13:129
Lewinski MK, Bisgrove D, Shinn P, Chen H, Hoffmann C, Hannenhalli S, Verdin E, Berry
CC, Ecker JR, Bushman FD (2005) Genome-Wide Analysis of Chromosomal
Features Repressing Human Immunodeficiency Virus Transcription. Journal of
Virology 79:6610
Li L, Stoeckert CJ, Jr., Roos DS (2003) Orthomcl: Identification of Ortholog Groups for
Eukaryotic Genomes. Genome Res 13:2178
Li M, Wang JX, Chen J (2008) A Fast Agglomerate Algorithm for Mining Functional
Modules in Protein Interaction Networks. Bmei 2008: Proceedings of the
International Conference on Biomedical Engineering and Informatics, Vol 1:3
Linial M (2003) How Incorrect Annotations Evolve - the Case of Short Orfs. Trends in
Biotechnology 21:298
Lunter G, Ponting CP, Hein J (2006) Genome-Wide Identification of Human Functional
DNA Using a Neutral Indel Model. PLoS Comput Biol 2:2
Macagno A, Molteni M, Rinaldi A, Bertoni F, Lanzavecchia A, Rossetti C, Sallusto F (2006)
A Cyanobacterial Lps Antagonist Prevents Endotoxin Shock and Blocks
Sustained Tlr4 Stimulation Required for Cytokine Expression. Journal of
Experimental Medicine 203:1481
Maddison WP (1990) A Method for Testing the Correlated Evolution of Two Binary
Characters - Are Gains or Losses Concentrated on Certain Branches of a
Phylogenetic Tree. Evolution 44:539
Maddison WP, Maddison DR (2010) Mesquite: A Modular System for Evolutionary
Analysis. Version 2.73
213
References
Maere S, Heymans K, Kuiper M (2005) Bingo: A Cytoscape Plugin to Assess
Overrepresentation of Gene Ontology Categories in Biological Networks.
Bioinformatics 21:3448
Malcolm BA, Wilson KP, Matthews BW, Kirsch JF, Wilson AC (1990) Ancestral
Lysozymes Reconstructed, Neutrality Tested, and Thermostability Linked to
Hydrocarbon Packing. Nature 345:86
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D (1999) Detecting
Protein Function and Protein-Protein Interactions from Genome Sequences.
Science 285:751
Mardis ER (2008) The Impact of Next-Generation Sequencing Technology on Genetics.
Trends Genet 24:133
Martin DM, Berriman M, Barton GJ (2004) Gotcha: A New Method for Prediction of
Protein Function Assessed by the Annotation of Seven Genomes. BMC
Bioinformatics 5:178
Maston GA, Evans SK, Green MR (2006) Transcriptional Regulatory Elements in the
Human Genome. Annual Review of Genomics and Human Genetics 7:29
Maxam AM, Gilbert W (1977) New Method for Sequencing DNA. Proc Natl Acad Sci U S
A 74:560
McDowall MD, Scott MS, Barton GJ (2009) Pips: Human Protein-Protein Interaction
Prediction Database. Nucleic Acids Res 37:D651
McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu YT, Tsung EF, Clouser CR,
Duncan C, Ichikawa JK, Lee CC, Zhang Z, Ranade SS, Dimalanta ET, Hyland FC,
Sokolsky TD, Zhang L, Sheridan A, Fu HN, Hendrickson CL, Li B, Kotler L, Stuart
JR, Malek JA, Manning JM, Antipova AA, Perez DS, Moore MP, Hayashibara KC,
Lyons MR, Beaudoin RE, Coleman BE, Laptewicz MW, Sannicandro AE, Rhodes
MD, Gottimukkala RK, Yang S, Bafna V, Bashir A, MacBride A, Alkan C, Kidd JM,
Eichler EE, Reese MG, De la Vega FM, Blanchard AP (2009) Sequence and
Structural Variation in a Human Genome Uncovered by Short-Read, Massively
Parallel Ligation Sequencing Using Two-Base Encoding. Genome Research
19:1527
McLysaght A, Baldi PF, Gaut BS (2003) Extensive Gene Gain Associated with Adaptive
Evolution of Poxviruses. Proceedings of the National Academy of Sciences of the
United States of America 100:15655
Messler W, Stewart CB (1997) Episodic Adaptive Evolution of Primate Lysozymes.
Nature 385:151
Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K,
Anuradha N, Reddy R, Raghavan TM, Menon S, Hanumanthu G, Gupta M, Upendran
S, Gupta S, Mahesh M, Jacob B, Mathew P, Chatterjee P, Arun KS, Sharma S,
Chandrika KN, Deshpande N, Palvankar K, Raghavnath R, Krishnakanth R, Karathia
H, Rekha B, Nayak R, Vishnupriya G, Kumar HG, Nagini M, Kumar GS, Jose R,
Deepthi P, Mohan SS, Gandhi TK, Harsha HC, Deshpande KS, Sarker M, Prasad TS,
Pandey A (2006) Human Protein Reference Database--2006 Update. Nucleic
Acids Res 34:D411
Mohamed TP, Carbonell JG, Ganapathiraju MK (2010) Active Learning for Human
Protein-Protein Interaction Prediction. BMC Bioinformatics 11 Suppl 1:S57
Monsalve M, Wu ZD, Adelmant G, Puigserver P, Fan ML, Spiegelman BM (2000) Direct
Coupling of Transcription and Mrna Processing through the Thermogenic
Coactivator Pgc-1. Molecular Cell 6:307
Moore KJ (1999) Utilization of Mouse Models in the Discovery of Human Disease Genes.
Drug Discov Today 4:123
214
References
Morgenstern B, Frech K, Dress A, Werner T (1998) Dialign: Finding Local Similarities by
Multiple Sequence Alignment. Bioinformatics 14:290
Mount DW (2004) Bioinformatics : Sequence and Genome Analysis. Cold Spring Harbor
Laboratory Press, Cold Spring Harbor, N.Y.
Needleman SB, Wunsch CD (1970) A General Method Applicable to the Search for
Similarities in the Amino Acid Sequence of Two Proteins. J Mol Biol 48:443
Nei M, Kumar S (2000) Molecular Evolution and Phylogenetics. Oxford University Press
Nooren IMA, Thornton JM (2003) Structural Characterisation and Functional
Significance of Transient Protein-Protein Interactions. Journal of Molecular
Biology 325:991
Nuin PA, Wang Z, Tillier ER (2006) The Accuracy of Several Multiple Sequence
Alignment Programs for Proteins. BMC Bioinformatics 7:471
Nye TMW, Lio P, Gilks WR (2006) A Novel Algorithm and Web-Based Tool for
Comparing Two Alternative Phylogenetic Trees. Bioinformatics 22:117
O'donnell RK, Kupferman M, Wei SJ, Singhal S, Weber R, O'Malley B, Cheng Y, Putt M,
Feldman M, Ziober B, Muschel RJ (2005) Gene Expression Signature Predicts
Lymphatic Metastasis in Squamous Cell Carcinoma of the Oral Cavity.
Oncogene 24:1244
Ohta S, Shiomi Y, Sugimoto K, Obuse C, Tsurimoto T (2002) A Proteomics Approach to
Identify Proliferating Cell Nuclear Antigen (Pcna)-Binding Proteins in Human
Cell Lysates - Identification of the Human Chl12/Rfcs2-5 Complex as a Novel
Pcna-Binding Protein. Journal of Biological Chemistry 277:40362
Ooi SL, Pan X, Peyser BD, Ye P, Meluh PB, Yuan DS, Irizarry RA, Bader JS, Spencer FA,
Boeke JD (2006) Global Synthetic-Lethality Analysis and Yeast Functional
Profiling. Trends Genet 22:56
Orengo C, Jones D, Thornton JM (2003) Bioinformatics : Genes, Proteins, and
Computers. BIOS Scientific ; Distributed in the U.S. by Springer-Verlag, Oxford
New York
Orlowski J, Kaczanowski S, Zielenkiewicz P (2007) Overrepresentation of Interactions
between Homologous Proteins in Interactomes. Febs Letters 581:52
Page RDM, Holmes EC (1998) Molecular Evolution : A Phylogenetic Approach.
Blackwell Science, Oxford
Pagel M (1994) Detecting Correlated Evolution on Phylogenies - a General-Method for
the Comparative-Analysis of Discrete Characters. Proceedings of the Royal
Society of London Series B-Biological Sciences 255:37
Pagel M (1997) Inferring Evolutionary Processes from Phylogenies. Zoologica Scripta
26:331
Pagel M, Meade A, Barker D (2004a) Bayesian Estimation of Ancestral Character States
on Phylogenies. Syst Biol 53:673
Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C,
Mark P, Stumpflen V, Mewes HW, Ruepp A, Frishman D (2005) The Mips
Mammalian Protein-Protein Interaction Database. Bioinformatics 21:832
Pagel P, Wong P, Frishman D (2004b) A Domain Interaction Map Based on Phylogenetic
Profiling. J Mol Biol 344:1331
Parfrey LW, Barbero E, Lasser E, Dunthorn M, Bhattacharya D, Patterson DJ, Katz LA
(2006) Evaluating Support for the Current Classification of Eukaryotic
Diversity. PLoS Genet 2:e220
Parida L (2008) Pattern Discovery in Bioinformatics : Theory & Algorithms. Chapman &
Hall/CRC, London
215
References
Pazos F, Ranea JAG, Juan D, Sternberg MJE (2005) Assessing Protein Co-Evolution in the
Context of the Tree of Life Assists in the Prediction of the Interactome. Journal of
Molecular Biology 352:1002
Pazos F, Valencia A (2001) Similarity of Phylogenetic Trees as Indicator of ProteinProtein Interaction. Protein Eng 14:609
Pearson WR, Lipman DJ (1988) Improved Tools for Biological Sequence Comparison.
Proc Natl Acad Sci U S A 85:2444
Pellegrini M (2001) Computational Methods for Protein Function Analysis. Curr Opin
Chem Biol 5:46
Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (1999) Assigning
Protein Functions by Comparative Genome Analysis: Protein Phylogenetic
Profiles. Proc Natl Acad Sci U S A 96:4285
Pesole G (2008) What Is a Gene? An Updated Operational Definition. Gene 417:1
Picardi E, Pesole G (2010) Computational Methods for Ab Initio and Comparative Gene
Finding. Methods Mol Biol 609:269
Pickett KM, Randle CP (2005) Strange Bayes Indeed: Uniform Topological Priors Imply
Non-Uniform Clade Priors. Molecular Phylogenetics and Evolution 34:203
Pinney JW, Shirley MW, McConkey GA, Westhead DR (2005) Metashark: Software for
Automated Metabolic Network Prediction from DNA Sequence and Its
Application to the Genomes of Plasmodium Falciparum and Eimeria Tenella.
Nucleic Acids Research 33:1399
Posada D, Buckley TR (2004) Model Selection and Model Averaging in Phylogenetics:
Advantages of Akaike Information Criterion and Bayesian Approaches over
Likelihood Ratio Tests. Syst Biol 53:793
Potter SC, Clarke L, Curwen V, Keenan S, Mongin E, Searle SM, Stabenau A, Storey R,
Clamp M (2004) The Ensembl Analysis Pipeline. Genome Res 14:934
Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla
D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S,
Somanathan DS, Sebastian A, Rani S, Ray S, Kishore CJH, Kanth S, Ahmed M,
Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S,
Ranganathan P, Ramabadran S, Chaerkady R, Pandey A (2009) Human Protein
Reference Database-2009 Update. Nucleic Acids Research 37:D767
Pressman R (2001) Software Engineering: A Practioners Approach. McGraw-Hill
Pruitt KD, Tatusova T, Maglott DR (2005) Ncbi Reference Sequence (Refseq): A Curated
Non-Redundant Sequence Database of Genomes, Transcripts and Proteins.
Nucleic Acids Res 33:D501
Qi YJ, Bar-Joseph Z, Klein-Seetharaman J (2006) Evaluation of Different Biological Data
and Computational Classification Methods for Use in Protein Interaction
Prediction. Proteins-Structure Function and Bioinformatics 63:490
Quackenbush J (2002) Microarray Data Normalization and Transformation. Nat Genet
32 Suppl:496
Raab JR, Kamakaka RT (2010) Opinion Insulators and Promoters: Closer Than We
Think. Nature Reviews Genetics 11:439
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and Identifying
Communities in Networks. Proceedings of the National Academy of Sciences of the
United States of America 101:2658
Radom-Aizik S, Hayek S, Shahar I, Rechavi G, Kaminski N, Ben-Dov I (2005) Effects of
Aerobic Training on Gene Expression in Skeletal Muscle of Elderly Men.
Medicine and Science in Sports and Exercise 37:1680
216
References
Ralston KS, Kabututu ZP, Melehani JH, Oberholzer M, Hill KL (2009) The Trypanosoma
Brucei Flagellum: Moving Parasites in New Directions. Annual Review of
Microbiology 63:335
Ramachandran N, Hainsworth E, Bhullar B, Eisenstein S, Rosen B, Lau AY, Walter JC,
LaBaer J (2004) Self-Assembling Protein Microarrays. Science 305:86
Ramazzina I, Folli C, Secchi A, Berni R, Percudani R (2006) Completing the Uric Acid
Degradation Pathway through Phylogenetic Comparison of Whole Genomes.
Nature Chemical Biology 2:144
Ranea JA, Yeats C, Grant A, Orengo CA (2007) Predicting Protein Function with
Hierarchical Phylogenetic Profiles: The Gene3d Phylo-Tuner Method Applied to
Eukaryotic Genomes. PLoS Comput Biol 3:e237
R Development Core Team (2011) R: A language and environment for statistical
computing. R Foundation for Statistical Computing Vienna, Austria
Reghunathan R, Jayapal M, Hsu LY, Chng HH, Tai D, Leung BP, Melendez AJ (2005)
Expression Profile of Immune Response Genes in Patients with Severe Acute
Respiratory Syndrome. BMC Immunology 6
Remm M, Storm CE, Sonnhammer EL (2001) Automatic Clustering of Orthologs and inParalogs from Pairwise Species Comparisons. J Mol Biol 314:1041
Richmond TJ, Davey CA (2003) The Structure of DNA in the Nucleosome Core. Nature
423:145
Ridley M (1983) The Explanation of Organic Diversity : The Comparative Method and
Adaptions for Mating. Clarendon Press, Oxford
Robertson DL, Lovell SC (2009) Evolution in Protein Interaction Networks: CoEvolution, Rewiring and the Role of Duplication. Biochem Soc Trans 37:768
Rodriguez-Ezpeleta N, Brinkmann H, Burey SC, Roure B, Burger G, Loffelhardt W, Bohnert
HJ, Philippe H, Lang BF (2005) Monophyly of Primary Photosynthetic
Eukaryotes: Green Plants, Red Algae, and Glaucophytes. Curr Biol 15:1325
Rodriguez-Ezpeleta N, Brinkmann H, Burger G, Roger AJ, Gray MW, Philippe H, Lang BF
(2007) Toward Resolving the Eukaryotic Tree: The Phylogenetic Positions of
Jakobids and Cercozoans. Current Biology 17:1420
Rokas A, Williams BL, King N, Carroll SB (2003) Genome-Scale Approaches to Resolving
Incongruence in Molecular Phylogenies. Nature 425:798
Russell SJ, Norvig P, Canny J (2003) Artificial Intelligence : A Modern Approach.
Prentice Hall, Upper Saddle River, N.J.
Salemi M, Vandamme A-M (2003) The Phylogenetic Handbook : A Practical Approach
to DNA and Protein Phylogeny. Cambridge University Press, Cambridge, U.K. ;
New York
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The Database
of Interacting Proteins: 2004 Update. Nucleic Acids Res 32:D449
Sanger F, Coulson AR, Friedmann T, Air GM, Barrell BG, Brown NL, Fiddes JC, Hutchison
CA, Slocombe PM, Smith M (1978) Nucleotide-Sequence of Bacteriophage-PhiX174. J Mol Biol 125:225
Sanger F, Nicklen S, Coulson AR (1977) DNA Sequencing with Chain-Terminating
Inhibitors. Proc Natl Acad Sci U S A 74:5463
Sankoff D (1975) Minimal Mutation Trees of Sequences. Siam Journal on Applied
Mathematics 28:35
Sasaoka T, Kobayashi M (2000) The Functional Significance of Shc in Insulin Signaling
as a Substrate of the Insulin Receptor. Endocrine Journal 47:373
Scott MS, Barton GJ (2007) Probabilistic Prediction and Ranking of Human ProteinProtein Interactions. BMC Bioinformatics 8:239
217
References
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B,
Ideker T (2003) Cytoscape: A Software Environment for Integrated Models of
Biomolecular Interaction Networks. Genome Research 13:2498
Shortle D, Ackerman MS (2001) Persistence of Native-Like Topology in a Denatured
Protein in 8 M Urea. Science 293:487
Siddall ME (1998) Success of Parsimony in the Four-Taxon Case: Long-Branch
Repulsion by Likelihood in the Farris Zone. Cladistics-the International Journal of
the Willi Hennig Society 14:209
Sillentullberg B (1988) Evolution of Gregariousness in Aposematic Butterfly Larvae - a
Phylogenetic Analysis. Evolution 42:293
Simmons MP, Ochoterena H, Freudenstein JV (2002) Amino Acid Vs. Nucleotide
Characters: Challenging Preconceived Notions. Molecular Phylogenetics and
Evolution 24:78
Singh GP, Ganapathi M, Dash D (2007) Role of Intrinsic Disorder in Transient
Interactions of Hub Proteins. Proteins-Structure Function and Bioinformatics
66:761
Slater GS, Birney E (2005) Automated Generation of Heuristics for Biological Sequence
Comparison. Bmc Bioinformatics 6
Slonim N, Elemento O, Tavazoie S (2006) Ab Initio Genotype-Phenotype Association
Reveals Intrinsic Modularity in Genetic Networks. Molecular Systems Biology
Smith TF, Waterman MS (1981) Identification of Common Molecular Subsequences. J
Mol Biol 147:195
Sneath PHA, Sokal RR (1973) Numerical Taxonomy : The Principles and Practice of
Numerical Classification. W. H. Freeman, San Francisco
Snel B, Bork P, Huynen MA (1999) Genome Phylogeny Based on Gene Content. Nat
Genet 21:108
Sokal RR, Rohlf FJ (1995) Biometry : The Principles and Practice of Statistics in
Biological Research. W.H. Freeman, New York
Spira A, Beane J, Shah V, Liu G, Schembri F, Yang XM, Palma J, Brody JS (2004) Effects
of Cigarette Smoke on the Human Airway Epithelial Cell Transcriptome.
Proceedings of the National Academy of Sciences of the United States of America
101:10143
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M (2006) Biogrid: A
General Repository for Interaction Datasets. Nucleic Acids Research 34:D535
Steel M, Penny D (2000) Parsimony, Likelihood, and the Role of Models in Molecular
Phylogenetics. Mol Biol Evol 17:839
Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner
M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann
S, Goedde A, Toksoz E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H,
Wanker EE (2005) A Human Protein-Protein Interaction Network: A Resource
for Annotating the Proteome. Cell 122:957
Stevens PF, Augier A (1983) Augustin Augier's "Arbre Botanique" (1801), a
Remarkable Early Botanical Representation of the Natural System. Taxon
32:203
Stewart CB, Schilling JW, Wilson AC (1987) Adaptive Evolution in the Stomach
Lysozymes of Foregut Fermenters. Nature 330:401
Strachan T, Read AP (2004) Human Molecular Genetics. Garland Press, New York
Stuart GW, Moffett K, Leader JJ (2002) A Comprehensive Vertebrate Phylogeny Using
Vector Representations of Protein Sequences from Whole Genomes. Mol Biol
Evol 19:554
218
References
Stumpf MP, Thorne T, de Silva E, Stewart R, An HJ, Lappe M, Wiuf C (2008) Estimating
the Size of the Human Interactome. Proceedings of the National Academy of
Sciences of the United States of America 105:6959
Sundquist A, Ronaghi M, Tang HX, Pevzner P, Batzoglou S (2007) Whole-Genome
Sequencing and Assembly with High-Throughput, Short-Read Technologies.
PLoS One 2
Swanson KW, Irwin DM, Wilson AC (1991) Stomach Lysozyme Gene of the Langur
Monkey - Tests for Convergence and Positive Selection. J Mol Evol 33:418
Swofford DL, Maddison WP (1987) Reconstructing Ancestral Character States under
Wagner Parsimony. Mathematical Biosciences 87:199
Swofford DL, Waddell PJ, Huelsenbeck JP, Foster PG, Lewis PO, Rogers JS (2001) Bias in
Phylogenetic Estimation and Its Relevance to the Choice between Parsimony and
Likelihood Methods. Systematic Biology 50:525
Takatsu H, Futatsumori M, Yoshino K, Yoshida Y, Shin HW, Nakayama K (2001) Similar
Subunit Interactions Contribute to Assembly of Clathrin Adaptor Complexes
and Copi Complex: Analysis Using Yeast Three-Hybrid System. Biochemical and
Biophysical Research Communications 284:1083
Talavera G, Castresana J (2007) Improvement of Phylogenies after Removing Divergent
and Ambiguously Aligned Blocks from Protein Sequence Alignments. Syst Biol
56:564
Tanaka R, Yi TM, Doyle J (2005) Some Protein Interaction Data Do Not Exhibit Power
Law Statistics. Febs Letters 579:5140
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM,
Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV,
Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The Cog Database: An Updated
Version Includes Eukaryotes. BMC Bioinformatics 4:41
Telford MJ (2004) Animal Phylogeny: Back to the Coelomata? Curr Biol 14:R274
Tian B, Nowak DE, Jamaluddin M, Wang SF, Brasier AR (2005) Identification of Direct
Genomic Targets Downstream of the Nuclear Factor-Kappa B Transcription
Factor Mediating Tumor Necrosis Factor Signaling. Journal of Biological
Chemistry 280:17435
Tierney EP, Tulac S, Huang STJ, Giudice LC (2003) Activation of the Protein Kinase a
Pathway in Human Endometrial Stromal Cells Reveals Sequential Categorical
Gene Regulation. Physiological Genomics 16:47
Townsend JP, Lopez-Giraldez F, Friedman R (2008) The Phylogenetic Informativeness of
Nucleotide and Amino Acid Sequences for Reconstructing the Vertebrate Tree. J
Mol Evol 67:437
Valadkhan S, Jaladat Y (2010) The Spliceosomal Proteome: At the Heart of the Largest
Cellular Ribonucleoprotein Machine. Proteomics 10: 4128
Vanacova S, Liston DR, Tachezy J, Johnson PJ (2003) Molecular Biology of the
Amitochondriate Parasites, Giardia Intestinalis, Entamoeba Histolytica and
Trichomonas Vaginalis. International Journal for Parasitology 33:235
Vanharanta S, Pollard PJ, Lehtonen HJ, Laiho P, Sjoberg J, Leminen A, Aittomaki K, Arola
J, Kruhoffer M, Orntoft TF, Tomlinson IP, Kiuru M, Arango D, Aaltonen LA (2006)
Distinct Expression Profile in Fumarate-Hydratase-Deficient Uterine Fibroids.
Human Molecular Genetics 15:97
Velculescu VE, Zhang L, Zhou W, Polyak K, Basrai M, Bassett D, Hieter P, Vogelstein B,
Kinzler KW (1997) Serial Analysis of Gene Expression (Sage). American Journal
of Human Genetics 61:A36
219
References
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M,
Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman
JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas
PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick
VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos
R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S,
Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E,
Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R,
Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian
AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z,
Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina
N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg
S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R,
Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong
F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A,
Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I,
Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport
L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart
B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T,
Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D,
McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K,
Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH,
Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E,
Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M,
Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell
MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania
A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz
R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M,
Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M,
Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek
A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J,
Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu
X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T,
Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J,
Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M,
Wu D, Wu M, Xia A, Zandieh A, Zhu X (2001) The Sequence of the Human
Genome. Science 291:1304
Vert JP (2002) A Tree Kernel to Analyse Phylogenetic Profiles. Bioinformatics 18 Suppl
1:S276
Vidalain PO, Boxem M, Ge H, Li S, Vidal M (2004) Increasing Specificity in HighThroughput Yeast Two-Hybrid Experiments. Methods 32:363
Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) Ensemblcompara
Genetrees: Complete, Duplication-Aware Phylogenetic Trees in Vertebrates.
Genome Res 19:327
von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen
MA, Bork P (2005) String: Known and Predicted Protein-Protein Associations,
Integrated and Transferred across Organisms. Nucleic Acids Res 33:D433
von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002)
Comparative Assessment of Large-Scale Data Sets of Protein-Protein
Interactions. Nature 417:399
220
References
von Mering C, Zdobnov EM, Tsoka S, Ciccarelli FD, Pereira-Leal JB, Ouzounis CA, Bork P
(2003) Genome Evolution Reveals Biochemical Networks and Functional
Modules. Proc Natl Acad Sci U S A 100:15428
Walhout AJ, Sordella R, Lu X, Hartley JL, Temple GF, Brasch MA, Thierry-Mieg N, Vidal
M (2000) Protein Interaction Mapping in C. Elegans Using Proteins Involved in
Vulval Development. Science 287:116
Wall DP, Fraser HB, Hirsh AE (2003) Detecting Putative Orthologs. Bioinformatics
19:1710
Wang D, Hsieh M, Li WH (2005a) A General Tendency for Conservation of Protein
Length across Eukaryotic Kingdoms. Mol Biol Evol 22:142
Wang H, Xu Z, Gao L, Hao B (2009a) A Fungal Phylogeny Based on 82 Complete
Genomes Using the Composition Vector Method. BMC Evol Biol 9:195
Wang J, Xia Q, He X, Dai M, Ruan J, Chen J, Yu G, Yuan H, Hu Y, Li R, Feng T, Ye C, Lu
C, Wang J, Li S, Wong GK, Yang H, Wang J, Xiang Z, Zhou Z, Yu J (2005b)
Silkdb: A Knowledgebase for Silkworm Biology and Genomics. Nucleic Acids
Res 33:D399
Wang Z, Gerstein M, Snyder M (2009b) Rna-Seq: A Revolutionary Tool for
Transcriptomics. Nat Rev Genet 10:57
Watson JD, Crick FH (1953) Molecular Structure of Nucleic Acids; a Structure for
Deoxyribose Nucleic Acid. Nature 171:737
Watts DJ, Strogatz SH (1998) Collective Dynamics of 'Small-World' Networks. Nature
393:440
Wheeler TJ, Kececioglu JD (2007) Multiple Alignment by Aligning Alignments.
Bioinformatics 23:i559
Whelan S, Goldman N (2001) A General Empirical Model of Protein Evolution Derived
from Multiple Protein Families Using a Maximum-Likelihood Approach.
Molecular Biology and Evolution 18:691
Whitaker JW, McConkey GA, Westhead DR (2009) Prediction of Horizontal Gene
Transfers in Eukaryotes: Approaches and Challenges. Biochem Soc Trans 37:792
Wodicka L, Dong H, Mittmann M, Ho MH, Lockhart DJ (1997) Genome-Wide Expression
Monitoring in Saccharomyces Cerevisiae. Nat Biotechnol 15:1359
Woese CR, Kandler O, Wheelis ML (1990) Towards a Natural System of Organisms:
Proposal for the Domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U
S A 87:4576
Wolf YI, Rogozin IB, Koonin EV (2004) Coelomata and Not Ecdysozoa: Evidence from
Genome-Wide Phylogenetic Analysis. Genome Research 14:29
Wootton JC, Federhen S (1993) Statistics of Local Complexity in Amino-Acid-Sequences
and Sequence Databases. Computers & Chemistry 17:149
Wu G, Nie L, Zhang WW (2008) Integrative Analyses of Posttranscriptional Regulation
in the Yeast Saccharomyces Cerevisiae Using Transcriptomic and Proteomic
Data. Current Microbiology 57:18
Yakovchuk P, Protozanova E, Frank-Kamenetskii MD (2006) Base-Stacking and BasePairing Contributions into Thermal Stability of the DNA Double Helix (Vol 34,
Pg 564, 2006). Nucleic Acids Res 34:1082
Yang Z (2006) Computational Molecular Evolution. Oxford University Press, Oxford
Yang Z (2008) Computational Molecular Evolution. Oxford University Press, Oxford
Yang ZH (1994) Maximum-Likelihood Phylogenetic Estimation from DNA-Sequences
with Variable Rates over Sites - Approximate Methods. Journal of Molecular
Evolution 39:306
221
References
Yang ZH, Kumar S, Nei M (1995) A New Method of Inference of Ancestral Nucleotide
and Amino-Acid-Sequences. Genetics 141:1641
Yedavalli VSRK, Neuveut C, Chi YH, Kleiman L, Jeang KT (2004) Requirement of Ddx3
Dead Box Rna Helicase for Hiv-1 Rev-Rre Export Function. Cell 119:381
Yu HY, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa
T, Gebreab F, Li N, Simonis N, Hao T, Rual JF, Dricot A, Vazquez A, Murray RR,
Simon C, Tardivo L, Tam S, Svrzikapa N, Fan CY, de Smet AS, Motyl A, Hudson
ME, Park J, Xin XF, Cusick ME, Moore T, Boone C, Snyder M, Roth FP, Barabasi
AL, Tavernier J, Hill DE, Vidal M (2008) High-Quality Binary Protein Interaction
Map of the Yeast Interactome Network. Science 322:104
Yu HY, Luscombe NM, Lu HX, Zhu XW, Xia Y, Han JDJ, Bertin N, Chung S, Vidal M,
Gerstein M (2004a) Annotation Transfer between Genomes: Protein-Protein
Interologs and Protein-DNA Regulogs. Genome Research 14:1107
Yu HY, Luscombe NM, Lu HX, Zhu XW, Xia Y, Han JDJ, Bertin N, Chung S, Vidal M,
Gerstein M (2004b) Annotation Transfer between Genomes: Protein-Protein
Interologs and Protein-DNA Regulogs. Genome Res 14:1107
Zhang JZ (2003) Evolution by Gene Duplication: An Update. Trends in Ecology &
Evolution 18:292
Zheng Q, Wang XJ (2008) Goeast: A Web-Based Software Toolkit for Gene Ontology
Enrichment Analysis. Nucleic Acids Research 36:W358
Zhu H, Bilgin M, Bangham R, Hall D, Casamayor A, Bertone P, Lan N, Jansen R,
Bidlingmaier S, Houfek T, Mitchell T, Miller P, Dean RA, Gerstein M, Snyder M
(2001) Global Analysis of Protein Activities Using Proteome Chips. Science
293:2101
Zhu X, Gerstein M, Snyder M (2007) Getting Connected: Analysis and Principles of
Biological Networks. Genes Dev 21:1010
222
Appendix A
Appendix A Description of divergence of Java implementation of Inparanoid algorithm
from Perl implementation.
In order to examine the differences in output between the novel Java implementation of the
Inparanoid algorithm and the version (2.0) distributed by Remm et al. (Remm et al. 2001) the
following test was run.
Organism A: Saccharomyces cerevisiae.
Organism B: Encephalitozoon cuniculi.
The algorithm BLASTP (version 2.2.18) was run on with the Fasta formatted file
containing all proteins for Saccharomyces cerevisiae as the query and the Fasta formatted file
containing all proteins for Encephalitozoon cuniculi as the database. The program formatdb
was used to create parsable input for BLASTP.
The substitution matrix BLOSUM62 was used to score alignments. The converse
command was also run with Encephalitozoon cuniculi as the query and Saccharomyces
cerevisiae as the database. The two organisms were also run against themselves as query and
database. The parameters v and b were set to the number of proteins in the database files and
the parameter z is set to a theoretical maximum database size by (Remm et al. 2001) to
maintain consistent values for relevant statistics such as K and " described below.
The exact syntax of the commands is given below:
!
blastall -i Saccharomyces_cerevisiae -d Saccharomyces_cerevisiae -p blastp -v 5883 -b 5883
-F "m S" -M BLOSUM62 -z 5000000 -V
blastall -i Saccharomyces_cerevisiae -d Encephalitozoon_cuniculi -p blastp -v 1996 -b 1996 F "m S" -M BLOSUM62 -z 5000000 -V
blastall -i Encephalitozoon_cuniculi -d Saccharomyces_cerevisiae -p blastp -v 5883 -b 5883 F "m S" -M BLOSUM62 -z 5000000 -V
blastall -i Encephalitozoon_cuniculi -d Encephalitozoon_cuniculi -p blastp -v 1996 -b 1996 F "m S" -M BLOSUM62 -z 5000000 -V
The output from these commands was fed to the Perl script blast_parser.pl provided in the
Inparanoid package, which produces formatted output in the following order:
223
Appendix A
Protein Id1.
Protein Id2.
Bit Score.
E value.
Protein A Length.
Protein B Length.
Identity percentage.
Similarity percentage.
S' =
!
!
"S # lnK
ln2
(1)
S ' = bit score, S= raw score, K = constant associated with search space size and " =constant
224
Appendix A
Reciprocal best hits as marked by Perl implementation but not by Java implementation.
Protein pair 1
Protein A: NP_015092 (Saccharomyces cerevisiae )
Protein B: NP_586462 (Encephalitozoon cuniculi)
A-B bit score = 56.2 (Rounded down to 56 by Perl). This is the best score in the A-B
direction.
B-A bit score = 55.5 (Rounded up to 56 by Perl).
Mean bit score = 55.85 (Rounded to 56 by Perl).
NP_586462 however has another significant score against Saccharomyces cerevisiae, which
is NP_013908 with a mean bit score of 56.2 (Rounded down to 56 by Perl).
The Java implementation does not recognise NP_015092 and NP_586462 as reciprocal best
hits as 56.2 > 55.85.
Protein pair 2
Protein A: NP_014520 (Saccharomyces cerevisiae ).
Protein B: NP_586468 (Encephalitozoon cuniculi).
A-B Bit score = 61.6 (Rounded up to 62 by Perl).
B-A Bit score = 61.6 (Rounded up to 62 by Perl).
Mean bit score = 61.6 (Rounded up to 62 by Perl).
NP_586468 has another significant score against Saccharomyces cerevisiae, which is
NP_014097 with a mean bit score of 62.0.
The Java implementation does not recognise NP_014520 and NP_586468 as reciprocal best
hits as 62.0 > 61.6.
Protein pair 3
Protein A: NP_013648 (Saccharomyces cerevisiae ).
Protein B: NP_597364 (Encephalitozoon cuniculi).
A-B Bit score = 111.0
B-A Bit score = 112.0
Mean bit score = 111.5 (Rounded up to 112 by Perl).
NP_597364 has another significant score against Saccharomyces cerevisiae, which is
NP_013546 with a mean bit score of 112.0
225
Appendix A
The Java implementation does not recognise NP_013648 and NP_597364 as reciprocal best
hits as 112.0 > 111.5.
Protein pair 4
Protein A: NP_013182 (Saccharomyces cerevisiae ).
Protein B: NP_586039 (Encephalitozoon cuniculi).
A-B Bit score = 60.5
B-A Bit score = 60.5
Mean bit score= 60.5 (Rounded up to 61 by Perl).
NP_586039 has two other significant scores against Saccharomyces cerevisiae, which are
NP_010629 and NP_010630, which both have mean bit scores of 60.85.
The Java implementation does not recognise NP_013648 and NP_597364 as reciprocal best
hits as 60.85 > 60.5.
Protein pair 5
Protein A: NP_010407 (Saccharomyces cerevisiae).
Protein B: NP_597607 (Encephalitozoon cuniculi).
A-B Bit score = 246.0
B-A Bit score = 245.0
Mean bit score = 245.5(Rounded up to 246 by Perl).
NP_597607 has another significant score against Saccharomyces cerevisiae, which is
NP_013197, which has a mean bit score of 246.
The Java implementation does not recognise NP_013648 and NP_597364 as reciprocal best
hits as 246> 245.5.
Differences in Cluster Output
There are a number of groups which differ between the two implementations on this test data.
This is due to different scores being stored for various values affecting the criterion for
reciprocal bests as well as the criteria for merging and deleting clusters. However the primary
purpose for ortholog selection in this project, which is detection of presence and absence of
proteins, is achieved, as the number of Saccharomyces cerevisiae proteins found to be present
in Encephalitozoon cuniculi was identical.
226
Appendix A
Groups, which differ between implementations.
There are 16 groups, which differ between the two implementations. The Java
implementation produces 616 groups while the Perl implementation produces 619.
Orthologous Group 1
Perl Inparanoid implementation
NP_009501
NP_586181
NP_014887
Java implementation clusters NP_009501 and NP_014887 with a separate protein
XP_955683.
Orthologous Group 2
Perl Inparanoid implementation
NP_011424 NP_586425
Orthologous Group 3
Perl Inparanoid implementation
NP_012263 NP_597203
Groups 2 and 3 are merged into one group by the Java implementation.
Orthologous Group 4
Perl Inparanoid implementation
NP_012610.
XP_955636
NP_010056.
NP_009928.
NP_010504.
Group 4 does not contain NP_009928 in output from the Java implementation.
Orthologous group 5
Perl Inparanoid implementation
227
Appendix A
NP_012710.
NP_597625
NP_014074.
Orthologous group 6
Perl Inparanoid implementation
NP_014293.
NP_597270
NP_014752.
NP_012264.
Groups 5 and 6 are merged into one group by the Java implementation.
Orthologous group 7
Perl Inparanoid implementation
NP_011573.
XP_965975
NP_011975.
NP_013418.
Orthologous group 7 has an additional paralog added in Saccharomyces cerevisiae by the
Java implementation NP_009928.
Orthologous group 8
Perl Inparanoid implementation
NP_011651.
NP_597477
NP_597286
NP_014604.
228
Appendix A
Orthologous group 9 has an additional paralog added in Saccharomyces cerevisiae by the
Java implementation NP_015007. This paralog replaces NP_014604.
Orthologous Group 10
Perl Inparanoid implementation
NP_014045.
NP_586473
NP_015007.
Java implementation clusters NP_014045. and NP_015007. with a separate protein
NP_597286.
Orthologous Group 11
Perl Inparanoid implementation
NP_012603.
NP_584802
NP_597429
Group 11 does not contain NP_597429 in output from the Java implementation.
Orthologous Group 12
Perl Inparanoid implementation
NP_010144.
NP_597320
NP_010089.
NP_586125
NP_014737.
Orthologous Group 13
Perl Inparanoid implementation
NP_009723.
NP_597558
NP_015274.
Orthologous groups 12 and 13 are merged into one group by the Java implementation.
229
Appendix A
Orthologous Group 14
Perl Inparanoid implementation
NP_009800
NP_584705
NP_010629
NP_586039
NP_010630
NP_013182
NP_011960
NP_012316
NP_014486
NP_010632
NP_011962
NP_012321
NP_011964
NP_013724
NP_116644
NP_010845
NP_014470
NP_010036
NP_012692
NP_011411
NP_014081
NP_010087
NP_010143
NP_010825
NP_014538
NP_010785
NP_010675
NP_116613
NP_011805
NP_010034
NP_012694
NP_009857
230
Appendix A
NP_010082
Orthologous group 14 has an additional paralog added in Saccharomyces cerevisiae by the
Perl implementation NP_010082.
Orthologous Group 15
Perl Inparanoid implementation
NP_012710.
NP_597625
NP_014074.
Orthologous Group 16
Perl Inparanoid implementation
NP_014293.
NP_597270
NP_014752.
NP_012264.
Orthologous groups 15 and 16 are merged into one group by the Java implementation.
231
Appendix B
Appendix B Individual Gene trees for genes in super matrix utilised in construction of
Phylogeny
232
Appendix B
233
Appendix B
234
Appendix B
Appendix B
Appendix B
Appendix B
238
Appendix B
239
Appendix B
240
Appendix B
Appendix C
Appendix C: Predictions made by constrained ML
Protein 1
Protein 2
Description
Description
PREDICTED:
apolipoprotein A-I binding protein
91984773
precursor
89042891
sigma 2 subunit
PREDICTED: similar to peptidylprolyl
23110944
113429091
isomerase A isoform 1
meiotic recombination protein SPO11
23110944
38201680
23110944
113414586
isoform b
PREDICTED: similar to CG17293-PA
PREDICTED: similar to Ubiquitin-63E
11024714
ubiquitin B precursor
113423966
11024714
ubiquitin B precursor
5454144
7705785
113414586
CG11624-PA, isoform A
ubiquitin D
PREDICTED: similar to CG17293-PA
PREDICTED: similar to peptidylprolyl
7705785
113429091
isomerase A isoform 1
meiotic recombination protein SPO11
7705785
38201680
isoform b
4557896
myotubularin
41350318
2 isoform a
51467029
protein S26
2 isoform a
4557896
myotubularin
7705477
28872761
myotubularin-related protein 1
2 isoform a
2 isoform a
38201710
242
Appendix C
transcription elongation factor A protein
4507385
2 isoform a
4557896
myotubularin
2 isoform a
4507129
E
RNA, U3 small nucleolar interacting protein
2 isoform a
4759276
2 isoform a
116812591
2 isoform a
4557719
DNA ligase I
2 isoform a
56549681
4557896
myotubularin
18491016
exonuclease 1 isoform b
2 isoform a
4758922
biosynthesis, class L
4506233
7019319
2 isoform a
transcription elongation factor A protein
4507385
2 isoform a
transcription elongation factor A protein
4507385
2 isoform a
10863925
polypeptide L
dehydrodolichyl diphosphate synthase
2 isoform a
45580738
isoform b
2 isoform a
4557896
myotubularin
150170706
19923424
2 isoform a
8923942
243
40254869
Appendix C
2 isoform a
transcription elongation factor A protein
4507385
2 isoform a
41327715
2 isoform a
4507311
2 isoform a
suppressor of Ty 4 homolog 1
PREDICTED: similar to large subunit
113427044
2 isoform a
4506651
4557896
myotubularin
113429091
isomerase A isoform 1
4557896
myotubularin
113414586
2 isoform a
4505947
phosphoribosyl pyrophosphate
4506127
polypeptide G
phosphoribosyl pyrophosphate synthetase 1-
synthetase 1
28557709
like 1
PREDICTED: similar to adaptor-related
153791910
89042891
4506541
4557719
38348232
89042891
homolog
113414586
homolog
(SIG-20)
PREDICTED: similar to 60S ribosomal
homolog
244
(SIG-20)
Appendix C
homolog
isomerase A isoform 1
PREDICTED: similar to adaptor-related
38348232
89041736
40254869
homolog
38201680
isoform b
13775200
SF3b10
62909985
13775200
SF3b10
4505331
13775200
SF3b10
113429091
isomerase A isoform 1
meiotic recombination protein SPO11
13775200
SF3b10
38201680
isoform b
PREDICTED: similar to adaptor-related
13775200
SF3b10
89042891
13775200
SF3b10
4502743
13775200
SF3b10
111494251
precursor
ras homolog gene family, member C
13775200
SF3b10
111494248
precursor
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
13775200
SF3b10
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
13775200
SF3b10
113418826
(SIG-20)
13775200
SF3b10
113414586
PSF2
isomerase A isoform 1
PREDICTED: similar to adaptor-related
89042891
245
Appendix C
7662482
transmembrane protein 15
4557719
DNA ligase I
meiotic recombination protein SPO11
4826675
cyclin-dependent kinase 5
38201680
isoform b
PREDICTED: similar to peptidylprolyl
4826675
cyclin-dependent kinase 5
113429091
4826675
cyclin-dependent kinase 5
4557719
4507213
isomerase A isoform 1
DNA ligase I
113414586
113414586
cyclin-dependent kinase 5
89042891
4826675
cyclin-dependent kinase 5
89041736
35493987
113429091
N-ethylmaleimide-sensitive factor
44917606
isomerase A isoform 1
PREDICTED: similar to adaptor-related
89042891
N-ethylmaleimide-sensitive factor
44917606
4557719
DNA ligase I
PREDICTED: similar to adaptor-related
16945972
89041736
N-ethylmaleimide-sensitive factor
44917606
35493987
N-ethylmaleimide-sensitive factor
4505331
113414586
N-ethylmaleimide-sensitive factor
44917606
89041736
16945972
89042891
35493987
38201680
246
isoform b
Appendix C
16945972
7019405
133925811
transportin 1 isoform 1
89042891
133925811
transportin 1 isoform 1
23510381
transportin 1 isoform 2
precursor
38201680
precursor
38201680
isoform b
isoform b
PREDICTED: similar to adaptor-related
precursor
89042891
precursor
4557719
DNA ligase I
precursor
118600973
118600973
precursor
ras homolog gene family, member C
111494251
precursor
89041736
119943098
dihydropyrimidine dehydrogenase
113429091
precursor
isomerase A isoform 1
ras homolog gene family, member C
111494248
precursor
precursor
4506717
71772583
71772583
47717139
precursor
ras homolog gene family, member C
111494251
precursor
ras homolog gene family, member C
111494251
precursor
247
Appendix C
ras homolog gene family, member C
111494248
precursor
47717139
precursor
45580738
isoform b
dehydrodolichyl diphosphate synthase
precursor
45580738
isoform b
38201710
38201710
56549681
56549681
14249398
PHD-finger 5A
14249398
PHD-finger 5A
precursor
ras homolog gene family, member C
111494248
precursor
ras homolog gene family, member C
111494248
precursor
ras homolog gene family, member C
111494251
precursor
ras homolog gene family, member C
111494248
precursor
ras homolog gene family, member C
111494251
precursor
ras homolog gene family, member C
111494251
precursor
10863925
polypeptide L
PREDICTED: similar to adaptor-related
precursor
89042891
precursor
4557719
DNA ligase I
PREDICTED: similar to adaptor-related
precursor
89041736
precursor
4502859
4506717
precursor
248
Appendix C
glucose-6-phosphate dehydrogenase
109389365
isoform a
89042891
glucose-6-phosphate dehydrogenase
109389365
isoform a
89041736
precursor
10863925
polypeptide L
meiotic recombination protein SPO11
51173724
bystin
38201680
isoform b
PREDICTED: similar to peptidylprolyl
51173724
bystin
113429091
isomerase A isoform 1
51173724
bystin
113414586
41406094
13236516
41281768
89042891
41349495
38201680
isoform b
41406094
11141871
41349495
113414586
41281768
4503183
41349495
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
41281768
89041736
41406094
31455614
41406094
4557719
35493996
113414586
35494003
113414586
DNA ligase I
113429091
249
isomerase A isoform 1
Appendix C
PREDICTED: similar to peptidylprolyl
35493996
113429091
isomerase A isoform 1
meiotic recombination protein SPO11
35493996
38201680
isoform b
meiotic recombination protein SPO11
35494003
38201680
isoform b
meiotic recombination protein SPO11
31581534
tRNA isopentenyltransferase 1
38201680
isoform b
isoform 3
4557719
DNA ligase I
31543831
tubulin, gamma 1
6996005
isoform 3
6996005
31543831
tubulin, gamma 1
4557719
DNA ligase I
PREDICTED: similar to peptidylprolyl
31581534
tRNA isopentenyltransferase 1
29826282
protein phosphatase 1G
31581534
tRNA isopentenyltransferase 1
113429091
4505999
113414586
89042891
19913408
protein phosphatase 1G
PREDICTED: similar to CG17293-PA
PREDICTED: similar to adaptor-related
isoform 3
30581111
isomerase A isoform 1
89041736
113414586
19913408
113429091
isomerase A isoform 1
meiotic recombination protein SPO11
19913408
38201680
isoform b
PREDICTED: similar to adaptor-related
13236516
89041736
250
Appendix C
PREDICTED: similar to peptidylprolyl
13236516
113429091
isomerase A isoform 1
13236516
113414586
13236516
38201680
isoform b
PREDICTED: similar to adaptor-related
13236516
89042891
10835049
89042891
member 3
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
89041736
5729877
89042891
isoform 1
4557719
isoform 1
89042891
isoform 1
DNA ligase I
113429091
isomerase A isoform 1
isoform 1
71772583
4506717
isoform 1
proteasome 26S ATPase subunit 4
5729991
isoform 1
10863925
polypeptide L
PREDICTED: similar to adaptor-related
5729877
5729991
89041736
8923942
251
Appendix C
isoform 1
PREDICTED: similar to adaptor-related
6005764
89042891
isoform 1
89041736
isoform 1
45580738
isoform b
56549681
isoform 1
ATP-binding cassette, sub-family C
4557481
(CFTR/MRP), member 2
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
4507785
113429091
isomerase A isoform 1
4507785
113414586
4507785
38201680
113429091
isoform b
isomerase A isoform 1
PREDICTED: similar to Ubiquitin-like
113422449
protein FUBI
PREDICTED: similar to adaptor-related
4504277
89042891
4503183
89041736
4503183
89042891
153252132
4506005
isoform 1
153251913
252
Appendix C
153252132
113414586
33286434
116256336
33286434
6996005
30520314
89042891
7657339
113414586
151101384
151101386
73622130
7662010
BolA-like protein 2
85797673
bolA-like protein 2B
10190686
19718751
89041601
149944735
15011936
149944735
88980535
protein S26
PREDICTED: similar to 40S ribosomal
149944735
88982349
protein S26
PREDICTED: similar to 40S ribosomal
149944735
113420084
protein S26
PREDICTED: similar to 40S ribosomal
149944735
89025350
149944735
113430282
protein S26
PREDICTED: similar to 40S ribosomal
149944735
88987217
protein S26
PREDICTED: similar to 40S ribosomal
149944735
150010661
SEC14-like 5
113429703
89042891
253
protein S26
Appendix C
protein complex 1 sigma 2 subunit
PREDICTED: similar to peptidylprolyl
150170706
113429091
isomerase A isoform 1
meiotic recombination protein SPO11
38201710
28626498
38201680
4557719
isoform b
DNA ligase I
PREDICTED: similar to adaptor-related
47778943
syntaxin 16 isoform a
89042891
38201710
89041736
38201710
4758496
38201710
113414586
38201710
113418826
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
38201710
113431146
(SIG-20)
PREDICTED: similar to peptidylprolyl
38201710
113429091
isomerase A isoform 1
45580738
isoform b
8923942
47717139
4557719
254
Appendix C
complementation group 2 protein
excision repair cross-complementing
rodent repair deficiency,
15834617
10863925
polypeptide L
89042891
89041736
113429091
isomerase A isoform 1
38201680
assembly protein
89042891
isoform b
assembly protein
89041736
4503719
148727247
89042891
8923942
148727247
45580738
alpha isoform 2
148596961
isoform b
PREDICTED: similar to adaptor-related
89042891
148596938
148536853
alpha isoform 2
113429091
145275210
113418826
255
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
Appendix C
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
145275210
113431146
(SIG-20)
145275210
113414586
145275187
tRNA-(N1G37) methyltransferase
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
145275210
113429091
isomerase A isoform 1
meiotic recombination protein SPO11
145275210
38201680
126723390
121582655
124256496
34419635
isoform b
ankyrin repeat domain 35
heat shock 70kDa protein 6 (HSP70B')
PREDICTED: similar to adaptor-related
126723390
89041736
(putative)
89042891
121582655
89041736
118600973
113418826
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
118600973
113431146
(SIG-20)
isoform 1
13129120
isoform 1
118498359
38201680
113414586
256
isoform b
PREDICTED: similar to CG17293-PA
Appendix C
PREDICTED: similar to peptidylprolyl
118600973
113429091
isomerase A isoform 1
113414586
isoform 1
RER1 retention in endoplasmic
116812591
reticulum 1
62909985
reticulum 1
113414586
reticulum 1
38201680
reticulum 1
isoform b
PREDICTED: similar to peptidylprolyl
113429091
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
reticulum 1
(SIG-20)
PREDICTED: similar to 60S ribosomal
reticulum 1
member 3
89042891
member 3
116256336
(SIG-20)
89041736
110347439
116256336
4506005
115387112
ubiquitin-like 5
13236510
115387112
ubiquitin-like 5
113414586
isoform 1
ubiquitin-like 5
PREDICTED: similar to CG17293-PA
PREDICTED: similar to adaptor-related
FGR
89042891
89042891
257
Appendix C
FGR
112382377
113430896
isoform a
89042891
isoform a
FGR
89041736
FGR
113429091
113429091
containing 1 isoform 1
89041736
89042891
110347439
110347439
containing 1 isoform 1
110349738
isomerase A isoform 1
PREDICTED: similar to adaptor-related
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
(E2-EPF5)
89042891
6996005
10190696
110347439
isoform 2
89042891
110347439
113413881
110347439
10190686
110349799
89041736
258
114
zinc finger protein 286
Appendix C
protein complex 1 sigma 2 subunit
109452595
109452593
109255245
113414586
109255245
6996005
member 1
89042891
member 1
89042891
95147356
89042891
95147356
89041736
89041736
89042891
93141204
methyltransferase like 2B
93004102
89041736
113414586
89145417
methyltransferase like 7A
89042891
5B
isomerase A isoform 1
nucleolar protein family A, member 2
isoform b
8923444
isoform a
PREDICTED: similar to adaptor-related
77812670
89042891
77812670
89041736
259
Appendix C
myosin head domain containing 1
75812980
isoform 3
isoform 3
89042891
homolog
isoform 1
89042891
113429091
isomerase A isoform 1
isomerase A isoform 1
meiotic recombination protein SPO11
isoform 1
38201680
isoform b
isoform 1
71772583
4758496
71772583
32130516
113414586
71772583
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
71772583
113418826
(SIG-20)
71772583
62909985
71772583
113414586
71772583
19718751
8393719
peroxisomal enoyl-coenzyme A
(putative)
PREDICTED: similar to adaptor-related
70995211
hydratase-like protein
71772583
4506717
71772583
38016127
89042891
260
Appendix C
71772583
4502743
cyclin-dependent kinase 7
71772583
4557719
DNA ligase I
meiotic recombination protein SPO11
71772583
38201680
isoform b
PREDICTED: similar to peptidylprolyl
71772583
113429091
isomerase A isoform 1
68509270
113414586
68303635
mutS homolog 3
89042891
68226422
32401427
62955833
89042891
antigen 10
89042891
64276486
antigen 10
89041736
62955833
48717485
62865890
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
62460637
importin 4
89041736
62865890
45580738
62865890
4557719
isoform b
DNA ligase I
ubiquitin-conjugating enzyme E2D 4
62865890
8393719
(putative)
PREDICTED: similar to adaptor-related
62865890
89042891
62865890
8923942
261
Appendix C
PREDICTED: similar to adaptor-related
62865890
89041736
62865890
56549681
62240994
62240992
62234438
41350318
62234438
44680154
62234461
41350318
62234461
62234438
62234438
4502703
62234438
21536371
telomerase-associated protein 1
62234461
44680154
58533179
7657548
60279265
38201680
isoform b
meiotic recombination protein SPO11
58533179
60279265
38201680
7657546
isoform b
Sec61 gamma subunit
PREDICTED: similar to adaptor-related
60279265
89042891
58533179
113429091
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
58533179
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
58533179
113418826
(SIG-20)
58533179
113414586
262
Appendix C
57165436
serine/threonine kinase 16
57165434
serine/threonine kinase 16
PREDICTED: similar to peptidylprolyl
56549681
88943062
56549681
113423887
isomerase A isoform 1
PREDICTED: similar to adaptor-related
56549681
89041736
56549681
31543091
56549681
22035624
phosphatidate cytidylyltransferase 1
PREDICTED: similar to adaptor-related
56549683
89042891
56549681
4502743
56549681
113418826
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
56549681
113431146
(SIG-20)
PREDICTED: similar to peptidylprolyl
56549681
113422777
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
56549681
89042897
isomerase A isoform 1
56549681
38016127
56549681
38201680
56549681
5729840
isoform b
tubulin, gamma complex associated protein 2
ubiquitin-conjugating enzyme E2D 4
56549681
8393719
(putative)
PREDICTED: similar to peptidylprolyl
56549681
88943041
263
Appendix C
PREDICTED: similar to peptidylprolyl
56549681
88953813
isomerase A isoform 1
PREDICTED: similar to TBC1 domain
family member 3 (Rab GTPase-activating
protein PRC17) (Prostate cancer gene 17
56549681
113426831
56549681
4557719
subunit 4 isoform b
56550057
isoform a
PREDICTED: similar to adaptor-related
56699411
89042891
56549683
89041736
56549681
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
56549681
89042891
56549681
4758496
56549681
6912680
isoform a
PREDICTED: similar to adaptor-related
56699411
89041736
56549681
62909985
55956895
89041736
56549113
89042891
56118223
choline/ethanolaminephosphotransferase
5174415
56549113
89041736
264
Appendix C
SMT3 suppressor of mif two 3 homolog
54792071
2 isoform b precursor
54792069
isoform a precursor
oxoglutarate (alpha-ketoglutarate)
dehydrogenase (lipoamide) isoform 1
51873036
precursor
51944950
phosducin-like 2
member 12
41054844
5-phosphatase, A isoform 2
89041736
chromosomes 4-like 1
89042891
chromosomes 4-like 1
89042891
5-phosphatase, A isoform 2
89042891
5-phosphatase, A isoform 2
chromosomes 4-like 1
18765707
chromosomes 4-like 1
113429091
113429091
subunit 11 isoform 2
50409781
subunit 11 isoform 2
18777675
11 isoform 2
APC11 anaphase promoting complex subunit
subunit 11 isoform 2
50409750
11 isoform 2
APC11 anaphase promoting complex subunit
isomerase A isoform 1
APC11 anaphase promoting complex subunit
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
phosphatase isoform 2
PREDICTED: similar to peptidylprolyl
member 12
PREDICTED: similar to adaptor-related
11 isoform 2
APC11 anaphase promoting complex subunit
subunit 11 isoform 2
50409750
265
11 isoform 2
Appendix C
APC11 anaphase promoting complex
50409796
subunit 11 isoform 2
50409789
subunit 11 isoform 2
50409750
subunit 11 isoform 2
50409789
subunit 11 isoform 2
18777675
subunit 11 isoform 2
50409781
subunit 11 isoform 2
50409796
subunit 11 isoform 2
18777675
subunit 11 isoform 2
50409781
11 isoform 2
PREDICTED: similar to adaptor-related
polypeptide A'
89042891
11 isoform 2
APC11 anaphase promoting complex subunit
11 isoform 2
APC11 anaphase promoting complex subunit
11 isoform 2
APC11 anaphase promoting complex subunit
11 isoform 2
APC11 anaphase promoting complex subunit
11 isoform 2
APC11 anaphase promoting complex subunit
11 isoform 2
APC11 anaphase promoting complex subunit
11 isoform 2
3, polypeptide A2
89042891
50083277
89042891
subunit 11 isoform 2
50409750
subunit 11 isoform 2
18777675
11 isoform 2
APC11 anaphase promoting complex subunit
subunit 11 isoform 2
18777675
11 isoform 2
APC11 anaphase promoting complex subunit
11 isoform 2
PREDICTED: similar to adaptor-related
3, polypeptide A2
89041736
266
Appendix C
DDI1, DNA-damage inducible 1,
48717485
homolog 1
89042891
homolog 1
89041736
6996005
leucine-zipper-like transcription
47717139
regulator 1
leucine-zipper-like transcription
47717139
regulator 1
38201680
isoform b
leucine-zipper-like transcription
47717139
regulator 1
4758496
4557719
DNA ligase I
leucine-zipper-like transcription
47717139
regulator 1
solute carrier family 25 member 3
47132595
isoform b precursor
45580738
isoform b
isoform b
113414586
isoform b
21536371
telomerase-associated protein 1
23397458
isoform b
dehydrodolichyl diphosphate synthase
45580742
isoform a
45580738
isoform b
isoform a
113414586
isoform b
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
isoform b
88943062
isoform b
89042897
267
isomerase A isoform 1
Appendix C
dehydrodolichyl diphosphate synthase
45580738
isoform b
isomerase A isoform 1
isoform b
5729840
isoform b
38201680
isoform b
isoform b
4506707
isoform b
(SIG-20)
PREDICTED: similar to 60S ribosomal
isoform b
(SIG-20)
isoform b
41872631
38016127
isoform b
dehydrodolichyl diphosphate synthase
45580738
isoform b
4505775
isoform a
precursor
isomerase A isoform 1
PREDICTED: similar to adaptor-related
isoform b
89042891
32130516
isoform b
dehydrodolichyl diphosphate synthase
45580738
isoform b
isomerase A isoform 1
isoform b
62909985
113426831
268
Appendix C
isoform b
beta isoform 1
4506005
isoform 1
isoform b
17978477
isoform b
4557719
DNA ligase I
meiotic recombination protein SPO11
isoform a
38201680
isoform b
isoform b
4758496
4502743
cyclin-dependent kinase 7
isoform b
dehydrodolichyl diphosphate synthase
45580738
isoform b
42516576
WW domain-containing oxidoreductase
isoform b
7706523
glutaredoxin 5
isoform 1
WW domain-containing oxidoreductase
isoform b
18860884
isoform 2
isoform b
7705369
isoform a
(SIG-20)
PREDICTED: similar to 60S ribosomal
isoform a
(SIG-20)
isoform b
269
31455614
23510381
transportin 1 isoform 2
Appendix C
isoform b
dehydrodolichyl diphosphate synthase
45580738
isoform b
88953813
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
isoform b
88943041
45238849
89041736
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
113418826
(SIG-20)
113414586
1
myotubularin-related protein 2 isoform
44680154
isomerase A isoform 1
41350318
19923424
myotubularin-related protein 9
21536371
telomerase-associated protein 1
1
myotubularin-related protein 2 isoform
44680154
1
transcription elongation factor A 1
45439355
isoform 2
113414586
45238849
89042891
28872761
myotubularin-related protein 1
1
acyl-CoA synthetase long-chain family
42794754
member 3
42794752
270
member 3
Appendix C
PREDICTED: similar to adaptor-related
42516563
UDP-glucuronate decarboxylase 1
89041736
41872631
89042891
42516563
UDP-glucuronate decarboxylase 1
89042891
42516576
glutaredoxin 5
89042891
41872631
89041736
19923424
myotubularin-related protein 9
2
myotubularin-related protein 2 isoform
41350316
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
113431146
(SIG-20)
PREDICTED: similar to peptidylprolyl
113429091
isomerase A isoform 1
21536371
telomerase-associated protein 1
28872761
myotubularin-related protein 1
2
myotubularin-related protein 2 isoform
41350318
113414586
41327715
113429091
41327715
89042891
271
isomerase A isoform 1
Appendix C
protein complex 1 sigma 2 subunit
PREDICTED: similar to adaptor-related
41349441
41327715
89042891
113414586
41349441
89041736
ubiquitin-conjugating enzyme E2
40806167
variant 1 isoform a
113429091
isomerase A isoform 1
113414586
ubiquitin-conjugating enzyme E2
40806167
variant 1 isoform a
ubiquitin-conjugating enzyme E2
40806167
variant 1 isoform a
38201680
isoform b
domain containing 9
113414586
domain containing 9
(SIG-20)
PREDICTED: similar to 60S ribosomal
domain containing 9
(SIG-20)
PREDICTED: similar to peptidylprolyl
39725636
domain containing 9
113429091
isomerase A isoform 1
38708309
113428755
38327644
89041736
38327644
89042891
38327644
62909985
isoform b
29742309
272
L31
Appendix C
meiotic recombination protein SPO11
38201680
isoform b
L31
isoform b
7706343
4504221
guanylate kinase 1
4506193
7657198
dimethyladenosine transferase
4507797
4506699
4506643
isoform b
meiotic recombination protein SPO11
38201680
isoform b
meiotic recombination protein SPO11
38201680
isoform b
meiotic recombination protein SPO11
38201680
isoform b
meiotic recombination protein SPO11
38201680
isoform b
meiotic recombination protein SPO11
38201680
isoform b
meiotic recombination protein SPO11
38201680
isoform b
13129120
isoform b
7706423
4557719
DNA ligase I
isoform b
meiotic recombination protein SPO11
38201680
isoform b
10863925
polypeptide L
14249398
PHD-finger 5A
62909985
7705477
isoform b
meiotic recombination protein SPO11
38201680
isoform b
meiotic recombination protein SPO11
38201680
isoform b
273
Appendix C
meiotic recombination protein SPO11
38201680
isoform b
11321585
subunit
SWI/SNF-related matrix-associated actin-
isoform b
21071060
isoform b
4506631
isoform b
15150809
SEC11-like 3
isoform b
7657546
8923475
thioredoxin-like 4B
isoform b
meiotic recombination protein SPO11
38201680
isoform b
isoform b
4758384
8922905
RIO kinase 2
4506701
isoform b
meiotic recombination protein SPO11
38201680
isoform b
meiotic recombination protein SPO11
38201680
isoform b
18105063
isoform b
(SIG-20)
PREDICTED: similar to 60S ribosomal
isoform b
(SIG-20)
isoform b
8923942
4502643
274
Appendix C
38201680
isoform b
isoform a
isoform b
4758922
biosynthesis, class L
isoform b
15431295
15431297
isoform b
meiotic recombination protein SPO11
38201680
isoform b
isomerase A isoform 1
isoform b
7706667
4507311
suppressor of Ty 4 homolog 1
isoform b
isoform b
protein HIP)
isoform b
32189369
isoform b
4503729
FK506-binding protein 4
PREDICTED: similar to 40S ribosomal
isoform b
89035017
isoform b
4507873
isoform b
4507129
E
solute carrier family 2 (facilitated glucose
isoform b
8923733
transporter), member 6
isoform b
113414586
275
Appendix C
meiotic recombination protein SPO11
38201680
isoform b
4502859
4506609
isoform b
isoform b
32813443
isoform b
phosphatase 1 isoform 2
L31
meiotic recombination protein SPO11
isoform b
6912680
isoform a
7657548
isoform b
meiotic recombination protein SPO11
38201680
isoform b
isoform b
51467029
protein S26
10864021
isoform b
meiotic recombination protein SPO11
38201680
isoform b
isoform b
subunit, 58kDa
PREDICTED: similar to DNA primase large
113418086
subunit, 58kDa
isoform b
4506717
polypeptide B''
89042891
isoform b
4502743
cyclin-dependent kinase 7
5729840
isoform b
276
Appendix C
meiotic recombination protein SPO11
38201680
isoform b
4504523
isoform b
113419590
protein S28
isoform b
4506651
isoform b
113427044
isoform b
23397458
isoform b
4759276
34147513
2
PREDICTED: similar to adaptor-related
89042891
113414586
89041736
glycerophosphodiester
32698962
4557719
DNA ligase I
isoform 3
32967276
32967280
isoform 3
serologically defined colon cancer
32130516
antigen 1
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
31542547
dullard homolog
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
31542507
32130516
113429091
10863925
277
isomerase A isoform 1
Appendix C
32130516
antigen 1
polypeptide L
antigen 1
18765707
phosphatase isoform 2
PREDICTED: similar to adaptor-related
31542547
dullard homolog
89042891
antigen 1
29553970
30425538
4506717
63029935
4557719
DNA ligase I
PREDICTED: similar to adaptor-related
31455614
89042891
31455614
89041736
30410779
89041736
29553970
89041736
29553970
63029943
28872761
myotubularin-related protein 1
113429091
28872761
myotubularin-related protein 1
19923424
isomerase A isoform 1
myotubularin-related protein 9
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
28872761
myotubularin-related protein 1
113418826
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
28872761
myotubularin-related protein 1
113431146
(SIG-20)
PREDICTED: similar to adaptor-related
29553970
28827774
89042891
89042891
dual-specificity tyrosine-(Y)-
278
Appendix C
phosphorylation regulated kinase 4
28872761
myotubularin-related protein 1
89042891
isoform 2
24430186
24371241
4505795
isoform 1
phosphatidylinositol glycan, class C
PREDICTED: similar to adaptor-related
24430186
89042891
23510381
transportin 1 isoform 2
8923942
23397458
8923942
23397458
10863925
polypeptide L
PREDICTED: similar to adaptor-related
23510381
transportin 1 isoform 2
89042891
23397458
23199991
89042891
4503093
22202633
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
22035624
phosphatidate cytidylyltransferase 1
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
21624654
spermatogenesis associated 5
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
21362110
89041736
21624654
spermatogenesis associated 5
89042891
21362110
89042891
279
Appendix C
protein complex 1 sigma 2 subunit
PREDICTED: similar to peptidylprolyl
21450653
113429091
isomerase A isoform 1
21361144
113414586
21361376
89042891
21361144
113429091
isomerase A isoform 1
precursor
4758304
20270343
89042891
SWI/SNF-related matrix-associated
actin-dependent regulator of chromatin
21071060
a-like 1
4502743
cyclin-dependent kinase 7
SWI/SNF-related matrix-associated
actin-dependent regulator of chromatin
21071060
a-like 1
113414586
SWI/SNF-related matrix-associated
actin-dependent regulator of chromatin
21071060
a-like 1
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
19718751
113429091
isomerase A isoform 1
DNA directed RNA polymerase II
19718751
10863925
polypeptide L
skeletal muscle and kidney enriched inositol
19718751
18765707
19718751
4506717
phosphatase isoform 2
ribosomal protein S29 isoform 1
PREDICTED: similar to adaptor-related
18860916
5'-3' exoribonuclease 2
89042891
280
Appendix C
PREDICTED: similar to peptidylprolyl
19913428
vacuolar H+ATPase B2
113429091
isomerase A isoform 1
autophagy-related cysteine endopeptidase 2
19718751
30795252
isoform a
autophagy-related cysteine endopeptidase 2
19718751
19923424
myotubularin-related protein 9
19718751
30795248
113414586
8923942
isoform b
PREDICTED: similar to CG17293-PA
nucleolar protein family A, member 3
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
18105063
113418826
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
18105063
18491016
exonuclease 1 isoform b
18105063
113431146
4557719
113414586
(SIG-20)
DNA ligase I
PREDICTED: similar to CG17293-PA
PREDICTED: similar to peptidylprolyl
18105063
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
89042891
17978519
17978477
89042891
4502859
17978519
113429091
isomerase A isoform 1
PREDICTED: similar to 40S ribosomal
15011936
88980535
protein S26
PREDICTED: similar to 40S ribosomal
15011936
15150809
SEC11-like 3
89041601
113414586
281
Appendix C
PREDICTED: similar to 40S ribosomal
15011936
88982349
protein S26
PREDICTED: similar to 40S ribosomal
15011936
113420084
protein S26
PREDICTED: similar to peptidylprolyl
15150809
SEC11-like 3
113429091
isomerase A isoform 1
PREDICTED: similar to 40S ribosomal
15011936
89025350
15011936
113430282
protein S26
PREDICTED: similar to 40S ribosomal
15011936
88987217
protein S26
PREDICTED: similar to 40S ribosomal
15011936
14249398
PHD-finger 5A
113429703
4758496
protein S26
H2A histone family, member Y isoform 2
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
14249398
PHD-finger 5A
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
14249398
PHD-finger 5A
113418826
(SIG-20)
PREDICTED: similar to peptidylprolyl
14249398
PHD-finger 5A
113429091
isomerase A isoform 1
protease-like
113414586
14249398
PHD-finger 5A
113414586
14149696
SEC31 homolog B
113429091
isomerase A isoform 1
13236510
ubiquitin-like 5
113414586
11863130
113429091
282
Appendix C
10863925
isoform 1
isomerase A isoform 1
polypeptide L
89042891
10864021
113429091
isomerase A isoform 1
ubiquitin-conjugating enzyme E2D 4
polypeptide L
8393719
(putative)
4502743
cyclin-dependent kinase 7
4557719
DNA ligase I
polypeptide L
DNA directed RNA polymerase II
10863925
polypeptide L
DNA directed RNA polymerase II
10863925
polypeptide L
38016127
62909985
polypeptide L
DNA directed RNA polymerase II
10863925
polypeptide L
6912680
isoform a
4758496
polypeptide L
DNA directed RNA polymerase II
10863925
polypeptide L
isomerase A isoform 1
polypeptide L
5729840
polypeptide L
10864021
89041736
113414586
polypeptide L
283
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
Appendix C
polypeptide L
(SIG-20)
polypeptide L
113414586
8923942
4506005
isoform 1
8923942
4502743
cyclin-dependent kinase 7
8923942
4557719
DNA ligase I
8923942
4758496
8923942
38016127
8923942
113426831
8923942
113414586
8923942
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
8923942
113418826
(SIG-20)
PREDICTED: similar to adaptor-related
8923942
89042891
8923942
89041736
HARP11
89042891
8923942
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
HARP11
89041736
284
Appendix C
PREDICTED: similar to adaptor-related
7706326
89042891
7706753
89042891
7706667
89041736
7706657
89041736
7706667
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
7706326
89041736
member 11 precursor
89042891
7706657
89042891
7706667
7705483
89042891
113414586
7705483
89041736
7705483
7657198
dimethyladenosine transferase
89042891
113414586
7657522
89042891
7657198
dimethyladenosine transferase
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
7657548
113429091
285
isomerase A isoform 1
Appendix C
PREDICTED: similar to adaptor-related
7657546
89042891
7657548
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
7657548
113418826
(SIG-20)
7657548
113414586
isoform a
isomerase A isoform 1
isoform a
4557719
isoform a
89042891
DNA ligase I
member 8
89042891
5902002
89042891
precursor
31542539
member 8
protein 2
member 3
PREDICTED: similar to adaptor-related
89041736
113429091
isomerase A isoform 1
113414586
protein 2
tubulin, gamma complex associated
5729840
protein 2
6996005
5454144
ubiquitin D
113423966
286
CG11624-PA, isoform A
Appendix C
tubulin, gamma complex associated
5729840
protein 2
4557719
protein 2
89042891
protein 2
89041736
DNA ligase I
protein 3
89042891
5032133
89042891
5031635
cofilin 1 (non-muscle)
113429091
isomerase A isoform 1
protein phosphatase 1, catalytic subunit, beta
1-like 2
4506005
isoform 1
4758496
89042891
4759302
6996005
4759302
89041736
polypeptide K
113414586
polypeptide K
113429091
isomerase A isoform 1
113414586
biosynthesis, class L
phosphatidylinositol glycan anchor
4758922
biosynthesis, class L
4502743
287
cyclin-dependent kinase 7
Appendix C
phosphatidylinositol glycan anchor
4758922
biosynthesis, class L
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
biosynthesis, class L
(SIG-20)
PREDICTED: similar to 60S ribosomal
biosynthesis, class L
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4507873
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4507873
113418826
(SIG-20)
complementation group 3
9910180
ACN9 homolog
PREDICTED: similar to peptidylprolyl
4507947
tyrosyl-tRNA synthetase
113429091
isomerase A isoform 1
4507873
113414586
ubiquitin-conjugating enzyme E2
4507797
variant 2
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
4507873
113429091
isomerase A isoform 1
113414586
ubiquitin-conjugating enzyme E2
4507797
variant 2
113429091
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4506699
113418826
4506699
113431146
288
(SIG-20)
Appendix C
protein L26 (Silica-induced gene 20 protein)
(SIG-20)
4506717
4758496
89042891
4506717
113418826
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4506717
113431146
4506699
4506717
62909985
4506717
113414586
4506699
4758496
4502743
(SIG-20)
H2A histone family, member Y isoform 2
hypothetical protein LOC140711
PREDICTED: similar to CG17293-PA
cyclin-dependent kinase 7
ubiquitin-conjugating enzyme E2D 4
4506717
8393719
(putative)
PREDICTED: similar to peptidylprolyl
4506699
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
4507047
89041736
4506717
38016127
4506717
4502743
cyclin-dependent kinase 7
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4506701
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4506701
113418826
(SIG-20)
4506701
113414586
289
Appendix C
4506699
113414586
polypeptide B''
4506717
89042891
4557719
4506715
113422526
4506715
113423050
4506715
89034184
4506715
88959151
protein S28
PREDICTED: similar to 40S ribosomal
4506715
88953906
protein S28
PREDICTED: similar to peptidylprolyl
4506717
113429091
isomerase A isoform 1
4506643
113414586
4506617
89042891
4506643
113429091
isomerase A isoform 1
4506609
113414586
4506609
113429091
isomerase A isoform 1
4506193
113414586
4506193
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
113429091
290
isomerase A isoform 1
Appendix C
protein phosphatase 1, catalytic subunit,
4506005
beta isoform 1
4557719
DNA ligase I
PREDICTED: similar to peptidylprolyl
4506233
4505621
113429091
22165364
isomerase A isoform 1
mitochondrial ribosomal protein L38
PREDICTED: similar to adaptor-related
4505795
89042891
member 1
89041736
4504511
member 1
113429091
isomerase A isoform 1
4504221
guanylate kinase 1
113414586
4504007
89042891
4504007
89041736
4504221
guanylate kinase 1
4502703
113429091
21536371
isomerase A isoform 1
telomerase-associated protein 1
PREDICTED: similar to adaptor-related
4503301
89042891
6A isoform a
89042891
63029935
89041736
phosphoribosyl pyrophosphate
28557709
synthetase 1-like 1
89042891
63029935
63029943
phosphoribosyl pyrophosphate
28557709
synthetase 1-like 1
89041736
291
Appendix C
PREDICTED: similar to adaptor-related
66912162
148747574
histone 2, H2bf
89042891
21945058
63029935
89042891
118402582
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
38016127
89041736
38016127
51467029
protein S26
PREDICTED: similar to adaptor-related
38016127
89042891
38016127
4557719
37595752
lamin B receptor
37595750
38016127
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
38348260
38016127
32189369
89041736
6996005
113414586
32189369
113429091
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
32189369
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
32189369
113418826
32484973
113429091
292
(SIG-20)
Appendix C
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
32528306
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
32483374
nucleolar protein 5A
89042891
31795544
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
31543091
31795544
89042891
113414586
31543091
113429091
autophagy-related cysteine
30795252
endopeptidase 2 isoform a
30795248
isomerase A isoform 1
isoform b
PREDICTED: similar to adaptor-related
subunit 8
89041736
31543091
89041736
31795544
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
31795544
113418826
(SIG-20)
PREDICTED: similar to adaptor-related
28376621
89042891
28376621
89041736
28173554
histone H2B
89042891
293
Appendix C
28559085
28559083
autophagy-related cysteine
30795252
endopeptidase 2 isoform a
113414586
113414586
autophagy-related cysteine
30795248
endopeptidase 2 isoform b
autophagy-related cysteine
30795248
endopeptidase 2 isoform b
113429091
autophagy-related cysteine
30795252
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
endopeptidase 2 isoform a
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
24586679
89041736
24586675
slingshot homolog 3
89042891
24586679
89042891
27436969
2 isoform 2
4504825
22538446
22538444
22001417
gemin 5
21536371
telomerase-associated protein 1
2 subunit
89041736
2 subunit
89042891
21362084
89041736
21362084
89042891
peptidase-like
113414586
294
Appendix C
PREDICTED: similar to peptidylprolyl
21314720
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
20911035
peptidylprolyl isomerase-like 4
89042891
serine hydroxymethyltransferase 2
19923315
(mitochondrial)
89042891
WW domain-containing oxidoreductase
18860884
isoform 2
89042891
WW domain-containing oxidoreductase
18860884
isoform 2
7706523
isoform 1
4557719
DNA ligase I
serine hydroxymethyltransferase 2
19923315
(mitochondrial)
WW domain-containing oxidoreductase
18860884
isoform 2
89041736
15431297
113429091
isomerase A isoform 1
15431297
113414586
16306568
89042891
16306566
histone H2B
89042891
15431297
113418826
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
15431297
113431146
polypeptide 20
(SIG-20)
PREDICTED: similar to peptidylprolyl
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
13376747
89042891
295
Appendix C
PREDICTED: similar to adaptor-related
14043026
89042891
14043026
113429091
isomerase A isoform 1
13430872
nucleolar protein 10
113414586
13430872
nucleolar protein 10
113429091
isomerase A isoform 1
13129120
113414586
beta-1 subunit
89041736
beta-1 subunit
89042891
11056006
kelch-like 12
89041736
11056006
kelch-like 12
89042891
12758125
89041736
12758125
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
12758125
89042891
11386163
ELAV-like 4
89042891
8922905
RIO kinase 2
113429091
isomerase A isoform 1
113414586
10190686
10190696
10190686
18765707
296
Appendix C
phosphatase isoform 2
solute carrier family 2 (facilitated
8923733
62909985
10190686
113413881
114
PREDICTED: similar to peptidylprolyl
113429091
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
113418826
(SIG-20)
8922905
RIO kinase 2
113414586
10190686
6996005
7706343
113414586
7705477
113414586
7705748
113429091
isomerase A isoform 1
113414586
LSm7
89041736
7706497
cytidylate kinase
113414586
7705369
113414586
7705369
89042891
WW domain-containing oxidoreductase
7706523
isoform 1
89042891
297
Appendix C
PREDICTED: similar to peptidylprolyl
7706343
113429091
7705477
4557719
isomerase A isoform 1
DNA ligase I
PREDICTED: similar to peptidylprolyl
7706497
cytidylate kinase
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
8922388
89042891
7657508
ring-box 1
89042891
7705477
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
7705477
113418826
WW domain-containing oxidoreductase
7706523
(SIG-20)
PREDICTED: similar to adaptor-related
isoform 1
89041736
7705477
113429091
LSm7
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
7705748
89042891
7019405
89042891
7019405
89041736
6912280
113429091
isomerase A isoform 1
7019319
113414586
298
Appendix C
7657315
Lsm3 protein
6996005
113414586
10190696
113414586
7019319
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
5729953
5902034
89042891
113414586
4557719
DNA ligase I
113429091
isomerase A isoform 1
4503729
FK506-binding protein 4
polypeptide A
113414586
4759224
89041736
4758384
4503729
protein 2
113414586
4759224
89042891
polypeptide A
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
4758384
89042891
4557719
DNA ligase I
4557719
62909985
299
DNA ligase I
hypothetical protein LOC140711
Appendix C
H2A histone family, member Y isoform
4758496
4506651
4557719
DNA ligase I
113431146
(SIG-20)
PREDICTED: similar to peptidylprolyl
113429091
isomerase A isoform 1
skeletal muscle and kidney enriched inositol
4557719
DNA ligase I
18765707
protein 2
phosphatase isoform 2
PREDICTED: similar to peptidylprolyl
113429091
isomerase A isoform 1
protein phosphatase 1, catalytic subunit,
4557719
DNA ligase I
4506007
gamma isoform
PREDICTED: similar to adaptor-related
89042891
4557719
DNA ligase I
4503981
aminotransferase
4557719
DNA ligase I
4502743
cyclin-dependent kinase 7
N-ethylmaleimide-sensitive factor
4557719
DNA ligase I
4505331
4557719
DNA ligase I
8393719
(putative)
PREDICTED: similar to large subunit
113427529
4557719
DNA ligase I
89042891
4507311
suppressor of Ty 4 homolog 1
113418826
300
(SIG-20)
Appendix C
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4507311
suppressor of Ty 4 homolog 1
113431146
(SIG-20)
PREDICTED: similar to adaptor-related
4507369
tyrosine aminotransferase
89042891
4507311
suppressor of Ty 4 homolog 1
113429091
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4506631
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4506631
113418826
(SIG-20)
PREDICTED: similar to peptidylprolyl
4506631
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
4506629
113429091
isomerase A isoform 1
4507311
suppressor of Ty 4 homolog 1
113414586
4557719
DNA ligase I
113426831
4506629
27482992
protein HIP)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4557719
DNA ligase I
113418826
(SIG-20)
4557719
DNA ligase I
113414586
4507133
polypeptide G
113429091
4506203
113429091
301
isomerase A isoform 1
Appendix C
isomerase A isoform 1
PREDICTED: similar to adaptor-related
4557719
DNA ligase I
4506631
89041736
113414586
4506629
113428574
protein HIP)
113414586
10)
heat shock 10kDa protein 1 (chaperonin
4504523
10)
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4503729
FK506-binding protein 4
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4503729
FK506-binding protein 4
113418826
(SIG-20)
PREDICTED: similar to peptidylprolyl
4505235
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
4505773
prohibitin
113429091
89042891
N-ethylmaleimide-sensitive factor
4505331
isomerase A isoform 1
89042891
4504257
89042891
4504261
89042891
4504269
89042891
302
Appendix C
PREDICTED: similar to 60S ribosomal
alpha isoform of regulatory subunit
4506019
(SIG-20)
PREDICTED: similar to 60S ribosomal
(SIG-20)
PREDICTED: similar to adaptor-related
4504263
89042891
N-ethylmaleimide-sensitive factor
4505331
89041736
10)
(SIG-20)
PREDICTED: similar to 60S ribosomal
10)
(SIG-20)
PREDICTED: similar to adaptor-related
4505997
protein phosphatase 1D
89042891
gamma isoform
4502743
cyclin-dependent kinase 7
89042891
113414586
4502743
cyclin-dependent kinase 7
89041736
148222882
113429091
isomerase A isoform 1
113414586
isoform b
S-phase kinase-associated protein 1A
25777711
isoform a
isomerase A isoform 1
PREDICTED: similar to adaptor-related
21166389
89042891
303
Appendix C
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4502743
cyclin-dependent kinase 7
113431146
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4502743
cyclin-dependent kinase 7
113418826
(SIG-20)
PREDICTED: similar to peptidylprolyl
25777713
isoform b
113429091
isomerase A isoform 1
23592238
glucose transporter 14
113414586
113414586
isoform a
glucose transporter 14
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
4502859
113429091
isomerase A isoform 1
S-phase kinase-associated protein 1A
isoform b
25777711
isoform a
PREDICTED: similar to adaptor-related
4502743
cyclin-dependent kinase 7
89042891
63029943
89041736
4758650
113413289
specific 2)
PREDICTED: similar to adaptor-related
58615669
89042891
58615665
17981855
58615673
17981853
58615673
58615663
58615673
17981862
304
Appendix C
58615673
13128862
histone deacetylase 3
62909985
58615672
113414586
8923475
like
113414586
51491914
89042891
58615672
17981862
62909985
113429091
isomerase A isoform 1
58615666
17981859
58615669
58615666
58615669
17981856
58615669
17981859
113414586
58615666
17981856
13128862
histone deacetylase 3
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
63029943
89042891
58615663
17981853
phosphatase 1 isoform 2
113414586
17981859
32813443
89042891
113429091
305
isomerase A isoform 1
Appendix C
phosphatase 1 isoform 2
PREDICTED: similar to 60S ribosomal
small nuclear ribonucleoprotein
4507129
polypeptide E
(SIG-20)
PREDICTED: similar to 60S ribosomal
polypeptide E
89041736
89041736
32813443
32813443
113429091
17981859
113429091
phosphatase 1 isoform 2
113431146
phosphatase 1 isoform 2
113418826
polypeptide E
113429091
17981856
4507129
89042891
isomerase A isoform 1
polypeptide G
(SIG-20)
PREDICTED: similar to peptidylprolyl
89042891
(SIG-20)
isomerase A isoform 1
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
(SIG-20)
113429091
isomerase A isoform 1
113414586
306
Appendix C
polypeptide E
RNA pseudouridylate synthase domain
27734887
containing 3
14249470
member 3
containing 4
PREDICTED: similar to peptidylprolyl
113429091
isomerase A isoform 1
isoform 1
32967276
10190696
8923475
113413881
114
thioredoxin-like 4B
113414586
15431295
113429091
isomerase A isoform 1
PREDICTED: similar to peptidylprolyl
8923475
thioredoxin-like 4B
113429091
isomerase A isoform 1
PREDICTED: similar to adaptor-related
sigma 1 subunit
89041736
polypeptide 3, 34kDa
(SIG-20)
PREDICTED: similar to 60S ribosomal
polypeptide 3, 34kDa
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
8923475
thioredoxin-like 4B
113418826
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
8923475
15431295
5802970
thioredoxin-like 4B
113431146
(SIG-20)
113414586
113414586
307
Appendix C
PREDICTED: similar to adaptor-related
21396484
89042891
(putative)
113429091
isomerase A isoform 1
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
15431295
113418826
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
15431295
113431146
polypeptide 3, 34kDa
(SIG-20)
isomerase A isoform 1
PREDICTED: similar to adaptor-related
sigma 1 subunit
89042891
polypeptide 3, 34kDa
113414586
5802970
113429091
isomerase A isoform 1
isomerase A isoform 1
PREDICTED: similar to large subunit
113427529
113414586
protein S28
PREDICTED: similar to peptidylprolyl
113429091
isomerase A isoform 1
isomerase A isoform 1
29742309
89041601
L31
PREDICTED: similar to 40S ribosomal
protein S26
88982349
L31
protein S26
PREDICTED: similar to 40S ribosomal
protein S26
88987217
88980535
308
protein S26
Appendix C
113420393
protein S26
protein S26
113420084
protein S26
88982349
isomerase A isoform 1
protein S26
protein S26
PREDICTED: similar to APG4 autophagy 4
113413585
homolog B isoform a
113414586
113414586
protein L31
PREDICTED: similar to ribosomal
113427093
protein L31
napsin A preproprotein
89042891
protein L31
113414586
113431146
isomerase A isoform 1
89042897
isomerase A isoform 1
protein) (SIG-20)
113418826
(SIG-20)
PREDICTED: similar to adaptor-related
4504265
89042891
4504271
89042891
protein S26
88987217
protein S26
PREDICTED: similar to 40S ribosomal
protein S26
88987217
protein S26
PREDICTED: similar to 60S ribosomal
isomerase A isoform 1
309
(SIG-20)
Appendix C
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20
113431146
protein) (SIG-20)
protein S26
88980535
protein S26
88980535
protein S26
PREDICTED: similar to 40S ribosomal
protein S26
89041601
protein S26
PREDICTED: similar to 40S ribosomal
isomerase A isoform 1
88982349
protein S26
protein S26
113414586
113414586
113429091
isomerase A isoform 1
51467029
protein S26
89025350
isomerase A isoform 1
isomerase A isoform 1
113418084
protein S26
subunit, 58kDa
PREDICTED: similar to DNA primase large
113418086
subunit, 58kDa
PREDICTED: similar to 60S ribosomal
protein S26
(SIG-20)
protein) (SIG-20)
51467029
protein S26
PREDICTED: similar to 40S ribosomal
89025350
310
Appendix C
PREDICTED: similar to 40S ribosomal
89025350
88980535
protein S26
protein) isoform 1
89042891
protein) (SIG-20)
89041736
(SIG-20)
PREDICTED: similar to 40S ribosomal
isomerase A isoform 1
113419590
protein S28
113414586
113414586
89042328
S18 isoform 4
PREDICTED: similar to ribosomal protein
41150652
S18 isoform 1
PREDICTED: similar to adaptor-related
4505289
diphosphomevalonate decarboxylase
89042891
(SIG-20)
protein) (SIG-20)
27482992
311
Appendix C
113430282
protein HIP)
protein S26
113420393
protein S26
PREDICTED: similar to 40S ribosomal
isomerase A isoform 1
89035017
113414586
113414586
113418826
(SIG-20)
protein) (SIG-20)
(SIG-20)
PREDICTED: similar to 60S ribosomal
protein L26 (Silica-induced gene 20 protein)
4506651
113431146
isomerase A isoform 1
(SIG-20)
PREDICTED: similar to large subunit
113427044
4506651
113429091
isomerase A isoform 1
isomerase A isoform 1
PREDICTED: similar to postmeiotic
113418682
113414586
113414586
113429091
isomerase A isoform 1
312
Appendix C
PREDICTED: similar to peptidylprolyl
88953813
isomerase A isoform 1
88943041
113414586
protein S26
89041601
protein S26
88980535
protein S26
PREDICTED: similar to large subunit
4506651
113427529
113427044
89041736
isomerase A isoform 1
89040714
protein (KS)
protein) isoform 1
89041736
88987217
protein S26
protein S26
protein S26
PREDICTED: similar to 40S ribosomal
88980535
protein S26
PREDICTED: similar to 40S ribosomal
113429703
protein S26
PREDICTED: similar to adaptor-related
4758754
napsin A preproprotein
89041736
protein S26
88982349
313
protein S26
Appendix C
PREDICTED: similar to 40S ribosomal
113429703
protein S26
protein S26
88982349
isomerase A isoform 1
88987217
113427613
88953906
88959151
88953906
89034184
88959151
89034184
88959151
protein S28
88953906
88953906
protein S28
PREDICTED: similar to 40S ribosomal
113422526
protein S28
PREDICTED: similar to 40S ribosomal
protein S28
PREDICTED: similar to 40S ribosomal
protein S28
PREDICTED: similar to 40S ribosomal
protein S28
PREDICTED: similar to 40S ribosomal
protein S28
PREDICTED: similar to 40S ribosomal
protein S28
PREDICTED: similar to 40S ribosomal
L31
PREDICTED: similar to 40S ribosomal
protein S26
PREDICTED: similar to ribosomal protein
protein S26
PREDICTED: similar to 40S ribosomal
protein S26
protein S26
89025350
314
Appendix C
PREDICTED: similar to adaptor-related
89042891
expressed gene 1
PREDICTED: similar to 60S ribosomal
(SIG-20)
protein) (SIG-20)
89042891
protein) (SIG-20)
L31
PREDICTED: similar to 60S ribosomal
protein L31
(SIG-20)
PREDICTED: similar to large subunit
4506651
113427044
isomerase A isoform 1
27482992
protein HIP)
PREDICTED: similar to 40S ribosomal
protein S26
89025350
isomerase A isoform 1
protein S26
89041601
protein HIP)
protein S26
88980535
315
protein S26
Appendix D
Appendix D Concatenated Filtered Alignment
316
10
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
20
30
40
50
60
70
80
R A R M E D L L K R R F F Y DQ S F A I Y - - - - - - -G G I T G QY D F G P MG C A L K S NM I NT WR Q F F V L E E QM L E V DC S I L T P E P V L K A S G HV D
R K A V V NT L E R R L F Y I P S F K I Y - - - - - - - S G V A G L F DY G P P G C A I K S NV L S F WR QH F I L E E NM L E V DC P C V T P E V V L K A S G HV D
R E N L E S V L K R R F F F A P A F E L Y - - - - - - -G G V S G L Y DY G P P G C A F QA N I V DV WR K H F I L E E DM L E V DC T M L T P Y E V L K T S G HV D
R T V L D S M L R R R L F Y T P S F D I Y - - - - - - -G G V S G L Y DY G P P G T A L L NN I V D L WR K H F V L E E DM L E V DC T M L T P H E V L K T S G HV D
R T L F E S L L K R R L F Y T E S F E I Y R T S G N L T G D S R G L Y DY G P P G C A L Q S N I V D L WR K H F V L Q E DM L E L DC T I L T P E E V F K T S G HV D
R A K M E D L I K R R F F Y DQ S F A I Y - - - - - - -G G I T G Q F D F G P MG C A L K S NM I H L WK K F F I L Q E QM L E V E C S I L T P E P V L K A S G HV E
R A K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G V S G L Y D F G P V G C A L K NN I I QT WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
R L K L E D L L K R R F F Y DQ S F A I Y - - - - - - -G G V T G L Y D F G P MG C A L K A NM L QQWR K H F I L E E G M L E V DC T S L T P E P V L K A S G HV D
R L K L E D L L K R R F F Y DQ S F A I Y - - - - - - -G G V T G L Y D F G P MG C S L K A NM L Q E WR K H F I L E E G M L E V DC T S L T P E P V L K A S G HV D
R D S L E QT L K R R F F F A P S F E I Y - - - - - - -G G V A G L F D F G P P G C A F QNNV I DA WR K H F I L E E DM L E V E A T M L T P HDV L K T S G HV D
R E K L E S V L R G R F F Y A P A F D L Y - - - - - - -G G V S G L Y DY G P P G C S F QA NV V DQWR K H F I L E E DM L E V DC T M L T P Y E V L K T S G HV D
R A K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G V S G L Y D F G P V G C A L K NN I I QT WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
R QQM E DT L K R R F F Y G QA F E L Y - - - - - - -G G V S G L Y D F G P V G C M L K NN I I S E WK QH F I L HDQM L E I E C T M L T P E P V L R A S G H I E
K S T L DA L L A R R F F F A P S F E I Y - - - - - - -G G V A G L Y DY G P T G S A L QA N I L DA WR K HY I I E E DM L E L DT T I MT L S DV L K T S G HV D
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -M L E I S A T C L T P Y N P L K A S G HV D
R E A L E N L L K R R F F I A P S F E I Y - - - - - - -G G V A G L F DY G P P G C A L K S E V E S F WR R H F V L A E DM L E I S A T C L T P Y N P L K A S G HV D
R T K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G V S G L Y D F G P V G C A L K NN I L QV WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
R E S L E QV L K R R F F F A P A F E I Y - - - - - - -G G V S G L Y DY G P P G C A L QA N I MDT WR K H F I L E E DM L E V DC T M L T P H E V L K T S G HV D
R A G L E D L MK R R F F I T Q S F S I Y - - - - - - -G G QA G L Y DY G P P G C A V K A N L I N L WR QH F V L N E DM S E V DC V S V T P E QV L K A S G HV A
R A K M E D L L K R R F F Y DQ S F A I Y - - - - - - -G G I T G QY D F G P MG C A L K S N I L A L WR QY F A L E E QM L E V DC S I L T P E P V L K A S G HV E
R A K M E D L L K R R F F Y DQ S F A I Y - - - - - - -G G I T G QY D F G P MG C A L K S N I L S L WR QY F A L E E QM L E V DC S I L T P E P V L K A S G HV E
QQQ I E Q I L K K R F F I T Q S A Y I Y - - - - - - -G G V S G L Y D L G P P G L S I K T N I L S L WR K H F V L E E DM L E I E T T T M L P HDV L K A S G HV D
K A K L D E I L K QR NMV I Q S Y E I Y - - - - - - -G G I A G L Y DMG P L G C A L K QN I L Q F WR K H F T T Y E N F F E V E G P I L T P K C V L A A S G HT A
R V K M E DT L K R R F F Y DQA F S I Y - - - - - - -G G V S G L Y D F G P V G C A L K NN I I QA WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
R A K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G V S G L Y D F G P V G C A L K NN I I QT WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
R E S L E S V L K R R F F Y A P A F E L Y - - - - - - -G G V S G L Y DY G P P G C S F QA N I V DV WR K H F V L E E DM L E V DC T M L T P Y E V L K T S G HV D
R T E F E DT C R R R F F Y G L A F D P Y - - - - - - -G G T A G L Y D L G P T MC A MK S NM L H F WR QH F V I E E S MC E V DT T C L T P E E V F K A S G HV T
R A K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G V S G L Y D F G P V G C A L K NN I I QT WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
R G A L DT I L R R R M F Y T P S F E I Y - - - - - - -G G V S G L Y DY G P P G C A L QA N I I DA WR K H F V L E DDM L E V DC S V L T P A DV L K T S G HV D
Y E K V F E L A K R R G F L WN S F E L Y - - - - - - -G G S R G F Y DY G P L G S T L K R R I E QV WR E F Y V I Q E G HM E I E C P T I G I E E V F I A S G HV G
R V K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G E C E - - - -G P G L G S L A P WA V S S DR S V L R L P Q S L A G R R C S L G W P E - - - - - - - - - R A K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G V S G L Y D F G P V G C A L K NN I I QA WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
K G A L E S M L R R R M F F A P S F D I Y - - - - - - -G G V A G L Y DY G P P G C A L QA N I I D I WR K H F V L E E DM L E V DC T A L T P HDV L K T S G HV D
R QA V V NT L E R K L F Y I P S F K I Y - - - - - - -R G V A G L Y DY G P P G C A V K A NV L A F WR QH F V L E E NM L E V DC P C V T P E V V L K A S G HV E
R A K L G Q L L E G R L F Y I P S F K I Y - - - - - - -G G V A G L Y DY G P P G C A V K S NV QQ F WR QH F V L E E S M L E V E C P A V T P E P V L R A S G HV E
R A K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G V S G L Y D F G P V G C A L K NN I I QT WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
R E A L E QV I K R R F I Y Q P A F S L Y - - - - - - -G G V A G L Y DY G P V G C A I K T N I E QY WR E H F I I E E D L F E I A A T I L T P E P V L K A S G HV D
R E S L E QV L K R R F F F A P A F D I Y - - - - - - -G G V S G L Y DY G P P G C A F QA NV V DT WR K H F V L E E DM L E V DC T M L T P HDV L K T S G HV D
R T K L E N L V K R K F F Y T N S F E I Y - - - - - - -G G A S G L F DY G P S G C L L K S E L E N L WR C H F I Y Y D E M L E I S G S C V T P Y QV L K T S G HV D
R T K I DN L A K R K L F Y T N S F E I Y - - - - - - -G G S S G L I DY G P S G C L L K S E L E N L WR Y H F I F Y D E M L E I S A T C I T P Y T V L K T S G HV D
R S K L E S L I K R R L F Y T N S F E I Y - - - - - - -G G V S G L I DY G P S G C L L K Y E L E K L WR NH F V F Y D E M L E I K G T C I T P Y S V L K T S G HV D
R QA V V NT L E R R L F F I P S F K I Y - - - - - - -R G V A G L Y DY G P P G C A V K S NV L A F WR QH F V L E E NM L E V DC P C V T P E V V L K A S G HV D
R A K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G V S G L Y D F G P V G C A L K NN I I QT WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
R DK L E S T L R R R F F Y T P S F E I Y - - - - - - -G G V S G L F D L G P P G C Q L QNN L I R L WR E H F I M E E NM L QV DG P M L T P Y DV L K T S G HV D
R T Q F E E L MK K R F F F S P S F Q I Y - - - - - - -G G I S G L Y DY G P P G S A L Q S N L V D I WR K H F V I E E S M L E V DC S M L T P H E V L K T S G HV D
R A K M E DT L K R R F Y Y DQ S Y A I Y - - - - - - -G G V S G L Y D F G P T G C A MK A N F I N I WR NH F I I E E G M L E V D S A I L A P E NV F K A S G HV E
R T K M E DT L K R R F F Y DQA F A I Y - - - - - - -G G V S G L Y D F G P MG C A L K NN I L QV WR QH F I Q E E Q I L E I DC T M L T P E P V L K T S G HV D
R K Y F E D L I K R R Y F F NQG F E I Y - - - - - - -G G V A G L Y DY G P P G C A I K NN L L K L WR E H F I L E E DM L E I S S T C I T P Y P V F K A S G HV D
K V DC E N L L R R R F F Y T N S F E I Y - - - - - - -G G S A G L F D F G P P G C A L K S E L E R L WR E H F V V F D E M L E V S C T C I T P H P V L K S S G HV D
K V DC E N L L R R R F F Y A N S F E I Y - - - - - - -G G S A G L F D F G P P G C A L K S E L E R L WR E H F I V F D E M L E V S C S C I T P H P V L K S S G HV D
R A T A E D L E V S G F F WV P S F E I Y - - - - - - -G S V A G I Y D L G P T G C A I E R N F L QK WR DH F V L E DDM L E V R C S A L T P R P V L DA S G HT E
R S E F E DT C R R R F F F G L A F D P Y - - - - - - -G G S A G L Y DMG P P L C A MK A N L L A HWR QH F V L A E S MC E V DT T C L T P Q E V F V T S G HV T
R A E F E DT C R R R F F F G L A F D P Y - - - - - - -G G S A G L Y D L G P P L C A MK A N L L S Y WR QH F V L E E NMC E V DT T S L T P E E V F K A S G HV V
R S Q L E V L MT K R F F Y I Q S F E I Y - - - - - - -G G V G G L Y DY G P T G A A L QA N I I NQWR NH F I I E E E M L E L DT T I MT L S DV L K T S G HV D
R E T L DA V L K R R F F Y A P A F E I Y - - - - - - -DG V S G L Y DY G P P G C A L QT R I I DT WR DH F V L E DDM L E V DT T M L T P H E V L K T S G HV D
90
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
100
110
120
R F A D L MT K DV K NG E C F R L D P I T G ND L T E P I E F N L M F G T Q I G P L R
K F T D L MV K D E K T G T C Y R A D P DT K N P L S D P Y P F N L M F QT S I G P MR
K F S DWMC QD P K S G E I F R A D P V T G E T L E P P K A F N L M F E T A I G P L R
K F A DWMC K D P K T G E I F R A D P T T DG N L L P P V A F N L M F QT S I G P L R
K F E DWMC K D F K K G D F L R A D P DG DA P V S S P V P F N L M F K T T V G P L R
R F A D L MT K D I K T G E C F R L D P I S G ND L T P P I E F N L M F NT Q I G P L R
K F A D F MV K D L K NG E C F R A D P T T G ND L S P P V P F N L M F K T F I G P L R
R F A DWMV K DT K NG E C F R A D P I T G ND L T E P I A F N L M F P T Q I G P L R
R F A DWMV K DMK NG E C F R A D P I T G ND L T E P I A F N L M F P T Q I G P L R
R F S DWMC K D L K T G E I F R A D P S T G G K L E P P V E F N L M F DT A I G P L R
K F S DWMC R D L K T G E I F R A D P V T G E P L E P P MA F N L M F E T A I G P L R
K F A D F MV K DV K NG E C F R A D P NT G ND L S P P V S F N L M F K T F I G P L R
R F A D L MV K D E K T G A C F R A D P V T G N E I S D P MD F N L M F QT T I G P L R
K F A DWMV K DV K NG E I Y R A D P T T G N E V S E P V E F N L M F E S N I G P L R
R F T D S M I T D I K T N E Y Y R A D P - S G G E W S E P Y P F N L M F R T K I G P MR
R F T D S M I T D I K T N E Y Y R A D P - S G G E W S E P Y P F N L M F R T K I G P MR
K F A DY MV K DV K NG E C F R A D P S T G ND L T P P I S F N L M F QT S I G P L R
K F A DWMC R D L K T G E I F R A D P A T DG P L E L P I E F N L M F E T A I G P L R
K F A D F MV K D E V T K A F F R A D P E T G NA L T E P Y P F N L M F QT Q I G P L R
R F A D L MV K DV K T G E C F R L D P L T G ND L T E P I E F N L M F A T Q I G P L R
R F A D L MV K DV K T G E C F R L D P L T G ND L T E P I E F N L M F A T Q I G P L R
K F C D I L V F D E V S G DC F R A DT - L G NK L S K S QQ F N L M F G T Q I G Y L R
K F S DY MV K D L K NG C C Y R A D P DT G ND L S E P L A F N L M F A T D I G P L R
K F A D F MV K DMK NG E C F R A D P I T G ND L S P P V S F N L M F K T S I G P L R
K F A D F MV K DV K NG E C F R A D P I T G ND L S P P V S F N L M F K T F I G P L R
K F S DWMC K D P K T G E I F R A D P V S G DK L E P P R A F N L M F E T A I G P L R
R F NDV MV R DT V T G E C I R A D P -K G N P F S D P F P F N L M F A T H I G P MR
K F A D F MV K DV K NG E C F R A D P I T G ND L S P P V S F N L M F K T F I G P L R
K F A DWMC K D P K T G D I F R A D P A T G L L P T P P V S F N L M F S T S I G P L R
G F S D P L C E C MNC K E A F R A D P E C G G E F E DA Y E F N L M F K T T I G P L R
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - S WA WR R E V A G L C
K F A D F MV K DV K NG E C F R A D P T T G ND L S P P V P F N L M F QT F I G P L R
K F A DWMC K D P K NG D I L R A D P A T G V Q P E P P V A F N L M F QT A I G P MR
K F T D L MV K D E K T G T C Y R A D P DT K N P L S D P Y P F N L M F QT S I G P MR
K F T D L MV NDV V T K DC F R A D P V T G ND L S E P Y P F N L M F P T Q I G P L R
K F A D F MV K DV K NG E C F R A D P I T G ND L S P P V S F N L M F K T F I G P L R
R F T D L L V C D S K T G T G Y R A D P E T G ND L T D P T P F N L M L P T I I G P L R
K F A DWMC K D L K T G E I F R A D P A T G G K L E P P V E F N L M F E T A I G P L R
R F T D L M I R DV V T NDC Y R A D P - L K ND L S E P F P F N L M F QT K I G P L R
R F T D L M I R DA V T G D F Y R A D P -G K ND F V G P F P F N L M F QT R I G P L R
R F T D L M I K D I V T K DC Y R A D P - E K ND L S D P F P F N L M F QT K I G P L R
K F T D L MV K D E K T G T C Y R A D P DT K N P L S D P Y P F N L M F QT S I G P MR
K F A D F MV K DV K NG E C F R A D P T T G ND L S P P V P F N L M F QT F I G P L R
K F T DWMC R N P K T G E Y Y R A D P V T NDV L DA L T S F N L M F E T K I G A L R
K F A DWMC K D P A T G E I F R A D P A T NG E L E T P R Q F N L M F E T Q I G P L R
R F A D F MV K DG K T G E C F R A D P T T NND L S D P M E F N L M F A T A I G P L R
K F A DY MV K DV K NG E C F R A D P T T G ND L T P P I S F N L M F QT S I G P L R
R F T D L MV K DV K NG A G HR A D P DT G N E L G F P E P F N L M F G T P I G P L R
R F T D L MV K N L S NG DC Y R A D P - E G D E F S K P F P F N L M F S T S I G P L R
R F T D L MV K N L S NG DC Y R A D P - E G ND F S K P F P F N L M F S T S I G P L R
K F ND L M L T DMT T K A L Y R A D P - E G N E F S E P A P F N L M F NT R V G P L R
R F NDV MV R DT V T G E C I R A D P -K G NA L S D P F P F N L M F S T S I G P MR
R F NDA MV R DT V T G E C I R A D P -K G NA L S E P F P F N L M F S T S I G P MR
K F A DWMC K DT K T G E I F R A D P E S G N E V S E P V E F N L M F E S Y I G P L R
K F A DWMC R D L A S G E I F R A D P V T G G P L E K P M E F N L M F E T A I G P L R
130
140
150
160
P E T A QG I F V N F K R L L E F -NQG R L P F A A A Q I G N S F R N E I S
P E T A QG I F V N F K D L Y Y Y -NG K K L P F A A A Q I G QA F R N E I S
P E T A QG Q F L N F NK L L E F -NNG K T P F A S A S I G K S F R N E I S
P E T A QG Q F L N F QK L L E F -NQQ S M P F A S A S I G K S F R N E I S
P E T A QG Q F L N F K K L L DY -NQN S M P F A S A S I G K S F R N E I S
P E T A QG I F V N F K R L L E F -NQG R L P F A A A Q I G N S F R N E I S
P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG I F V N F K R L L E F -NQG K L P F A A A Q I G L G F R N E I S
P E T A QG I F V N F K R L L E F -NQG K L P F A A A Q I G L G F R N E I S
P E T A QG Q F L N F NK L L E F -NNDK M P F A S A S I G K S F R N E I A
P E T A QG Q F L N F NK L L E F -NNG K T P F A S A S I G K S F R N E I S
P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG I F L N F K R L L E F -NQG K L P F S A V Q I G M S F R N E I S
P E T A QG H F V N F A R L L E F -NNG K V P F A S A Q I G K S F R N E I A
P E T A QG I F V N F K R L Y E Y -NG K K L P F S V A Q I G L G F R N E I A
P E T A QG I F V N F K R L Y E Y -NG K K L P F S V A Q I G L G F R N E I A
P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG Q F L N F S K L L DC -NN E K M P F A S A S I G K S F R N E I S
P E T A QG I F T N F G K L Y E Y -NG K K L P F A A A Q I G NA F R N E I A
P E T A QG I F V N F K R L L E F -NQG K L P F A V A Q I G N S F R N E I S
P E T A QG I F V N F K R L L E F -NQG K L P F A V A Q I G N S F R N E I S
P E T A QG Q F L N F K K L C E Y -NNDK L P F A S A S I G K A Y R N E I S
P E T A QG I F T M F K R N L E F -NG G K V P F G V T Q I G NV F R N E I A
P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG Q F L N F NK L L E F -NNG K T P F A S A S I G K S F R N E I S
P E L A QG I I L N F K R L MD S G NA QR M P F A G A C V G T A F R N E I A
P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG Q F L N F A K L L E Y -NNQQM P F A S A S I G K S Y R N E I S
P E T A QG M F V D F QR L S R F -Y R DK L P F G A V Q I G K S Y R N E I A
PAW PR A L LC - - - - LGT T - PGGR LAV A - - - - - - - - - - - - P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG Q F L N F A K L L E Y -NA G NM P F A S A S I G K S Y R N E I A
P E T A QG I F V N F K D L Y Y Y -NG QK L P F A A A Q I G QA F R N E I S
P E T A QG I F V N F R D L L Y Y -NG G K L P F A A A Q I G Q S F R N E I A
P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG M F L N F A R L L E Q -NG G R V P F G A A Q I G L G F R N E I A
P E T A QG Q F L N F A K L L E F -NN E K M P F A S A S I G K S F R N E I A
P E T A QG I F V N F K K L L E Y -NG G K T P F A G A Q L G L G F R N E I S
P E T A QG I F V N F K K L L E Y -NG G K M P F A G A Q I G L G F R N E I S
P E T A QG I F V N F K K L L E Y -NG G K M P F A G A Q I G L G F R N E I S
P E T A QG I F V N F K D L Y Y Y -NG NK L P F A A A Q I G QA F R N E I S
P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG Q F L N F NK L L E I -NQG K I P F A S A S I G K S F R N E I S
P E T A QG Q F L N F S R L L E F -NNG K V P F A S A MV G K A F R N E I S
P E T A QG I F V N F K R L L E F -NQG R L P F G A A Q I G T A F R N E I S
P E T A QG I F L N F K R L L E F -NQG K L P F A A A Q I G N S F R N E I S
P E T A QG M F V N F NR L N E F -NG G R I P F A A A Q I G L G F R N E I A
P E T A QG I F V N F NR L L E F -NG G K I P F A A A Q I G L G F R N E I S
P E T A QG I F V N F T R L L E F -NG G K I P F A A A Q I G L G F R N E I S
P E T A QG I F V N F T R L L NA -NR G S L P F A A A QV G A G Y R N E I S
P E L A QG I I L N F K R L L DT G NA QR M P F A C A S I G T A F R N E I A
P E L A QG I I L N F K R L L D S G NA QR M P F A G A C I G T A F R N E I A
P E T A QG H F V N F QR L L E F -NNG R V P F A S A Q I G K S F R N E I S
P E T A QG Q F L N F NK L L DC -NNT K M P F A S A S I G K S F R N E I S
170
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
180
190
200
210
220
230
240
P R S G L I R V R E F T MC E I E H F C D P Q -A K NH P K F E NV A DT V MT L Y S A C NQA V - -A S G L V A N E T L G Y F MA R I QMY L HR I G I L P E R L R
P R QG L L R V R E F T L A E I E H F V D P E -NK S H P K F S DV A K L E F L M F P R E E QA V - -A K G T V NN E T L G Y F I G R V Y L F L T R L G I DK E R L R
P R S G L L R V R E F L MA E I E H F V D P E -NK NH P R F D E V K N L K L K F L P K G V QA V - -A S G M I DNQT L G Y F I A R I Y Q F L T K I G V D E E K L R
P R A G L L R V R E F L MA E I E HY V D P E G G K K HHR F E E V K D I E MA F L NR NV QA V - - E T G MV DN E T L G Y F I A R I Q L F L L K L G V D P NK L R
P R S G L L R V R E F L MA E I E H F V D P E G G K K HA K F D L V K D L Q L S F L DR A T QA V - - S S K MV DN E T L G Y F L G R I Y I F L L K I G V DT NK V R
P R S G L L R V R E F T MC E I E H F C D - - -V K E H P K F E S V K NT QM L L Y S A DNQA V - - S K G I V NN E T L G Y F MA R I HMY M L A V G I D P K R L R
P R S G L I R V R E F T MA E I E H F V D P S - E K DH P K F QNV A D L Y L Y L Y S A K A QA V - - E QG V I NN S V L G Y F I G R I Y L Y L V K V G V S P E K L R
P R QG L I R V R E F T MC E I E H F V D P E -DK R F P K F A K V A D E K L V L F S A C NQA V - -A NK T V A N E T L G Y Y MA R C HQ F L MK V G I DG R R L R
P R QG L I R V R E F T MC E I E H F V D P E -DK S L A K F A K V A DQK L V L F S A C NQA V - -A K K T V A N E T L G Y Y MA R C HQ F L MK V G I DG R R L R
P R A G L L R V R E F L MA E I E HY V D P E - S K S H P K F E DV K D I K L K F L P K NV QA V - - S S G MV DN E T L G Y F I A R I Y L F L V K I G V DT NR L R
P R A G L L R V R E F L MA E I E H F V D P L -DK S H P K F H E V K D I K L S F L P R N I QA V - -A S K MV DN E T L G Y F I A R I Y L F L I K I G V DDT K L R
P R S G L I R V R E F T MA E I E H F V D P S - E K E H P K F QNV A D L H L Y L Y S A K A QA V - -DQG V I NN S V L G Y F I G R I Y L Y L T K V G V S P DK L R
P R S G L I R V R E F QMG E I E H F V D P L -R K E H P L F S T V K D I K V P L Y S S K A QA V - - E DG T I DN E T L G Y F MG R I Y L F C V K V G I D P F K F R
P R QG L L R V R E F T MA E I E HY V D P L -DK R HA R F N E V K DV V L T L L A K G V QA V - -A E G I V DN E T L G Y F L G R T Q L F L T K I G I D P A R L R
P R NG L L R V R E F QMA E I E H F I H P D -R K DH P K F DDV A F K C L P L Y S S K T QA V HG E E K I I NN E T L A Y F L S R T Y D F L I S I G I N P DG I R
P R NG L L R V R E F QMA E I E H F I H P D -R K DH P K F DDV A L K C L P L Y S S K T QA V HG E E K I I NN E T L A Y F L S R T Y D F L I S I G I N P DG I R
P R S G L I R V R E F T MA E I E H F V D P N - E K V H F K F S NV A D L D I M L Y S S K A QA V - - E QG V I NN S V L G Y F I G R I Y L Y L V K V G V A K DK L R
P R A G L L R V R E F L MA E I E HY V D P D -NK S H S R F D E I K D L K L K F L P K G V QA V - - S S G MV DN E T L G Y F L A R I Y S F L I K I G V D P S R L R
P R A G L L R V R E F T MA E I E H F V N P N -NK T H P K F N E I K DV E A N L L S S D S QA V - - E K K L I DN E T L A Y F MA R T QQ F L HT V G I K P A G L R
P R S G L I R V R E F T MA E I E H F C D P V - L K DH P K F G N I K S E K L T L Y S A C NQA V - -A S K L V A N E T L G Y Y MA R I QQ F L L A I G I K P E C L R
P R S G L I R V R E F T MA E I E H F C D P T -QK DH P K F G NV K D E K MT L Y S A C NQA V - - S A K L V A N E T L G Y Y MA R I QQ F L L A I G I K P E C L R
P R S G L L R V R E F DQA E I E H F V L T D - E K DH P K F S T V QG I K L K L MHHDA S A I - - E R G I V C N E T MG Y Y I G R T A L F L I E L G I DR E L L R
P R NG L L R V R E F T L A E I E Y F V L P D -K K T H S N F S DV E N L S V Q L Y P R E L QA V - -NDG I I N S Q L L A Y F MG R T F K F L I E L G I P A E H I R
P R S G L I R V R E F T MA E I E H F V D P S - E K NH P K F Q S V A D L N I L L Y S S K A QA V - -QQG V I NN S V L G Y F I G R I Y L F L T K V G V S P DK L R
P R S G L I R V R E F T MA E I E H F V D P S - E K DH P K F QNV A D L H L Y L Y S A K A QA V - - E QG V I NNT V L G Y F I G R I Y L Y L T K V G I S P DK L R
P R S G L L R V R E F L MA E I E H F V D P N -DK S HK R F QD I K D I K L K F L P R E V QA V - -A T K L V DN E T L G Y F I A R I Y Q F L I K I G V D P E R L R
P R S A L I R V R E F T L A E I E H F V N P S -NK NH E K F DR V R DV E I WA W P R H F QA V - - E A K V I DNQT L G Y F MG R V A L F L T S I G V - -R F Y R
P R S G L I R V R E F T MA E I E H F V D P S - E K DH P K F QNV A D L H L Y L Y S A K A QA V - - E QG V I NNT V L G Y F I G R I Y L Y L T K V G I S P DK L R
P R S G L L R V R E F L MA E I E H F V D P E S G K K H P R F A E V A D I E L E L L DR E T QA V - -K DG L V DN E T L G Y F L A R I H L F L E K I G V DK S K L R
P R QG V I R L R E F T QA E C E L F V D P R -NK K H P N F E R F A DK E L V L Y S QA A QA V - - E T G V I A H E I L G Y N I A L T N E F L T K V G I D P E K L R
- - - - - - - - - - - - - - - - -R I C S P R -C QK P P - - - - - - - - - L E L L T S S L R P - - -A R G V I NN S V L G Y F L G R I Y L F L T K A G V C A E R L R
P R S G L I R V R E F T MA E I E H F V D P T - E K DH P K F Q S V A D L C L Y L Y S A K A QA V - - E QG V I NN S V L G Y F I G R I Y L Y L T K V G I S P DK L R
P R G G L L R V R E F L MA E I E H F V D P A G HK K H E R F H E V A D I E L A L L DR NV QA V - -K QK I V DN E T L G Y F L A R I H L F L K K I G V DQ S K I R
P R QG L L R V R E F T L A E I E H F V D P E -DK S H P K F V DV A D L E F L M F P R E L QA V - - S K G T V NN E T L G Y F I G R V Y L F L T R L G I DK NR L R
P R A G L L R V R E F T QA E I E H F V H P E -HK E H P R F A E V A DT V L S L F S QDA QA V - - S K G I I A N E T L G Y F I A R C H L F L V Q I G I DT NR L R
P R S G L I R V R E F T MA E I E H F V D P S - E K DH P K F QNV A D L H L Y L Y S A K A QA V - - E QG V I NNT V L G Y F I G R I Y L Y L T K V G I S P DK L R
P R G G L L R C R E F QMA E I E Y F V D P T E K S T F K K F NK Y I N L E I P L L S R Q L QA V - -K E G I I NN E T L A Y F I C R T Y L Y L V E I G I N P V N I R
P R A G L L R V R E F L MA E I E H F V D P N -DK S H P K F K DV QD I K L R F L P K DV QA V - - S S G MV DNQT L G Y F L A R V Y Q F L I K V G V DT DR L R
P R NG L L R V R E F QMA E I E Y F V N P K -K K NH E K Y Y L F K Y L M L P L Y P R DNQA V - - E K N I I A N E A L A Y F L A R T Y L F L L K C G I NK DG I R
P R NG L L R V R E F E MA E I E Y F V N P E -K K C H E K Y H L F K H L I L P L Y P R E E QA V - -T K G I I A N E A L A Y F L A R T Y L F L L K C G I NK DG L R
P R NG L L R V R E F E MG E I E Y F F N P E -K S K H E K Y D L Y K H L V L P L Y P R T NQA V - -NNG I I C N E A L A Y F L A R T Y L F L L K C G I K K DG I R
P R QG L L R V R E F T L A E I E H F V D P E -DK S H P K Y S E V A D L E F L M F P R E QQA V - - S K G I V NN E T L G Y F I G R V Y L F L T H L G I DK DR L R
P R S G L I R V R E F T MA E I E H F V D P T - E K DH P K F P S V A D L Y L Y L Y S A K A QA V - - E QG V I NN S V L G Y F I G R I Y L Y L T K V G I S P DK L R
P R S G L L R V R E F L MA E I E H F V D P L -NK S HA K F N E V L N E E I P L L S R R L QA V - -N S G MV E N E T L G Y F MA R V HQ F L L N I G I NK DK F R
P R S G L L R V R E F L MA E V E H F V D P K -NK E HDR F D E V S HM P L R L L P R G V QA V - -K K G I V DNT T L G Y F MA R I S L F L E K I G I DMNR V R
P R S G L L R V R E F T MC E I E H F I D P T -NK DH P K F DT V A N L A I P L F P V DR QA V - - E K G M I K S R V L G Y F MG R T F L F M I K V G I D P K K L R
P R S G L I R V R E F T MA E I E H F V D P K - E K V HQK F A NV A D L E I L L Y S S K A QA V - - E QG V I NN S V L G Y F I G R I Y L Y L I K V G V A K DK L R
P R NG L Y R V R E F DMA E I E H F F D P K -R P E H P K F K Y V K D L K L P L L T A K S QA V - -K S G T V S N E T HA Y F I G R T F L F L V E A G V NQNN I R
P R NG L L R V R E F P MA E I E Y F V N P K - F K T H E K F P E F K NT V L P L L T R DQQA V - - S S G I V G N E A L A Y F L A R T F L F L K R V G I N E A G L R
P R NG L L R V R E F P MA E I E Y F V N P K - F K T H E K F P E F S HV V L P L V T R DQQA V - - S S G MV G N E A L A Y F L A R T F L F L K R V G I N E A G L R
P R NG L V R C R E F QMA E I E H F A D P E Q L NN F P K F E T V K N L K V K L F P A S I QA I - -A QHV V S HK T L G Y Y I G R V Y L F L C E I G I Q P DT I R
P R A N L I R V R E F T L A E I E H F V N P N -DK T H E K F A L V K DV E I WMWA R K QQA V - -A QK I I DN E T L A Y F I A R T A Q F L E A V G A - -R Y V R
P R A N L I R V R E F T L A E I E H F V N P N -DK S H E K F E S V R G T E F WA W S R E L QA V - -A K K I I DN E T L G Y F I A R T V L F L E A V G L - -R F L R
P R A G L L R V R E F T MA E I E H F V D P E -DK NHDR F D E V K H I NV P L L A K DV QA V - - S A G I I DNQT L G Y F I G R I Y L F L V K I G I DA T R L R
P R S G L L R V R E F T MA E I E H F V D P L -DK DHHR F D E V K DV K L R F L A K DV QA V - - E T G L V DNK T L G Y F L A R I Y L F L I K I G V N P DR L R
260
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
270
F R QHMG N E MA HY A C DC WDA E C L T - S Y G W I
F R QH L A N E MA HY A A DC WDA E I E S - S Y G W I
F R QHM S N E MA HY A T DC WDA E L K T - S Y G W I
F R QHMA N E MA HY A A DC WDA E L L T - S Y G W I
F R QHMA N E MA HY A T DC WDA E L QT -T Y G W I
F R QHMG N E MA HY A C DC WDA E C L S - S Y G W I
F R QHM E N E MA HY A C DC WDA E S K T - S Y G W I
F R QH L S N E MA HY A QDC WDA E I L T - S Y G W I
F R QH L S N E MA HY A QDC WDA E I L T - S Y G W I
F R QHM S N E MA HY A S DC WDA E L E T - S Y G W I
F R QHMA N E MA HY A A DC WDA E L K T - S F G W I
F R QHM E N E MA HY A C DC WDA E S K T - S Y G W I
F R QHMQN E MA HY A C DC WDA E C K T - S Y G WV
C R QHMA N E MA HY A T DC WD F E I Q S - S Y G W I
F R QH L S T E MA HY A S DC WDA E V L T - S Y G W I
F R QH L S T E MA HY A S DC WDA E V L T - S Y G W I
F R QHMDN E MA HY A C DC WDA E T K T - S Y G W I
F R QHM S N E MA HY A A DC WDA E L HT - S Y G W I
F R QHQK N E MA HY A QDC WDA E I L S - S Y G WV
F R QHM S N E MA HY A C DC WDA E I L T - S Y G WV
F R QHM S N E MA HY A C DC WDA E I L T - S Y G WV
F R QHK K D E MA HY A K G C WDA E I Y T - S Y G W I
F R QH L K T E MA HY A K DC WDA E I R L - S Y G WV
F R QHM E N E MA HY A C DC WDA E S K T - S Y G W I
F R QHM E N E MA HY A C DC WDA E S K T - S Y G W I
F R QHMA N E MA HY A A DC WDA E L QT - S Y G W I
F R QHQ S T E MA HY A QDC WDA E L L T - S Y G W I
F R QHM E N E MA HY A C DC WDA E S K T - S Y G W I
F R QHMA N E MA HY A C DC WDA E L L T - S Y G W I
F R QH L T D E MA HY A I DC WDA E I E T DR F G WV
F R QHMDN E MA HY A C DC WDA E A R T - S Y G W I
F R QHM E N E MA HY A C DC WDA E S K T - S Y G W I
F R QHMG N E MA HY A C DC WDA E L L T - S S G WV
F R QH L P N E MA HY A A DC WDA E I E C - S Y G W I
F R QH L K H E MA HY A A DC WDA E I QC - S Y G W I
F R QHM E N E MA HY A C DC WDA E S K T - S Y G W I
F R QHQA D E MA HY S S DC WDA E I E M - S S G WV
F R QHMG N E MA HY A S DC WDA E L QT - S Y G W I
F R QH L K T E MA HY A NDC WDA E I L T - S F G F I
F R QH L P T E MA HY A NDC WDA E I L T - S Y G F I
Y R QH L E K E MA HY A NDC WDA E I L T - S Y G Y I
F R QH L A N E MA HY A A DC WDA E I E S - S Y G W I
F R QHM E N E MA HY A C DC WDA E S K T - S Y G W I
F R QH L K N E MA HY A T DC WDG E I L T - S Y G W I
F R QHM S N E MA HY A C DC WDA E I QC - S Y G W I
F R QHM F N E MA HY A T DC WDA E T K T - S Y G WV
F R QHMDN E MA HY A C DC WDA E T K T - S Y G W I
F R QHM S N E MA HY A C DC WDA E I E F - S HG F K
F R QHMA N E MA HY A S DC WDA E I L T - S Y G WV
F R QHT A N E MA HY A S DC WDA E I L T - S Y G WV
F R MHR K N E MA HY A R E C WDA E I Y T K T L G W L
F R QH L R N E MA HY A QDC WDA E L L T - S Y G WV
F R QHQR D E MA HY A QDC WDA E L L T - S Y G WV
F R QHM S N E MA HY A S DC WDA E I HT - S Y G W I
F R QHM S N E MA HY A T DC WDA E L HT - S Y G W I
280
290
300
310
320
330
E C V G C A DR S A Y D L T QHT NA T - - - - - - -G V K L V A E K K L P A P K A A I G K A F K K E A K A
E C V G I A DR S A Y D L R A H S DK S - - - - - - -G T P L V A E E K F A E P K K E L G L A F K G NQK N
E C V G C A DR S A Y D L T V HA NK T - - - - - - -K T A L V V R E K L DV P K K L F G P K F R K DA P K
E C V G C A DR S A Y D L T V HK NK T - - - - - - -G A P L V V R E P R A E P K K K F G P R F K K DG K A
E C V G C A DR S A Y D L T V H S R K T - - - - - - -K E P L V V R E P R R E P K P K L G P L F K K NA K A
E C V G C A DR S A Y D L T QHT K A T - - - - - - -G I R L A A E K K L P A P K A A I G K A F K K D S QA
E I V G C A DR S C Y D L S C HA R A T - - - - - - -K V P L V A E K P L K E P K G A I G K A Y K K DA K L
E C V G NA DR A C Y D L QQHY K A T - - - - - - -NV K L V A E K K L P E P MA L L G K K Y K K E A K K
E C V G NA DR A C Y D L QQHY K A T - - - - - - -NV K L V A E K K L P E P MA L L G K S F K K DA K K
E C V G C A DR S A Y D L S V H S A R T - - - - - - -G E K L V A R QT L A E P K K K F G P K F R K DA G T
E C V G C A DR S A Y D L T V HA NK T - - - - - - -K E K L V V R QK L E T P K K L F G P K F R K DA P K
E I V G C A DR S C Y D L S C HA R A T - - - - - - -K V P L V A E K P L K E P K G A I G K A Y K K DA K L
E C V G C A DR S C Y D L K C H S QA A - - - - - - -K V N L S A E R P L P E P K QA V G K A F K K DA K K
E C V G C A DR S A Y D L T V H S V R T - - - - - - -K Q P L R V QQR L DQ P A K A F G MK F K K DA T M
E C A G HA DR S C Y D L L QH S K A T - - - - - - -K T D L F A S E K Y D E P K P L I G K T F K Q E A S L
E C A G HA DR S C Y D L L QH S K A T - - - - - - -K T D L F A S E K Y D E P K P L I G K T F K Q E A S L
E I V G C A DR S C Y D L L C HA R A T - - - - - - -K V P L V A E K P L K E P K G A I G K A Y K K DA K F
E C V G C A DR S A Y D L S V H S A R T - - - - - - -N E K L V V R Q P L P E P K K K F G P K F R K DA G T
E C V G HA DR S C Y D L K V HA T E S - - - - - - -K S N L S A Y E E F K E P P G A I S K K HR A A V S P
E C V G C A DR S A Y D L G QHT A A T - - - - - - -G V R L V A E K R L P A P K QA L G K T F K K E A K N
E C V G C A DR S A Y D L G QHT A A T - - - - - - -G V R L V A E K R L P A P K QA L G K T F K K E A K T
E C V G I A DR A C Y D L S C H E DG S - - - - - - -K V D L R C K R R L A E P K K E WG A K L R DR F S V
E C V G HA DR G D F D L S NHA R C S - - - - - - -K V DQ S V F I A Y D E P K G V MG K K Y K K D S QK
E I V G C A DR S C Y D L S C HA R A T - - - - - - -K V P L I A E K L L K E P K G A I G K A Y K K DA K V
E I V G C A DR S C Y D L S C HA R A T - - - - - - -K V P L V A E K P L K E P K G A I G K A Y K K DA K L
E C V G C A DR S A Y D L T V H S NK T - - - - - - -K E K L V V R E A L E T P K K L F G P K F R K DA P K
E C V G I A DR S A Y D L T QH S NA S - - - - - - -K K D L C A R E E Y D E P K G L I G K T F G K K A G E
E I V G C A DR S C Y D L S C HA R A T - - - - - - -K V P L V A E K P L K E P K G A I G K A Y K K DA K L
E C V G C A DR S A Y D L S V HA K K T - - - - - - -NA P L I V R QR L P E P K K K F G P K F K K DA K A
E I V G I A DR T DY D L K A HA R V S - - - - - - -K T D L Y V Y V E Y D E P MG K L G P L F K G K A K A
E I V G C A DR S C Y D L T C H S R A T - - - - - - -K V P L V A E K L L R E P K A A I G R T Y K K DA R L
E I V G C A DR S C Y D L S C HA R A T - - - - - - -K V P L V A E K P L K E P K G A V G K A Y K K DA K L
E C V G C A DR S A Y D L T V HA K K T - - - - - - -G A P L V V R E T L E T P S K K F G P T F R K DA K T
E C V G I A DR S A Y D L R A H S DK S - - - - - - -G V P L V A H E K F S K P K K D L G L A F K G NQK M
E C V G L A DR S A F D L K A H S DK S - - - - - - -K V D L V A Y E R F DK P K K V L G K A F K K DA K P
E I V G C A DR S C Y D L S C HA R A T - - - - - - -K V P L V A E K P L K E P K G A I G K A Y K K DA K L
E C V G L A DR S A Y D L NA H S E A T - - - - - - -G QK L QA A R K F K V P K QK I G K E L K K DG MA
E C V G C A DR S A Y D L S V HA A R T - - - - - - -NA S L V V R Q P L P E P K K K F G P K F K K DG G A
E V V G HA DR S A Y D L QHHMK Y T - - - - - - -G A N L Y A C E K Y N E P K A K I G HT F K S E QNK
E V V G HA DR S A Y D L K NHMK V T - - - - - - -G A N L Y A C E K Y DT P K A K MG MK F K S QQNV
E C V G HA DR S A Y D L K HHMNA T - - - - - - -G S N L Y G C QK Y DK P K A K I G MK F K S DQNK
E C V G I A DR S A Y D L R A HT DK S - - - - - - -G V P L V A H E K F S E P K K E L G L S F K G NQK K
E I V G C A DR S C Y D L S C HA R A T - - - - - - -K V P L V A E K P L K E P K G A V G K A Y K K DA K L
E C V G C A DR A A F D L T V H S K K T - - - - - - -G R S L T V K QK L DT P K K F F G S K F K QK A K L
E C V G C A DR S A Y D L S V H S K A T - - - - - - -K T P L V V Q E A L P E P R K K F G P R F K R DA K A
E C V G NA DR S C Y D L T C HA K H S - - - - - - -K V A MV A E K K L P E P K S V MG K A F K K E A K V
E I V G C A DR S C Y D L A C HA R V T - - - - - - -K V P L V A E K P L K E P K G A L G K A Y K K DA K I
E C V G L A NR S A F D L E S HT K G S - - - - - - -G V K L L A A R R L P E P K K E I F K A L K G DG N E
E V V G HA DR MA Y D L MC H S K S T - - - - - - -N S Q L V A HHR Y DN P K P E I G K A F K S DQK I
E V V G HA DR MA Y D L MC H S K S T - - - - - - -N S Q L V A HHR Y DT P K P E I G K A F K A DQK I
E C V G I A DR Q S WD L S R HA K Y T T K K G DA E S S P L Y L S A P L DT P K S A I G K I F R K DA K E
E C V G V A DR S A Y D L T QH S G A T - - - - - - -K K D L C A R E E F A E P K G A I G K A F G K NA G D
E C V G I A DR S A Y D L T QH S A A S - - - - - - -K K D L F A R E E F A E P K G A I G K A F G K NA G E
E C V G C A DR S A Y D L T V H S K R T - - - - - - -K K D L V V QK A HK E P K K N L G P K F K K DA K F
E C V G C A DR S A Y D L S V H E A R T - - - - - - -K V K L QV QQK L DA P K K K F G P L L K K A A K P
340
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
350
360
370
380
390
400
410
I T E A L P S V I E P S F G I G R I MY S L L E H S F R MR R S Y F S L P P V V A P L K C S V L P L S NNA E F A P F V K K I S S A L T S V DV S HK V DD S S G S I
V V E S L P S V I E P S F G I G R I I Y C L Y E HC F S T R L N L F R F P P L V A P I K C T V F P L V QNQQ F E E V A K V I S K E L A S V G I S HK I D I T G T S I
V E NY L P NV I E P S F G I G R I I Y A I F E H S F W S R R S V L S F P P L V A P T K V L L V P L S NNA D L A E V V T E V S R V L R K E Q I P F K V DD S G V S I
V A A A V P NV I E P S F G I G R I L Y S M I E HV Y WA R R G V L S F P P A I A P T K V L I V P L S T HA S F R P L L QQ L MT K L R R MG I S NR V DD S S A S I
V E A A L P NV I E P S F G F G R I F Y S L L E HV Y WHR R G V L S L P I S V A P T K V L I V P L S T HQD F V P I T K R I T E D L R E L G I S C R A D E S S A S I
I NDT L - - - - - - - - - - - - - - - - - - -N S F - - - - - - - - - - - - -A P MK C V V L P L S G NA E F Q P F V R D L S Q E L I T V DV S HK V DD S S G S I
V M E Y L P S V I E P S F G L G R I MY T V F E HT F QV R R T F F S F P A V V A P F K C S V L P L S QNQ E F M P F V K E L S E A L T R HG I S HK V DD S S G S I
I QT A L P S V I E P S Y G I G R I MY A L L E H S F R QR R T F L A F K P L V A P I K C S I L P I S A NDT L V P V MDA V K E E L S HY E L S Y K V DD S S G T I
I QT S L P S V I E P S Y G I G R I MY A L L E H S F R QR R T F L A F K P L V A P I K C S V L P I S A NDT L I P V MDA V K E E L S R F E M S Y K V DD S S G T I
V E K W L P NV I E P S F G I G R I L Y S I F E HQ F WC R R G V L S L P P I V A P T K V L L V P L S NN S E L Q P I V K K V S QA L R K E K I P F K V DD S S A S I
V E A Y L P NV I E P S F G I G R I I Y S I F E H S F W S R R A V L S F P P L V A P T K V L L V P L S NHK D L A P V T A QV S K I L R K E Q I A F R V DD S G V S I
V L E Y L P NV I E P S F G L G R I MY T V F E HT F HV R R T F F S F P A V V A P F K C S V L P L S QNQ E F M P F V K E L S E A L T R NG V S HK V DD S S G S I
V QA H L P S V I E P S F G F G R L L Y S T L E HN F K I R R T F F S L P A V I A P Y K C S L L P L S NK P D F N P F I T S L S L A L K K L G I S HK V DT S S G S I
I K E T L P NV I E P S F G I G R I L Y C V L E HT Y WA R R G V L S L P A L V A P I K C L I V S I S QDA Q L R S K I H E V S R E MR K R G I A S R V DD S S A T I
V T E A L P G V I E P S F G I G R I I Y C L L E H S F K I R R S Y L S L P A L I A P V K C S I L P I S S NA I F ND L I N L L HK S F I NHG I S C K V DT S S A S I
V T E A L P G V I E P S F G I G R I I Y C L L E H S F K I R R S Y L S L P A L I A P V K C S I L P I S S NA I F ND L I N L L HK S F I NHG I S C K V DT S S A S I
A M E Y L P NV I E P S F G I G R I MY S I F E HT F R I R R T Y F S F P A T V A P Y K C S V L P L S QNQ E F M P F V R E L S E A L T R NG V S HK V DD S S G S I
V E NW L P NV I E P S F G I G R I L Y S I F E HQ F WA R R T V L S L P P L V A P T K V L L V P L S S NA E L Q P I V K K I S A F L R K E QV P F K V DD S S A S I
I K K Y L P HV I E P S F G L G R I I Y S I L E QNY Y T R R G V L S L P A I I A P V K A S I L P L T S S DR I A P F V QT I S K A L K E A N I S T K V DDT G NA I
I T DA L P S V V E P S F G I G R I MY S L L E H S F QC R R C Y F T L P P L V A P I K C S I L P L S NNT D F Q P Y T QK L S S A L T K A E L S HK V DD S S G S I
I T E A L P S V V E P S F G I G R I MY A L L E H S F QC R R C Y F T L P P L V A P L K C S I L P L S NNA E F Q P Y T QK L S S S L T K A E L S HK V DD S S G S I
L M E T V P DV I E P S F G I G R I L Y A L I E H S F Y L R R P V F R F K P A I A P V QC A I G Y L I H F D E F N E H I L N I K R F L T DNG L V V HV N E R S C S I
L F A Y A P NV I E P S F G V G R V L T A V L E H S F WV R K S V L S I P A S I A P V K V G L F P L L T K L E F NNK I A E I E K I C K NG F L S F K S NT T A V A I
V M E Y L P NV I E P S F G I G R I MY T V F E HT F R I R R T F F S F P A V V A P F K C S V L P L S QNQ E F M P F V K E L S E A L T R NG I S HK V DD S S G S I
V M E Y L P NV I E P S F G L G R I MY T V F E HT F HV R R T F F S F P A V V A P F K C S V L P L S QNQ E F M P F V K E L S E A L T R HG V S HK V DD S S G S I
V E A R L P NV I E P S F G I G R I I Y S I F E H S F W S R R A V L S F P P L V A P T K V L L V P L L NN P E L S K I T A QV S Q I L R K E Q I P F K V D E S G V S I
V MA Y L P S V I E P S F G I G R I L Y C L L E Q S Y WV R R A V F S F S P L L A P QK V A L L P L MV K P E L L A T I S E I R Q E L V MR G I S V R V DD S S V T I
V M E Y L P -Y F G I K I G L L R NG Y S HT Q L T - - - - - - - - - - - P W L K P QV C T - - - - - - - - - - - - - - - - L QK A C S R - - - - - - - - - - - - - V E T V L P NV I E P S F G I G R I L Y C L L E HNY WT R R G V L S F T P V V A P T K V L I V P L S R HDD F V P F V QK I S QK L R S V G V S S R V DD S S A T I
V A DA L P HV I E P S Y G I DR I F Y G I M E HA F D E E R L V MH F S S A V A P V QV A V L P L L T R K E L A D P A K E I I A K L R E K T L L V NY DD S G -T I
V L DY L P NV I E P S F G L G R I MY T V F E HT F R V R R T F F S F P A I V A P Y K C S V L P L S QNQ E F A P F V R E L S E A L T R NG V S HK V DD S S G S I
V L E Y L P S V I E P S F G L G R I MY T I L E HT F HV R R T F F S F P A V V A P F K C S V L P L S QNQ E F M P F V K E L S E A L T R NG V S HK V DD S S G S I
V E A A I P NV I E P S F G I G R I L Y S L L E HNY WV R R G V I S F P P A V A P V K V L I V P I S S K A E F A P HV R R L S QK L R S V G I S S R V DD S S A S I
V V E A L P S V I E P S F G I G R I I Y C L F E H S F Y T R L NV F R F P P I V A P I K C T V F P L V K NQ E F DDA A K V I DK A L T T A G I S H I I DT T A I S I
V T DA L P S V I E P S F G I G R I MY C M F E HA F Y I R K T V L R L T P V V A P I K T T I F P L V NDDK L NA I A A E MNK M L T T NG I S A K L DA T A I S V
V M E Y L P NV I E P S F G L G R I MY T V F E HT F HV R R T F F S F P A V V A P F K C S V L P L S QNQ E F M P F V K E L S E A L T R HG V S HK V DD S S G S I
L I K Y V P HV I E P A F G I G R I L QA I I E H S F NQR K T F F K F S P R V A P V K C S I L S V V Q S E E F DNV I F E L T S S L K K L G I S C K T DNA G V A L
I QK W L P NV I E P S F G I G R I L Y S I F E HQY WA R R G V L S L P P L V A P T K V L L V P L S N S A D L Q P I V T K V S A Y L R K QQ I P F K V DD S S A S I
I Y A C L P NV I E P S F G I G R L I F C I L E H S F R I R R QY L S L P Y K L A P I K C S I L S I S NNK A F Y P Y I K Q I QM L L NQY N I S S K I DN S S V S I
I Y QW L P NV I E P S F G I G R L I F C I I E H S F R T R R HY L S L P Y T L A P I K C S V L T I S NHK T F I P F V K QV QM I L N E F S I S S K I DN S S V S I
I Y K I L P NV I E P S F G I G R L I F C I L E H S F R V R R HY L S L P Y A L S P I K C S V L S I S NNK E F Y P Y I K Q I QT I L S E NN I S C K L DN S S V S I
V V E A L P S V I E P S F G I G R I I Y C L Y E H S F Y MR QNV F R F P P L V A P I K C T V F P L V QNQQY E DV A K I I S K S L T A A G I S HK I D I T G T S I
V L E Y L P S V I E P S F G L G R I MY T I L E HT F HV R R T F F S F P A V V A P F K C S V L P L S QNQ E F M P F V K E L S E A L T R NG V S HK V DD S S G S I
I E S V L P NV I E P S F G L G R I I Y C I F DHC F QV R R G F F S F P L Q I A P I K V F V T T I S NNDG F P A I L K R I S QA L R K R E I Y F K I DD S NT S I
V E E A M P NV I E P S F G L G R I L Y V L M E HA Y WT R R G V L S F P A S I A P I K A L I V P L S R NA E F A P F V K K L S A K L R N L G I S NK I DD S NA N I
V V E H L P S V I E P S F G V G R I L Y S I L E HN F K V R R T Y F T L P P I I A P Y K C C V L P L S S NK D F E P L V K T L A QA L S NA S I S HK V D S S S G S I
A MD F L P NV I E P S F G I G R I MY T I F E HT F HV R R T F F S F P A T V A P Y K C S V L P L S QNQ E F V P F V R E L S E E L T R NG V S HK V DD S S G S I
L T K L I P Y V I E P S F G V G R I F S A I L E H S F R MR R T F F H L P P K I S P I K C S I L P V I S H E K Y NDA I HK L K V G L T K V G V S S K V DDT G HA I
L L NH L P C V I E P S F G L G R L I F S I L E H S Y R V R R K Y V A L NK S I A P T K C S V L P L S S K E V F E P L I T R V QA H L R R L G I S HK V DK T G A S I
I L NH L P C V I E P S F G L G R L I F S I L E HA Y R V R R K Y V S L HK S I A P T K C S I L P L S S K E V F E P L I S R V Q S Q L R S L G I S HK V DK T G A S I
I MDA L N I T V C G DK T V T Y E MY N I NDT V V T T R - - - - F F P NA L S S - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - V M E Y L P S V I E P S F G V G R I L Y A L L E Q S Y WV R R A V F G L R P L I A P QK V A V F P L L MK P E L I R T V E E I K E R M L L HG I S T R T DD S G A S I
V M E Y L P S V I E P S F G V G R I L Y A L L E Q S Y WV R R A V F S MR P V I A P QK V A V L P L L V K P E L L R V V E E I R G DMV L R G I S T R T DD S G A S I
V E E A I P NV I E P S F G I G R I F Y S L L E H S F WT R R G V L S L P P L V A P I K A S I V P I S S N E K L S P L V K QV S R K L R S A G V A S R V DD S NA S I
V E E W F P NV I E P S F G I G R I L Y S L I E HC F WT R K G V L S F P P R I A P T K V L V V P L S S QK E L A P F T Q E V S K K L R QA R I S A K V DD S S A S I
420
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
430
440
450
460
470
480
490
G R R Y A R T D E I A I P Y G I T I D F DT L K - E P HT V T L R E R D S MK QV R I G L D E V A NT I R D L A T G - -R T S WK Y E P - - P - - I P T R V G K K K G
G K R Y A R T D E L G V P F A I T V D S - - - - - -DT S V T I R E R D S K DQV R V T L K E A A S V V S S V S E G - -K MT WK F E P - -A A - P P A R V G R K QG
G K R Y A R ND E L G T P F G I T I D F E S I K - -DG S V T L R E R D S T R QV R G S V T D I I R A I R D I T Y N - -G V T WK Y E P - - P - -V Q S K I G R K K G
G K R Y A R ND E L G T P F G I T V D F Q S V K - -DNT F T L R DR DT T K QV R A S E D E I L QA I K S L V DG - - E K T WK Y E P - - P P P P T T R I G R K K G
G K R Y A R ND E L G I P L G I T I D F D S V K - -DG T I T L R E R D S T K QV R A S K E E I L G A I E S L I S G - -K MNWR Y E P - - P P P P T T R L G R K K G
G R R Y A R T D E L G V P Y A V T V D F DT I K - E P HT V T L R E R D S MR QV R L P MA DV P T V V R D L S N S - -K I L W - - - - - - - - - - - - - - - - - - G R R Y A R T D E I G V A F G I T I D F DT V NK T P HT A T L R DR D S MR Q I R A E V S E L P S V V R D L A NG - - S I MWK Y E P - - P - -V P T R V G K K K G
G R R Y A R T D E I G I P F G I T V D F E S G K T T P Y T V T I R HA E T M S Q I R L E V S E L G R L I S D L V S G - -R QQWK Y E A - - P - - I P S R I G K K K G
G R R Y A R T D E I G I P F G I T V D F D S L K T T P F T V T I R HA E T M S Q I R L E V S E L G R L I S D L V A G - -R QQWK Y E A - - P - - I P S R I G K K K G
G K R Y A R ND E L G T P F G I T I D F D S V K - -DD S V T L R E R D S T K QV R G S I Q E I V E A I K D I T Y N - -DG T WK Y E P - - P - -V E S K F G K K K G
G K R Y A R ND E L G T P F G I T I D F D S V K - -DG S V T L R E R D S T K QV R G S V E A V I K A V R E I T Y N - -G A S WK Y E P - - P - -V Q S K F G R K K G
G R R Y A R T D E I G V A F G I T I D F DT V NK T P HT A T L R DR D S MR Q I R A E V S E L P S V V C D L A NG - - S I T WK Y E P - - P - -V P T R V G K K K G
G K R Y A R T D E I A V P F G I T V D F DT V K I E P HT A T L R E R D S L V Q I R A T V E E I P Q I V Y D L V Q E - -NT T WK Y E R - - P - - L P T R V G K R R G
G K K Y A R ND E L G T P F G C T V D F A T I Q - -NG T MT L R E R D S T S Q L I G P I E DV I S V V DQ L V K G - -V L DWK W E P - - P - -V P T R I G K K K G
G R R Y A R T D E I G I P F G I T I D F Q S V K - -DDT V T L R E R D S MK QV R I S S S E V P S V I S K I I NQ - -Q I T WK L E S - -A - - P P P M E MK R K G
G R R Y A R T D E I G I P F G I T I D F Q S V K - -DDT V T L R E R D S MK QV R I S S S E V P S V I S K I I NQ - -Q I T WK L E S - -A - - P P P M E MK R K G
G R R Y A R T D E I G V A F G I T I D F DT V NK N P HT A T L R DR D S MR Q I R A E V G E L P E I I R D L A NG - -A I T WK Y E P - - P - - I P T R V G K R K G
G K R Y A R ND E L G T P F G I T I D F D S V K - -D E S V T L R DR D S T K QV R G S L E D I V E A I K D I A Y N - -NV S WK Y E P - - P - -V E S K F G K K R G
G R K Y A R T D E I G V P F G V T I D F QT V E - -DNT V T L R E R DT T K QV R I P I S E L A S T L R K L C D L - -T V S WK Y Q P - - P P - P P T Q F G K K K G
G R R Y A R T D E I A I P Y G I T V D F DT L K - E P HT V T L R DR NT MK QV R V G L E E V V G V V K D L S T A - -R T T WK Y E P - - P - - I P T R V G K K K G
G R R Y A R T D E I A I P Y G I T V D F DT L K - E P HT V T L R DR NT MK QV R V G L E QV V G V V K D L A T A - -R T S WK Y E P - - P - - I P T R V G K K K G
G R K Y S S C D E L G I P F F I T F D P D F L K - -DR MV T I R E R D S MQQ I R V DV E K C P S I V L E Y I R G - -Q S R WN L QD - -T - -T T I N L R R R R E
G K K Y A QA D E A G I P F DV T V DY T S L S - -DNT V T L R DR DT T K Q I R I P I DK L V E T V HA L T Q L H P T T T F - - - - - - - - -M S MT L G K K R E
G R R Y A R T D E V G V A F G I T I D F DT V NR T P HT A T L R DR D S MR Q I R A E V S E L P A I I R D L A NG - -Y L T WK Y E P - - P - -V P T R V G K K K G
G R R Y A R T D E I G V A F G V T I D F DT V NK T P HT A T L R DR D S MR Q I R A E I S E L P S I V QD L A NG - -N I T WK Y E P - - P - -V P T R V G K K K G
G K R Y A R ND E L G T P F G V T I D F D S V T - -DG S I T L R E R D S T K QV R G S V A DV I K A I R E I T Y Q - -G V S WK Y E P - - P - -V E S K F G R K K G
G K K Y A R V D E L G I P F A I T C D F E G - - - -DG S V T L R E R DT A S QV R V P K L E V A S V V V D L C N P L Q P L T WK W E P - - P - -V A P E I G K R K G
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -K Y E P - - P - -V P T R V G K K K G
G R R Y S R ND E L G T P L G I T V D F QT V K - -DG T I T L R DR DT T V QV R A DQDK I V E A I Q E L V S G - -NK V WK Y E P - - P P R P T T R V G R K K G
G R R Y R R ND E I G T P Y S V T V DY DT L Q - -DG T V T I R DR D S MR QV R A P I NG I E NV L Y E L I Y R - -G R D F - - - - - - - - - - - - - - - - - - G K R Y A R T D E I G V A F G I T I D F DT V NK T P HT A T L R DR D S MR Q I R A E I S E L P K V V C S L A NG - -T MT WK Y E P - - P - -V P T R V G K K K G
G R R Y A R T D E I G V A F G I T I D F DT V NK T P HT A T L R DR D S MR Q I R A E V S E L P NV V R D L A NG - -N I T WK Y E P - - P - -V P T R V G K K K G
G R R Y A R ND E L G T P F G L T I D F QT L Q - -DG T F T L R E R D S T R QV R A E E E K I V DA I K A L V E G - - S K T WK Y E P - - P P K P T T R I G R K K G
G R R Y A R T D E I G V P F A V T V D S - - - - - -A T S V T I R E R D S K E Q I R V G I D E V A S V V K Q L T DG - -Q S T WK F E P - - P A -A P S R V G R K QG
G K R Y A R T D E L G V P F A V T V DHR S V T - - E NT V T V R E R D S C G QV R V P I P E V P G L L G R L C K M - -T V DWK Y V P - - P A - P P MR V G K K K G
G R R Y A R T D E I G V A F G V T I D F DT V NK T P HT A T L R DR D S MR Q I R A E I S E L P S I V R D L A NG - -N I T WK Y E P - - P - -V P T R V G K K K G
G K K Y A R T D E I G I P F A I T V DK E T L T - -A Q S V T L R E I E T T K QV R V P I A E V P R L I L E L S A G - - L I L WK K P P - - P - - P P QR V G R K K G
G K R Y A R ND E L G T P F G I T I D F D S I K - -D E S V T L R E R D S T K QV R G S F E DV V A A I K E I T Y T - -G T T WK Y E P - - P - -V E S K F G K K R G
G K K Y A R T D E I G I P F A V T I D F QT L K - -DK T I T L R E R D S M L Q I R I S M S H L V D I I N S M L HA - -K K NWK L E S - -V - - P I S HMG K K K G
G K K Y A R T D E I G I P F A V T I D F QT L K - -DK T V T L R E R D S M L QV R I D L S D L V E I V T S L L R Q - -K K T WK L E S - -A - - P P S H I G K R K G
G K K Y A R I D E I G I P F A V T I D F QT L K - -DK T I T L R DR D S M L Q I R V N I S E V S D I I N S L L S Q - -K S S WK L E S - - S - - P P S H I G K R K G
G K R Y A R T D E L G V P F A I T V D S - - - - - -T S S V T I R E R D S K DQ I R V NV E E A A S V V K S V T DG - -HT T WK F E P - -A A - P P A R V G R K QG
G R R Y A R T D E I G V A F G I T I D F DT V NK T P HT A T L R DR D S MR Q I R A E V S E L P S V V R D L A NG - -N I T WK Y E P - - P - -V P T R V G K K K G
G K K Y A R ND E L G T P F G I T I D F E T I K - -DQT V T L R E R N S MR QV R G T I T DV I S T I DK M L HN P D E S DWK Y E P - - P - -V Q S K F G R K K G
G R R Y A R ND E L G T P F G L T V D F E T L Q - -N E T I T L R E R D S T K QV R G S QD E V I A A L V S MV E G - -K S S F K Y E P - - P - -V P T R T G R R K G
G R R Y A R T D E I S V P F C I T V D F D S L K - E P HT V T L R DR DT F E QV R T L V S DV A D I I R D L S S D - -K I R WK Y E P - - P - -V P T R V G K K K G
G R R Y A R T D E I G V A F G I T I D F DT V NK T P HT A T L R DR D S MR Q I R A E V R E L P G I I R D L A NG - -T L S WK Y E P - - P - - I P T R V G K K K G
G R R Y A R T D E L G I P F G I T I DNDT L V - -DD S V T L R E I L T T K Q I R I P I NDV F R V V S D L A DG - - L I T WK DQM - - P - - - - - - - -R E K G K R Y A R T D E I G I P F C V T L D F Q S V N - -DDT V T L R E R DT MQQV R I K L DD L G E L I NN L L K D - -D I T WQ S R S - -D - -Q P V T F G K R K R
G K R Y A R T D E I G V P F C V T L D F Q S V N - -DDT V T L R E R D S MQQV R V K L DQV G Q L L S N L L K - - -D I T W - - - - - - - - - - - - -M I K R I K
- - - - - - - - - - - - - - - - - - - - - - - - - -HR S V S A E S - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E E E S V N P K K K L T E E E R K K R
G K K Y A R V D E L G V P F C V T C DM E N - - - -DG C V T L R E R D S A QQV R I P K E K V A D I V A E MC R P L R P R E WK W E P - - P - -V A T D I G K K K G
G K K Y A R V D E L G I P F C V T C D F E T - - - -DG C V T L R E R D S A R QV R I P R E A V A DV V A E L S R P L R P R E WK WQ P - - P - -V A S D I G K K K G
G R R Y A R ND E L G T P F A C T L D F A S L S - -K G T MT L R E R DT T A QR I G P I DQV I DV I R Q L C DG - - S L DWK W E P - - P - - L P T R V G K K K G
G K R Y A R ND E MG T P F G I T V D F DT V K - -DN S V T L R E R D S T R QV R G S I DA V I A A I NV MT A D - -DV A WK Y E P - - P - -V NT R S HR K K G
500
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
510
520
530
540
550
560
570
P DA A L K L P QV - - - - - -T P HT R C R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G S P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
P E A A A R L P T V - - - - - -T P S T K C K L R L L K L E R I K DY L L M E E E F V A NQ E DD L R G T P M S V G N L E E L I D E NHA I V S S S V G P E Y Y V G I
P A T A E K L P NV - - - - - -Y P S T R C K L K L L R M E R I K DH L L L E E E F V T N S E E E I R G T P L S I G T L E E I I DDDHA I V T S P T T P D F Y V S I
P S A A S K L P D I - - - - - - F P T S R C K L R Y L R MQR V HDH L L L E E E Y V E NM E DDMR G S P MG V G N L E E L I DDDHA I V S S A T G P E Y Y V S I
P S T A S K L P D I - - - - - - F P T S R C K L R Y L R MQR V HDH L L L E E E Y V E NM E DDMR G S P MG V G N L E E L I DDDHA I V S S A T G P E Y Y V S I
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -M E E E F I R NQ E - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - P DV A S K L P L V - - - - - -T P HT QC R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
P DA A S K L P A V - - - - - -T P HA R C R L K L L K S E R I K DY L L M E Q E F I QNQ E D E L R G T P MA V G S L E E I I DDQHA I V S T NV G S E HY V N I
P DA A S K L P A V - - - - - -T P HA R C R L K L L K S E R I K DY L L M E Q E F I QNQ E D E L R G T P MA V G S L E E I I DDQHA I V S T NV G S E HY V N I
P DT A V K L P S V - - - - - -Y P NT R C K L K L L K L E R I K DH L L L E E E F V T NQ E D E L R G Y P MA I G T L E E I I DDDHA I V S S T A S S E Y Y V S I
P A T A E K L P NV - - - - - -V P S T R C K L K L L R M E R I K DH L L L E E E F V T N S E E E I R G N P L S I G T L E E I I DDDHA I V T S P T M P DY Y V S I
P DA A S K L P L V - - - - - -T P HT QC R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
P DA A NK L P S V - - - - - -T P HT R C R L R L L K Q E R I K DY L L M E E E Y I R NQ E DD L R G S P M S V G T L E E I I D E NHA I V S T S V G S E HY V S I
P DA S S R L P A V - - - - - -Y P T T R C K L K L L K M E R I QDY L L M E E E F V S NQA D E L R G S P MG V G T L E E I I DDDHA I V S S G G G S E Y Y V G I
P P QY A R L P A V - - - - - -V P NA K C R L R L L K Y E R I K DY L MM E Q E F I T S M E DD L R G S P MN I G T L E E I I D E NHA I V S S S V G S E Y Y V N I
P P QY A R L P A V - - - - - -V P NA K C R L R L L K Y E R I K DY L MM E Q E F I T S M E DD L R G S P MN I G T L E E I I D E NHA I V S S S V G S E Y Y V N I
P DA A S K L P L V - - - - - -T P HT QC R L K L L K QDR I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
P DT A V K L P S V - - - - - -Y P S T R C K L K L L K L E R I K DH L L L E E E F V T NQ E D E L R G Y P M S I G T L E E I I DDDHA I V S S T A G S E Y Y V S I
A E T S T K L P V I - - - - - -T P H S K C K L K Q L K L E R I K DY L L M E Q E F L QNY D E E L R G D P L T V G N L E E I I DDNHA I V S S T V G P E HY V R I
P DA A MK L P QV - - - - - -T P HT R C R L K L L K L E R I K DY L MM E D E F I R NQ E DD L R G T P M S V G N L E E I I DDNHA I V S T S V G S E HY V S I
P DA A MK L P L V - - - - - -T P HT R C R L K L L K L E R I K DY L MM E D E F I R NQ E DD L R G T P M S V G N L E E I I DDNHA I V S T S V G S E HY V S I
G K A A S K P P QV - - - - - -Y P L MK C K L R Y L K L K K L A H L L S L E DN I L S L C E E Q L R G S P L S V G T L E E F V DDHHG I I T T G V G L E Y Y V N I
Y G NNNK L P Q I - - - - - -N P R T QC N L K K L R L E R L K D I L L I QR D F I E NQ E E E L R G S P L E V S K L H E M I DDHHA I I S S G NT MQY C V P V
P DA A S K P P L V - - - - - -T P HT QC R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
P DA A S K L P L V - - - - - -T P HT QC R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
P S T V E K L P S V - - - - - -Y P S T R C K L K L L R M E R I K DH L L L E E E Y V T N S E DD I R G T P L S I G T L E E I V DDDHA I V T S P T T P DY Y V S I
P DA A T R I P K V - - - - - -Y P NR A C L L R K Y R L E R C K DY L L L E E E F L R T I N E D I R G T P L E V A T L E E A V DD S HA I V S I S -G T E Y Y V P L
P DA A S K L P L V - - - - - -T P HT QC R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY I S I
T S A A A K L P A I - - - - - -Y P T S R C K L R L L R MQR T HDH L L L E E E F V E NQ E DDMR G S P MG V G T L E E M I DDDHA I V S S T T G P E Y Y V S I
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E Q L T E P P L F I A T I L E V NG E I A L I R QHG NNQ E - - - -V
P DA A S K L P L V - - - - - -T P HT QC R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
P DA A S K L P L V - - - - - -T P HT QC R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
A S A A A K L P S V - - - - - -Y P T S R C K L R L L R MQR I HDH L L L E E E Y V E NQ E DDMR G S P MG V G V L E E L I DDDHA I V S S T S G P E Y Y V S I
P E A A A R L P NV - - - - - -A P L S K C R L R L L K L E R V K DY L L M E E E F V A A Q E DD L R G T P M S V G S L E E I I D E S HA I V S S S V G P E Y Y V G I
I E G S T R L P NV - - - - - -A P Q S K C K L R M L K L E R V K DY L L M E E E F V G NQ E D E MR G A P M S V G S L E E I I DDT HG I V S S S I G P E Y Y V N I
P DA A S K L P L V - - - - - -T P HT QC R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
V E QA S K L P A A - - - - - -T P I T K C R L K L L K N E R I K DY L L L E Q E F I E NQQ E Q L R G T P M I V G T L E E F V N E NHA I V S S S V G P E S Y S G I
P DT A V K L P S V - - - - - -Y P NT R C K L K L L K L E R I K DH L L L E E E F V T NQ E D E L R G Y P M S I G T L E E I I DDDHA I V S S T A G S E Y Y V S I
T S G H S K L P NV - - - - - -T P NT K C R L K L L K L E R I K DY L L L E E E Y I T NQ E DD L R G S P V S V G T L E E L I D E NHG I I A T S V G P E Y Y V N I
V P G H S K L P T V - - - - - -T P NT K C R L K L L K L E R I K DY L L L E E E F I T NQ E DD L R G S P M S V G T L E E L I D E NHG I I A T S V G P E Y Y V N I
A T G H S K L P T V - - - - - -T P NT K C R L K L L K L E R I K DY L L L E E E F I T NQ E DD L R G S P M S V G T L E E L I D E NHG I I A T S V G P E Y Y V N I
P E A A A R L P T V - - - - - -T P HT K C K L R L L K M E R I K DY L L M E E E F V A NQ E DD L R G S P M S V G N L E E L I D E NHA I V S S S V G P E Y Y V G I
P DA A S K L P L V - - - - - -T P HT QC R L K L L K L E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
P A T A E K L P N I - - - - - -Y P S T R C K L K L L R M E R I K DH L L L E E E F V S N S E E E I R G N P L S I G T L E E I I DDDHA I V T S P T M P DY Y V S I
P DA S A K L P T V - - - - - - I P T T R C R L R L L K MQR I HDH L L M E E E Y V QNQ E D E I R G T P M S V G T L E E I I DDDHA I V S T A -G P E Y Y V S I
P DT A S K L P QV - - - - - -T P HT K C R L R L L K M E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G S L E E I I DDNHA I V S A S V G S E Y Y V S I
P DA A S K L P L V - - - - - -T P HT QC R L K L L K Q E R I K DY L L M E E E F I R NQ E DD L R G T P M S V G T L E E I I DDNHA I V S T S V G S E HY V S I
- -K P A P V R I V - - - - - -T P I S K C R L R Q L K L DR I K DY L L M E Q E F I R NQ E E Q L R G S P M L I G T L E E F I D E DHA I V S S - I G P E Y Y A N I
Q L A P V R I P T V - - - - - -T P N S K C R L R L L K L E R I K DY L L L E E E Y I T NK S DD I R G S P M S V G T L E E I I D E NHA I V T S S I G P E Y Y V N I
P V L T NR L P L V NV K G K L T P N S K C R L R L L K L E R I K DY L L L E E E Y I T NK S DD I R G S P M S V G T L E E I I D E NHA I V T S S I G P E Y Y V N I
G A K NT H I P T V - - - - - -T P NA K C Q L R L L K L E R V K DW L K M E E E F I NNC E E E V R G S P MMV G T L E E I V DDDHA I V S R S V -QD F Y V T I
P DA A A K L P K I - - - - - -Y P S R A C L L K Q L R L E R C K DY L L L E D E L L T M I T DA L R G M P L E V G T L E E V I DDT HA I V S T A -G S E Y Y V A M
P DT A A K L P K I - - - - - -Y P V K A C L L K Q L R L E R C K DY L L L E E E L L K T I G DA L R G M P L E V G T L E E V I DDT HA I V S T A -G S E Y Y V P M
P D S S S K L P T V - - - - - -Y P NT R C R L K L L K L E R I K DH L L L E E E F V QNQ E DD L R G S P MA V G T L E E I I DD E HA I V S S A T G P E Y Y V S I
P E NA NK L P G V - - - - - -Y P T T R C K L K L L K M E R I K DH L L L E E E F V QNQ E DT L R G S P MG V G T L E E I I DDDHA I V S S T S G P E Y Y V S I
590
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
600
610
620
630
640
L S F V DK D - - -Q L E P G C S V L L NHK V HA V V G V L G DDT D P MV T V MK L E K A P Q E T Y A D I G G L DT Q I Q E I K
L S F V DK D - - -Q L E P G C S I L MHNK V L S V V G I L QD E V D P MV S V MK V E K A P L E S Y A D I G G L E A Q I Q E I K
L S F V DK E - - - L L E P G C S V L L HHK T M S I V G V L QDDA D P MV S V MK I DK S P T E S Y ND I G G L E A Q I Q E I K
M S F V DK D - - - L L E P G A S I L L HHK S V S V V G V L T E E S D P L V S V MK L DK A P T E S Y A D I G G L E S Q I Q E V R
M S F V DK D - - - L L E P G A S I L L HHK S V S V V G V L T E E S D P L V S V MK L DK A P T E S Y A D I G G L E T Q I Q E V R
- - - - - - - - - - - - - - - - - - - - - -K V HA V V G V L G DDT D P MV S V MK L E K A P Q E T Y A D I G G L DT Q I Q E I K
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK V E K A P Q E T Y A D I G G L DNQ I Q E I K
M S F V DK E - - -Q L E P G C S V L L NHK NHA V I G V L S DDT D P MV S V MK L E K A P Q E T Y A DV G G L DQQ I Q E I K
M S F V DK E - - -Q L E P G C S V L L NHK NHA V I G V L S DDT D P MV S V MK L E K A P Q E T Y A DV G G L DQQ I Q E I K
M S F V DK G - - - L L E P G C S V L L HHK T V A V V G V L QDDA D P MV S V MK L DK S P T E S Y A D I G G L E S Q I Q E I K
L S F V DK E - - - L L E P G C S V L L HHK T M S I V G V L QDDA D P MV S V MK I DK S P T E S Y S D I G G L E S Q I Q E I K
L S F V DK D - - - L L E L G C S V L L NHK V HA V I G V L MDDT D P L V I V MK V E K T P Q E T Y A D I R A L DNQ I Q E I K
L S F V DK D - - - L L E P G C T V L MNHK V HA V V G F L G DDV D P L V T V MK L E K A P K E S Y A D I G G L DT Q I T E I K
M S F V DK D - - - L L E P G C S V L L HHK T HA V V G V L A DDT D P MV S V MK L DK A P T E S Y A D I G G L E S Q I Q E I K
L S F V DK N - - -Q L E P G S S V L L HNK V Y S V V G I MND E V D P L V S V MK V DK A P L E S Y A D I G G L E QQ I Q E I K
L S F V DK N - - -Q L E P G S S V L L HNK V Y S V V G I MND E V D P L V S V MK V DK A P L E S Y A D I G G L E QQ I Q E I K
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK V E K A P Q E T Y A D I G G L DNQ I Q E I K
M S F V DK G - - - L L E P S C S V L L HHK T V S I V G V L QDDA D P MV S V MK L DK S P T E S Y A D I G G L E S Q I Q E I K
M S F V DK S - - -K L Y L G A T V L L NNK T L S V V G V I DG E V D P MV NV MK V E K A P T E S Y S D I G G L E A QV Q E MK
L S F V DK D - - -Q L E P G C S V L L NHK V HA V V G V L S DDT D P MV T V MK L E K A P Q E T Y A D I G G L DT Q I Q E I K
L S F V DK D - - -Q L E P G C S V L L NHK V HA V V G V L S DDT D P MV T V MK L E K A P Q E T Y A D I G G L DT Q I Q E I K
M S F V DK D - - - L L E P G C T V L L NY K DN S V V G V L E G E MD P MV NV MK L E K A P S E T Y A D I G G L E E Q I Q E I K
L S I V DR E - - - L L E P G V QV L T HNHNK A I V G V L QND E D P HV S V MK V DK A P L E S Y A DV G G L E K Q I Q E I K
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK L E K A P Q E T Y A D I G G L DNQ I Q E I K
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK V E K A P Q E T Y A D I G G L DNQ I Q E I K
L S F V DK E - - - L L E P G C S V L L HHK T M S V V G V L QDDA D P MV S V MK MDK S P T E NY S D I G G L E A Q I Q E I K
M S F V DK E - - -Q L E L G C S V L L HDR QH S I V G V L K DDV D P L V S V MK V DK A P E DT Y A D I G G L E QQ I Q E I K
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK V E K A P Q E T Y A D I G G L DNQ I Q E I K
M S F V DK D - - - L L E P G A S V L L HHK S V S I V G V L T DDA D P L V S V MK L DK A P T E S Y A D I G G L E QQ I Q E V R
L T Q I P E E C L G K I E P G MR V A V N -G A Y S I I S I V S R A A DV R A QV M E L I N S P G V DY S M I G G L DDV L Q E V R
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK V E K A P Q E T Y A D I G G L DNQ I Q E I K
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK V E K A P Q E T Y A D I G G L DNQ I Q E I K
M S F V DK D - - - L L E P G A S V L L HHK S V S I V G V L T DDT D P A V S V MK L DK A P T E S Y A D I G G L E QQ I Q E V R
L S F V DK D - - -Q L E P G C S I L MHNK V L S V V G I L QD E V D P MV S V MK V E K A P L E S Y A D I G G L DA Q I Q E I K
A S F V DK S - - -Q L E P G C A V L L HHK N S A V V G T L A DDV D P MV S V MK V DK A P L E S Y A DV G G L E DQ I Q E I K
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK V E K A P Q E T Y A D I G G L DNQ I Q E I K
M S F V DK D - - -Q L E P G C S V L L NQR S Y A V V G I MQD E I D P L L NV MK V DK A P L E S Y A D I G G L E QQ I Q E I K
M S F V DK G - - - L L E P G C S V L L HHK T V S V V G V L QDDA D P MV S V MK L DK S P T E S Y A D I G G L E S Q I Q E I K
L S F V DK D - - - L L E P G C S V L L NNK T N S V V G I L L D E V D P L V S V MK V E K A P L E S Y A D I G G L E S Q I Q E I K
L S F V DK D - - - L L E P G C S V L L NNK T N S V V G I L L D E V D P L V S V MK V E K A P L E S Y A D I G G L E S Q I Q E I K
L S F V DK D - - - L L E P G C S V L L NNK T N S V V G I L L D E V D P L V S V MK V E K A P L E S Y A D I G G L E S Q I Q E I K
L S F V DK D - - -Q L E P G C A I L MHNK V L S V V G L L QD E V D P MV S V MK V E K A P L E S Y A D I G G L DA Q I Q E I K
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK V E K A P Q E T Y A D I G G L DNQ I Q E I K
L S F V DK E - - - L L E P G C S V L L HHK T M S I V G V L QDDA D P MV S V MK MDK S P T E S Y S D I G G L E S Q I Q E I K
M S F V DK D - - -M L E P G C S V L L HHK A M S I V G L L L DDT D P M I NV MK L DK A P T E S Y A D I G G L E S Q I Q E I K
L S F V DK D - - -Q L E P G C T V L L NHK V L A I V G V L G DDT D P MV S V MK L E K A P Q E S Y A D I G G L DT Q I Q E I K
L S F V DK D - - - L L E P G C S V L L NHK V HA V I G V L MDDT D P L V T V MK V E K A P Q E T Y A D I G G L DNQ I Q E I K
L S F V DK D - - -Q L E P G S T V L L NNR T MA V V G I MQD E V D P M L NV MK V E K A P L E C Y A D I G G L E QQ I Q E V K
L S F V DK E - - - L L E P G C S V L L HNK T N S I V G I L L DDV D P L V S V MK V E K A P L E S Y DD I G G L E E Q I Q E I K
L S F V DK E - - - L L E P G C S V L L HNK T N S I V G I L L DDV D P L V S V MK V E K A P L E S Y DD I G G L E E Q I Q E I K
S S F V DR K - - -A L Q I G C S V L L H E K A L T I V G L L DDDA N P L V DV MK V E NA P L E S F A D I G G L E DQ I V D I K
L S F V DK E - - -K L E L G C S V L L HDR Y HNV V G L L E S NT D P L V S V MK V DK A P Q E T Y A D I G G L E DQ I Q E I K
L S F V DK E - - -K L E L G C S V L L HDR QH S V V G V L QN S I D P HV S I MK V E K A P Q E T Y A D I G G L E E Q I Q E I K
M S F V DK D - - - L L E P G C S V L L HHK A MA I V G V L S DDA D P MV S V MK L DK A P S E S Y A D I G G L E T Q I Q E I K
M S F V DK D - - - L L E P G C S V L L HHK T V S V V G V L QDDA D P MV S V MK L DK A P T E S Y A D I G G L E S Q I Q E I K
650
E SV
EAV
EAV
E SV
E SV
E SV
E SV
EAV
EAV
E SV
E SV
E SV
E SV
E SV
EAV
EAV
E SV
EAV
EA I
E SV
E SV
E SV
EAV
E SV
E SV
EAV
EAV
E SV
E SV
E SV
E SV
E SV
E SV
EAV
EAV
E SV
EAV
EAV
EAV
EAV
EAV
EAV
E SV
E SV
EAV
E SV
E SV
EAV
EAV
EAV
EAV
EAV
EAV
EAV
E SV
660
E L P L T H P E Y Y E E MG
E L P LT H P E LY ED I G
E L P L T H P E L Y E E MG
E L P L L H P E L Y E E MG
E L P L L H P E L Y E E MG
E L P L T H P E Y Y E E MG
E L P L T H P E Y Y E E MG
E L P L T H P E Y Y E E MG
E L P L T H P E Y Y E E MG
E L P L T H P E L Y E E MG
E L P L T H P E L Y E E MG
E L P L T H P E Y Y E E MG
E L P L T H P E Y Y E E MG
E L P L T H P E L Y E E MG
E I P L T H P E L Y DD I G
E I P L T H P E L Y DD I G
E L P L T H P E Y Y E E MG
E L P L T H P E L Y E E MG
E L P LT H P E LY E E I G
E L P L T H P E Y Y E E MG
E L P L T H P E Y Y E E MG
E L P L T N P E L Y Q E MG
E L P L SH P E LY E E I G
E L P L T H P E Y Y E E MG
E L P L T H P E Y Y E E MG
E L P L T H P E L Y E E MG
E F P L SH P E LY D E I G
E L P L T H P E Y Y E E MG
E L P L L H P E L Y E E MG
E L P LT E P E L F ED LG
E L P L T H P E Y Y E E MG
E L P L T H P E Y Y E E MG
E L P L L H P E L Y E E MG
E L P LT H P E LY ED I G
E L P LT H P E LY ED I G
E L P L T H P E Y Y E E MG
E L P L T H P E I Y E DMG
E L P L T H P E L Y E E MG
E L P LT H P E LY ED I G
E L P LT H P E LY ED I G
E L P LT H P E LY ED I G
E L P LT H P E LY ED I G
E L P L T H P E Y Y E E MG
E L P L T H P E L Y E E MG
E L P L T H P E L Y E E MG
E L P L T H P E Y Y E E MG
E L P L T H P E Y Y E E MG
E L P L T H P E I Y E DMG
E L P L T R P E L Y DD I G
E L P L T R P E L Y DD I G
E L P LT H P EQ FD E I G
E F P L SH P E L FD EV G
E F P L SH P E LY D EV G
E L P L T H P E L Y E E MG
E L P L T H P E L Y E E MG
670
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I R P PK GV
I R P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I R P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I E P P SGV
I K P PK GV
I K P PK GV
I K P PK GV
I R P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I R P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
I K P PK GV
V Q P PK GV
V K P PK GV
I K P PK GV
I R P PK GV
I K P PK GV
680
I LY G P PGT GK T
I LY G E PGT GK T
I LY GA PGT GK T
I LY GA PGT GK T
I LY GA PGT GK T
I LY G P PGT GK T
I LY G P PGT GK T
I LY GC PGT GK T
I LY GC PGT GK T
I LY GA PGT GK T
I LY GA PGT GK T
I LY G P PGT GK T
I LY G P PGT GK T
I LY GV PGT GK T
I LY G P PGT GK T
I LY G P PGT GK T
I LY GA PGT GK T
I LY GA PGT GK T
I LY G E PGT GK T
I LY G P PGT GK T
I LY G P PGT GK T
I LY G L PGT GK T
I LY G P PGT GK T
I LY G P PGT GK T
I LY G P PGT GK T
I LY GA PGT GK T
I LY GV PGT GK T
I LY G P PGT GK T
I LY GA PGT GK T
L L HG A P G T G K T
I LY G P PGT GK T
I LY G P PGT GK T
I LY GA PGT GK T
I LY G E PGT GK T
I LY GA PGT GK T
I LY G P PGT GK T
I LY G E PGT GK T
I LY GA PGT GK T
I LY G P PGT GK T
I LY G P PGT GK T
I LY G P PGT GK T
I LY G E PGT GK T
I LY G P PGT GK T
I LY GA PGT GK T
I LY GA PGT GK T
I LY GA PGT GK T
I LY G P PGT GK T
I MY G P P G T G K T
I LY G P PGT GK T
I LY G P PGT GK T
I L FG P PGT GK T
I LY GV PGT GK T
I LY GV PGT GK T
I LY GV PGT GK T
I LY GA PGT GK T
690
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L A K A V A NQT S A T
LAK AV AN ST SAT
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
LAK AV AN ET SAT
LAK AV AN ET SAT
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
LAK AV AN ET SAT
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NR T S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
I A K A I A S QA K A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
LAK AV AN ST SAT
LAK AV AN ST SAT
L A K A V A NQT S A T
LAK AV AN ET SAT
L A K A V A NQT S A T
LAK AV AN ET SAT
LAK AV AN ET SAT
LAK AV AN ET SAT
LAK AV AN ST SAT
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
LAK AV AN ET SAT
LAK AV AN ET SAT
LAK AV AN ET SAT
LAR AV AK ST SAT
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
L A K A V A NQT S A T
700
710
720
730
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A DD L S P S I
F L R I V G S E L I QK Y L G DG P R L C R Q I F K V A A E NA P S I
F L R I V G S E L I QK Y L G DG P R L V R Q I F QV A A E HA P S I
F L R I V G S E L I QK Y L G DG P R L V R Q I F QV A A E HA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R I V G S E L I QK Y L G DG P K MV R E L F R V A E E NA P S I
F L R I V G S E L I QK Y L G DG P K MV R E L F R V A E E NA P S I
F L R I V G S E L I QK Y L G DG P R L C R Q I F Q I A A DHA P S I
F L R I V G S E L I QK Y L G DG P R L C R Q I F K V A A E NA P S I
F L R V V G S Q L I QK Y L G NG P K L I R E L F R V V E E HA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R I V G S E L I QK Y L G DG P K L V R E L F R V A E E NA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E NA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E NA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R I V G S E L I QK Y L G DG P R L C R Q I F Q I A G E L A P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A D E C A P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R V V G T E L I Q E Y L G E G P K L V R E L F R V A DMHA P S I
F L R I V G S E L I QK Y L G DG P K L V R E L F QA A K D S A P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HG P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R I V G S E L I QK Y L G DG P R L C R Q I F K V A A E NA P S I
F L R V V G S E L I QK Y S G E G P K L V R E L F R V A E E H S P A I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R I V G S E L I QK Y L G DG P R L V R Q L F QV A A E NA P S I
F I R M S G S D L V QK F V G E G S R L V K D I F Q L A R DK S P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R I V G S E L I QK Y L G DG P R L V R Q L F QV A A E NA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A D E L S P S I
F L R I V G S E L I QK Y L G DG P K L V R E L F R V A D E M S P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R V V G S E L I QK Y QG DG P K L V R E L F R V A E E HA P S I
F L R I V G S E L I QK Y L G DG P R L C R Q I F Q I A G E HA P S I
F L R V V G S E L I QK Y L G DG P K L V R E M F K V A E E HA P S I
F L R V V G S E L I QK Y L G DG P K L V R E M F K V A E DHA P S I
F L R V V G S E L I QK Y L G DG P K L V R E M F K V A E DHA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A DD L S P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R I V G S E L I QK Y L G DG P R L C R Q I F K V A G E NA P S I
F L R V V G S E L I QK Y L G DG P R L V R Q L F NA A E E H S P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A E E HA P S I
F L R I V G S E L I QK Y A G E G P K L V R E L F R V A E E HA P S I
F L R V V G S E L I QK Y L G E G P K L V R E M F K V A E DNA P S I
F L R V V G S E L I QK Y L G E G P K L V R E M F K V A E DNA P S I
F L R V V G S E L I QK Y L G E G P K L V R E L F K T A H E L A P S I
F L R V V G S E L I QK Y S G E G P K L V R E L F R V A E E N S P S I
F L R V V G S E L I QK Y S G DG P K L V R E L F R V A E E N S P S I
F L R V V G S E L I QK Y L G DG P K L V R E L F R V A D E HA P S I
F L R I V G S E L I QK Y L G DG P R L C R Q I F Q I A A E HA P S I
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
I
V
V
V
V
V
V
V
L
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
I
I
V
V
V
V
V
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
F MD E I
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
FI DEI
740
DA V G T K R Y D S N
DA V G T K R Y DA N
DA I G T K R Y D S N
DA I G T K R Y D S T
DA I G T K R Y D S T
DA V G T K R Y D S N
DA I G T K R Y D S N
DA V G T K R Y D S N
DA V G T K R Y D S N
DA I G T K R Y E S T
DA I G T K R Y D S N
DA I G T K R Y D S N
DA I G T K R Y E S N
DA I G T K R Y D S T
DA V G T K R HD S Q
DA V G T K R HD S Q
DA I G T K R Y D S N
DA I G S K R Y E S S
DA V G T K R Y D S Q
DA V G T K R Y D S N
DA V G T K R Y D S N
DA I G G K R Y NT S
DA V G T K R Y DA H
DA I G T K R Y D S N
DA I G T K R Y D S N
DA I G T K R Y E S N
DA I G T K R Y DT D
DA I G T K R Y D S N
DA I G T K R Y D S T
DA V G S MR T Y DG
DA I G T K R Y D S N
DA I G T K R Y D S N
DA I G T K R Y D S T
DA V G T K R Y DA H
DA V G T K R Y D S Q
DA I G T K R Y D S N
DA V G T K R Y D S H
DA I G T K R Y E S T
DA V G T K R Y E A T
DA V G T K R Y E A T
DA V G T K R Y E A T
DA V G T K R Y DA H
DA I G T K R Y D S N
DA I G T K R Y D S N
DA I G T K R Y DA Q
DA I G T K R Y E S N
DA I G T K R Y D S N
DA V G S K R Y NT S
DA I G T K R Y DA T
DA I G T K R Y DA T
DA V G T K R Y D S T
DA I G T K R Y DT D
DA I G T K R Y DT D
DA V G T K R Y D S N
DA I G T K R Y E S T
750
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
760
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S S G E R E I QR T M L E L
S G G E R D I QR T M L E L
S G G E R D I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E V QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G R R E V QR T M L E L
S G G E K E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E K E I QR T M L E L
S S G T K E V QR T M L E L
S G G E R E I QR T M L E L
S G G E R E V QR T M L E L
T S G S A E V NR T M L Q L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E K E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G A E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E R E I QR T M L E L
S G G E K E I QR T M L E L
S G G E K E I QR T M L E L
S G G E K E I QR T M L E L
S S G E R E V QR T M L E L
S G G A K E V QR T M L E L
S S G A K E V QR T M L E L
S G G E R E I QR T L L E L
S G G E R E V QR T M L E L
770
780
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I L A T NR
L NQ L DG F D -DR G DV K V I MA T NK
L NQ L DG F D -DR G DV K V I MA T NK
L NQ L DG F D -DR G DV K V I MA T NK
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V L MA T NR
L NQ L DG F D - S R G DV K V L MA T NR
L NQ L DG F D -DR G D I K V I MA T NK
L NQ L DG F D -DR G DV K V I MA T NK
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D -T R G DV K V I MA T NK
L NQ L DG F E -A R G DV K V I MA T NK
L NQ L DG F E -A R G DV K V I MA T NK
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D -DR G D I K V I MA T NK
L NQ L DG F D -A R T DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D -T R ND I K V I MA T NK
L NQ L DG F D -T R G E V K V I I A T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D -DR G DV K V I MA T NK
L T Q L DG F D - S S NDV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D -DR G DV K V I MA T NK
L A E MDG F D - P K G NV K V V A A T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D -DR G DV K V I MA T NK
L NQ L DG F D - S R G DV K V I L A T NR
L NQMDG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R A DV K V I L A T NK
L NQ L DG F D -DR G D I K V I MA T NK
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I L A T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D -DR G DV K V I MA T NK
L NQ L DG F DT S QR D I K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R G DV K V I MA T NR
L NQ L DG F D - S R T DV K V I L A T NK
L NQ L DG F D - S Q S DV K V I MA T NK
L NQ L DG F D - S Q S DV K V I MA T NK
L NQ L DG F D -DR G D I K V I MA T NR
L T Q L DG F D - S C NDV K V I MA T NR
L T Q L DG F D - S S NDV K V I MA T NR
L NQ L DG F D -T R HDV K V I MA T NR
L NQ L DG F D -DR G DV K V I MA T NK
790
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
ET LD PA
E S LD PA
E S LD PA
ET LD PA
ET LD PA
ET LD PA
ET LD PA
E S LD PA
E S LD PA
E S LD PA
E S LD PA
ET LD PA
D S LD PA
EN LD PA
E S LD PA
E S LD PA
ET LD PA
E S LD PA
ET LD PA
ET LD PA
ET LD PA
EA LD PA
E S LD SA
ET LD PA
ET LD PA
E S LD PA
DT L D P A
ET LD PA
ET LD PA
D L LD PA
ET LD PA
ET LD PA
E S LD PA
E S LD PA
E S LD PA
ET LD PA
E S LD PA
E S LD PA
D S LD PA
D S LD PA
D S LD PA
E S LD PA
ET LD PA
ET LD PA
SD LD PA
ET LD PA
ET LD PA
E S LD PA
E S LD PA
E S LD PA
ET LD PA
ET LD PA
ET LD PA
E S LD PA
E S LD PA
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
L
I R PGR
LR PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R AGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
LR PGR
I R PGR
I R PGR
I R PGR
LR PGR
LR PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
LR PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
I R PGR
800
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
F DR S I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
I DR K I
810
E F P L PD EK T K R R I
E F P L PD I K T R R R I
L F E N P DV S T K R K I
L F E N P DQNT K K K I
L F E N P DQNT K K K I
E F P L PD EK T K R R I
E F P L PD EK T K K R I
E F P L PD EK T K R R I
E F P L PD EK T K R R I
L F E N P DA NT K K K I
L F EN PD L ST K R K I
E F P L PD EK T K K R I
E F PMPD EK T K R R I
E F P L P DT K T K R H I
E L P N P DC K T K R R I
E L P N P DC K T K R R I
E F P L PD EK T K R R I
L F E N P D S NT K K R I
E F P L PD I K T K R K I
E F P L PD EK T K R R I
E F P L PD EK T K R R I
E F G M P DA A T K K K I
E F P L PD I K T K R K I
E F P L PD EK T K K R I
E F P L PD EK T K K R I
L F EN PD I T T K R K I
E F P F PD EK T K R R I
E F P L PD EK T K K R I
L F E N P DQNT K R K I
EV P L PD EK GR V E I
E F P L PD EK T K K R I
E F P L PD EK T K K R I
L F E N P DQNT K R K I
E F P L PD I K T R R R I
E F P L P DV K T K R H I
E F P L PD EK T K K R I
E F P L P DV K NK K K I
L F E N P DA NT K K K I
Q L P N P DT K T K R R I
Q L P N P DT K T K R R I
Q L P N P DT K T K R R I
E F P L PD I K T R R R I
E F P L PD EK T K K R I
L F EN PD L ST K K K I
L F EN PD EAT K R K I
E F P L PD EK T K R R I
E F P L PD EK T K R R I
E F P V P DMK T K K K I
Q L PN PD SK T K R K I
Q L PN PD SK T K R K I
E L P F P DNK T K L K I
E F P F PD EK T K K MI
E F P F PD EK T K K MI
E F P L P DQK T K MH I
L F EN PD ST T K R K I
820
F N I HT A R MT L A E DV N L
F Q I HT S K MT L A E DV N L
L G I HT S K MN L S A DV D L
F T L HT S K M S L A DDV D L
F T L HT S K M S L G DDV D L
F T I HT S R MT L A DDV N L
F Q I HT S R MT L A DDV T L
F Q I HT S R MT L G DDV N L
F Q I HT S R MT L G K E V N L
L T I HT S K M S L A DDV N L
L G I HT S K MN L S S DV D L
F Q I HT S R MT L A DA V T L
F N I HT A R MT L S DDV NV
F K L HT S R M S L A DDV D I
F Q I HT S K MT L S DDV D L
F Q I HT S K MT L S DDV D L
F Q I HT S R MT V A E DV S L
L H I HT S K M S L A DDV K L
F E I HT A K MN L S E DV N L
F T I HT S R MT L A E DV N L
F T I HT S R MT L A E DV N L
F D I HT S R MT L D E S V N I
F E I HT S K MT L E E G V DM
F Q I HT S R MT L A DDV T L
F Q I HT S R MT L A DDV T L
V G I HT S K MN L A E DV D L
F E I HT S R M S L A E DV D I
F Q I HT S R MT L A DDV T L
F T L HT S K M S L N E DV D L
L K I HT R K MK L A DDV D F
F Q I HT S R MT L A DDV T L
F Q I HT S R MT L A DDV T L
F T L HT S K M S L N E DV D L
F Q I HT S K MT L A DDV N L
F N I HT G R MN L S A DV Q L
F Q I HT S R MT L A DDV T L
F Q I HT S K MN L G E DA N L
L T I HT S K M S L A DDV N L
F Q I HT S K MT M S P DV D L
F Q I HT S K MT M S P DV D L
F Q I HT S K MT M S P DV D L
F Q I HT A R MT L A DDV N L
F Q I HT S R MT L A DDV T L
L G I HT S K MN L S E DV N L
F T I HT S K MN L G E DV N L
F N I HT S R MT L S NDV N L
F Q I HT S R MT V A DDV T L
F E I HT S K MA L G E E V N F
F E I HT S K MT M S K DV D L
F E I HT S K MT M S K DV D L
F Q I HT A NMH L A P DV N L
F E I HT S R M S L A E DV D L
F E I HT S R M S L A E DV D I
F K L HT S R MN L D S DV D L
MG I HT S K MN L NDDV D L
840
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
850
S E L I MA K DD L S G A D I K A I C T E A G L MA
E E F V MT K D E F S G A D I K A I C T E A G L L A
E T L V T S K DD L S G A D I K A MC T E A G L L A
D E F I NQK DD L S G A D I R A I C T E A G L MA
D E F I NQK DD L S G A D I R A I C T E A G L MA
S E L I M S K DD L S G A D I K A I C T E A G L MA
DD L I MA K DD L S G A D I K A I C T E A G L MA
E E F I T A K D E L S G A D I K A MC T E A G L L A
E E F I T A K D E L S G A D I K A MC T E A G L L A
D E I V T G K DD L S G A D I K A I C T E A G L L A
E N L V T S K DD L S G A D I QA MC T E A G L L A
DD L I MA K DD L S G A D I K A I C T E A G L MA
D E HV QA K DD L S G A D I K A I C T E A G L L A
E E L V MT K D E L S G A D I K A V C T E A G L L A
E E F I MA K DD I S G A D I K A I C T E A G L L A
E E F I MA K DD I S G A D I K A I C T E A G L L A
DD L I L A K DD L S G A D I K A I C T E A G L MA
D E L V T S K D E L S G A D I K A MC T E A G L L A
E E F V M S K DD L S G A D I K A I C T E S G L L A
S E L I MA K DD L S G A D I K A I C T E A G L MA
S E L I MA K DD L S G A D I K A I C T E A G L MA
E L L I T SK ED L SGAD I K A I CT EAGMI A
E E F V M S K DD L S G A D I K A I C T E A G L L A
D E L I MA K DD L S G A D I K A I C T E A G L MA
DD L I MA K DD L S G A D I K A I C T E A G L MA
DN L V T S K DD L S G A D I K A MC T E A G L L A
S E F I HA K D E M S G A DV K A I C T E A G L L A
DD L I MA K DD L S G A D I K A I C T E A G L MA
E E F I A QK DD L S G A D I K A I C S E A G L MA
EK LAK V LT GK SGA E I SV I V K EAG I FV
DD L I MA K DD L S G A D I K A I C T E A G L MA
DD L I MA K DD L S G A D I K A I C T E A G L MA
E E F I A QK DD L S G A D I K A I C S E A G L MA
E E F V MT K D E F S G A D I K A I C T E A G L L A
E E F V MA K D E L S G A D I K A L C T E A G L L A
DD L I MA K DD L S G A D I K A I C T E A G L MA
D E F I NA K D E L S G A D I K A MC T E A G L L A
D E L V T S K DD L S G A D I K A I C T E A G L L A
E E FV MSK D E L SGAD I K A I CT EAG L LA
E E FV MSK D E L SGAD I K A I CT EAG L LA
E E FV MSK D E L SGAD I K A I CT EAG L LA
E E F V MT K D E F S G A D I K A I C T E A G L L A
DD L I MA K DD L S G A D I K A I C T E A G L MA
E T L V T T K DD L S G A D I QA MC T E A G L L A
E E L I QC K DD L S G A E I K A I V S E A G L L A
D E Y I T S K DD L S G A D I K A I C T E A G L MA
DD L I L A K DD L S G A D I K A I C T E A G L MA
DT F V HV K DD L S G A D I K A MC T E A G L L A
D E F V V NK DD L S G A D I K A MC T E A G L L A
D E F V V NK DD L S G A D I K A MC T E A G L L A
M E F A NT K D E I S G A D I K A I C S E A G L I A
S E F I HA K D E M S G A D I K A I C T E A G L L A
S E F I HA K E E M S G A D I K A I C T E A G L L A
E E F V A MK DD L S G A D I K S L V T E A G L L A
E E F V S S K D E L S G A D I K A MC T E A G L L A
860
870
880
890
900
910
L R E R R MK V T N E D F K K S K E S V L Y R K K E G T P - E G L Y Y L DA QA T T S MD P R V L DA MM P Y L T
L R E R R MK V T HV D F K K A K E K V M F K K K E G V P - E G L Y Y L DMQA T T P I D P R V F DA MNA S Q I
L R E R R MQV T V E D F K QA K E R V MK NK V E E N L - E G L Y Y L DMQA T T P T D P R V V DT M L K F Y T
L R E R R MR V QMDD F R A A R E R I MK T K QDG G P V E G L Y Y L DMQA T T P V D P R V L DA M L P Y L T
L R E R R MR V QMDD F R A A R E R I MK T K QDG G P V E G L Y Y L DMQA T T P T D P R V L DA M L P Y L T
L R E R R MK V T N E D F K K S K E S V L Y R K K E G T P - E G L Y - - - - - - - - - - - - - - - - - - - - - - L R E R R MK V T N E D F K K S K E NV L Y K K Q E G T P - E G L Y Y MDV QA T T P L D P R V L DA M L P Y L V
L R E R R MR V T M E D F QK S K E NV L Y R K K E G A P - E E L Y Y L DV QA T S P MD P R V V DA M L P Y M I
L R E R R MR V T M E D F QK S K E NV L Y R K K E G A P - E E L Y Y L DV QA T A P MD P R V V DA M L P Y M I
L R E R R MQV K A E D F K S A K E R V L K NK V E E N L - E G L NY L DV QA T T P V D P R V L DK M L E F Y T
L R E R R MQV T A E D F K QA K E R V MK NK I E E N L - E G L Y Y MDMQA T T P T D P R V L DV M L K F Y T
L R E R R MK V T N E D F K K S K E NV L Y R K Q E G T P - E G L Y Y MDV QA T T P L D P R V L DA M L P Y L V
L R E R R MK V T S E D F K K S K E NV L Y R K N E G A P -QG L Y Y MDA QA T T P L D P R V L DK V M S Y Y V
L R E R R MR V T R T D F T T A R E K V L Y G K D E NT P -A G L Y Y L DMQA T T P MD P R V L DK M L P L F T
L R E R R MR V T Q E D L R K A K E K A L Y R K K G G I P - E G L Y Y F DY QA T T P V D P R V L DK MM P F F T
L R E R R MR V T Q E D L R K A K E K A L Y R K K G G I P - E G L Y Y F DY QA T T P V D P R V L DK MM P F F T
L R E R R MK V T N E D F K K S K E NV L Y K K Q E G T P - E G L Y Y MD F QA T T P MD P R V L DA M L P Y QV
L R E R R MQV K A E D F K A A K E R V L K NK V E E N L - E G L Y Y L DV QA T T P T D P R V L DR M L E F Y T
L R E R R MR V T HT D F K K A K E K V L Y R K T A G A P - E G L Y Y L DMQ S T T P I D P R V L DA M L P L Y T
L R E R R MK V T N E D F K K S K E S V L Y R K K E G T P - E G L Y Y L DA QA T T P MD P R V L DA M L P Y L T
L R E R R MK V T N E D F K K S K E S V L Y R K K E G T P - E G L Y Y L DA QA T T P MD P R V L DA M L P Y L T
L R E R R K T V T MK D F I S A R E K V F F S K QK MV S -A G L Y F L DV Q S T T P V D P R V L DA M L P F Y T
L R E R R MK V NQ E D F K K A K E K V MY R K K E G V P -DG L Y Y L DNNA T T MV D P E V L N S M L P Y F S
L R E R R MK V T N E D F K K S K E N F L Y K K T E G T P - E G L Y Y L DV QA T T P L D P R V L DR M L P Y L T
L R E R R MK V T N E D F K K S K E NV L Y K K Q E G T P - E G L Y Y MDV QA T T P L D P R V L DA M L P Y L I
L R E R R MQV T A QD F K E A K E R V L K NK V E E N L - E G L Y Y L DMQA T T P T D P R V L DT M L K F Y T
L R E R R MK V C QA D F I K G K E NV QY R K DK S T F - S R F Y Y MDNQA T T P L D P R V L DA M L P Y MT
L R E R R MK V T N E D F K K S K E NV L Y K K Q E G T P - E G L Y Y MDV QA T T P L D P R V L DA M L P Y L I
L R E R R MR V QMA D F R A A R E R V L R T K Q E G E P - E G L Y Y L DMQA T T P V D P R V L DA M L P L Y V
L R R R G K E I T MA D F MK A Y E K V V NV Q E P T I P -QA M F Y MDN S A T T P V R K E V V E E M L P Y L T
L R E R R MK V T N E D F K K S K E NV L Y K K Q E G T P - E G L Y Y MDV QA T T P L D P R V L DA M L P Y L I
L R E R R MK V T N E D F K K S K E NV L Y K K Q E G T P - E G L Y Y MDV QA T T P L D P R V L DA M L P Y L V
L R E R R MR V QMA D F R A A R E R V L R T K Q E G E P - E G L Y Y L DMQA T T P I D P R V L DA MM P Y F T
L R E R R MK V T HA D F K K A K E K V M F K K K E G V P - E G L Y Y MDMQA T T P V D P R V L DA M L P F Y L
L R E R R MQV T HA D F S K A K E K V L Y K K K E G V P - E G M F - - -MQA T T P L D P R V L DA M L P Y F T
L R E R R MK V T N E D F K K S K E NV L Y K K Q E G T P - E G L Y Y MDV QA T T P L D P R V L DA M L P Y L I
L R E R R MK I T Q E D F R K A K E K I L Y L K K G N I P - E G L Y Y L D F QA T T P T DY R V L DA M L P Y L T
L R E R R MQV K A DD F K S A K E R V L K NK V E E N L - E G L Y Y L DV QA T T P T D P R V L DK M L T F L T
L R E R R MK I T QA D L R K A R DK A L F QK K G N I P - E G L Y Y L D S QA T T M I D P R V L DK M L P Y MT
L R E R R MK I T QV D L R K A R DK A L Y QK K G N I P - E G L Y Y L D S QA T T M I D P R V L DK MM P Y MT
L R E R R MK I T Q L D L R K A R DK A L Y QK K G N I P - E G L Y Y L D S QA T T M I D P R V L DK MM P Y MT
L R E R R MK V T HT D F K K A K E K V M F K K K E G V P - E G L Y Y L DMQA T S P V D P R V L DA M L P Y Y L
L R E R R MK V T N E D F K K S K E NV L Y K K Q E G T P - E G L Y Y MDV QA T T P L D P R V L DA M L P Y L V
L R E R R MQV T A E D F K QA K E R V MK NK V E E N L - E G L Y Y L DMQA T T P T D P R V L DT M L K F Y T
L R E R R MR V V MDD F R QA R E K V L K T K D E G G P A G G L Y Y MD F QA T S P L DY R V L D S M L P F F T
L R E R R MK V NN E D F K K S K E NV L Y R K T E G T P - E G L Y Y L DA Q S T T P L D P R V MDA MM P Y S V
L R E R R MK V T N E D F K K S K E NV L Y K K Q E G T P - E G L Y Y MD F QA T T P MD P R V L DA M L P Y QV
L R E R R MK V T L DD F T K A K DK V L Y L K K G DT P -DG L Y Y L D F QA T T P L D F R V L DK MM P Y QT
L R E R R MQ I T QA D L MK A K E K V L F QK K G NV P -DV L Y Y L DNQA T T C V D P R V L D S MM P Y L T
L R E R R MQ I T QA D L MK A K E K V L Y QK K G NV P -DV L Y Y L DNQA T T C V D P R V L DA MM P Y L T
L R DG R L M E C QA D F R K G R E MV MY R R K E N I P - E G L Y Y L DT QA T S V L D P R V F DT M I P Y E T
L R DR R MK V C Q S D F V K G K E NV QY R K DK G R F - S K F Y Y L D L Q S T T P L D P R V L DK M L P Y MT
L R DR R MK V C QA D F V K G K E NV QY R K DK S S F - S K F Y Y L D F QA T T P L D P R V L DR M L P Y L T
L R E R R MR V T K K D F T T A R E R V I DR K N E G T P - E G L Y Y L DA Q S T T P V D P R V V DK MM P Y MT
L R E R R MR V T A E D F R T A K E R V MK NK V E E N L - E G L Y Y L DMQA T T P T D P R V L DV M L NY Y T
920
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
930
940
950
960
970
980
A Y Y G N P H S R T HA Y G W E T E E A V E K A R A QV A S L I G A -D P K E V V F T S G A T E S NN I S V K G I G R F K K H I I T T QT E HK C V
H E Y G N P H S R T H L Y G W E A E NA V E NA R NQV A K L I E A - S P K E I V F V S G A T E A NNMA V K G V MH F K K HV I T T QT E HK C V
G L Y G N P H S NT H S Y G W E T S Q E V E K A R K NV A DV I K A -D P K E I I F T S G A T E S NNMA L K G V A R F K NH I I T T R T E HK C V
G I Y G N P H S R T HA Y G W E S E K A V E QA R E Y I A K L I G A -D P K E I I F T S G A T E S NNM S I K G V A R F K K H I I T S QT E HK C V
G I Y G N P H S R T HA Y G W E S E K A V E QA R E HV A K L I G A -D P K E I I F T S G A T E S NNM S I K G V A R F K K H I I T T QT E HK C V
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -HK C V
NY Y G N P H S R T HA Y G W E S E A A M E C A R QQV A S L I G A -D P R E I I F T S G A T E S NN I A I K G V A R F K K H L I T T QT E HK C V
ND F G N P H S R T H S Y G WK A E E G V E QA R K Y V A D L I K A -D P R D I V F T S G A T E S NN L A I K G V A K F K NH I I T L QT E HK C V
ND F G N P H S R T H S Y G WK A E E G V E QA R E HV A N L I K A -D P R D I I F T S G A T E S NN L A I K G V A K F K NH I I T L QT E HK C V
G L Y G N P H S S T HA Y G W E T DK E V E K A R T Y I A DV I NA -D P K E I I F T S G A T E T NNMA I K G V P R F K K H I I T T QT E HK C V
G L Y G N P H S NT HA Y G W E T NK E V E T A R DHV A K V I R A -D P K E I I F T S G A T E S NN L A I K G V G R F K K H I I T T R T E HK C A
NY Y G N P H S R T HA Y G W E S E A A M E HA R QQV A S L I G A -D P R E I I F T S G A T E S NN I A I K G V A R F K K H L I T T QT E HK C V
S Y Y G N P H S R T HA Y G WQA E DA V E V A R QQV A DV I NA -D P R E I I F T S G A T E S NN L A V K G V G R F K K H I I T T Q I E HK C V
E QY G N P H S R T HA Y G W E A E K A V D E A R QQV A Q L V G A -Q P K D I V F T S G A T E S NNM L I K G I A K F K K H I I T T QT E HK C V
E K F G N S H S R T HG Y G W E A E E A V E NA R T N I A N L I K C - L P K E I I F T S G A T E S NNT I I R G V C D I K NH I I T T Q I E HK C V
E K F G N S H S R T HG Y G W E A E E A V E NA R T N I A N L I K C - L P K E I I F T S G A T E S NNT I I R G V C D I K NH I I T T Q I E HK C V
NY Y G N P H S R T HA Y G W E S E S A M E K A R K QV A G L I G A -D P R E I V F T S G A T E S NNM S I K G V A R F K MH I I T T Q I E HK C V
G L Y G N P H S S T H S Y G W E T DK E V E K A R K Y V A DV I NA -D P K E I I F T S G A T E S NNMA V K G V P R F K K H I I T T QT E HK C V
E NY G N P H S K T HA Y G WT S ND L V E DA R E K V S K I I G A -D S K E I I F T S G A T E S G N I A I K G V A R F K NH I I T T V T E HK C I
N F Y G N P H S R T HA Y G W E T E S A V E K A R E QV A T L I G A -D P K E I I F T S G A T E S NN I A V K G V A R F K R HV I T T QT E HK C V
NY Y G N P H S R T HA Y G W E S E T A V E K A R E QV A N L I G A - E T K E I I F T S G A T E S NN I A V K G V A R F K K HV V T T QT E HK C V
T V F G N P H S R T HR Y G WQA E A A V E K A R S QV A S L I G C -D P K E I I F T S G A T E S NN L A L K G V S G F A A H I I T L QT E HK C I
E I Y G N P N S - L HA F G QK A R K A L S D S L D I I Y E C I G A S DDDT V L I T A N S T E G NNT V L K T M L A R R NK I I V S Q I E H P S I
G C Y G N P H S R T HA Y G W E S E A A T E R A R R QV A D L I G A -D P R E V I F T S G A T E S NNMA I K G V A R F K K H I I T T QT E HK C V
NY Y G N P H S R T HA Y G W E S E A A M E R A R QQV A S L I G A -D P R E I I F T S G A T E S NN I A I K G V A R F K K H L I T T QT E HK C V
G L Y G N P H S NT H S Y G W E T NK E I E QA R K Y I A DV I K A -D P K E I I F T S G A T E S NNMA L K G V S R F R NH I I T T R T E HK C V
E E Y G N P N S R T HQY G W S A E E A V E K A R R QV A D L I G A - S P K E I F F T S G A T E C NN I A I K G V G N F K NH I I T L QT E HK C V
NY Y G N P H S R T HA Y G W E S E A A M E R A R QQV A S L I G A -D P R E I I F T S G A T E S NN I A I K G V A R F K K H L I T T QT E HK C V
G V Y G N P H S R T HA Y G W E S E K A V E DA R A HV A S L I G A -D P K E I I F T S G A T E S NNM S I K G V A R F K K H I I T T QT E HK C V
E N F G N P - S S I Y E L G K I S K HA V E NA R K R V A DA I G A - E E N E I Y F T S G G T E S DNWT V K G V A F A G K H I I T S S I E HHA V
NY Y G N P H S R T HA Y G W E S E A A V E HA R QQV A S L I G A -D P R E I I F T S G A T E S NN L A I K G V A R F K K HV I T T QT E HK C V
NY Y G N P H S R T HA Y G W E S E A A M E R A R QQV A S L I G A -D P R E I I F T S G A T E S NN I A I K G V A R F K K H L V T T QT E HK C V
NV Y G N P H S R T HA Y G W E T DK A V E E A R K H I A D L I G A -D P K E I I F T S G A T E S NNM S I K G V A R F K K H I I T S QT E HK C V
S R Y G N P H S R T H L Y G W E S DA A V E E A R A R V A S L V G A -D P R E I F F T S G A T E C NN I A V K G V MR F R R HV V T T QT E HK C V
E QY G N P H S R T HMY G W E T E DA I E K A R G E L A S L I G A -NA K E I V F T S G A T E S NNM S L K G V A R F K K H I I T T T T E HK C V
NY Y G N P H S R T HA Y G W E S E A A M E R A R QQV A S L I G A -D P R E I I F T S G A T E S NN I A I K G V A R F K K H L I T T QT E HK C V
NQY G N P H S K T H S F G W E T E K A V E NA R S Q I A N L I NT -Q P Q S I I F T S G A T E S NNA A L K G L Y G F K NH I I T T QT E HK C V
G MY G N P H S S T HA Y G W E T DK E V E K A R E Y V A A V I K A -D P K E I I F T S G A T E T NNMA I K G V P R F K K H I I T T QT E HK C V
Y I Y G NA H S R NH F F G W E S E K A V E DA R T N L L N L I NG K NNK E I I F T S G A T E S NN L A L I G I C T Y K NH I I T S Q I E HK C I
Y I Y G NA H S R NH F F G W E S E E A V E DA R K N I L H L I NG K NNK E I I F T S G A T E S NN L A L I G I C T Y K NH I I T S Q I E HK C I
Y I Y G NA H S R NH F F G W E S E QA V E DA R A N L I K L L NG NNNK E I I F T S G A T E S NN L A L I G T C T Y K NH I I T S Q I E HK C I
A R Y G N P H S R T H L Y G W E S DQA V E T A R S Q I A D L I G A - S P K E I V F T S G A T E S NN I S V K G V I K F K R HV V T T QT E HK C V
NY Y G N P H S R T HA Y G W E S E A A M E R A R QQV A S L I G A -D P R E I I F T S G A T E S NN I A I K G V A R F K K H L V T T QT E HK C V
G L Y G N P H S NT H S Y G W E T NT A V E NA R A HV A K M I NA -D P K E I I F T S G A T E S NNMV L K G V P R F K K H I I T T R T E HK C V
G I Y G N P H S R T HA Y G W E A E K A V E NA R Q E I A S V I NA -D P R E I I F T S G A T E S NNA I L K G V A R F K K H L V S V QT E HK C V
A Y Y G N P H S R T H S Y G W E S DDA V E HA R K QV A N L I G A -DA R E I I F T S G A T E S NN I S V K G T A R F K K HV I T T QT E HK C V
NY Y G N P H S R T HA Y G W E S E T A M E T A R K QV A D L I G A -D P R E I I F T S G A T E S NNMA I K G V A R F K R HV I T T QT E HK C V
NMY G N P H S R S H E Y G WA T E K A T E DA R A QV A D L I G A -D P K E I T F T S G A T E S NNQA L K G L A A F K K H I I T T Q I E HK C I
HA F G N P H S R T H S Y G W E A E K A V E T A R A DV A N L I NC - E S K NV I F T S G A T E S NN L A I K G S K S F K NHV I T T Q I E HK C V
HA F G N P H S R T H S Y G W E A E K A V E T A R A D I A N L I NC - E S K NV I F T S G A T E S NN L A I K G S K S F K NHV I T T Q I E HK C V
Y V HG NA H S K QHG F G Q E A MA A V E K A R K S V A D L I NA -K P N E I I F T S G A T E C NN I A I K G A MG Y K K HV I V S S I E HK C V
E MY G N P H S R T H S Y G WT A E E A V E K A R T QV A D L I R A - S P K G V F F T S G A T E S NN I A I K G V A NY K NH L I T L QT E HK C V
E R Y G N P H S R T HR Y G WT A E DA V E K A R A E V A D L I G T - S P K G V F F T S G A T E S NN I A I K G V A Y Y K NH I I T L QT E HK C V
NQY G N P H S R T HA Y G W E S E K G V E E G R E H I A S L I G A -D P K E I I F T S G A T E S NNMA I K G V A H F K NH I I T T QT E HK C V
DMY G N P H S R T H S Y G W E T DT A V E K A R E E I A A L I G A -D P K E I I F T S G A T E S NNMV I K G I A R F K R H I I T T QT E HK C I
990
LD SCR A L EG
L D S C R H L QQ
L E A A R S MK D
L D S C R H L QD
L D S C R H L QD
LD SCR A L EG
LD SCR S L EA
LD SCR Y L EN
LD SCR Y L EN
L D S A R HMQD
L EAAR GMI N
LD SCR S L EA
LD SCR A L EN
LD SCR WL ST
L ST LR E L E L
L ST LR E L E L
LD SCR V L ET
L D S A R HMQD
LD SCR H L EM
LD SCR A L EN
LD SCR A L EN
L DT C R N L E E
S E S EK Y LK E
LD SCR S L EA
LD SCR S L EA
L E A A R A MK N
LD SCR Y L EM
LD SCR S L EA
L D S C R H L QD
L HA C A W L E G
LD SCR S L EA
LD SCR S L EA
L D S C R H L QD
L D S C R Y L QQ
LD SCR Q L ER
LD SCR S L EA
LD SCR Y L E E
L D S A R HMQD
L QT C R F L QT
L QT C R Y L QT
L QT C R Y L QT
L D S C R H L QQ
LD SCR S L EA
L E A A R A MMK
LD S LR A LQ E
LD SCR V L EG
LD SCR V L E S
L DT C R N L E E
L QC C R Q L E N
L QC C R Q L E N
I E S A R A L QK
LD SCR Y L EM
LD SCR Y L EM
LD SCR R LQ E
L D S C R Y L QD
1000
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1010
1020
1030
1040
1050
1060
1070
E G F HV T Y L P V Q S NG L I S M E E L E K A I T P - E T S L V S I MT V NN E I G V K Q P I A E I G R L C V F F HT DA A QA V G K I P L DV NK MN I D L M S I
E G F E V T Y L P V K T DG L V D L E M L R E A I R P -DT G L V S I MA V NN E I G V V Q P M E E I G M I C V P F HT DA A QA I G K I P V DV K K WNV A L M S M
E G F DV T F L NV N E DG L V S L E E L E QA I R P - E T S L V S V M S V NN E I G V V Q P I K E I G A I C V F F H S DA A QA Y G K I P I DV D E MN I D L L S I
E G F E V T Y L P V QNNG L I R M E D L E A A I R P -DT A L V S I MA V NN E I G V I Q P L E E I G K L C V F F HT DA A QA V G K I P L DV NK L N I D L M S I
E G F DV T Y L P V Q S NG L I R M E E L E A A I R P -DT A L V S I MA V NN E I G V I Q P M E E I G K L C I F F HT DG A QA V G K I P L DV NK L N I D L M S I
E G F R I T Y L P V QQNG I I N L K D L E DA I T P - E T S L V S I MT V NN E I G V R Q P I E A I G A I C V F F HT DA A QA V G K V P L DV NT MN I D L M S I
E G F K V T Y L P V K K S G I I D L K E L E A A I Q P -DT S L V S V MT V NN E I G V K Q P I K E I G Q I C V Y F HT DA A QA V G K I P L DV NDMK I D L M S I
E G F K V T Y L P V DK G G MV DM E Q L E Q S I T P - E T C L V S I M F V NN E I G V V Q P I K Q I G E L C V Y F HT DA A QA T G K V P I DV ND L K I D L M S I
E G F K V T Y L P V DK G G MV DM E Q L T Q S I T A - E T C L V S I M F V NN E I G V MQ P I K Q I G E L C V Y F HT DA A QA T G K V P I DV N E MK I D L M S I
E G F E V T Y L P V S S E G L I N L DD L K K A I R K -DT V L V S I MA V NN E I G V I Q P L K E I G K I C V F F HT DA A QA Y G K I P I DV N E MN I D L L S I
E G F DV T F L S V DNQG L I DMK E L E E A I R P -DT C L V S V MA V NN E I G V MQ P L K E I G A L C I Y F HT DA A QA Y G K V P I DV N E MN I D L L S V
E G F QV T Y L P V K K S G I I D L K E L E S A I Q P -DT S L V S V MT V NN E I G V K Q P I A E I G Q I C V Y F HT DA A QA V G K I P L DV NDMK I D L M S I
E G F K V T Y L P V K P NG I V D L K V L E E S F Q P -DT S L V S I I F V NN E I G - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - QG F E V T Y L P V L P NG L V S I N E L K A A L R P -DT S L V S I MA V NN E I G V I Q P L A E I S QA I P L F HT DA A QA V G K I P I DV E A L G I DA M S I
K G F R V T Y L K V NNK G L I S L E E L E K S I I P G E T I L A S I MHV NN E I G V I Q P MN L I G E I C V L F H S DV A QG L G K I N I DV DK WNA D F L S L
K G F R V T Y L K V NNK G L I S L E E L E K S I I P G E T I L A S I MHV NN E I G V I Q P MN L I G E I C V L F H S DV A QG L G K I N I DV DK WNA D F L S L
E G F D I T Y L P V K S NG L I D L K Q L E DT I R P -DT S L V S I MA I NN E I G V K Q P V K E I G H L C V F F HT DA A QA V G K I P V DV T DWK V D L M S I
E G F E V T Y L P V N E E G L I S L DD L R K S I R K -DT S L V S I MA V NN E I G V V Q P L K E I G K I C I F F HT DA A QA Y G K I D I DV N E MN I D L M S I
E G F K V T Y L P V G E NG L V D L E L L K NT I T P -QT S L V T I MA V NN E I G V V Q P I K E I G K I C V F F HT DA A QA V G K I P I DV NDMN I D L L S I
E G F K V T Y L P V L A NG L I D L QQ L E E T I T S - E T S L V S I MT V NN E I G V R Q P V D E I G K L C V F F HT DA A QA V G K V P L DV NA MN I D L M S I
E G F T V T Y L P V QT NG I I D L K Q L E E A L T P - E T S L V S I MA V NN E I G V K Q P I D E I G R L C V F F HT DA A QA V G K I P MDV NA MN I D L M S I
NG V E V T Y L P V G NDG V V D I DDV K K S I K E -NT V L V S I G A V N S E I G T V Q P L K E I G M L C V L F HT DA A QG V G K I Q I DV N E MN I D L L S M
R G I E V I K M P V N E DG V V D P K D L E R L I DD -K T A L V S C MWV NN E T G L I M P V E E L C K I A A L F H S DA T QA MG K I K V S V K DV P V DY L T F
E G F Q I T Y L P V QK NG L I D L K E L E A A F Q P -DT S L V S V MA V NN E I G V K Q P I R D I G E I C V F F HT DA A QA V G K I P L DV ND S K I D L M S I
E G F QV T Y L P V QK S G I I D L K E L E A A I Q P -DT S L V S V MT V NN E I G V K Q P I A E I G R I C V Y F HT DA A QA V G K I P L DV NDMK I D L M S I
E G Y E I T F L NV D E QG L I N L E E L E A A I R P - E T C L V S V MA V NN E I G V MQ P L K E I G E L C V F F HT DA A QA Y G K I P I DV N E MK I D L M S I
E G F E V T Y L P V QK NG I L D L K V L E A A I K P -T T C L V S C MA A HN E I G V L Q P I R E I G A L C V L F HT DA A QA L G K V K V DV NA DN I D L M S M
E G F QV T Y L P V QK S G I I D L K E L E A A I Q P -DT S L V S V MT V NN E I G V K Q P I A E I G Q I C V Y F HT DA A QA V G K I P L DV NDMK I D L M S I
E G F E V T Y L P V QN S G L V D L K E L E A A MR P - E T A L V S I MT V NN E I G V I Q P V E E I G K MC I F F HT DA A QA V G K I P MDV NA MN I D L M S I
QG F E V T Y L P V DR Y G MV S P E E L K NA I R D -DT I L I S I M L A NN E I G T I Q P V E E I G K I S I Y F HT DA V QA I G HV P I DV K K MNV D L L S L
E G F QV T Y L P V QK S G I I D L K E L E A A I Q P -DT S L V S I MT V NN E I G V K Q P I A D I G R I C V Y F HT DA A QA I G K I P L NV NDMK I D L M S I
E G F R V T Y L P V QK S G I I D L K E L E A A I Q P -DT S L V S V MT V NN E I G V K Q P I A E I R Q I C V Y F HT DA A QA V G K I P L DV NDMK I D L M S I
E G F E V T Y L P V K S S G L I DMA E L E A A I R P -DT A I V S I MA V NN E I G V I Q P L E E I G K L C I F F HT DA A QA V G K I P V DV NA MN I D L M S I
E G F E V T Y L P V R P DG L V DV A Q L A DA I R P -DT G L V S V MA V NN E I G V V Q P L E E I G R I C V P F HT DA A QA L G K I P I DV NQMG I G L M S L
E G F DV T Y L P V K E NG L V D L K E L E A A MR D -DT A I V S V MA V NN E I G V I Q P L K A I G E L C I F F HT DG A QA V G K V P MDV NDMN I D L M S I
E G F QV T Y L P V QK S G I I D L K E L E A A I Q P -DT S L V S V MT V NN E I G V K Q P I A E I G R I C V Y F HT DA A QA V G K I P L DV NDMK I D L M S I
K G V E V T Y L P V D S NG L I S L QQ L Q E S I K S -NT L C V S V M L V NN E I G V I QN L K E I S R I C V Y V H S DMA QA I A K I P V DV QD L D I D L G S I
E G F DV T Y L P V D E HG L I S L DD L K A A I R K -DT I L V S V MA V NN E I G V V Q P L K E I G K I C I F F HT DA A QA Y G K I D I DV NDMN I D L L S I
K G F E V T Y L K P DT NG L V K L DD I K N S I K D -NT I MA S F I F V NN E I G V I QD I E N I G N L C I L F HT DA S QA A G K V P I DV QK MN I D L M S M
K G F E V T Y L K P E P NG I V K L E D I E K N I K E -NT I MA S F I HV NN E I G V I QD I E N I G L L C V I F HT DA S QA I G K I P I DV QK MN I D L L S M
K G F E V T Y L K P DA NG L I K L E D L K N S I K E -NT I L A S F I Y V NN E I G V I QD I E N I G K I C I I F HT DA S QA V G K I K I DV QK L N I D L L S L
E G F E V T Y L P V G NDG I V D L E K L K G S I R P -DT G L V S V MA V NN E I G V I Q P M E E I G E I C V P F HT DA A QA L G K I P I DV DK WNV S L M S L
E G F R V T Y L P V QK S G I I D L K E L E A A I Q P -DT S L V S V MT V NN E I G V K Q P I A E I G Q I C L Y F HT DA A QA V G K I P L DV NDMK I D L M S I
E G F E V T F L NV DDQG L I D L K E L E DA I R P -DT C L V S V MA V NN E I G V I Q P I K E I G A I C I Y F HT DA A QA Y G K I H I DV N E MN I D L L S I
E G F E V T F L P V QT NG L I N L D E L R DA I R P -DT V C V S V MA V NN E I G V C Q P L E E I G K I C V F F H S DA A QG Y G K I D I DV NR MN I D L M S I
E G F D I T Y L P V K P NG I I D L K E L E A A F R P -DT V L C S I MA I NN E I G V K Q P MK Q I G E MC V F F HT DA A X A V G K I P V DV NDMK I D L M S I
E G F S V T Y L P V QK NG L V D L E L L E A S I R P -DT S L L S V MT V NN E I G V QQ P I D E I G R I C V F L HT DA A QA V G K I P I NV S DWK V D L M S I
QG Y E I T Y L P V QK NG L V D L E V F K NA I R P -DT L V A S I I L V HN E I G V I QD I K T I G K I C V F F HT DA A QA L G K I P I NV D E MN I D L M S M
E G Y S V T Y L K P DK Y G M I L P D L V R K N I R P - E T F L C S V I HV NN E I G V I QN I S E I G R I C V I F HT DA A Q S F G K L P I D L K N L DV D L L S I
E G Y S V T Y L K P DK Y G M I L P E E V R K N I R P - E T F L C S V I HV NN E I G V I QD I A E I G K V C V I F HT DA A Q S F G K L P I D L K N L E V D L L S I
E G F DA T F L QV G K DG R V D P K E V A K N I R P -DT G L V S C M L V NN E I G S I N P V Q E I S K I C V W F HT DA A QG F G K I P I DV K K I G A N F M S I
E G F E V T Y L P V E K NG I V N L QK L E E A I R P -T T A L V S C MY V NN E I G V I Q P I G E I G K I C V L F HT DA A QA V G K L D I DV DR DN I D L M S V
DG F E V T Y L P V E K NG L V N L QK I E E A I R P -T T A L V S C MY V HN E I G V I Q P I S E I G N L C V L F HT DA A QA L G K V S I DV E R DN I D L M S L
E G F E V T Y L P V Q S NG L I D L K Q L E E A L R P -T T A L V S I MT V NN E I G V I Q P I K E I G Q L L P F F HT DA A QA A G K I R L DV N E L G I D L M S L
E G F E V T Y L P V L S S G L I DMK Q L E A A I R P -DT A L V S I MA V NN E I G V I Q P I A E I G A L C V F F HT DA A QA V G K I P I DV NA DK I DV M S I
1090
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1100
1110
1120
1130
1140
1150
1160
S G HK I Y G P K G I G A L Y V -R R -K P R V R V E A I Q S G G G Q E R G L R S G T V P T P L A V G L G A A C E I A A R E MA Y DHR WM E F L S K R L NG D - - P
S A HK I Y G P K G V G A L Y V -R R -R P R I R L E P L MNG G G Q E R G L R S G T G A T QQ I V G F G A A C E L A MK E M E Y D E K W I K G L Q E R L NG S - -M
S S HK I Y G P K G I G A L Y V -R R -R P R V R M E P L L S G G G Q E R G F R S G T L P P P L V V G L G HA A K L MV E E Y E Y D S A HV R R L S DR L NG S - -A
S S HK I Y G P K G I G A C Y V -R R -R P R V R L E P I I S G G G Q E R G L R S G T L A P H L V V G F G E A C R I A S QDM E Y DR K HV E R L S K R L NG D - -A
S S HK I Y G P K G MG A C Y V -R R -R P R V R L E P I I S G G G Q E R G L R S G T I A P H L V V G F G E A C R I A Y E DM E Y D S K H I A R L S K R L NG D - - P
S G HK V Y G P K G V G A L Y I -R R -R P R V R V E P I Q S G G G Q E R G MR S G T V P T P L V V G L - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - S G HK I Y G P K G V G A I Y I -R R -R P R V R V E A L Q S G G G Q E R G MR S G T V P T P L V V G L G A A C E V A QQ E M E Y DHK R I S K L A E R L NG D - - P
S G HK I Y G P K G A G A L Y V -R R -R P R V R V E A QM S G G G Q E R G L R S G T V A A P L C I G L G E A A R I A G R E M E MDK A HV E R L S R M L NV D - - E
S A HK I Y G P K G A G A L Y V -R R -R P R V R I E A QM S G G G Q E R G L R S G T V A A P L C I G L G E A A K I A DK E MA MDK A HV E R L S QM L NG D - -A
S S HK I Y G P K G I G A C Y V -R R -R P R V R L D P I I T G G G Q E R G L R S G T L A P P L V A G F G E A A R L MK Q E S S F DK R H I E K L S S K L NG C NDA
S S HK I Y G P K G I G A L Y V -R R -R P R V R L E P L L S G G G Q E R G L R S G T L A P P L V A G F G E A A R L MH E E Y NA D I A H I DK L S S K L NG S - -A
S G HK I Y G P K G V G A I Y I -R R -R P R V R V E A L Q S G G G Q E R G MR S G T V P T P L V V G L G A A C E V A QQ E M E Y DHK R I S K L A DR L NG D - - P
----------------------------------------------------------------------------------S G HK L Y G P K G V G A A Y V -R R -R P R V R L E P L I HG G G Q E R G L R S G T V A A P L V V G L G E A C R I A E N E MA A DHA R I K A L S DR L NG D - - S A HK V Y G P K G I G A F Y I -R S -K P R R R I K P L I F G G G Q E R G MR S G T M P V P L A V G F G E A C K I A S S E MN S D S I HV K S L Y DK L NG C - -G
S A HK V Y G P K G I G A F Y I -R S -K P R R R I K P L I F G G G Q E R G MR S G T M P V P L A V G F G E A C K I A S S E MN S D S I HV K S L Y DK L NG C - -G
S A HK I Y G P K G V G A L F V -R R -R P R V R L E P L Q S G G G Q E R G L R S G T V P T P L A V G L G A A C E I A QQ E L E Y DHK R V S L L A NR L NG D - - P
S S HK I Y G P K G I G A C Y V -R R -R P R V R L D P I V T G G G Q E R G L R S G T L A P P L V A G F G E A S R L MK E E MD - - - - - - - - - - - - - - - - - - S G HK I Y G P K G V G A L F V -R R -R P R V R I E P I T T G G G Q E R G I R S G T V P S T L A V G L G A A C D I A L K E MNHDA A WV K Y L Y DR L NG D - - L
S G HK I Y G P K G V G A L Y V -R R -R P R V R L E P I Q S G G G Q E R G L R S G T V P A P L A V G L G A A A E L S L R E MDY DK K WV D F L S NR L NG D - -A
S G HK I Y G P K G V G A L Y V -R R -R P R V R L E P I Q S G G G Q E R G L R S G T V P A S L A V G L G A A A E L S QQ E M E Y DK K W I D F L S NR L NG D - -A
C A HK I Y G P K G I G A L Y V -R R -R P R V R MV P L I NG G G Q E R G L R S G T V A S P L V V G F G K A A E I C S K E MK R D F E H I K E L S K K L NG S - - T A HK F HG P K G V G A L F I -R A G K P - - - I T P L L HG G E QMG G L R S G T I DT P S V V G MA V A L K K A T HD I N I E NT Y V R K L R DK L V G K - - P
S G HK I Y G P K G V G A I Y V -R R -R P R V R L E P L Q S G G G Q E R G L R S G T V P T P L A V G L G A A C E V A Q E E M E Y DHK R I S Q L A E R L NG D - -R
S G HK I Y G P K G V G A I Y I -R R -R P R V R V E A L Q S G G G Q E R G MR S G T V P T P L V V G L G A A C E V A QQ E M E Y DHK R I S K L S E R L NG D - - P
S S HK I Y G P K G I G A I Y V -R R -K P R V R L D P L I S G G G Q E R G L R S G T L A P P L V A G F G E A A R L MMK E Y E ND S NH I K R L S DK L NG S - -A
S S HK V Y G P K G C G A L Y V -R R -R P R V R L R S P V S G G G Q E R G V R S G T V A A A L V V G MG A A C E V A MK E WK R DA A HT E R L Q E R L NG D - - L
S G HK I Y G P K G V G A I Y I -R R -R P R V R V E A L Q S G G G Q E R G MR S G T V P T P L V V G L G A A C E V A QQ E M E Y DHK R I S K L A E R L NG D - - P
S G HK I Y G P K G I G A C Y V -R R -R P R V R L D P I I S G G G Q E R G L R S G T L A P P L I V G F G E A C R I A K Q E M E Y D S K R V K Y L S DR L NG H - - P
S G HK F G G P K G C G A L Y I -R K - - -G T K I E A F L HG G A Q E R K R R A G T E NV P S I V G L G K A I G L A T G E M E E T NK P L L E MR E R L NG H - - P
S G HK L Y G P K G V G A I Y I -R R -R P R V R V E A L Q S G G G Q E R G MR S G T V P T P L V V G L G A A C E V A Q E E M E NDHK R I S M L A E R L NG D - - P
S G HK L Y G P K G V G A I Y I -R R -R P R V R V E A L Q S G G G Q E R G MR S G T V P T P L V V G L G A A C E L A QQ E M E Y DHK R I S K L A E R L NG D - - P
S S HK I Y G P K G I G A C Y V -R R -R P R V R L D P I I S G G G Q E R G L R S G T L A P P L V V G F G E A C R I A K E E M P Y D S K R I K H L S DR L NG D - - P
S A HK I Y G P K G V G A L Y L -R R -R P R I R V E P QM S G G G Q E R G I R S G T V P T P L V V G F G A A C E I A A K E MDY DHR R A S V L QQR L NG S - -M
S G HK F Y G P K G I G A L Y V -R R -R P R V R M E P I I NG G G Q E R G L R S G T L P T P L I V G I G E A A R V A QK E L QR D E E HV NR L A K R L NG D - -R
S G HK I Y G P K G V G A I Y I -R R -R P R V R V E A L Q S G G G Q E R G MR S G T V P T P L V V G L G A A C E V A QQ E M E Y DHK R I S K L S E R L NG D - - P
S A HK L Y G P K G I G A L Y V -R R -K P R V R L QQ I I HG G G Q E R G L R S G T L A P H L C V G F G K A A E I A L T E L P Y D I QHV DK L Y NR L NG S - - L
S S HK I Y G P MG I G A C Y V -R R -R P R V R L D P I I T G G G Q E R G L R S G T L S P P L V A G F G E A A R L MK E E MDY DK A H I T R L S NK L NG S NN P
S G HK L Y G P K G I G A L Y I K R K -K P N I R L NA L I HG G G Q E R G L R S G T L P T H L I V G F G E A A K V C S L E MNR D E K K V R Y F F NY V NG C - -Q
S G HK L Y G P K G I G A L Y I K R K -K P N L R L NA L I HG G G Q E R G L R S G T L P T H L I V G L G E A A N L G S I E MNR DHK K MK F F F DY V NG C - -Q
S S HK L Y G P K G V G A L Y I K R K -K P N I R L NA I I HG G G Q E R G L R S G T L P T H L I V G L G E A A N I C L S E MDR DNK K MN F F F NY V NG C - -Q
S G HK I Y G P K G V G A L Y M -R R -R P R I R V E P QMNG G G Q E R G I R S G T V P T P L V V G MG A A C E L A K K E M E Y DDK R I R A L H E R MNG S - -V
S G HK L Y G P K G V G A I Y I -R R -R P R V R V E A L Q S G G G Q E R G MR S G T V P T P L V V G L G A A C E L A QQ E M E Y DHK R I S K L A E R L NG D - - P
S S HK I Y G P K G I G A I Y V -R R -R P R V R L E P L L S G G G Q E R G L R S G T L A P P L V A G F G E A A R L MK K E F DNDQA H I K R L S DK L NG S - - P
S A HK I Y G P K G I G A A Y V -R R -R P R V R L E P L I S G G G Q E R G L R S G T L A P S QV V G F G T A A R I C K E E MK Y DY A H I S K L S QR L NG D - - P
S G HK I Y G P K G I G A L Y V -R R -R P R V R V E A L Q S G G G Q E R G MR S G T L P A P L V V G L G A A C E V S QQ E M E Y DHK R I S A L S E R L NG D - - P
S G HK I Y G P K G V G A L Y V -R R -R P R V R L E P L Q S G G G Q E R G L R S G T V P T P L A V G L G A A C S V A QQ E I E Y DHQR V S M L A NR L NG D - - P
S S HK V Y G P K G I G G L Y V -R R -K P K V R I L P I I NG G G Q E R G L R S G T L A P H L C V G F G E A C E I A K R E MDNDK K H I QR L S E K F NG D - -K
S G HK I Y G P K G V G A L F V -R T -K P R I R L Q P I I DG G G Q E R G L R S G T L P T A L V V G L G T A A K I A K M E MK R DQ L HM E N L F F K L NG S I K P
S G HK I Y G P K G V G A L F V -R T -K P R I R L Q P I I DG G G Q E R G L R S G T L P T A L V V G L G T A A K I A K M E M E R DHR HM E N L F F K L NG S I K P
S G HK I HG P K G I G A L Y V - S S -R P R S R V E P I I NG G G Q E R N I R S G T L A V P L I V G L G K A A E I A K R E MK Y D S P Y I E S L G K H L NG S - - L
S S HK I Y G P K G C G A L Y M -R R -R P R V R V R S P V S G G G Q E R G V R S G T I A T P L A V G L G A A C E L A K V E MK R D S E R I A Q L S K R L NG D - -V
S S HK I Y G P K G C G A L Y M -R R -R P R V R V R S P V S G G G Q E R G V R S G T V A T A QV V G MG A A C A I A K V E M E R D S A H I S R L S K R L NG D - - L
S S HK L Y G P MG I G A C Y I -R R -R P R V R L E P I I NG G G Q E R G L R S G T L A P P L I A G F G E A A R L A K Q E L A Y DHA H I S K L S QR L NG D - - S G HK L Y G P MG I G A C Y V -R R -R P R V R L E P I I T G G G Q E R G L R S G T L A A P L V A G F G E A A R L C R Q E M P Y DT A H I K K L S DK L NG D - -A
1170
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1180
1190
1200
1210
1220
1230
1240
V Q S Y P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G I G R F T T I E E V DY T A E K C I
D S R Y V G N L N L S F A Y V E G E S L L MG L - -K E V A V S S G S A C T S A S L E P S Y V L R A L G V D E DMA HT S I R F G I G R F T T K E E I DK A V E L T V
DHR Y P G C V NV S F A F V E G E S L L MA L - -R D I A L S S G S A C T S A S L E P S Y V L HA I G R DDA L A H S S I R F G I G R F T T E A E V DY V I K A I T
E R HY P G C V NV S F A Y I E G E S L L MA L - -K D I A L S S G S A C T S A S L E P S Y V L R A L G S S D E S A H S S I R F G I G R F T T D S E I DY V L K A V Q
DR HY P G C V N I S F A Y I E G E S L L MA L - -K D I A L S S G S A C T S A S L E P S Y V L R A L G S S D E S A H S S I R F G I G R F T T D S E I DY V L K A V Q
- - - - - - - - - - - - - - - -G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G L G R F T T I E E V DY T A E K T I
E HHY P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G I G R F T T E E E V DY T V E K C I
K HA Y P G C V N L S F A Y V E G E S L L MA L - -K S I A L S S G S A C T S A S L E P S Y V L R A I G S E E D L A H S S I R F G L G R F T T E E E V K HT I D L C V
R HA Y P G C V N L S F A Y V E G E S L L MA L - -K S I A L S S G S A C T S A S L E P S Y V L R A I G S E E D L A H S S I R F G L G R F T T D E E V K HT I D L C I
K S QY P G C V NV S F A Y I E G E S L L MA L - -K D I A L S S G S A C T S A S L E P S Y V L HA L G A DDA L A H S S I R F G I G R F T T E A E V DY V I QA I N
E K R Y P G C V NV S F A Y V E G E S L L MA L - -R D I A L S S G S A C T S A S L E P S Y V L HA L G K DDA L A H S S I R F G I G R F T T E E E V DY V L K A I T
E HHY P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G I G R F T T E E E V DY T V E K C I
----------------------------------------------------------------------------------V NG Y P G C V N L S F S Y V E G E S L L MA L - -K D I A L S S G S A C T S A S L E P S Y V L R A L G A A E DMA H S S L R F G I G R F T T E E E I D L V V QR I V
V NR M F G N L N L S F T G V E G E S L MMK L - -Y S L A L S S G S A C T S A S L E P S Y V L R A I G V G E DV A HT S I R F G L G R F T K H E DV DK A V K E I V
V NR M F G N L N L S F T G V E G E S L MMK L - -Y S L A L S S G S A C T S A S L E P S Y V L R A I G V G E DV A HT S I R F G L G R F T K H E DV DK A V K E I V
DQR Y P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G A D E D L A H S S I R F G I G R F T T E E E V DY T A E K C I
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -V C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - I F V - - - - - - - - - - - - NA R Y Y G N L N I S F S Y V E G E S L L MA I - -K DV A C S S G S A C T S S S L E P S Y V L R S L G V E E DMA H S S I R F G I G R F T T E Q E I DY T I E I L K
K A T Y NG C L N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G I G R F T T V E E V DY T A DK C I
V A T Y NG C L N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G A D E D L A H S S I R F G L G R F T T V E E V DY T A DK C I
E K G F P G C V NV S F P F V E G E S L L MH L - -K D I A L S S G S A C T S A S L E P S Y V L R A L G R DD E L A H S S I R F G I G R F T MA K E I D I V A NK T V
E L R V P NT I L V A F K G V E G E A M L WD L NK HG I A A S T G S A C A S E S L QA N P T F K A MK F G E D L S HT G I R L S L S R F NT E E E I DY T I D I I K
E HR Y P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G A D E D L A H S S I R F G I G R F T T E E E I DY T V QK C I
K HHY P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G I G R F T T E E E V DY T V E K C I
DHR Y P G C V N I S F A Y V E G E S L L MA L - -R D I A L S S G S A C T S A S L E P S Y V L HA L G K DDA L A H S S I R F G I G R F T T D E E I DY V I K A I T
K HR L P G N L N I S F S C V E G E S L L MG M - -R DV A V S S G S A C T S A S L E P S Y V L R A L G V DA E NA HT S I R F G I G R F T T A K E V D L V I E E C V
E HHY P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G I G R F T T E E E V DY T V E K C I
DH F Y P G C V NV S F A Y V E G E S L L MA L - -K D I A L S S G S A C T S A S L E P S Y V L R A L G N S D E S A H S S I R F G I G R F T T E R E I DY V L K A V Q
T E R L A NNV NV T F E Y I E G E S L L L L L NA K G I F A S T G S A C N S T S L E P S HV L T A C G V P H E I V HG S L R L S L G R MNT L E DV DR V L E V L P
QQHY P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G I G R F T T E E E V DY T A E K C I
K QHY P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V F R A I G T D E D L A H S S I R F G I G R F T T E E E V DY T A E K C I
NH F Y P G C V NV S F A Y V E G E S L L MA L - -K D I A L S S G S A C T S A S L E P S Y V L R A L G N S D E S A H S S I R F G I G R F T T E Q E I DY V L K A V T
E HR Y P G N L N L S F A Y V E G E S L L MG L - -K E V A V S S G S A C T S A S L E P S Y V L R A L G V E E DMA HT S I R F G I G R F T T E E E V DR A I E L T V
E A R Y HG NV NM S F A Y V E G E S M L MG L - -K E I A V S S G S A C T S A S L E P S Y V L R A L G V N E E MA HT S V R Y G L G R F T T E A E V DR A I E A T V
K HHY P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G I G R F T T E E E V DY T V E K C I
E HR Y K G N L NV S F A F V E G E S L I MA I - -K QV A V S S G S A C T S A S L E P S Y V L R A L G V Q E DMA HT S L R I G I G R F T T E K E V D F L L DQ L S
E S QY P G C V N I S F A Y I E G E S L L MA L - -K D I A L S S G S A C T S A S L E P S Y V L HA L G A DDA L A H S S I R F G I G R F T T E E E V DY V I K A I N
I NR Y Y G NMN I S F L F V E G E S L L M S L - -N E I A L S S G S A C T S S T L E P S Y V L R S I G I S E D I A HT S I R I G F NR F T T F F E V QQ L C I N L V
T NR Y F G NMNV S F L F V E G E S L L M S L - -N E I A L S S G S A C T S S T L E P S Y V L R S I G I S E D I A HT S I R I G F NR F T T F F E V QQ L C E N L V
I NR Y F G NMN I S F L F V E G E S L L M S L - -ND I A L S S G S A C T S S T L E P S Y V L R S I G I T E E I A HT S I R I G F NR F T T F F E V QQ L C K N L V
E R R Y A G N L N L S F A Y V E G E S L L MG L - -K DV A V S S G S A C T S A S L E P S Y V L R A L G V D E DMA HT S I R F G I G R F T T E E E I DR A I E L T V
K QHY P G C I N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G T D E D L A H S S I R F G I G R F T T E E E V DY T V QK C I
DHR Y P G C V NV S F A Y V E G E S L L MA L - -R D I A L S S G S A C T S A S L E P S Y V L HA L G K DDA L A H S S I R F G I G R F S T E E E V DY V V K A V S
K S R Y P G C V N I S F NY V E G E S L L MG L - -K N I A L S S G S A C T S A S L E P S Y V L R A I G Q S D E NA H S S I R F G I G R F T T E A E I DY A I E NV S
D E T Y P G C V N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G A Q E D L A H S S I R F G I S R F T T E E E V DY T A E K C V
NQR Y P G C V N L S F A Y V E G E S L L MA L - -K DV A L S S G S A C T S A S L E P S Y V L R A I G A D E D L A H S S I R F G I G R HHR R R S G L HG R K MY L
DQR Y V G N I N I S F E F V E G E S L MMG I - -K QC A V S S G S A C T S A S L E P S Y V L R A L G V N E E L A HT S L R I G F G R F T T D E E V DY L I N L L S
G E R Y F G N L NM S F E F I E G E S L L M S L - - S N F A L S S G S A C T S A S L E P S Y V L R S L DV S E E L A HT S I R F G L G R F T M E S E V DMA L E S I T
G QR Y F G N L NM S F E F I E G E S L L M S L - - S N F A L S S G S A C T S A S L E P S Y V L R S L DV S E E L A HT S I R F G MG R F T I E S E V DMA L D S I T
E HR W F G C V N I S F E A V E G E S L MA T I - - P N F G V S S G S A C T S A S L E P S Y V L K G I G V G D E L A HT S L R I G I S K F T T R E E V DQ F V E L L E
E R R F HG N L N I S F A C V E G E S L L MG M - -K K V A V S S G S A C T S A S L E P S Y V L R A L G I DA E NA HT S I R F G I G R F T T E R E V DV T V E E C A
E K R Y P G N L N I S F S C V E G E S L L MG M - -K NV A V S S G S A C T S A S L E P S Y V L R A L G I DA E NA HT S I R F G I G R F T T E R E I DV T I E E C V
QNG Y P G C L N L T F QY V E G E S L L MA L - -K D I C L S S G S A C T S A S L E P S Y V L R A L G L ND E NA H S S L R F G I G R F T T E E E V DY V A DK I I
E HHY P G C V N I S F A Y V E G E S L L MA L - -K D I A L S S G S A C T S A S L E P S Y V L R A L G A DDA L A H S S I R F G I G R F T T E A E V DY V L K A V Q
1250
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1260
1270
1280
1290
1300
1310
1320
K HV T R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S A G G K F R I S L G L P V G A V I NC A DNT G A K N L Y V I A V HG I R G R L NR L P A A
K QV E K L R E M S P L - - - - - - - -Y E M S K R - - - - - -G R G G T S G NK F R M S L G L P V A A T V NC A DNT G A K N L Y I I S V K G I K G R L NR L P S A
E R V E F L R E L S P L - - - - - - - -W E M - - - - - - - - - S G NG A QG T K F R I S L G L P T G A I MNC A DN S G A R N L Y I MA V K G S G S R L NR L P A A
DR V H F L R E L S P L - - - - - - - -W E - - - - - - - - - - - - - - - - - - - - -MT L G L P C G A V MNC C DN S G A R N L Y I I S V K G V G A R L NR L P A A
DR V H F L R E L S P L - - - - - - - -W E M S A R - - - - - -G R G G A S G NK L K MT L G L P C G A V L NC C DN S G A R N L Y I I S V K G I G A R L NR L P A A
R HV E R L R E M S P L - - - - - - - -W E I K T L - - - - - -G R G G S A G A K F R I S L G L P V G A V I NC A DNT G A K N L Y V I A V QG I K G R L NR L P A A
HHV K R L R E M S P L - - - - - - - -W E M S K L - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P A A
R E T E R L R E L S P L - - - - - - - -W E M S K R - - - - - -G R G G A S G A K F R I S L G L P V G A V MNC A DNT G A K N L F V I S V Y G I R G R L NR L P S A
R E T NR L R D L S P L - - - - - - - -W E M S K R - - - - - -G R G G A S G A K F R I S L G L P V G A V MNC A DNT G A K N L F V I S V Y G I R G R L NR L P S A
E R V D F L R K M S P L - - - - - - - -W E - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -MNC A DN S G A R N L Y V L A V K G T G A R L NR L P A A
E R V K F L R E L S P L - - - - - - - -W E M - - - - - - - - - S G NG A QG T K F R I S L G L P T G A I MNC A DN S G A R N L Y I MA V K G S G S R L NR L P A A
HHV K R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P A A
- - - - - - - - - - - - - - - - - - - - - -M S X X - - - - - -X X X X X X X X X F R I S L S L P V G A V V NC A DNT G A K N L Y I I A V K G I R G R L NR L P A A
S V V NK L R DM S P L - - - - - - - -W E M S - - - - - - - - I K S A A A G T K F R M S L G L P V G A V MNC A DN S G A K N L Y V I S V I G F G A R L NR L P A A
E S V T L L R K M S P L - - - - - - - -WDM -K R - - - - - -G R G A A G G A K MR I T L G L NV G A L I NC C DN S G G K N L Y I I A V K G T G S C L NR L P S A
E S V T L L R K M S P L - - - - - - - -WDM -K R - - - - - -G R G A A G G A K MR I T L G L NV G A L I NC C DN S G G K N L Y I I A V K G T G S C L NR L P S A
HQV K R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P S A
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - L G L P V G A V V NC C DN S G A R N L Y I V S V K G F G A R L NR L P A A
K NV QR L R DM S P L - - - - - - - -W E M - - - - - - - - - S K A QA V G S NY R V S L G L P V G A V MN S A DN S G A K N L Y V I A V K G I K G R L NR L P S A
K HV E R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G T A G G K F R I S L G L P V G A V MNC A DNT G A K N L Y V I A V HG I R G R L NR L P A A
K HV E R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G T A G G K F R I S L G L P V G A V MNC A DNT G A K N L Y V I A V HG I R G R L NR L P A A
E A V QK L R E M S P L - - - - - - - -Y E MA A E K K T E V L E K K I S I K P R Y K MT R G I QV E T L MK C A DN S G A K I L R C I G V K R Y R G R L NR L P A A
K S V DR L R Q L S S T - - - - - - - -Y A M P K R - - - - - -G A G G R QG NK F R V T C G L NNA S T V NC A DNT G A K T L T I I S V K G F HG R L NR L P R A
QHV K R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S S G A K F R I S L G L P V G A V I K G A DNT G A K N L Y I I S V K G I K G R L NR L P A A
QHV K R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P A A
E R V D F L R E L S P L - - - - - - - -W E M - - - - - - - - - S G NG A QG T K F R I S L G L P T G A I MNC A DN S G A R N L Y I MA V K G S G S R L NR L P A A
R NV E R L R E L S P L - - - - - - - -WDM -G K - - - - - -DQA NV K G C R F R V S V A L P V G A V V NC A DNT G A K N L Y V I S V K G Y HG R L NR L P S A
QHV K R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P A A
E R V S F L R E L S P L - - - - - - - -W E M -A K - - - - - - L S R G A P G G K L K MT L G L P V G A I MNC A DN S G A R N L Y I I S V K G I G A R L NR L P A G
E I V QK L R NM S P L - - - - - - - -T P - - - - - - - - - - - - - -MK G MR S N I P R A L NA G A Q I A C V DNT G A K V V E I I S V K K Y R G V K NR M P C A
HHV K R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P A A
HHV K R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P A A
E R V G F L R E L S P L - - - - - - - -W E M -A K - - - - - -Q S R G A P G G K L K MT L G L P V G A I MNC A DN S G A R N L Y I I S V K G I G A R L NR L P A G
HQV K K L R DM S P L - - - - - - - -Y E M S K R - - - - - -G R G G S A G NK F R M S L G L P V A A T V NC A DNT G A K N L Y I I S V K G I K G R L NR L P S A
R QV E K L R E M S P L - - - - - - - -W E M S K R - - - - - -G G G NA S G T K Y K M S Y G V P V G A V V NC A DNT G A K N L Y L I A V K R WG S R QNR L P A A
QHV K R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P A A
G A V R K L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G QV G I K L R I T L A C NV G A V L NC A DN S G A K N I Y V I S T F G I K G H L S R L P S A
E R V E F L R K M S P L - - - - - - - -W E M - - - - - - - - - S G S G A S G NK F R M S L A L P V G A V MNC A DN S G A R N L Y V L A V K G V G A R L NR L P A A
K S V E R L R S I S P L - - - - - - - -Y E M -K R - - - - - -G R A G T L K NK MR I T L S L P V G A L I NC C DN S G G K N L Y I I A V QG F G S C L NR L P A A
K S V K R L R S I S P L - - - - - - - -Y E M -K R - - - - - -G R A G T L K NK MR I T L S L P V G A L I NC C DN S G G K N L Y I I A V QG F G S C L NR L P A A
K S V K R L R S I S P L - - - - - - - -Y E M -K R - - - - - -G R A G T L K NK MR I T L S L P V G A L I NC C DN S G G K N L Y I I A V QG F G S C L NR L P A A
QQV E K L R E M S P L - - - - - - - -Y E M S K R - - - - - -G R G G S A G NK F R M S L G L P V A A T V NC A DNT G A K N L Y I I S V K G I K G R L NR L P S A
HHV K R L R E M S P L - - - - - - - -W E M S K R - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P A A
DR V K F L R E L S P L - - - - - - - -W E M - - - - - - - - - S G NG A QG T K F R I S L G L P V G A I MNC A DN S G A R N L Y I I A V K G S G S R L NR L P A A
R QV S F L R NM S P L - - - - - - - -WDM - S R - - - - - -G R G A A S G T K Y R MT L G L P V QA I MNC A DN S G A K N L Y I V S V F G T G A R L NR L P A A
H E V T Q L R E M S P L - - - - - - - -W E - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -G K N L Y I I A V S G I G G R L NR L P NA
P C L QT E G D E S S MG DG S G R HR Y Q - - - - - - - - - -G R G G S S G A K F R I S L G L P V G A V I NC A DNT G A K N L Y I I S V K G I K G R L NR L P S A
E K V S K L R E M S P L - - - - - - - -W E A A A R - - - - - -G R G G QV G T K A K V S L G L P V G A V MNC A DN S G A K N L Y T I A C F G I K G H L S K L P S A
K V V E K L R N L S P L - - - - - - - -Y E M -K R - - - - - -G R G G S G G NK L R V T L G L P V G A L I NC C DN S G G K N L Y L I A V K G T G A C L NR L P S A
K V V E K L R N L S P L - - - - - - - -Y E M -K R - - - - - -G R G G S G G NK L R V T L G L P V G A L I NC C DN S G G K N L Y L I A V K G T G A C L NR L P S A
HA V K H L R D L S P L - - - - - - - -W E M S K R - - - - - -G R T G QQG T K F A MT A G L P V G A V I NC C DN S G A K NM F I I S V R G HK G R L NR L P A A
R T V E R L R E M S P L - - - - - - - -WDM -G K - - - - - -DK A NV K G C R F R V S L A L P V G A V V NC A DNT G A K N L Y I I S V K G Y HG R L NR L P A A
R NV E R L R E M S P L - - - - - - - -WDM -G K - - - - - - E K A NV K G C R F R V S L A L P V G A V V NC A DNT G A K N L Y I I S V K G Y HG R L NR L P A A
K V V NK L R DM S P L - - - - - - - -W E M - - - - - - - - - - S K A A V G T K F R MT L A L P V G A V MNC A DN S G A K N L F V I A V HG I G A R L NR L P A A
E R V N F L R E L S P L - - - - - - - -W E M - - - - - - - - - - - S G A S G T K Y K M S MA L P V G A I MNC A DN S G A R N L Y V I A V K G C G A R L NR L P A A
1330
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1340
1350
1360
1370
1380
1390
1400
G V G DM F V A T V K K G K P E L R K K V M P A V V I R QR K P F R R R DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
C V G DMV MA T V K K G K P D L R K K V L P A V I V R QR K P WR R K DG V F MY F E DNA G V I V N P K G E MK G S A I T G P I G K E C A D L W P R I A - - - S A
S L G DMV MA T V K K G K P E L R K K V M P A I V V R Q S K P WR R K DG V Y L Y F E DNA G V I A N P K G E MK G S A I T G P V G K E C A D L W P R I A - - - S N
G V G DMV MA T V K K G K P E L R K K V M P A V V V R Q S K P WR R P DG I Y L Y F E DNA G V I V NA K G E MK G S A I T G P V G K E A A E L W P V S S L L F S N
G V G DMV MA T V K K G K P E L R K K V M P A V V V R Q S K P WR R P DG I Y L Y F E DNA G V I V NA K G E MK G S A I T G P V G K E A A E L W P R I A - - - S N
G S G DM I V A T V K K G K P E L R K K V M P A V V I R QR K P F R R R DG V F I Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G V G DM F V C S V K K G K P E L R K K V L QG V V I R QR K Q F R R K DG T F I Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - -A N
G V G DM F V C S V K K G K P E L R K K V L QG V V I R QR K Q F R R K DG T F I Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - -A N
A A G DMV MA T V K K G K P E L R K K V M P A I V I R Q S K P WR R R DG V Y L Y F E DNA G V I V N P K G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
S L G DMV MA T V K K G K P E L R K K V M P A I V V R Q S K A WR R K DG V Y L Y F E DNA G V I A N P K G E MK G S A I T G P V G K E C A D L W P R V A - - - S N
G V G DMV MA T V K K G K P X L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V X K E C A D L W P X I A - - - S N
G V G D I V L A T V K K G K P E L R K K V H P A V I I R Q S K S Y R R K HG QM I Y F E DNA G V I V NQK G E MK G - - - - - - - - - - - - - - - - - - - - - - - A A G DMV MA S V K K G K P E L R K K V M P A V I C R QR K P WR R R DG I F L Y F E DNA G V I V NA K G E MK G S A I NG P V A K E C A D L W P R I A - - - S N
S I G DMV L A T V K K G K P E L R K K V W P A V I V R QR K A F R R P E G T F L Y F E DNA G V I V N P K G E MK G S A I T G P V G K E C A E L W P K V S - - -A A
S I G DMV L A T V K K G K P E L R K K V W P A V I V R QR K A F R R P E G T F L Y F E DNA G V I V N P K G E MK G S A I T G P V G K E C A E L W P K V S - - -A A
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
S A G DMV MA T V K K G K P E L R K K I M P A I V V R QA R P WR R K DG V Y L Y F E DNA G V I V N P K G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G V G DMV MA T V K K G K P E L R K K V C T G L V V R QR K HWK R K DG V Y I Y F E DNA G V MC N P K G E V K G N - I L G P V A K E C S D L W P K V A - - -T N
G V G DM F V A T V K K G K P E L R K K V M P A V V I R QR K P F R R R DG V F I Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G V G DM F V A T V K K G K P E L R K K V M P A V V I R QR K P F R R R DG V F I Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
A P G D I C V V S V K K G K P E L R K K V HY A I L I R QK K I WR R T DG S H I M F E DNA A V L I NNK G E L R G A Q I A G P V P R E V A DMW P K I S - - - S Q
G C G DMV V A T C K K G K P E Y R K K MHT A V I I R QR R T WR R K DG V T L Y F E DNA A V I V NMK G E MK G S A I T G P V S K E S A D L W P K I S - - - S N
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
S L G DMV MA T V K K G K P E L R K K V M P A I V V R Q S K A WR R K DG V F L Y F E DNA G V I A N P K G E MK G S A V T G P V G K E C A D L W P R I A - - - S N
A L G DMV MC S V K K G K P E L R K K V L NA V I I R QR K S WR R K DG T V I Y F E DNA G V I V N P K G E MK G S G I A G P V A K E S A D L W P K I S - - -T H
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G V G DMV MA T V K K G K P E L R K K V H P A V I V R Q S K P WK R T DG V F L Y F E DNA G V I V N P K G E MK G S A I T G P V G K E A A E L W P R I A - - - S N
G I G DMC V V S V K K G T P E MR K QV L L A V V V R QK Q E F R R P DG L HV S F E DNA MV I T D E E G I P K G T D I K G P V A R E V A E R F P K I G - - -T T
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G V G DMV MA T V K K G K P E L R K K V H P A V I V R Q S K P WK R F DG V F L Y F E DNA G V I V N P K G E MK G S A I T G P V G K E A A E L W P R I A - - - S N
C V G DMV MA T V K K G K P D L R K K V M P A V I V R QR K P WR R K DG V Y MY F E DNA G V I V N P K G E MK G S A I T G P I G K E C A D L W P R I A - - - S A
N P G S MV MA T V K K G K P D L R K K V F P A I I V R QR K P I R R K E G L I I Y F E DNA G V I C N P K G E MK G S A I A G P V A K E C A D L W P R V A - - - S A
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
S I G DMV L C S V K QG K P A L R K K V MQA V V V R QR K P Y R R R E G Y Y I Y F E DNA G V I I N P K G E MK G S A I T G P V G K E A A D L W P K I A - - - S A
S A G DMV MA T V K K G K P E L R K K V M P A I V I R Q S R P WR R K DG V Y L Y F E DNA G V I V N P K G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
S L G DMV L A T V K K G K P D L R K K V L NA I I C R Q S K A WR R H E G Y Y I Y F E DNA G V I V N P K G E MK G S A I T G P V A R E C A E L W P K L S - - - S A
S L G DMV L A T V K K G K P D L R K K V L NA I I T R Q S K A WR R H E G Y F I Y F E DNA G V I V T P R -R MK G S A I T G P V A R E C A E L W P K L S - - - S A
S L G DMV L A T V K K G K P D L R K K V L NA I I T R Q S K A WR R H E G Y F I Y F E DNA G V I V N P K G E MK G S A I T G P V A R E C A E L W P K L S - - - S A
C V G DMV MA T V K K G K P D L R K K V M P A V I V R QR K P WR R K DG V F MY F E DNA G V I V N P K G E MK G S A I T G P I G K E C A D L W P R I A - - - S A
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
S L G DMV MA T V K K G K P E L R K K V M P A I V V R QA K S WR R R DG V F L Y F E DNA G V I A N P K G E MK G S A I T G P V G K E C A D L W P R V A - - - S N
S C G DMV L A T V K K G K P D L R K K I M P A I V V R QR K A WR R K DG V Y L Y F E DNA G V I V N P K G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G L G DM I V A T V K K G K P E L R K K V M P A V V I R QR K P I R R R E G I V L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
G V G DMV MA T V K K G K P E L R K K V H P A V V I R QR K S Y R R K DG V F L Y F E DNA G V I V NNK G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
S I G DM I L C S V K K G S P K L R K K V L QA I V I R QR R P WR R R DG V F I Y F E DNA G V I A N P K G E MK G S Q I T G P V A K E C A D I W P K V A - - - S N
S V G DMV L A T V K K G R P D L R K K V L P A V I V R QR K A WR R R E G Y F I Y F E DNA G V I V N P K G E MK G S A I NG P V A K E C A E L W P K I S - - -A A
S V G DMV L A T V K K G R P D L R K K V L P A V I V R QR K A WR R R E G Y F I Y F E DNA G V I V N P K G E MK G S A I NG P V A K E C A E L W P K I S - - -A A
S V S D L I V V T C K K G K P A L R K K V S MG V V V R QR A I WR R K DG V V I G F QDNA G V I I NDK G E MK G S A I T G P V A K E A A E L W P K V A - - - S V
A L G DMV MA S V K K G K P E L R R K V L NA V I I R QR K S WR R K DG T V I Y F E DNA G V I V N P K G E MK G S G I A G P V A K E A A E L W P K I S - - -T H
A L G D I V MA S V K K G K P E L R R K V L NA V I I R QR K S WR R K DG T V I Y F E DNA G V I V N P K G E MK G S G I A G P V A K E A A D L W P K I S - - - S H
A A G DMV V A S V K K G K P E L R K K V M P A V V V R QR K P WR R R DG V F L Y F E DNA G V I V N P K G E MK G S A I T G P V A K E C A D I W P R I A - - - S N
G A G DMV MA T V K K G K P E L R K K V M P A I V V R Q S K P WR R K DG V Y L Y F E DNA G V I V N P K G E MK G S A I T G P V A K E C A D L W P R I A - - - S N
1420
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1430
1440
1450
1460
1470
1480
1490
A G S I - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -MA R K A R I I DV V Y NA S NN E L I R T K
A NA I G - - I S R D S I HK K K R K Y E L G R Q P A NT K L - - S I R V R G G NV K WR A L R L DT G N F S WG S E A V T R K T R I L DV A Y NA S NN E L V R T Q
S G V V G - - I S R D S R HK K K R K F E L G R Q P A NT K I - -G V R T R G G NK K F R A L R I E T G N F S WA S E G V S R K T R I V G V V Y H P S NN E L V R T N
T S S -G - - I S R D S R HK K K R A F E K G R Q P A NT R I - -G V R T R G G NR K F R A L R L E S G N F S WG S E G I S R K T R V I V V A Y H P S NN E L V R T N
S G V V G - - I S R D S R HK K K R A F E K G R Q P S NT R I - -G V R T R G G NQK F R A L R L E S G N F S WG S E G I S R K T R V I V V A Y H P S NN E L V R T N
A S S I G - - I S R DHWHK K K R K Y E L G R P A A NT R L - -G V R S R G G NT K Y R A L R L DT G N F S WG S E C S T R K T R I I DV V Y NA S NN E L V R T K
A G S I G - - I S R DNWHK K K R K Y E L G R P A A NT K I - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
A G S I G - - I S R D S WHK K K R K F E L G R P A A NT K I - -G V R T R G G N L K Y R A L R L DNG N F S WA S E QT T R K T R I V DT MY NA T NN E L V R T K
A G S I G - - I S R D S WHK K K R K F E L G R P A A NT K I - -G V R T R G G N E K Y R A L R L D S G N F S WA S E QT T R K T R I V DT MY NA T NN E L V R T K
S G V V G - - I S R D S R HK K K R K F E L G R Q P A NT K I - -G V R T R G G NQK F R A L R V E T G N F S WG S E G V S R K T R I A G V V Y H P S NN E L V R T N
S G V V G - - I S R D S R HK K K R K F E L G R Q P A NT K I - -G V R T R G G NQK F R A L R I E T G N F S WA S E G V A K K T R I V G V V Y H P S NN E L V R T N
A G S I G - - I S R DNWHK K K R K Y E L G R P A A NT K I - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
- - -MG - - I S QD S WHK K K R K F E L G R P A A NT K I - -G V R T R G G NT K F R G L R L DT G N F S WG S E A C A R K T R I I DV MY NA S NN E L L R T K
A G T V G - - I T R D S R HK K K R K F E L G R Q P A MT K L D - S V R T R G G NV K Y R A L R L D S G N F A WG S E S V T R K T R L I QV R Y NA T NN E L L R T Q
A P S I G - - I S R D S R HK K K R K Y E MG R P A S NT K L - -G V R C R G G NK K F R A L R L D S G NY S WG S QG V T R K A R I M E V V Y NA S NN E L V R T K
A P S I G - - I S R D S R HK K K R K Y E MG R P A S NT K L - -G V R C R G G NK K F R A L R L D S G NY S WG S QG I S R K A R I M E V V Y NA S NN E L V R T K
A G S I G - - I S R DNWHK K K R K Y E L G R P A A NT K I - -G I R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
S G V V G - - I S R D S R HK K K R K F E L G R Q P A NT K I - -G V R T R G G N E K F R A L R I E T G N F S WG S E G V A R K T R L A G V V Y H P S NN E L V R T N
A G T I G - - I S R DA L HK K K R K Y E L G R QA A K T K I - -C I R V R G G HQK F R A L R L DT G N F S WA T E K I T R K C R I L NV V Y NA T S ND L V R T N
A S S I G - - I S R D S A HK K K R K F E L G R P A A NT K L - -G V R T R G G NT K L R A L R L E T G N F A WA S E G V A R K T R I A DV V Y NA S NN E L V R T K
A S S I G - - I S R D S A HK K K R K F E L G R P A A NT K L - -G V R T R G G N S K L R A L R L E NG N F A WA S E G V A R K T R I A DV V Y NA S NN E L V R T K
A S S I G - - I NHR G DHK K K R NNR A G S Q P S S T K I - -G V R V R G G NR K Y K A L R L DMG H F K F I T T G K F R MA K L L QV V Y H P S S N E L V R T N
A P T I G - - I T R D S R HK K K K K NT MG R Q P A NT R L - -G V R C R Y G I I K R R A L R L E NG N F S WA S Q S I T K G T K I L NV V Y NA S DND F V R T N
A G S I G - - I S R DNWHK K K R K Y E L G R P P A NT K I - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
A G S I G - - I S R DNWHK K K R K Y E L G R P A A NT K I - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
S G V V G - - I S R D S R HK K K R K F E L G R QA A NT K I - -G V R T R G G NQK F R A L R I E T G N F S WA S E G V A R K T R I T G V V Y H P S NN E L V R T N
A P A I G - - I V R S R L HK K R MK A E L G R L P A NT R L - -G V R A R G G N F K I R A L R L DT G N F A WA S E A I A HR V R L L DV V Y NA T S N E L V R T K
A G S I G - - I S R DNWHK K K R K Y E L G R P A A NT K I - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
S G V V G - - I S R D S R HK QK R A W E A G R Q P A S T K I - -G V R V R G G NT K Y R A L R L D S G N F S WG S E G V T R K T R V I A V A Y H P S NN E L V R T N
A S I I - - -MR WQG S S R G K R K F E MG R E S A E T R I - - S V P T MG G NR K V R L L Q S NV A NV T N P K DG K T V T A P I E T V I DNT A NK HY V R R N
A G S I G - - I S R DNWHK K K R K Y E L G R P P A NT K I - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
A G S I G - - I S R DNWHK K K R K Y E L G R P A A NT K I - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
S G V V G - - I S R D S R HK K K R A F E A G R Q P A NT R I - -G V R T R G G NHK Y R A L R L D S G N F A WA S E G C T R K T R V I V V A Y H P S NN E L V R T N
A NA I G - - I S R D S MHK K K R K Y E L G R Q P A NT K L - - S V R V R G G N L K WR A L R L DT G NY S WG S E A V T R K T R I L DV V Y NA S NN E L V R T Q
A S S I G - - I S R D S L HK K K R K Y E L G R Q P A NT K L - - S V R C R G G N I K HR A L R L DT G N F A WG S E NC T R K T R I L DV V Y NA S NN E L V R T K
A G S I G - - I S R DNWHK K K R K Y E L G R P A A NT K I - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
A G S V G - - I S R D S R HK K K R A F E K G R QA A MT K L V S G I R V R G G N F K F R A L R L S E G N F S WG S QG I A K K A K I V E V V Y H P S NN E L V R T K
S G V V G - - I S R D S R HK K K R K F E L G R Q S A NT K I - -G V R T R G G NQK F R A L R V E T G N F S WG S E G V S R K T R I A T V V Y H P S NN E L V R T N
A S A I G - - I S R DG R HK K K R K Y E L G R P P S NT K L - -G V R G R G R NY K Y R A I K L D S G S F S W P T F G I S K NT R I I DV V Y NA S NN E L V R T K
A S A I G - - I S R DG R HK K K R K Y E L G R P P S NT K L - -G V R G R G K N L K Y R A I K L D S G S F S W P A F G V S K I T R I I DV V Y NA S NN E L V R T K
A S A I G - - I S R DG R HK K K R K Y E L G R P P S NT K L - -G V R G R G R NY K Y R A I K L D S G S F S W P A F G I S K MT R I I DV V Y NA S NN E L V R T K
A NA I G - - I S R D S MHK K K R K Y E L G R Q P A S T K L - - S I R V R G G NV K WR A L R L DT G NY S WG S E A V T R K T R I L DV V Y NA S NN E L V R T Q
A G S I G - - I S R DNWHK K K R K Y E L G R P A A NT K I - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
S G V V G - - I S R D S R HK K K R K F E L G R Q P A NT K I - -G V R T R G G NK K Y R A L R I E T G N F S WA S E G I S K K T R I A G V V Y H P S NN E L V R T N
A G T V G - - I T R D S R HK K K R K F E L G R Q P S NT R I - -G V R V R G G NK K F R A L R L D S G N F S WG S E G V S K K T R I I QV A Y H P S NN E L V R T N
A S T I G G R I P DDT T R K A HY A L P L A R K K G A K L L - -G V R C MG G N I K R R A L R L DNG N F S WG S E HT T R K T R I I DV V Y NA S NN E L V R T K
A G S I G - - I S R DNWHK K K R K Y E L G R P P A NT K L - -G V R V R G G NK K Y R A L R L DV G N F S WG S E C C T R K T R I I DV V Y NA S NN E L V R T K
A G S V G - - I S R D S K HK K K R A F E K G R P I S MT K L - -T V R V R G G H L K F R A L R L C E G N F S WG S E N I T R K T K I L DV K Y NA T NN E L V R T K
A P S I G - - I S R D S R HK K K R K Y E L G R P S S NT K L - -G V R C R G G N L K F R A L R L D S G N F S WG S QNV T R K T R V MDV V Y NA S S N E L V R T K
A P S I G - - I S R D S R HK K K R K Y E L G R P S S NT K L - -G V R C R G G N L K F R A L R L D S G N F S WG S QNV T R K T R V MDV V Y NA S S N E L V R T K
A P A V G - - I T R MG D L K K K R N F L A G R P S A QT R I - -G V R V R G G N L K MR A L R L E T G T F A WA S E NC T R K T R I L NV T Y H P A DND L V R T N
A P A I G - - I V R S R L HK K R MK A E L G R L P A NT K L - -G V R A R G G N F K L R G L R L DT G N F A WG T E A S A QR A R I L DV V Y NA T S N E L V R T K
A P A I G - - I V R S R L HK K R MK A E L G R L P A HT K L - -G V R A R G G N F K L R G L R L DT G N F A WG T E A I A QR A R I L DV V Y NA T S N E L V R T K
A G T V G - - I T R D S R HK K K R A F E L G R QA A NT R I - -G V R V R G G N L K HR A L R L E S G N F A WG S E H I T A K T R V L G V V Y NA S NN E L V R T N
S G V V G - - I S R D S R HK K K R K F E C G R QG A V T R I - -G V R T R G G NK K F R A I R I E T G N F S WG S E G T T R K T R V L G V S F H P S NN E L I R T N
1500
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
I
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
1510
1520
1530
1540
1550
1560
L V K NA I I V I DA S P F R QWY E S HY S K S N L R K Y V K R - - -QK NA K I D P A V E E Q F NA G R L L A C I S S R P G QV G R A DG Y I
L V K S A I V QV DA A P F K QG Y L QHY S NHV QR K L E MR - - -Q E G R A L D S H L E E Q F S S G R L L A C I A S R P G QC G R A DG Y I
L T K S A I V Q I DA T P F R QWY E S HY S R NA E R K WA A R - - -A G DA K I E G A V D S Q F S A G R L Y A C I S S R P G Q S G R C DG Y I
L T K S A V V Q I DA A P F R QWY E A HY S N S V V K K QA A R F - -A DHG K V E P A I E K Q F E S G R L Y A V I A S R P G Q S G R V DG Y I
L T K S A V V Q I DA A P F R QWY E A HY S N S V V K K QA A R F - -A E QG K V E S A V E R Q F E S G R L Y A V V S S R P G Q S G R V DG Y I
L V K NA I V V V DA T P F R QWY E S HY S QK T A R K Y L A R - - -QR L A K V E G A L E E Q F HT G R L L A C V A S R P G QC G R A DG Y I
L V K NC I V L I D S T P Y R QWY E S HY S K K I QK K Y D E R - - -K K NA K I S S L L E E Q F QQG K L L A C I A S R P G QC G R A DG Y V
L V K G A I V S V DA A P F R QWY E A HY S NHT L K K Y T E R - - -QK T A A V DA L L T E Q F NT G R L L A R I S S S P G QV G QA NG Y I
L V K G A I I S V DA A P F R QWY E A HY S HHT MK K Y T E R - - -QK T A A V DA L L I E Q F NT G R L L A R I S S S P G QV G QA NG Y I
L T K S A V V Q I DA T P F R QWY E NHY S R K V E R K L A A R - - - S G A A A I E S A V D S Q F G S G R L Y A V I S S R P G Q S G R C DG Y I
L T K A A I V Q I DA T P F R QWY E A HY S K S A E R K WA A R - - -A A S A K V E S A V D S Q F S A G R L Y A C I S S R P G Q S G R C DG Y I
L V K NC I V L I D S T P Y R QWY E S HY S K K I QK K Y D E R - - -K K NA K I S S L L E E Q F QQG K L L A C I A S R P G QC G R A DG Y V
L V K NA I I Q I D S T P F R QWY E A HY S K K T QK K Y E E R - - -K K E P K V A QA L E E Q F NQG R I L A C I S S R P G Q S G R C DG Y I
L V K G A V V D I DA T P F R QWY E S HY S NHV K R I L E E R - - -K K V A K I D P L L E QQ F R A G R L L A V I S S R P G Q S G R A DG Y I
L V K NA I V V I DA T P F R Q F Y L QR Y S G H L L A T R K A R - - - L MNNV I D P L V E E Q F G I G R L L A C V S S R P G QC G R C DG Y I
L V K NA I V V I DA T P F R Q F Y L QR Y S G H L L A T R K A R - - - L MNNV I D P L V E E Q F G I G R L L A C V S S R P G QC G R C DG Y I
L V K NC V V L V D S T P Y R QWY E S HY S K K V QK K F T L R - - -R K T A K I S P L L E E Q F L QG K L L A C I S S R P G QC G R A DG Y V
L T K A A I V Q I DA T P F K QW F E T HY S R K V E R K L A QR - - - S G A S N I E S A V E HQ F NA G R L Y A A I S S R P G Q S G R C DG Y I
L V K G S I V Q I DA T P Y K QWY E T HY S A S L L A K L A S R - - -A K G R V L D S A I E S Q I G E G R F F A R I T S R P G QV G K C DG Y I
L V K N S I V V I DA T P F R QWY E A HY S E K V MK K Y L E R - - -QK Y G K V E QA L E DQ F T S G R I L A C I S S R P G QC G R S DG Y I
L V K N S I V V I DA T P F R QWY E S HY S E K V MK K Y L E R - - -QK F G K V E QA L E DQ F T S G R I L A C I S S R P G QC G R S DG Y I
L T K S S V V K I S A E P F K ND I K - - - - - - - - - - - - - - - - -DV A R DV D P S L H E S F E K G H L Y A I I T S R P G QV G MA QG HV
L V K G A I I E I D P A P F R L W F L K F Y S K T MQK K Y A K K L E V L K NMK F D E A L L E G F Q S G R V L A C I S S R P G QT G S V E G Y I
L V K NC I V L V D S T P Y R QWY E A HY S K K I QK K Y D E R - - -K K NA K I A S I L E E Q F QQG K L L A C I A S R P G QC G R A DG Y V
L V K NC I V L I D S T P Y R QWY E S HY S K K I QK K Y D E R - - -K K NA K I S S L L E E Q F QQG K L L A C I A S R P G QC G R A DG Y V
L T K A A I V Q I DA T P F R QWY E S HY S K NT E R K WA A R - - -A A E A K I E HA V D S Q F G A G R L Y A A I S S R P G Q S G R C DG Y I
L V K NC I V A V DA A P F K R WY A K HY S P K L QR E WT R R - - -R R NHR V E K A I A DQ L R E G R V L A R I T S R P G Q S G R A DG I L
L V K NC I V L I D S T P Y R QWY E S HY S K K I QK K Y D E R - - -K K NA K I S S L L E E Q F QQG K L L A C I A S R P G QC G R A DG Y V
L T K S A V I Q I DA A P F R QWY E A HY S K S V E K K QA E R F - -A A R G K V D S A L E K Q F E A G R V F A V V S S R P G Q S G R C DG Y I
L T K G S V I R T S MG T - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -A R V T S R P G QDG V V NA V L
L V K NC I V L V D S T P Y R QWY E S HY S K K I QK K Y E E R - - -K K NA K I S P L L E E Q F QQG K L L A C I A S R P G QC G R A DG Y V
L V K NC I V L I D S T P Y R QWY E S HY S K K I QK K Y D E R - - -K K NA K I S S L L E E Q F QQG K L L A C I A S R P G QC G R A DG Y V
L T K S A V V Q I DA A P F R QWY E A HY S K S V E K K QA E R F - -A A A G K V D P A L E K Q F E A G R L Y A V I S S R P G Q S G R C DG Y I
L V K S A I V QV DA A P F K QWY L T HY S NHV V R K L E K R - - -QQT R T L D S H I E E Q F G S G R L L A C I S S R P G QC G R A DG Y I
L V K S A V I A V DA A P F R A WY A QHY S K S V T MK L R S R - - -NQK H E V A K A I D E Q F A T G R L L A I I T S R P G QC G R A DG Y V
L V K NC I V L I D S T P Y R QWY E S HY S K K I QK K Y D E R - - -K K NA K I S S L L E E Q F QQG K L L A C I A S R P G QC G R A DG Y V
L T R G V I V QV DA T P F R QWY A K K Y S R S L I K K L E QR - - -A K DNA I DA L V Q E Q F T NQR L L V R I T S R P G Q S G R A DG Y I
L T K A A I V Q I DA T P F R QWY E NHY S R K V E R K L A S R - - -A G QA A I E S A V DA Q F G S G K L Y A A I S S R P G Q S G R C DG Y I
L V K NC I V V I D S H P F T T WY E NT F T Y G V I K K I - - - - - -G K S K N I D P L L L E Q F K QG R V L A C I S S R P G QC G K A DG Y I
L V K NC I V L I D S H P F T T WY E NT F S Y S V I K K I - - - - - -G K S K Q I D P A L L E Q F K QG R V L A C I S S R P G QC G K A DG Y I
L V K NC I V L I D S H P F T A WY E NT F S Y S V I K K I - - - - - -G K A K Q I D P A L L E Q F K QG R V L A C I S S R P G QC G K A DG Y I
L V K S A I V QV DA A P F K QWY L QHY S NHV I R K L E K R - - -QQV R K L D P H I E E Q F G S G R L L A S I S S R P G QC G R A DG Y I
L V K NC I V L I D S T P Y R QWY E S HY S K K I QK K Y D E R - - -K K NA K I S S L L E E Q F QQG K L L A C I A S R P G QC G R A DG Y V
L T K A A I V Q I DA T P F R QW F E A HY S K NA E R K WA A R - - -A A S A K I E S S V E S Q F S A G R L Y A C I S S R P G Q S G R C DG Y I
L T K S A I V Q I DA A P F R V WY E T HY S K HV QR K H S A R - - - L G D S K V D S A L E T Q F A A G R L Y A V V S S R P G Q S G R C DG Y I
L V K NA I V Q I D S T P F R QWY E A HY S K K V V K K F E E R - - -K K T A K V A QA L E E Q F G T G R L L A C I A S R P G QC G R A DG Y I
L V K NC I I L V D S L P F R QWY E A HY S K K T QK K Y D E R - - -K K T A K I S T L L E E Q F QQG K L L A C I A S R P G QC G R A DG Y I
L V K N S I V E I D S T P F R E WY K L HY S R HV QK R V -K R - - -T K A QA L E K N I E E Q F V S QR I L A C I T S R P G Q S G R A DG Y I
L V K NA I V T V D P T P F K L W F K T HY S E K V - - - - - - - - - - - -A G L V P K T L L E Q F S S G R L L A C I S S R P G QC G R C DG Y V
L V K NA I V T V D P T P F K L W F K T HY S E K V - - - - - - - - - - - -A A L V P R T L L DQ F S S G R L L A C I S S R P G QC G R C DG Y V
L A R G S V V S I DA A P F K QWY E R Q F T DK MT QR WA A N - - -K DG G V V A P E L V A E F DQG R L L A V I T S R P G QC G R A DG Y I
L V K NC I V V V DA A P F R L WY A K HY S S K L K R K W E Y R - - -R K HHK I E K A L A DQ L R E G R L L A R I T S R P G QT G R A DG A L
L V K NC I V V V DA A P F K L WY A K HY S D E L K R K WM L R - - -R E NHK I E K A V A DQ L K E G R L L A R I T S R P G QT A R A DG A L
L V K G C I V QV DA T P F R QA Y E K HY S NNV T R K L E NR - - -R K E G K L D S L V E QQ F G A G R L Y A A V S S R P G Q S G R C DG Y I
L T K S A I V Q I DA T P F R QWY E S Y Y A E A DQA A V A A R - - -QA DA K L D P A V E A Q F G A G R L Y A C V S S R P G Q S G R V DG Y V
1570
L EGK E L E FY
L EGK E L E FY
L EG E E LA FY
L EG E E LA FY
L EG E E LA FY
L EGK E L E FY
L EGK E L E FY
L EGK E LD FY
L EGK E LD FY
L EG E E LA FY
L EG E E LA FY
L EGK E L E FY
L EGK E L E FY
L EGK E L E FY
L EGK E L E FY
L EGK E L E FY
L EGK E L E FY
L EG E E LA FY
L EAK E L E FY
L EGK E L E FY
L EGK E L E FY
L QG D E L K F Y
L EGK E LD FY
L EGK E L E FY
L EGK E L E FY
L EG E E LA FY
L EGA E LQ FY
L EGK E L E FY
L EG E E LA FY
I E------L EGK E L E FY
L EGK E L E FY
L EG E E LA FY
L EGK E L E FY
L EGK E L E FY
L EGK E L E FY
L EGK E L E FY
L EG E E LA FY
I EGD E L L FY
I EGD E L L FY
I EGD E L L FY
L EGK E L E FY
L EGK E L E FY
L EG E E LA FY
L EG E E LH FY
L EGK E LD FY
L EGK E L E FY
L EGK E L E FY
L EG E E LN FY
L EG E E LN FY
L EG E E LA FY
L EGA E LQ FY
L EGA E LQ FY
L EGK E L E FY
L EG E E LA FY
1580
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1590
1600
1610
1620
1630
1640
1650
L K K I K NK K MV L A D L G R K I T NA L H S L S K A T I I N E E -V L D S M L K E I C T A L L E A DV N I R L V K K L R E NV R Q S A V F K E L V K L V I M F V G
MK K L QK K K MV L A E L G G R I T R A I QQM S NV T I I D E K -A L N E C L N E I T R A L L Q S DV S F P L V K E MQ S N I K E QA I F S E L C K MV V M F V G
V R R L T A K K MV L A D L G K R I NA A V A QA L NNDT DDY V A G V E T M L K A I V T A L L E NDV N I K L V S S V R S N I K QK T V F E E L C A L V V M F V G
QR A I R K - -MV L QD L G R R I NA A V ND L T R S S N L D E K -A F DDM L K E I C A A L L S A DV NV R L V QT L R K S I K QK A V F D E L V A L V I M F V G
QR A I R K - -MV L QD L G R R I NA A V ND L T R S NN L D E K QA F DDM I K E I C A A L L S A DV NV R L V Q S L R K S I K QK A V F D E L V S L V I M F V G
LR K I K SK R - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - L R K I K A R K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C T A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
L R K I R A K K MV L A D L G R K I R NA I G K L G QNT V I N E E - E L D L M L K E V C T A L I E S DV H I R L V K Q L K DNV K QK T V F N E L L K L V F M F V G
L R K I R A K K MV L A D L G R K I R NA I G K L G Q S T V I N E G - E L D L M L K E V C T A L I E S DV H I R L V K Q L K DNV K QK T V F N E L L K L V F M F V G
L R R L T A K K MV L A D L G S R L R G A L S S V E S G S - - -DD - E I QQM I K D I C S A L L E S DV NV K L V A K L R G N I K QK I I F D E L C A L V I M F V G
L R R L T A K K MV L A D L G K R I NNA V N S A L S NT E DDY V N S I DG M L K G I S T A L L E A DV N I M L V S K V R NN I R QK T V F D E L C G L I I M F V G
L R K I K A R K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C T A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
QR K I K A R K MV L A D L G R K I NNA L R S L S NA T I I N E E -V L Q S M L S E I C R A L L E S DV N I R L V K K L R E NV R Q S A V F R E L V K L V I M F V G
HHK L Q I R K MV L A D L G T R L HG A WNQ L S K A S V I DDK -V I DG V L K E L C A A L L E S DV NV K L V A S L R T K V K QK A V F D E L V A L V L MA V G
K K K L EK K K - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - K K K L E K K K MV L A D L G G Q L A S A I R K F Q S S T I A D E A -A I D L C L K E I A T A L L K A DV NV K L V A Q L R NN I K Q S A V V E E L V N I V I V F V G
L R K I K A K K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C A A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
L R R L T A K K MV L A D L G S R L R G A L S NV E S G S - - - E T - E I Q S M I K D I C NA L L E S DV N I K L V A K L R DN I K QK I V F D E L C S L V I M F V G
QR R L QK K K MV L A D L G NQ L S S A L R S L N E T T I V N E D -T I NQ L L K E V G NA L S K S DV S M S L I I QMR K N I K K QV V F D E L I R L I V M F V G
L K K I K S K K MV L A D L G R K I T T A L H S L S K A T V I N E E -A L N S M L K E I C A A L L E A DV N I R L V K Q L R E NV R Q S A V F K E L V K L V V M F V G
L K K I K S K K MV L A D L G R K I T T A L H S L S K A T V I N E E -A L N S M L K E I C A A L L E A DV N I R L V K Q L R E NV R Q S A V F K E L I K L I I M F V G
A DK F NK K S -M I T E L G R S I T NT L S N L L S S P A T DQH - - I E T A I R E I C N S L I L S NV N P R Y V S D L R D E L R QNA V Y E R L V D L V V V F V G
S K K I S DK K MV L S Q L G S S L V T A L R K MT S S T V V D E E -V I NT L L K E I E T S L L G E DV N P I F I R QMV NN I K K D S V F E E L I N L V L MMV G
L R K I K A R K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C T A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L I I M F V G
L R K I K A R K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C T A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
L R R L T A K K MV L A D L G K R I NNA V T NA I S N E QT DY E T T V Q S M L K E I A T A L L E NDV N I R L V S R L R E N I K QK T V F D E L C N L V I M F V G
L K R L E K K K MV L A E L G QK I G QA I HR M S A K S M L G E D -DV K E L MN E I A R A L L QA DV NV T I V K K L QV S I R QNA V F NG L K R I V V M F V G
L R K I K A R K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C T A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
QR K L HK - -MV L QD L G R R I NA A V S D L T R A P N L D E K -A - - - - - -K I C A A L L E A DV NV R L V G Q L R K S I K QK A V F D E L V S L V I M F V G
- - - - - - - -MV M E K L G D S L QG A L K K L I G A G R I D E R -T V N E V V K D I QR A L L QA DV NV K L V MG M S QR I K I R I V Y Q E L M E I T I MMV G
L R K I K A R K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C T A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
L R K I K A R K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C T A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
QR K L HK - -MV L QD L G R R I NA A V S D L T R A P N L D E K -A F DG M L K E I C S A L L E A DV NV R L V G Q L R K S I K QK A V F D E L V R L V I M F V G
MK K L QR K K MV L A Q L G G S I S R A L A QM S NA T V I D E K -V L S DC L N E I S R A L L Q S DV Q F K MV R DMQ S N I K QQA V F T E L C NMV V M F V G
QK K MMK K K MV L ND L G NK I A S A L R S L NA HV V V D E E - L L DA C L K D I T NA L L A S DV A V P L V V R MK K N I V E R A V F K E L T A L V V M F V G
L R K I K A R K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C T A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
I K K V E QK K MV L A E L G K S I NA A L QK L S K A P V V D E A - L V DQ I L G E I A MA L L K A DV NA K F I K K L R E DV K QK A V V DG L T R MV I M F V G
L R R L T A K K MV L A D L G S R L R G A L S S V E S A S - - -D E - E I NQM I K DV C T A L L E S DV N I K L V V K L R DN I K QK I I Y D E L V G L I I M F V G
K R K MDK K K MV L T E L G T Q I T NA F R K L QT S T L A DDV -V I E E C L K E I I R A L I L S D I NV S Y L K D I K S N I K QK Y V V E E L I K L V I L F V G
K R K MDK K K MV L T E L G T Q L T S A L QK L QA S A V A DD S -A I E E C L K E V I R A L I L A D I N I S Y L K D I K S N I K QQY V V E E L I N L V I L F V G
K R K MDK K K MV L T E L G A Q L T S A L QK I QA A P V A DDN -V I E E C L K E I V R A L I L A D I NV I Y L K D I K S N I K QK Y V V E E L I K L V I L F V G
MK K I QR K K MV L A Q L G G S I S R A I QQM S NA T I I D E K -A L NDC L N E I T R A L L Q S DV Q F K L V R DMQT N I K QQA I F N E L C K I V V M F V G
L R K I K A R K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C T A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
L R R L T A K K MV L A D L G K R I N S A V NNA I S NT QDD F T T S V DV M L K G I V T A L L E S DV N I A L V S K L R NN I R QK T V F D E L C K L I I M F V G
L R R MA P K K MV F A D L G R R L N S A L G D F S K A T S V N E E - L V DT L L K N I C T A L L E T DV NV R L V Q E L R S N I K QK A V F D E L C S L V I MMV G
MR K MR A K K MV L A D L G R K I T S A L K S L S NA T I I D E D -V L N S M L N E I C R A L L E A DV N I R L V K A L K E NV K QT A V F K E L V K L I I M F V G
L R K I K A K K MV L A D L G R K I T S A L R S L S NA T I I N E E -V L NA M L K E V C A A L L E A DV N I K L V K Q L R E NV K QHA V F K E L V K L V I M F V G
I R K L Q S K K MV L A D L G K R I NNA L QQ L NK A P V I D E E - L L NQV L K E I Q L A L L Q S DV NV K Y V A K L K S N I I QQA V V Q E L T QMV V M F V G
R R R MDK K K MV L A E L S NQ I T QA F R K L H S T T V I S E A -V I E E V I G D I V R A L L MA DV NV K L V HK L K E NV K QK I V V D E L V NMV I M F V G
R R R MDK K K MV L A E L S NQ I T K A F R K L H S T T V I S E A -V I E E V I G D I V R A L L MA DV NV K L V HK L K E NV K QK I V V D E L V NMV I M F V G
S DK I A K K K -M L QD L G E K L MG S I K K L S E S K T I D E K -V Y V T F MA E V A K S L I A A DC S K E I V F D F S R R L K E K A V F N E L V K L I F MMV G
L K K L DK K K MV L A E L G QK I G A A I S K M S S K S F V G E D -DV K E F L N E V A R A L L QA DV NV K T V K E L QQNV R QT A V F NG I K K M I V M F V G
L K K L E K K K MV L A E L G QK I G G A I S K M S S K P L L G E D -DV K E F L N E V A R A L L QA DV HV T T V K E L QQT I R QT A V F S G L R K I I V M F V G
V R R L K A S K MV L S D L G R R I N S A F QD L S K V P T V DA A - S I DQ L L K S V C NA L I E A DV NV K L V A N L R S QV K QK A V F DH L V A L V I M F V G
L K K I V S K K MV L E D L G K R I NG A F A N L S K G G D I D E - -A L DA M L K E V C S A L L E S DV N I K L V S Q L R QK V K QK A L F D E L V N L V V M F V G
1670
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1680
1690
1700
1710
1720
1730
1740
1750
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1760
1770
1780
1790
S G R HK Q E E S L F E E M L A V A NA V N P DN I I F V MDA T I G QA C E A QA K A F K E K V D I G S V
S G R HK Q E A S L F E E MR QV A E A T K P D L V I F V MD S S I G QA A F DQA QA F K Q S V A V G A V
S G R HHQ E DA L F Q E MV E I A Q E V K P NQT I MV L DA S I G QA A E QQ S R A F K E A A D F G A I
S G R HK Q E E E L F T E MT Q I QNA V T P DQT I L V L D S T I G QA A E A Q S A A F K A T A N F G A I
S G R HK Q E E E L F T E MT Q I QT A V T P DQT I L V L D S T I G QA A E A Q S S A F K A T A D F G A I
- - - - - - - - - - - - - - - - - - - - - - P DN I I F V MDA T I G QA C E A QA R A F K DK V D I G S V
S G R HK Q E D S L F E E M L QV A NA I Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HK Q E A S L F E E M L QV S NA V T P DNV V F V MDA S I G QA C E A QA R A F S QT V DV A S V
S G R HK Q E A S L F E E M L QV S NA V T P DNV V F V MDA S I G QA C E A QA R A F S QT V DV A S V
S G R HR Q E E Q L F T E MV Q I G E A V Q P T QT I MV MDG S I G QA A E S QA R A F K E S S N F G S I
S G R HHQ E E E L F H E MV Q I S NV I K P NQT I MV L DA S I G QA A E QQ S K A F K E S S D F G A I
S G R HK Q E D S L F E E M L QV A NA I Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HK Q E D S L F E E M L QV Y NA V A P DNV I F V MDA S I G QA C E G QA R A F K E K V DV A S V
S G R HK Q E S E L F E E MV A I G A A V K P DMT L MV L DA S I G QA A E G Q S R A F K D S A D F G A I
- - - - - - - - - - - - -M E QV V M E T N P DDV V F V MD S H I G QA C Y DQA MA F C NA V DV G S V
S G R HK Q E S S L F V E M E QV V M E T N P DDV V F V MD S H I G QA C Y DQA MA F C NA V DV G S V
S G R HK Q E D S L F E E M L QV S NA V Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HK Q E Q S L F N E M I Q I S E M I V P T QT I MV MDG S I G QA A E S QA K A F K E S S Q F G S I
S G R HK QD S E L F E E MK Q I E T A V K P DNC I F V MD S S I G QA A Y E QA T A F R S S V K V G S I
S G R HK Q E E S L F E E M L A V S NA V S P DN I I F V MDA T I G QA C E A QA K A F K DK V D I G S V
S G R HK Q E E S L F E E M L A V A NA V N P DN I I F V MDA T I G QA C E A QA K A F K DK V D I G S V
S G R HT Q E T E L F T E MK D I I R E I S P S S I V F V MDA G I G Q S A E DQA MG F K R A V DV G S I
S G R HK QDK E L F K E MQ S V R DA I K P D S I I F V MDG A I G QA A F G QA K A F K DA V E V G S V
S G R HK Q E D S L F E E M L QV A NA I Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HK Q E D S L F E E M L QV A NA I Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HQQ E D S L F Q E MV E I S QA V K P K QT I MV L DA S I G QA A E HQ S K A F K E S A D F G S I
S G R HK Q E S A L F E E MK QV E E A V K P ND I V F V M S A T DG QA V E E QA R N F K E MV A V G S V
S G R HK Q E D S L F E E M L QV A NA I Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HR Q E S A L F Q E MMD I QK A V K P D E T I MV L DA S I G QQA E A QA K A F K E A A D F G A I
A G R HA L E A D L I E E M E R I HA V A K P DHK F MV L DA G I G QQA S QQA HA F ND S V G I T G V
S G R HK Q E D S L F E E M L QV A NA I Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HK Q E D S L F E E M L QV S NA I Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HR Q E E A L F Q E MMD I QT A V K P D E T I MV L DA S I G QQA E A QA K A F K E A A D F G A I
S G R HK Q E A A L F E E MR QV S E A T K P D L V I F V MD S S I G QA A F DQA QA F K Q S V S V G A V
S G R HK Q E E A L F E E MR E I A S V T E P T MT I F V MD S S I G Q S A S DQA K A F A S T V DV G G V
S G R HK Q E D S L F E E M L QV A NA I Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HK Q E ND L F E E MK QV E A A V K P DD I V F V MD S S I G QA C F DQA L A F K K A V NV G S V
S G R HR Q E HQ L F Q E MV Q I G E M I Q P T QT I MV MDG S I G QA A E S QA K A F K E S S N F G S I
S G R HK Q E N E L F E E M I QV E N S I Q P E E I I F V I D S H I G Q S C HDQA MA F K N S V S L G S I
S G R HK Q E S E L F E E MK QV E S S I N P E E I V F V I D S H I G Q S C HDQA MA F K N S V T L G S I
S G R HK Q E ND L F E E MK QV E N S I K P E E I V F V I D S H I G Q S C HDQA MA F K N S V K V G S I
S G R HK Q E A A L F E E MR QV S E A T K P D L I I F V MD S S I G QA A F DQA QA F K QMV A V G A V
S G R HK Q E D S L F E E M L QV S NA I Q P DN I V Y V MDA S I G QA C E A QA K A F K DK V DV A S V
S G R HHQ E E E L F Q E M I E I S NV I K P NQT I MV L DA S I G QA A E QQ S K A F K E S S D F G A I
S G R HQQ E Q E L F A E MV E I S DA I R P DQT I M I L DA S I G QA A E S Q S K A F K E T A D F G A V
S G R HK Q E D S L F E E M L QV A NV T S P DN I I F V MDA S I G QA C E S QA K A F K E K V DV A S V
S G R HK Q E D S L F E E M L QV S NA V Q P DN I V Y V MDA S I G QA C E S QA K A F K DK V DV A S V
S G R HK Q E S E L F D E MK QV QA A V N P D E C I F V MDG S I G QA C Y DQA QA F R NA V NV G S V
S G R HK Q E DA L F D E MK L I Y DA V Q P D E V V F V MD S H I G QA C Y DQA S A F NK A V DV G S V
S G R HK Q E DA L F D E MK L I Y DA V Q P D E V V F V MD S H I G QA C Y DQA A A F NK A V DV G S V
S G R HMQ E E A L F A E MK A L A A A V N P H E I I F V MDG T I G QA A Y DQA L G F K NA V G V G S I
S G R HK Q E S A L F E E MK QV QQA V K P ND I V F V M S A T DG QG I E E QA R Q F K E K V P I G S V
S G R HK Q E S A L F E E MK QV Q E A V K P ND I V F V M S A T DG QG I R E QA R Q F K E K V P V G S V
S G R HK Q E Q E L F D E MR E I DT A V T P D L T I MV L DA N I G QA A E A Q S R A F K QA A G Y G A I
S G R HR Q E S E L F T E MV D I G A A V K P D S T I MV L DA S I G QA A E P Q S R A F K DA S D F G S I
1800
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
1810
I T K L DG HA K G G G A
I T K MDG HA K G G G A
L T K MDG HA K G G G A
I T K T DG HA A G G G A
I T K T DG HA A G G G A
I T K L DG HA K G G G A
V T K L DG HA K G G G A
I T K L D S HA K G G G A
I T K L D S HA K G G G A
L T K MDG HA K G G G A
L T K MDG HA K G G G A
V T K L DG HA K G G G A
V T K L DG HA K G G G A
V T K L DG HA K G G G A
I T K L DG HA K G G G A
I T K L DG HA K G G G A
V T K L DG HA K G G G A
L T K MDG HA R G G G A
I T K MDG N S MG G G A
I T K L DG HA K G G G A
I T K L DG HA K G G G A
L T K I DG T T K A G G A
I T K L DG H S NG G G A
V T K L DG HA K G G G A
V T K L DG HA K G G G A
L T K MDG HA R G G G A
V T K L DC QT K G G G A
V T K L DG HA K G G G A
I T K T DG HA S G G G A
I T K L DG T A K G G G A
V T K L DG HA K G G G A
V T K L DG HA K G G G A
I T K T DG HA A G G G A
V T K MDG HA K G G G A
MT K L DG HA K G G G A
V T K L DG HA K G G G A
I T K L DG HA K G G G A
L T K MDG HA K G G G A
I T K I DG HA K G G G A
I T K I DG HA K G G G A
I T K I DG HA K G G G A
I T K MDG HA K G G G A
V T K L DG HA K G G G A
L T K MDG HA R G G G A
I T K L DG HA K G G G A
I T K L DG HA K G G G A
V T K L DG HA K G G G A
I T K L DG HA K G G G A
I T K L DG HA K G G G A
I T K L DG HA K G G G A
I T K L D S NA K G G G A
V T K L DG QA K G G G A
I T K L DG HA K G G G A
V T K L DG HA K G G G A
L T K MDG HA K G G G A
1820
L SAV AAT N S P I I F I G
L SAV AAT K S PV I F I G
I SAV AAT K T PV I F I G
I S A V A A T HT P I I F L G
I S A V A A T HT P I I Y L G
L SAV AAT Q S P I I F I G
L SAV AAT K S P I I F I G
L SAV AV T K S PV I F I G
L SAV AV T K S PV I F I G
I SAV AAT K T P I V F I G
I S A V A A T NT P I A F I G
L SAV AAT K S P I I F I G
L SAV AAT Q S P I I F I G
I SAV AAT K T P I I F LG
L SAV AAT GA P I I F I G
L SAV AAT GA P I I F I G
L SAV AAT K S P I I F I G
I SAV AT T K T P I V F I G
I S A V A A T NT P I I F I G
L SAV AAT Q S P I I F I G
L SAV AAT Q S P I I F I G
I S SV AAT K C P I E FV G
L SAV AAT K S P I I F I G
L SAV AAT K S P I I F I G
L SAV AAT K S P I I F I G
I S A V A S T NT P I I F I G
L SAV AAT R S P I V F I G
L SAV AAT K S P I I F I G
I S A V A A T HT P I V F I G
L SAV S ET K A P I A F I G
L SAV AAT K S P I I F I G
L SAV AAT K S P I I F I G
I S A V A A T HT P I V F I G
L SAV AAT K S PV I F I G
I SAV S ET K A P I L F I G
L SAV AAT K S P I I F I G
L SAV AAT E S P I V F I G
I SAV AAT K T P I V F I G
L SAV AAT GC P I T F I G
L SAV A ST GC P I T F I G
L SAV SA I GC P I T F I G
L SAV AAT K S PV I F I G
L SAV AAT K S P I I F I G
I S A V A A T NT P I I F I G
L SAV AAT K T P I V F I G
L SAV AAT K S PV I F I G
L SAV AAT R S P I I F I G
L SAV AAT E S P I I F I G
L SAV SAT N S P I I F I G
L SAV SAT N S P I I F I G
L SAV AAT N S P I S F I G
L A A V A MT K S P I V F I G
L A A V A MT K S P I V F I G
I SAV AAT K T P I MF I G
I S A V A A T NT P I I F I G
1830
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1840
1850
1860
1870
1880
1890
1900
T G E H I DD L E P F K T K P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E E L I DK I K HG Q - F T I R DMY E Q F QN I MK MG P F S Q I MG M I
T G E HMD E F E V F DV K P F V S R L L G MG DW S G F V DK L Q E V V P - - -K DQQ P E L L E K L S QG N - F T L R I MY DQ F QN I L NMG P L K E V F S M L
T G E HV HD F E K F S P K S F V S K L L G I G D I E S L L E Q F QT V S N - - -K E DT K A T M E N I QQG R - F T L L D F QK QMQT I MK MG P L S N L A S M I
T G E H L MD L E R F E P K A F V QK L L G MG DMA G L V E HV QA V T K - -D S A A A K E T Y K H I A E G I -Y T L R D F R E N I T S I MK MG P L S K L S G M I
T G E H L MD L E R F E P K A F I QK L L G MG DMA G L V E HV QA V T K - -D S A S A K E T Y K H I S E G I -Y T L R D F R E N I T S I MK MG P L S K L S G M I
T G E H I DD L E P F K T K P F I K K L L G MG D I E G L L DK V N E L K - - - - L DDND E L L E K L K HG Q - F T L R A MY E Q F QN I MK MG P F S Q I MG M I
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E A L I E K L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I L G M I
T G E H I DD F E I F K P K S F V QK L L G MG D I A G L V DMV ND I G - - - - I S DNK E L V G R L K QG Q - F T L R DMY E Q F QN I MK MG P F S Q I MV M I
T G E H I DD F E I F K P K S F V QK L L G MG D I A G L V DMV ND I G - - - - I QDNK E L V G R L K QG Q - F T L R DMY E Q F QN I MK MG P F S Q I MG M I
T G E HV G D L E I F K P T T F I S K L L G I G D I QG L I E HV Q S L N L -HQD E G HK QT I E H I K E G K - F T L R D F QNQMNN F L K MG P L T N I A S M I
T G E H I HD L E K F S P K S F I S K L L G I G D I E G L F E Q L K T V S N - - -K E DT K A T M E N I QQG K - F T L L D F K K QMQT I MK MG P L S N I A QM I
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E A L I E K L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I L G M I
T G E H I DD F E M F K T Q P F V R K L L NMG D I E G L I DK V N E L K - - - - L DDN E E L L DK L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I M S M I
T G E H L ND L E R F A P Q P F I S K L L G MG DMQG L V E HMQDMA R -A N P DR QK D L A K K L E QG K - F T I R DWR E Q L S N I MNMG S I S K I A S M I
T G E H F D E F E P F E T K G F V S R L L G L G D I S G L MA K I N E V V P - - - L DR Q P DMV NR L V QG I - F T L R DMY E Q F QNM L NMG S P S A L L S M I
T G E H F D E F E P F E T K G F V S R L L G L G D I S G L MA K I N E V V P - - - L DR Q P DMV NR L V QG I - F T L R DMY E Q F QNM L NMG S P S A L L S M I
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E E L I DK L K HG Q - F T L R DMY E Q F QN I MK MG P F G Q I MG M I
T G E HA T D L E I F K P T S F I S K L L G I G D I Q S L I E HV Q S L N L -QDD E S HK K T I E N F K E G K - F T L R D F QT QMNN F MK MG P L T N I A S M I
T G E H L T D L E L F D P S T F V S K L L G Y G DMK G M L E K I K E V I P - - - - - E D S T S L K E I A QG K - F T L R S MQQQ F QQ I MQ L G P I DK L V QM I
T G E H I DD L E P F K T K P F V S K L L G MG D I E G L I DK V N E L K - - - - L DG ND E L L E K I K HG H - F T I R DMY E Q F QN I MK MG P F S Q F MNM I
T G E H I DD L E P F K T K P F V S K L L G MG D I E G L I DK V N E L K - - - - L DG N E E L L E K I K HG H - F T I R DMY E Q F QN I MK MG P F S Q I MNM I
T G E G MDD L E A F DA R R F V S R M L G MG DV E G L M E K V G S L G I - - - - -D E K E V V K K L R QG R - F T L G D F Y DQ F QK I L S L G P I S K L L E M I
T G E K V N E I E E F DA E S F V R K L L G MG D L K G I A K L A K D F A E - - -NA E Y K T MV K H L Q E G T - L T V R DWK E Q L S N L QK MG Q L G N I MQM I
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E A L I E K L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I L G M I
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E A L I E K L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I L G M I
T G E H I HD F E K F S P K S F V S K L L G I G D I E S L M E R F QT V S D - - -QDDA K NT L E N I QQG K - F T L L D F K NQMQT I MK MG P L S N I A NM I
T G E H F E D F D L F N P E R F V QK M L G MG D I G G L MDT MR DA N - - - - I DG N E E V Y K R L QDG L - F T MR DMY E H L QNV L K MG S V G K I M E M L
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E A L I E K L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I L G M I
T G E HM L D L E R F A P QQ F I S K L L G MG DMA G L V E HV Q S L K L - - - - -DQK DT I K H I T E G I - F T I R D L R DQ L QN I MK MG P L S K MA G M I
V G E T P E D F E K F E A DR F I S R L L G MG D L K S L M E K A E E S L S - - - - - E E DV NV E A L MQG R - F T L K DMY K Q L E A MNK MG P L K Q I M S M L
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E A L I E K L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I L G M I
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E A L I E K L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I L G M I
T G E HM L D L E R F V P NN F I S K L L G MG DMA G L V E HV Q S L K L - - - - -DQK DT I K H I T E G I - F T I R D L R DQ L QN I MK MG P L S K MA G M I
T G E H I D E F E V F DV K P F V S R L L G MG DW S G F MDK I H E V V P - - -T DQQ P E L L QK L S E G T - F T L R L MY E Q F QN I L K MG P I G QV F S M L
T G E H I G E L E A F E T T S F V S K L L G MG D I K G L V E K MN E I V P - - - E E S A E K L M E A F G S G T - F T MR L L Y E Q F QN L QNMG P I S S I M S MV
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E A L I E K L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I L G M I
E G E H F DD L E S F E A S S F V R R L L G L G D I NK L F Q S V K DV V N - - -MR DQ P Q L I QK L K E G K - F S I R D L QT Q F N S V L K L G S L NQ F M S A I
T G E H I G D L E I F K P T T F I S K L L G I G D I Q S L I E HV QG L N L -QND E NHK QT M E N I K E G K - F T L K D F QNQMNN F L K MG P L T N I A S M I
T G E HV ND F E K F E A K S F V S R L L G L G D I S G L V S T I K E V I D - - - I DK Q P E L MNR L S K G K - F V L R DMY DQ F QNV F K MG S L S K V M S M I
T G E H I ND F E K F E A K S F V S R L L G L G D I NG L V S T L K E V I D - - - I E K Q P Q L I NR L S K G K - F V L R DMY DQ F QNV F K MG S L S K V M S M I
T G E HV ND F E K F E A K S F V S R L L G L G D I DG L V S T L K E V I D - - - I E K Q P Q L I NR I A K G K - F V L R DMY DQ F QNV F K MG S L S K V M S M I
T G E HMD E F E V F DV K P F V S R L L G MG DW S G F MDK I H E V V P - - -MDQQ P E L L QK L S E G N - F T L R I MY E Q F QN L L K MG P I G QV F S M L
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DK V N E L K - - - - L DDN E A L I E K L K HG Q - F T L R DMY E Q F QN I MK MG P F S Q I L G M I
T G E H I HD L E K F S P K S F I S K L L G I G D I E S L F E Q L QT V S N - - -K E DA K A T M E N I QK G K - F T L L D F K K QMQT I MK MG P L S N I A QM I
T G E H I ND L E R F S P R S F I S K L L G L G D L E G L M E HV Q S L D F - - - - -DK K NMV K N L E QG K - F T V R D F R DQ L G N I MK L G P L S K MA S M I
T G E H I DD F E P F K T K P F I S K L L G MG D I E G L I DK V S E L N - - - - L DDN E E L I NK L K HG E - F T L R DMY E Q F QN I MK MG P F G Q I MG M I
T G E H I DD F E P F K T Q P F I S K L L G MG D I E G L I DR V ND L K - - - - L DDN E E L I DK L K HG Q - F T L R DMY E Q F QN I MK MG P F G Q I MG M I
T G E H F E D L E P F N P E S F V K R L L G L G D I K G M I T T V T E A V D - - -M E T QG K A I A N I T K G Q - F S I R D F QA QY K S I L K L G S I NQ F M S M I
T G E H F DD F E P F D P K S F I S R L L G F G D I NG L I NT L K DV I N - - - L E DK P D L L DR I A S A K - F T I R DMY DQ F QN L L K MA P I G K V M S M L
T G E H F DD F E P F D P K S F I S R L L G F G D I NG L I NT L K DV I N - - - L DDK P D L L DR I A S A K - F T I R DMY DQ F QN L L K MA P I G K V M S M L
S G E Q F T D L E W F D P N S F V S R L L G I QD P G V I QR T L E E I D - - - -K E A NK E I A E H I QK G Q - F S F R D L Y NQY K MV L DV G N F N S M L D S I
T G E H F DD F E L F Q P E S F V S R M L G MG DMR A L V D S MK DA N - - - - I DT D S E L Y K R F QDG Q - F T L R DMY E H L QNV L K MG S V S K I MDM I
T G E H F E D F E L F Q P E S F V S R M L G MG DMR A L MDT MK DA N - - - - I DT D S E L Y R R F QDG Q - F T MR DMY E H L QNV L K MG S V S K I MNM I
T G E HA A D L E P F R A Q P F I S K L L G MG D I S G L MDK M E E MQMNG G Q E R QQ E M L K K I G QG G I F S I R DWR E Q L S N I MG MG P L S K I A G M I
T G E H I HD L E A F S P K Q F I S K L L G I G D L QG L M E T MQ S L N L - - - - -DQK K T M E H I Q E G I - F T L A D L R DQMG NM L K MG S L S S I A G M I
1920
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
1930
1940
1950
1960
1970
1980
1990
2000
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
2010
2020
2030
2040
2050
2060
2070
K P DN E S R P L WV A P -NG H - - I F L E S F S P V Y K HA HD F L I A I S E P V C R P E H I H E Y K L T A Y S L Y A A V S V G L QT HD I V E Y L K R - - - - L
K P DHG NR P L WA C A -DG K - - I F L E T F S P L Y K QA Y D F L I A I A E P V C R P E S MH E Y N L T P H S L Y A A V S V G L E T E T I I S V L NK - - - - L
K P DHA S R P L W I A P NDG R - - I I L E S F S P L A E QA QD F L V T I A E P V S R P S HV H E Y K I T A Y S L Y A A V S V G L E T DD I I A V L DR - - - - L
K P DHA NR P L W I D P L K G T - - I T L E S F S P L A P QA QD F L T T I A E P L S R P T H L H E Y R L T G N S L Y A A V S V G L Q P T D I I N F L DR - - - - L
K P DHA NR P L W I D P L K G T - - I T L E S F S P L A P QA QD F L T T I A E P L S R P T H L H E Y R L T G N S L Y A A V S V G L L P QD I I N F L DR - - - - L
K P DNA S R P L WV A P -NG H - - I F L E A F S P V Y K HA HD F L I A I A E P V C R - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - K DDHT S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L V A I A E P V C R P T HV H E Y K L T A Y S L Y A A V S V G L QT S D I T DY L R K - - - - L
K A D F S A R P L WV A P -DG H - - I F L E S F S P V Y K HA R D F L I A I S E P V C R P QH I H E Y Q L T A Y S L Y A A V S V G L QT K D I I E Y L E R - - - - L
K G D F T A R P L WV A P -DG H - - I F L E S F S P V Y K HA R D F L I A I S E P V C R P QH I H E Y Q L T A Y S L Y A A V S V G L QT K D I I E Y L E R - - - - L
K P DH F S R P I W I S P NDG R - - I I L E S F S P L A E QA QD F L I T I A E P I S R P S H I H E Y R I T A Y S L Y A A I S V G L E T DD I I S V L NR - - - - L
K P DHA S R P I W I S P S DG R - - I I L E S F S P L A E QA QD F L V T I A E P I S R P S H I H E Y K I T A Y S L Y A A V S V G L E T DD I I S V L DR - - - - L
K DDHG S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L V A I A E P V C R P T HV H E Y K L T A Y S L Y A A V S V G L QT S D I T E Y L R K - - - - L
----------------------------------------------------------------------------------K G DH S L R P L WV DD -R G N - - I I V E A F A P F A K QA QD F L V A I S E P V S R P A L I H E Y R I T K P S L H S A M S I G L E T K V I I E V L S R - - - - L
K S DHDK R P I WV F P -DG L - - I I I E T F HQ S S K A A C E F L V T I S E P L S R P E L I H E Y Q L T I F S L Y A A V S L G I T V D S I I E T L G K - - - - F
K S DHDK R P I WV F P -DG L - - I I I E T F HQ S S K A A C E F L V T I S E P L S R P E L I H E Y Q L T I F S L Y A A V S L G I T V D S I I E T L G K - - - - F
K NDH S S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L V A I S E P V C R P T HA H E Y K L T A Y S L Y A A V S V G L QT S D I I E Y L QK - - - - L
K P DH F S R P I W I S P I DA R - - I I L E S F S P L A E QA QD F L I T I A E P I S R P S HV H E Y R I T A Y S L Y A A V S V G L E T DD I I L V L NR - - - - L
K QDNK S R P I WV C P -DG H - - I F L E T F S A I Y K QA S D F L V A I A E P V C R P QN I H E Y Q L T P Y S L Y A A V S V G L E T ND I I T V L G R - - - - L
R P DHG NR P L WV A P -NG H - -V F L E S F S P V Y K HA HD F L I A I S E P V C R P E H I H E Y K L T A Y S L Y A A V S V G L QT HD I V E Y L K R - - - - L
R P DHG NR P L WV A P -NG H - -V F L E S F S P V Y K HA HD F L I A I S E P V C R P E H I H E Y K L T A Y S L Y A A V S V G L QT HD I V E Y L K R - - - - L
K E DG E S H P I WV NY -DG L - - I I L E T F R E S S R QA S D F L I A I A E P M S R P L Q I H E F Q I T A Y S L Y A A V S V G L T T S D I I E T L DR - - - - F
K P NH P E L P MWV S S -N L R - - I V V E T S NDM F K E V S DY L S R V A QV K S R M E HMH E Y Q L T P T S I MT A F S F G S T P E A M I S T L E K - - - -Y
K A DNA S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L V A I A E P V C R P T H I H E Y K L T A Y S L Y A A V S V G L QT S D I T E Y L QK - - - - L
K DDHT S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L V A I A E P V C R P T HV H E Y K L T A Y S L Y A A V S V G L QT S D I T E Y L R K - - - - L
K P DHA S R P L W I S P NDG R - -V I L E S F S P L A E QA QD F L V T I A E P V S R P S H I H E Y R I T A Y S L Y A A V S V G L E T E D I I A V L DR - - - - L
G E K C L F V E S R I E S -DG Y I T I I A E S F R R S Y V N I R P F L T T L A E A I S R P S L MH E Y L L T P F S L G A A V S NG I DA A E A T A F L E T HA Y G L
K DDHT S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L V A I A E P V C R P T HV H E Y K L T A Y S L Y A A V S V G L QT S D I T E Y L R K - - - - L
K P DHA NR P L W I N P DK G I - - I I L E S F N P L A E QA QD F L I T I A E P Q S R P T F L H E Y A L T A H S L Y A A V S V G L H P QD I I S T L DR - - - - F
- - - - - - - - - - - - - -QG T - - - - - - - - - - - - - - - - - - - L L I K G NV R V P N S I WD E R S G S F R A P A - - - - - L Y Y R D I V NY L K E - - - - K DDHA S R P L WV A P -DG H - -V F L E A F S P V Y K Y A QD F L V A I A E P V C R P S HV H E Y K L T A Y S L Y A A V S V G L QT S D I T E Y L K K - - - - L
K G DHT S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L V A I A E P V C R P T HV H E Y K L T A Y S L Y A A V S V G L QT S D I T E Y L R K - - - - L
K P DHDQK P L W I D P E K G T - - I I L E K F S P DA DR V T D F L V T I A E P K S R P H F L H E Y Q L T A H S L Y A G V S I G L Q S K D I I DT L DR - - - - F
K P DHA NR P L WA C A -DG R - - I F L E T F S P L Y K QA Y D F L I A I A E P V C R P E S MH E Y N L T P H S L Y A A V S V G L E T S T I I S V M S K - - - - L
K P DHA NR P L WV C D -DG R - - I F L E S F S P V Y K A A Y D F L I S V A E P V C R P A NMH E Y V L T P H S L Y A A V S V G L E T S T I L S V L DR - - - - L
K DDHT S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L V A I A E P V C R P T HV H E Y K L T A Y S L Y A A V S V G L QT S D I T E Y L R K - - - - L
E I V Q S NK P L I L S P -D L G - - I I V E K F N P L Y E I A F E F L MC V A E P I S R S E L I H E Y V L T QM S MY T A MV L QY S A DD I I R L L D L - - - - L
K P DH F S R P I WM S P -DG R - - I I L E S F S P L A E QA QD F L I T I A E P I S R P S H I H E Y R L T P Y S L Y A A V S V G L E T DD I I S V L S R - - - - L
K K NHMNK P L W I C S -DG F - - I Y L E M F N S C S K QA S D F L I T I A E P I C R P E L I H E F Q L T I F S L Y A A I S V G I T L D E L L I N L DK - - - - F
K K NHMNK P MW I C S -DG F - - I Y L E M F N S C S K QA S D F L I T I A E P I C R P E L I H E F Q L T I F S L Y A A I S V G V T L D E L L V N L DK - - - - F
K K NHMNK P L W I C S -DG F - - I Y L E M F N S C S K QA S D F L I T I A E P I C R P E I I H E F Q L T I F S L Y A A I S V G I T L D E L L L N L DK - - - - F
K P DHA NR P L WA C A -DG R - - I F L E T F S S L Y K QA Y D F L I A I A E P V C R P E S MH E Y N L T P H S L Y A A V S V G L E T E T I I S V L NK - - - - L
K G DHT S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L V A I A E P V C R P T HV H E Y K L T A Y S L Y A A V S V G L QT S D I T E Y L R K - - - - L
R P DHA S R P L W I S P S DG R - - I I L E S F S P L A E QA QD F L V T I A E P I S R P S H I H E Y K I T A Y S L Y A A V S V G L E T DD I I S V L DR - - - - L
K L DHT A R P L W I N P I DG R - - I I L E A F S P L A E QA I D F L V T I S E P V S R P A F I H E Y R I T A Y S L Y A A V S V G L K T E D I I A V L DR - - - - L
K K DHG S R P L W L A P -DG H - - I F L E S F S P V Y K HA HD F L I A I S E P V C R P E N I H E Y K L T A Y S L Y A A V S V G L QT S D I I E Y L R R - - - - L
K DDHA S R P L WV A P -DG H - - I F L E A F S P V Y K Y A QD F L I A I A E P V C R P T H I H E Y K L T A Y S L Y A A V S V G L QT S D I V E Y L QK - - - - L
K DDY R E R P I L I C P -DG I - - I F L E T F N P L Y R V A Y Q F L I S I G E P V QR P L S MHK F T L T K Y S L Y T A MV L QY E P K D I I L C L E K - - - - L
K T NHT A R P L WV C P -DG Y - - L Y L E L F T P V S K QA L D F I V T I A E P V C R P E L I H E Y QV T V F S L Y T A V S V G L S F E E L L NN L NK - - - - F
K NNH S A R P L WV C P -DG Y - - L Y L E L F T P V S K QA L D F I V T I A E P V C R P E L I H E Y QV T V F S L Y T A V S V G L S F E E L L NN L NK - - - - F
L E N S DNR P A I V M P -DG H - - I F V E T F S P F Y S K V V D F I I A I A D P C S R P K Y V Q E Y Q I N P Y S I F S A V S I G L K A K E I I R I L A I - - - - I
- - - - - - - - - - L G P -G G R - - I F I NHG H P A Y P H L MD F L T A C C E P V C R T L Y V S E Y T I S P S S L S A A T A E G T Y S M E MV R NV I R Y F R L D
- - - - - - - - - -V G A -NG S - - L F V NNT H P A Y P H L V D F L T S C C E P V S R T L R M S E Y V I S P S S L S A A S A E G T Y S T A M I R N F I R Y F R L D
K L DHA S R P L W I S P DDG H - - I I L E G F S P L A E QA QD F L I A I A E P V S R P A Y I H E Y K L T P Y S L Y A A V S V G L Q P DD I I E V L NR - - - - L
K P DHA A R P L W I N P E DG R - - I I L E S F S P L A E QA QD F L V T I A E P I S R P S H I H E Y R I T T Y S L Y A A V S V G L E T S D I I S V L NR - - - - L
2080
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
2090
2100
2110
2120
2130
2140
2150
S K I R L C T L S Y G K K L V L K HNK Y F V E F E V NQ E K I - E V L QK R C I - - E I E F P L L A E Y D F R NDT I N - - - -A D I N I D L K P A - - -A V L R P
S K I HA S T A NY G K K L V L K K NR Y F I E F E I D P A L V - E NV K QR C L P NA L NY P M L E E Y D F R NDNV N - - - - P D L DM E L K P H - - -A Q P R P
S K I K G A T V S Y G K K L V I K HNR Y F V E F E I DR E S V - E L V K R R C Q - - E I DY P V L E E Y D F R NDNR N - - - - P D L E I D L K P S - - -T Q I R P
S K I I D F T K S F G K K V V L K HNR F F V E F E I P N E A V - E S V K A R C Q - -A MG C P A L E E Y D F R ND E I N - - - - P T L D I D L K P N - - -A R I R S
S K I L D F T K S Y G K K V V L K HNR F F V E F E I P N E A V - E P V K A R C Q - -A MG C P A L E E Y D F R ND E I N - - - - P T L D I D L K P A - - -A R I R S
- - - - - - - - - - - - - - - - - - - -Y L V E F E V D P DK I - E V I QK R C I - - E L E H P L L A E Y D F R ND S I N - - - - P D I N I D L K P T - - -A V L R P
S K I K L C T V S Y G K K L V L K HNR Y F V E F E V K Q E M I - E E L QK R C I - -H L E Y P L L A E Y D F R ND S V N - - - - P D I N I D L K P T - - -A V L R P
S K I QMC T V S Y G K K L V L K HNR Y F V E F E I K Q E T I - E T V QR R C I - - E L E Y P L L A E Y D F R NDT MN - - - - P N L G I D L K P S - - -T T L R P
S K V QMC T V S Y G K K L V L K HNR Y F V E F E I K Q E T I - E T V QK R C I - - E L E Y P L L A E Y D F R NDT L N - - - - P N L G I D L K P S - - -T T L R P
S K I K A A T V S Y G K K L V L K HNR Y F V E F E I A HD S V - E I V K R R C Q - -D I E Y P V L E E Y D F R HDA R N - - - - P D L E I D L K P S - - -T Q I R P
S K I K G A T I S Y G K K L V I K HNR Y F V E F E I A N S S V - E I V K R R C Q - - E I DY P V L E E Y D F R NDNR N - - - - P D L E I D L K P S - - -T Q I R P
S K I K L C T V S Y G K K L V L K HNR Y F V E F E V K Q E M I - E E L QK R C I - -H L E Y P L L A E Y D F R ND S V N - - - - P D I N I D L K P T - - -A V L R P
----------------------------------------------------------------------------------S K I E E WT A S F G K R L V L K DNR Y F L E F E V S G E R M - E DV R R R C K - -D I D L P A L E E Y D F R NDT I N - - - - P N L D I Q L K P M - - -T V I R P
S K I R G HC K L F G K K I V L L E G R Y F V E F E I S G DK V -D I V T MA S F -V S L HR P L L S E Y D F R S D I K N - - - - P N L D I S L K HT - - -T Q I R Y
S K I R G HC K L F G K K I V L L E G R Y F V E F E I S G DK V -D I V T MA S F -V S L HR P L L S E Y D F R S D I K N - - - - P N L D I S L K HT - - -T Q I R Y
S K I K L C T V S Y G K K L V L K HNR Y F V E F E I R Q E M I - E E L QK R C I - -Q L E Y P L L A E Y D F R NDT V N - - - - P D I NMD L K P T - - -A V L R P
S K I R G A T I S Y G K K L V L K HNR Y Y V E F E I A N E S V - E I V K R R C Q - - E I E Y P V L E E Y D F R NDDR N - - - - P D L D I D L K P S - - -T Q I R P
S K V R QC T Q S Y G K K L V L QK NK Y F V E F E I D P QQV - E E V K K R C I - -Q L DY P V L E E Y D F R NDT V N - - - - P N L N I D L K P T - - -T M I R P
S K I R L C T L S Y G K K L V L K HNK Y F I E F E V A Q E K I - E V I QK R C I - - E I E H P L L A E Y D F R NDT NN - - - - P D I N I D L K P A - - -A V L R P
S K I R L C T L S Y G K K L V L K HNK Y F I E F E V S Q E K I - E V I QK R C I - - E I E H P L L A E Y D F R NDT NN - - - - P D I N I D L K P A - - -A V L R P
S K I T E C T L S Y G K K L V MK E S S F F L E L S I E V E E V - E L V K K R C I - - E I DY P L I E E Y D F R NDK V L - - - -R S L Q I D L K P T - - -T I I R S
S K I R QA G E K K QNR L V L I NG K Y Y L Q I E V K QT S V - F K L K K K C K - -K K K V R V Y E E Y H F L R DK Q - - - - -K E L P I Q L R K D - - - -C L R P
S K I K L C T V S Y G K K L V L K R NR Y F V E F E V K Q E M I - E E L QK R C I - -H L DY P L L A E Y D F R ND S V N - - - - P D I N I D L K P T - - -A V L R P
S K I K L C T V S Y G K K L V L K HNR Y F V E F E V K Q E M I - E E L QK R C I - -H L E Y P L L A E Y D F R ND S V N - - - - P D I N I D L K P T - - -A V L R P
S K I K S A T V S Y G K K L V I K HNR Y F V E F E I DNA S V - E I V K K R C Q - - E L DY P V L E E Y D F R NDR R N - - - - P D L D I D L K P S - - -T Q I R P
A E I E S C MK R Y N L R I I I DA E R T L V Q F L L Q S R A M S K V V A A QC V - -V L G L P I QQQY D F E NDT S V - - - -R T A H I S L R T Q - - -T K P R R
S K I K L C T V S Y G K K L V L K HNR Y F V E F E V K Q E M I - E E L QK R C I - -H L E Y P L L A E Y D F R ND S V N - - - - P D I N I D L K P T - - -A V L R P
L K I E V S T K S Y G K K L V L K NT QY F V E F Q I E D E G V - E I V QK R C L - - E L NY P I L E E Y D F R NDT F N - - - - P V L D I D L R P N - - -T QV R P
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - S G I D F E DA V L D L L P C P D L S A A Y E A S G K K L K L R D
S K I K L C T V S Y G K K L V L K HNR Y F V E F E V K Q E M I - E E L QK R C I - -H L E Y P L L A E Y D F R ND S V N - - - - P D I N I D L K P T - - -A V L R P
S K I K L C T V S Y G K K L V L K HNR Y F V E F E V K Q E M I - E E L QK R C I - -C L E Y P L L A E Y D F R NDT L N - - - - P D I N I D L K P T - - -A V L R P
L K I E S C T K S Y G K K L V L NNNK Y F V E F E I P E T A V - E I V QR R C L - -D L G F P I L E E Y D F R ND S NN - - - -A D L E I D L R P N - - -T Q I R P
S K I HA S T A NY G K K L V L K K NR Y F V E F E I D P S QV - E NV K QR C L P NA L N F P M L E E Y D F R NDT V N - - - - P D L E M E L K P Q - - -A R P R P
S K V H E C T E NY G K K L V L QR NK F Y L E F E I E A R QV - E HV K QR C L P G N L G Y P T L E E Y D F R NDT R N - - - - P D L G I E L K P M - - -T R I R P
S K I K L C T V S Y G K K L V L K HNR Y F V E F E V K Q E M I - E E L QK R C I - -H L E Y P L L A E Y D F R ND S V N - - - - P D I N I D L K P T - - -A V L R P
S K I R HHT NN I G QK F F L QDK S Y Y I D F R I V G DY F - -DV A QA L I - -R S S V P L I Q E Y D F T K E K - - - - - -QK L D I N L K P S - - -T K P R L
S K I K S A T I S Y G K K L V L K HNR Y F V E F E I A N E S V - E I V K R R C Q - -D I DY P V L E E Y D F R NDA R N - - - - P D L E I D L K P S - - -T Q I R P
S K I T K S A E S F G K K L V L R E NK Y Y I E F E V NC DK I - E E V K Q E A L -QT MQR P L L M E Y D F R R DK K N - - - - P N L I C S L K S H - - -V Q I R Y
S K I T K S A E S F G K K L V L R E NK Y Y I E F E V NC DK L - E E V K Q E A L -QT MQR P L L M E Y D F R R DK K N - - - - P N L I C S L K S H - - -V Q I R Y
S K I T K S A E S F G K K L V L R E NK Y Y I E F E V NC DK I - E E V K Q E A L -QT MQR P L L M E Y D F R R DK K N - - - - P N L NC S L K S H - - -V Q I R Y
S K I HG S T A NY G K K L V L K K NR Y F I E F E V D P S QV - E NV K QR C L P NA L NY P M L E E Y D F R NDT V N - - - - P D L NM E L K P H - - -A Q P R P
S K I K L C T V S Y G K K L V L K HNR Y F V E F E V K Q E M I - E E L QK R C I - -C L E Y P L L A E Y D F R ND S L N - - - - P D I N I D L K P T - - -A V L R P
S K I K G A T I S Y G K K L V I K HNR Y F V E F E I A N E S V - E V V K K R C Q - - E I DY P V L E E Y D F R NDHR N - - - - P D L D I D L K P S - - -T Q I R P
S K I R A C T V S Y G K K L V L K K NR Y F I E F E I K H S S V - E T I K K R C A - - E I DY P L L E E Y D F R NDN I N - - - - P D L P I D L K P S - - -T Q I R P
S K I K L C T L S Y G K K L V L K HNR Y F V E F E V V QD E I - E N L QK R C I - - E L E Y P L L A E Y D F R NDT R N - - - - P D L S I D L K P T - - -A V L R P
SK I K V - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - S K I T E NT QNY G A R L F L DD S S Y Y L D I Q I I G DH F - - E V T K A V I - -NC S V P L I Q E Y D F E NK S F - - - - -K Q L E I E L K P K - - - I K V R Y
S K I L NT S S A F G K K L V L R D S R Y W I E F E V QQ E K I - E D L K R E A L -QT MR R P L V M E Y D F R K DNN S - - - - P S L NC C I R S N - - - I K I R Y
S K I L NT S S A F G K K L V L R D S R Y W I E F E V QQ E K I - E E L K R E A L -QNMR R P L V M E Y D F R K DNN S - - - - P S L NC C I R S N - - - I K I R Y
S K I E L C C L S V G K K S V L R NT K Y Y I E F Q I K T E S V -R E I R QY A V - -DHN L F I S D E Y D F MNDK T I - - - -DN L G I Q L K NT - - -T R I R P
E QA K V S A NG DV K - -V K K E E T K E V A S QV MDG K M -R NV R E R L Y -K E L S V R A D L F Y DY V QDH S L - - - -HV C D L E L S E N - - -V R L R P
E QT H E S E E S QG K S L V K Q E A T E E T A S QV K DG R L -R NV R E R L F -K E L G V R A D L F Y DY V QDG T L - - - -DV R D L A L A E H - - -V R L R P
S K I R E Y T A S F G K K L V L K QNK Y F V E F E I A E E Y I - E QV K K R C N - - E I G Y P M L E E Y D F R NDQ L N - - - -A D L E I D L K P I - - -T H I R P
S K I H S C T K S Y G K K L V L K HNR Y F V E F E I A P D S V - E T V K K R C Q - - E I DY P V L E E Y D F R NDHG N - - - - P D L D I D L K S S - - -T Q I R P
2160
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
2170
2180
2190
2200
2210
2220
2230
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A C C T V R K R A L V L C N S G V S V E QWK QQ F WG I MV L D E V HT I P A K M F R R V L T I
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K S L V G V S A A A R I K K S C L C L A T NA V S V DQWA Y Q F WG L L L MD E V HV V P A HM F R K V I S I
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I K K S V I V L C T S S V S V MQWR QQ F WG F I L L D E V HV V P A A M F R R V V S T
Y Q E K S L S K M F G NG R A K S G I I V L P C G A G K T L V G I T A A C T I K K G T I V L C T S S M S V V QWR N E F WG L M I L D E V HV V P A S M F R K V T S A
Y Q E K S L S K M F G NG R A K S G I I V L P C G A G K T L V G I T A G C T I K K G T I V L C T S S M S V V QWR N E F WG L M I L D E V HV V P A S M F R K V T S A
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A V C T V R K R A L V L C N S G V S V E QWK QQ F WG L V V L D E V HT I P A K M F R R V L T I
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S A V S V E QWK A Q F WG L M I L D E V HT I P A K M F R R V L T I
Y Q E K S L R K M F G N S R A R S G V I V L P C G A G K T L V G V T A V T T V NK R C L V L A N S NV S V E QWR A Q F WG L L L L D E V HT I P A K M F R R V L T I
Y Q E K S L R K M F G N S R A R S G V I V L P C G A G K T L V G V T A V T T V NK R C L V L A N S NV S V E QWR A Q F WG L L L L D E V HT I P A K M F R R V L T I
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I R K S V I V L C T S S V S V MQWR QQ F WG F I I L D E V HV V P A QM F R R V V T T
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I K K S V I V L C T S S V S V MQWR QQ F WG F I I L D E V HV V P A A M F R R V V S T
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S A V S V E QWK A Q F WG L M I L D E V HT I P A K M F R R V L T I
----------------------------------------------------------------------------------Y Q E M S L A K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I K K S A L V L C T S A V S V A QWK QQ F WG F L L L D E V HV T P A DM F R K C I NN
Y Q E QA L R MM F S NG R A R S G I I V L P C G A G K T L T G I T A A C T MR K S I L I L T T S A V A V S QWK F Q F WG L L I F D E V Q F A P A P A F R R I NG I
Y Q E QA L R MM F S NG R A R S G I I V L P C G A G K T L T G I T A A C T MR K S V L I L T T S A V A V S QWK F Q F WG L L I F D E V Q F A P A P A F R R I NG I
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S S V S V E QWK A Q F WG L I I L D E V HT I P A K M F R R V L T I
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I R K S V I V L C T S S V S V MQWR QQ F WG F I I L D E V HV V P A A M F R R V V T T
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K S L S G I T A A C T V K K S I L V L C T S A V S V E QWK Y Q F WG L V L L D E V HV V P A A M F R K V L T V
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A C C T V R K R A L V L C N S G V S V E QWK QQ F WG I MV L D E V HT I P A K M F R R V L T I
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A C C T V R K R A L V L C N S G V S V E QWK QQ F WG I MV L D E V HT I P A K M F R R V L T I
Y Q E I C L NK M F G NG R A R S G I I V L P C G S G K T I V G I T A I S T I K K NC L V L C T S A V S V E QWK QQT WG L L V L D E V HV V P A MM F R R V L S L
HQ E R A L QQ I F DN E MA R S G I V V L P C G A G K T L T A I A A C S K I K R S T I V L T HT T Q S V F QWK E E F WG F I I F D E V HG S T T DN I E K F V C K
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S A V S V E QWK A Q F WG L M I L D E V HT I P A K M F R R V L T I
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S A V S V E QWK A Q F WG L M I L D E V HT I P A K M F R R V L T I
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I K K S V I V L C T S S V S V MQWR QQ F WG F I I L D E V HV V P A A M F R R V V S T
Y Q I E A V DA A I HDG T L N S G C L L L P C G A G K T L L G I M L MC K V K K P T L V L C A G S V S V E QWK S Q I Y G L L I L D E V HV M P A E S F R G S L G F
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S A V S V E QWK A Q F WG L M I L D E V HT I P A K M F R R V L T I
Y Q E K S L S K M F G NG R A K S G I I V L P C G A G K T L V G I T A A C T I K R G V I V L C T S T M S V V QWR D E F WG L M I L D E V HV A P A K M F R R V T S A
Y QA E A L V A W S E N - - E K WG V L V L P T G S G K T L L G I R A I A G C NT P A L V I V P T L D L L E QWK T Q L F G L L V F D E V HH L P A A G Y R S I A E F
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S A V S V E QWK A Q F WG L M I L D E V HT I P A K M F R R V L T I
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S A V S V E QWK A Q F WG L M I L D E V HT I P A R M F R R V L T I
Y Q E Q S L S K M F G NG R A K S G I I V L P C G A G K T L V G I T A A C T I K K G V I V L C T S S M S V V QWR Q E F WG L M L L D E V HV V P A DV F R R V I S S
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K S L V G V S A A C R I K K S C L C L A T NA V S V DQWA F Q F WG L L L MD E V HV V P A HM F R K V I S I
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K S L T G I A A A A R I R K S C L C L C T S S V S V DQWA A Q F WG C M L L D E V HV V P A A M F R K V I G I
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S A V S V E QWK A Q F WG L M I L D E V HT I P G K QA G A E L R V
Y Q L R A A K T V I MG DY A K S G L I V L P C G A G K T L V G V L C M S L I K S S T V I I C D S NV S V E QWK R E I WG I C I V D E V HR L P A V Q F QNV L K Q
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I R K S V I V L C T S S V S V MQWR QQ F WG F I I L D E V HV V P A A M F R R V V T T
Y Q E K A L R K M F S NG R S R S G I I V L P C G V G K T L T G I T A A S T I K K S A L F L T T S A V A V E QWK K Q F WG L L V F D E V Q F A P A P S F R R I ND I
Y Q E K A L R K M F S NG R S R S G I I V L P C G V G K T L T G I T A A S T I K K S S L F L T T S A V A V E QWK K Q F WG L L V F D E V Q F A P A P S F R R I ND I
Y Q E K A L R K M F S NG R S R S G I I V L P C G V G K T L T G I T A A S T I K K S S L F L T T S A V A V E QWK K Q F WG L L V F D E V Q F A P A P S F R R I ND I
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K S L V G V S A A C R I K K S C L C L A T NA V S V DQWA F Q F WG L L L MD E V HV V P A HM F R K V I S I
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K S L V G V T A A C T V R K R C L V L G N S A V S V E QWK A Q F WG L M I L D E V HT I P A K M F R R V L T I
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I K K S V I V L C T S S V S V MQWR QQ F WG F I I L D E V HV V P A A M F R R V V S T
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I K K S V I V L C T S S V S V MQWR QQ F WG F I L L D E V HV V P A A M F R R V V T T
Y Q E K S L R K M F G NG R A R S G V I V L P C G A G K T L V G V T A S C T V R K R C MV L C T S G V A V E QWR S Q F WG L M I L D E V HT I P A K Q F R R V L T Q
----------------------------------------------------------------------------------Y Q E R A L K N I F I QK K A R S G L I I L P C G A G K T I V G V I A I E R I K Q S T V I I C D S DV S V DQWR D E L WG V C V I D E V HK L P A NT F QNV L K Q
Y Q E R A L R R M F S NG R A R S G I I V L P C G A G K T L T G I V A A C T V R K S I F V L T T S A V A V E QW I K Q F WG M L I F D E V Q F V P A P A F R R I N E I
Y Q E R A L R R M F S NG R A R S G I I V L P C G A G K T L T G I V A A C T V R K S I F V L T T S A V A V E QW I K Q F WG M L I F D E V Q F V P A P A F R R I N E I
Y Q E K A L T K M F S G G R S I S G I I V L P C G A G K T L V G I A A L A T I NK P T V I V C NNR L T V K QWY NQ I WG L L I L D E V QD S A A NT F R NV T D I
Y QV A S L E R F R S G NK A HQG V I V L P C G A G K T L T G I G A A A T V K K R T I V MC I NV M S V L QWQR E F WG L L L L D E V HT A L A HN F Q E V L NK
Y QV A S L E R F R C G NK A HQG V I V L P C G A G K T L T G I G A A T I L K K R T I V MC I NV I S V L QWQR E F WG L L L L D E V HA A L A HH F Q E V L NK
Y Q E K S L A K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I K K S C L V L C T S S V S V MQWR QQ F WG F I L L D E V HV V P A S M F R R V L T K
Y Q E K S L S K M F G NG R A R S G I I V L P C G A G K T L V G I T A A C T I R K S V I V L C T S S V S V MQWR QQ F WG F I I L D E V HV V P A A M F R K V V T N
2250
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
2260
2270
2280
2290
2300
2310
2320
V Q S HC K L G L T A T L L R E DDK I A D L N F L I G P K L Y E A NW L E L QK R G Y I A R V QC A E V WC P MA P E F Y R E Y L Y V MN P A K F R A C QY L I R Y
T K S HC K L G L T A T L V R E D E K I T D L N F L I G P K L Y E A NW L D L V K G G F I A NV QC A E V WC P MT K E F F A E Y L Y V MN P NK F R A C E F L I R F
I A A HA K L G L T A T L V R E DDK I S D L N F L I G P K L Y E A NWM E L S QK G H I A NV QC A E V WC P MT A E F Y Q E Y L Y I MN P T K F QA C Q F L I QY
I A T Q S K L G L T A T L L R E DDK I K D L N F L I G P K L Y E A NWM E L A E QG H I A K V QC A E V WC P MT T E F Y T E Y L Y I MN P R K F QA C Q F L I DY
I A C Q S K L G L T A T L L R E DDK I K D L N F L I G P K L Y E A NWM E L A E QG H I A K V QC A E V WC P MT T E F Y S E Y L Y I MN P R K F QA C Q F L I DY
V H S HA K L - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - V QA HC K L G L T A T L V R E DDK I V D L N F L I G P K L Y E A NWM E L QN S G Y I A K V QC A E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I K F
V QA HC K L G L T A T L V R E DDK I T D L N F L I G P K I Y E A NWM E L QK A G H I A K V QC A E V WC P MT S A F Y S Y Y L A V MN P NK F R I C Q F L I K F
V QA HC K L G L T A T L V R E DDK I T D L N F L I G P K I Y E A NWM E L QK A G H I A K V QC A E V WC P MT S A F Y S Y Y L A V MN P NK F R I C Q F L I K F
I A A HA K L G L T A T L V R E DDK I DD L N F L I G P K L Y E A NWMD L A QK G H I A NV QC A E V WC P MT A E F Y Q E Y L Y I MN P T K F QA C Q F L I HY
I A A HA K L G L T A T L V R E DDK I S D L N F L I G P K L Y E A NWM E L S QK G H I A NV QC A E V WC P MT A E F Y Q E Y L Y I MN P T K F QA C Q F L I QY
V QA HC K L G L T A T L V R E DDK I V D L N F L I G P K L Y E A NWM E L QNNG Y I A K V QC A E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I K F
----------------------------------------------------------------------------------F K V HA K L G L T A T L V R E DDR I G D L G Y L I G P K L Y E A NWMD L A K NG H I A T V QC A E V WC P MT P E F Y R E Y L HA MN P NK I QA C Q F L I NY
V K A HC K L G L T A T L V R E DD L I QD L QW L I G P K L Y E A NWM E L QDR G Y L A K A L C S E V WC P MT A S Y Y R E Y L WV C N P NK L R V C E F L I R W
V K A HC K L G L T A T L V R E DD L I QD L QW L I G P K L Y E A NWM E L QDR G Y L A K A L C S E V WC P MT A S Y Y R E Y L WV C N P NK L R V C E F L I HW
V QA HC K L G L T A T L V R E DDK I V D L N F L I G P K L Y E A NWM E L QNNG Y I A K V QC A E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I R F
I A A HA K L G L T A T L V R E DDK I DD L N F L I G P K L Y E A NWMD L A QK G H I A NV QC A E V WC P MT S E F Y Q E Y L Y I MN P T K F QA C Q F L I HY
T K A HC K L G L T A T L L R E D E K I QD L N F L I G P K L Y E A NW L D L QK A G F L A NV S C S E V WC P MT A E F Y K E Y L Y T MN P NK F R A C E Y L I R F
V Q S HC K L G L T A T L L R E DDK I A D L N F L I G P K L Y E A NW L E L QK K G Y I A R V QC A E V WC P M S P E F Y R E Y L Y V MN P S K F R S C Q F L I K Y
V Q S HC K L G L T A T L L R E DDK I A D L N F L I G P K L Y E A NW L E L QK K G Y I A R V QC A E V WC P M S P E F Y R E Y L Y V MN P S K F R S C Q F L I K Y
V S HHC K L G L T A T L V R E DDK I E D L N F L I G P K L Y E A DWQD L S A K G H I A R V S C I E V WC G MT G D F Y R E Y L S I MN P T K F QV C E Y L I NK
I K A QC K L G L T A T L I R E DDR I R D L E F M I G P M L Y E A S WQ E L A K QG Y I A NA K C F E V I C P MT K T Y Y S A Y L A Q L N P NK I DA C K Y L L E Q
V QA HC K L E L T A T L V R E DDK I V D L N F L I G P K L Y E A NWM E L QN S G Y I A K V QC A E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I K F
V QA HC K L G L T A T L V R E DDK I V D L N F L I G P K L Y E A NWM E L QNNG Y I A K V QC A E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I K F
I A A HA K L G L T A T L V R E DDK I S D L N F L I G P K L Y E A NWM E L S QK G H I A NV QC A E V WC P MT A E F Y Q E Y L Y I MN P T K F QA C Q F L I QY
V DA K G V I G L T A T Y V R E DHK I L D L F H L V G P K L Y D I S M E T L A S QG Y L A K V HC V E V R T P MT K E F G L E Y L A A A N P NK MMC V R E L V R Q
V QA HC K L G L T A T L V R E DDK I V D L N F L I G P K L Y E A NWM E L QNNG Y I A K V QC A E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I K F
L K S H S K L G L T A T L L R E DDK I S D L N F L I G P K L Y E A NWM E L S L G G H I A R V QC A E V WC P M P T E F Y R E Y L Y I MN P MK F QA C QY L I NY
S A A P C R L G L T A T Y E R E DG L HT E L NR L V G G K V Y E K K V S E L A -G G H L A P Y T I K R F A V T L T E K E QR E Y L A F N S N S K I E K L R E I L E Q
V QA HC K L G L T A T L V R E DDK I V D L N F L I G P K L Y E A NWM E L QNNG Y I A K V QC A E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I K F
V QA HC K L G L T A T L V R E DDK I V D L N F L I G P K L Y E A NWM E L QNNG Y I A K V QC A E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I K F
I K S H S K L G L T A T L L R E DDK I S H L N F L I G P K L Y E A NWM E L S E K G H I A K V QC A E V WC P M P T E F Y D E Y L Y A MN P R K F QA C QY L I NY
T K S HC K L G L T A T L V R E D E R I T D L N F L I G P K L Y E A NW L D L V K G G F I A NV QC A E V WC P MT K E F F A E Y L Y A MN P NK F R A C E F L I R F
T K A HC K L G L T A T L V R E DDK V DH L N F L I G P K L Y E A NW L D L QR DG H I A NV QC V E V WC P MT A E F F R K Y L Y C MN P NK F MA C Q F L MQ F
I L A HC N L R L L A T A F G HHD P V L D F L F T H L Q S I F E WA WW L T P NNG Y I A K V QC V E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I K F
I K C A I K I G L T A T L L R E DQK L DN L Y F M I G P K L Y E E N L I D L MT QG F L A K P H I I E I QC DM P P I F L Q E Y L HT G N P G K Y K A L Q F L I K N
I A A HA K L G L T A T L V R E DDK I HD L N F L I G P K L Y E A NWMD L A QK G H I A NV QC A E V WC P MT S E F Y Q E Y L Y I MN P T K F QA C Q F L I HY
V K S HC K L G L T A T L V R E D L L I R D L HW I I G P K L Y E A NWV E L QNK G F L A K A L C K E I WC S M P C S F Y K Y Y L Y T C N P R K L MMC E Y L I K Y
V K S HC K L G L T A T L V R E D L L I R D L QW I I G P K L Y E A NWV E L QNK G F L A K A L C K E I WC S M P S S F Y K Y Y L Y T C N P R K L MMC E Y L I K Y
V K S HC K L G L T A T L V R E D L L I R D L QW I I G P K L Y E A NWV E L QNK G F L A K A L C K E I WC S M P S S F Y K Y Y L Y T C N P R K L MMC E Y L I K Y
T K S HC K L G L T A T L V R E D E R I T D L N F L I G P K L Y E A NW L D L V K G G F I A NV QC A E V WC P MT K E F F A E Y L Y V MN P NK F R A C E F L I R F
V QA HC K L G L T A T L V R E DDK I V D L N F L I G P K L Y E A NWM E L QNNG Y I A K V QC A E V WC P M S P E F Y R E Y L Y T MN P NK F R A C Q F L I K F
I A A HA K L G L T A T L V R E DDK I G D L N F L I G P K L Y E A NWM E L S QK G H I A NV QC A E V WC P MT A E F Y Q E Y L Y I MN P T K F QA C Q F L I QY
I A A HT K L G L T A T L V R E DDK I DD L N F L I G P K MY E A NWMD L A QK G H I A K V QC A E V WC A MT T E F Y N E Y L Y I MN P K K F QA C Q F L I DY
V QA HC K L G L T A T L V R E DDK I A D L N F L I G P K L Y E A NWM E L QNK G F I A R V QC A E V WC P MA P E F F R E Y L Y V MN P NK F R A C Q F L V R F
----------------------------------------------------------------------------------Y K F H F K L G L T A T P Y R E D E K I I N L F Y M I G P K L Y E E NWY D L V S QG F L A K P Y C V E I R C E M S Q L WM S E Y I HT S N P R K F K T L E Y L I K V
I R S HC K L G L T A T L V R E DD L I R D L QW L I G P K L Y E A NW L E L QQK G Y L A K V I C K E I WC P MT A P F Y R E Y L W S C N P V K L I T C E Y L L R F
I R S HC K L G L T A T L V R E DD L I R D L QW L I G P K L Y E A NW L E L Q E K G Y L A K V I C K E I WC P MT A P F Y R E Y L W S C N P V K L I T C E Y L L K F
A K A HT R L G L T A T L I R E DDK I S D L R Y L V G P K L Y E A NW L E L S E QG Y L A R V K C F E V T V P MT A S F Y K Y Y L C S S N P NK I R T V A G I I K F
V K Y K C V I G L S A T L L R E DDK I G D L R H L V G P K L Y E A NW L D L T R A G F L A R V E C A E I QC P L P K A F L T E Y V V C L N P Y K L WC T QA L L E F
V K Y K C V V G L S A T L L R E DDK I G D L R H L V G P K L Y E A NW L E L T R A G F L A R V E C A E V QC P L P L P F F R E Y V V C F N P Y K L WC T QA L L E F
I K A H S K L G L T A T L V R E D E K I D E L N F L V G P K L Y E A NWMD L A A K G H I A T V QC A E V WC P MT P E F Y R E Y L Y C MN P NK F QA C Q F L I DY
I A A HA K L G L T A T L V R E DDK I DD L N F L I G P K L Y E A NWMD L A QK G H I A NV QC A E V WC P MT S E F Y Q E Y L Y I MN P S K F QA A Q F L I NY
2330
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
2340
2350
2360
2370
2380
2390
2400
2410
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
2420
2430
2440
2450
2460
2470
2480
HG G S R R Q E A QR L G R I L R A K A F F Y T L V S QDT E MG Y S R K R QR F L V NQ -G Y S Y K V Y F K L R S A A V A E L K K S P -DT - - - - -H P Y P HK F
HA G S R R Q E A QR L G R I L R A K A F F Y S L V S T DT E MY Y S T K R QQ F L I DQ -G Y S F K V Y Y E NR L K Y L A A E K A K - -G E - - - - -N P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S T K R QA F L V DQ -G Y A F K V Y F E A R S R Q I Q E L R K T Q - E P - - - - -N P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E M F Y S S K R QA F L V DQ -G Y A F K V Y F E I R S K R I NK L R E T K -Q P - - - - -D P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E M F Y S S K R QA F L V DQ -G Y A F K V Y F E I R S K R I NK L R E T K -N P - - - - -D P Y P HK F
----------------------------------------------------------------------------------HG G S R R Q E A QR L G R V L R A K A F F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y F K I R S QA I HQ L K V N - -G E - - - - -D P Y P HK F
HG G S R R Q E A QR L G R I L R A K A F F Y S L V S QDT E MG Y S R K R QR F L V NQ -G Y A Y K V Y F NMR V R M I E A R R A A - -G E - - - - -N P F P HK F
HG G S R R Q E A QR L G R I L R A K A F F Y S L V S QDT E MG Y S R K R QR F L V NQ -G Y A Y K V Y F NMR V R M I E A R R A A - -G D - - - - -N P F P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S T K R QA F L V DQ -G Y A F K V F F E I R S R Q I S E L R E K N -NA D P S A F N P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S T K R QA F L V DQ -G Y A F K V Y F E A R S R Q I L E L R K T H - S P - - - - -N P Y P HK F
HG G S R R Q E A QR L G R V L R A K A F F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y F K I R S QA I HQ L K I N - -G E - - - - -D P Y P HK F
HG G S R R Q E A QR L G R I L R A K A F F Y S L V S QDT E V A Y S T K R QR F L V DQ -G Y S F K A Y L Q I R K NT I T T L R QN - -N I - - - - - E P Y P HK F
H F G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E M F Y S S K R QG F L I DQ -G Y A F K V F H E MR Y K E I A K L R E T K -Q P - - - - -N P Y P HK F
N F A S R R Q E A QR L G R I L R P K A F F Y S L L S K DT E M E Y A DK R QQ F I I DQ -G Y S Y R V Y T DNR Y K MM E C I K DA - -G R - - - - - P F Y P HK F
N F A S R R Q E A QR L G R I L R P K A F F Y S L L S K DT E M E Y A DK R QQ F I I DQ -G Y S Y R V Y T DNR Y K MM E C I K DA - -G R - - - - - P F Y P HK F
HG G S R R Q E A QR L G R V L R A K A Y F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y F K I R S QA I QA L K G T - -A E - - - - -D P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S T K R QA F L V DQ -G Y A F K V Y F E I R S R Q I D E L R QA N - L A DG S A F N P Y P HK F
HY G S R R Q E A QR L G R I L R P K A F F Y S L V S K DT E MY Y S T K R QQ F L I DQ -G Y S F K V Y K E NR T K Q L T S A D I - - -G V - - - - -N P W P HK F
HG G S R R Q E A QR L G R I L R A K A F F Y T L V S QDT E M S Y S R K R QR F L V NQ -G Y S Y K V Y F K L R S A A V Q E L K R S P -A T - - - - -D P Y P HK F
HG G S R R Q E A QR L G R I L R A K A F F Y T L V S QDT E M S Y S R K R QR F L V NQ -G Y S Y K V Y F K L R S A A V Q E L K Q S A -D S - - - - -H P Y P HK F
H F G S R R Q E A QR L G R I L R A K V Y F Y S L V S K DT E M F Y S S K R QQ F L I DQ -G Y T F T I - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - S S G S R R Q E A QR L G R I L R A K A Y F Y T L T S K DT E MY F S QR R QR V MR QN -G Y T F K V L F L NR C K DV E E Y QK A - -G H - - - - -N P W P HK F
HG G S R R Q E A QR L G R V L R A K A F F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y Y K I R S HA I QQ L K G T - -N E - - - - -D P Y P HK F
HG G S R R Q E A QR L G R V L R A K A F F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y Y K I R S QA I HQ L K V N - -G E - - - - -D P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S T K R QA F L V DQ -G Y A F K V Y F E A R S R Q I N E L R K T H - S P - - - - -N P Y P HK F
HG G S R R Q E A QR L G R I L R P K A W F Y S I I S T DT E I NY A A HR T A F L V DQ -G Y T C R I Y F E S R L A MV K E MG L L - -G - - - - - -A A Y P HK Y
HG G S R R Q E A QR L G R V L R A K A F F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y F K I R S QA I HQ L K V N - -G E - - - - -N P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S A K R QA F L V DQ -G Y A F K V Y F E I R S R E V NG M L E N P S G P - - - - -N P Y P HK F
G T G S K R A Y V QR L G R I L R K K A V L Y E I I A G E T E T G T A R R R K E A L S S G -K R T S K A F DD S K L A K L NG I I S Q - -G L - - - - -D P Y P Y R F
HG G S R R Q E A QR L G R V L R A K A F F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y Y K I R S QA I QQ L K I S - -G E - - - - -D P Y P HK F
HG G S R R Q E A QR L G R V L R A K A F F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y Y K I R S QA V QQ L K V T - -G E - - - - -D P Y P HK F
H F G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S S K R QA F L V DQ -G Y A F K V Y Y E I R T R QV N E L L K N P - E T - - - - -N P Y P HK F
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Q - - - - - - -Y Y E NR L K A L D S L K A T - -G V - - - - -N P Y P HK F
HA G S R R Q E A QR L G R I L R P K A F F Y S L V S T DT E MY Y S T K R QQ F L I QQ -G Y A F K V Y T QNR I NK V L S A K A K - -G E - - - - - S P Y P HK Y
HG G S R R Q E A QR L G R V L R A K A F F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y Y K I R S QA I HQ L K V N - -G E - - - - -D P Y P HK F
L G G S R R QK V QR L G R V MR P K A F F Y S L A S K DT E S E Y S Y K R QK Y I T E Q L G L NT E L F H E NR S K QV L A L K QT K -D P - - - - -N P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S T K R QA F L V DQ -G Y A F K V Y F E I R S R Q I N E L R E S N -N E N P T A F N P Y P HK F
N F A S R R Q E A QR L G R I I R P K S F F Y S L V S K DT E MC Y S DK R QR F L I NQ -G Y A Y NV Y F E NR S K F I QDQK DK - -G I - - - - -N P Y P HK F
N F A S R R Q E A QR L G R I I R P K S F F Y S L V S K DT E MC Y S DK R QR F L I NQ -G Y A Y NV Y Y E NR S K F V Q E QK A K - -G I - - - - -N P Y P HK F
N F A S R R Q E A QR L G R I I R P K S F F Y S L V S K DT E MC Y S DK R QR F L I NQ -G Y A Y NV Y F E NR S K L I L S QQ E K - -G I - - - - -NT Y P HK F
HA G S R R Q E A QR L G R I L R A K A F F Y S L V S T DT E MY Y S T K R QQ F L I DQ -G Y S F K V Y Y E NR L K Y L DA QK G E - -G K - - - - -NMY P HK F
HG G S R R Q E A QR L G R V L R A K A F F Y S L V S QDT E MA Y S T K R QR F L V DQ -G Y S F K V Y Y K I R S QA V QQ L K V S - -G E - - - - -D P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S T K R QA F L V DQ -G Y A F K V Y F E T R S R Q I Q E L R K T H - E P - - - - -N P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S S K R QA F L I DQ -G Y A F K V Y F E NR S R T I M E L R QT K -D P - - - - -N P Y P HK F
HG G S R R Q E A QR L G R I L R A K A F F Y T L V S QDT E M F Y S L K R QR F L V NQ -G Y S F K T Y F K I R S QA V E A L K A A - -G D - - - - -H P Y P HK Y
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Y F K I R S QA I E E L K G A - -G E - - - - -D P Y P HK F
H F K S R R Q E V QR L G R I MR A K A F WY T L V S K G T E T S Y C L A R QK C L I NQ -G F K Y E I Y Y E NR C K A V QD L MT T G -K P - - - - -Y P Y P HK F
N F A S R R Q E A QR L G R I L R P K A F F Y S L V S K DT E MV F A DK R QQ F I I DQ -G Y A Y NV - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - N F A S R R Q E A QR L G R I L R P K A F F Y S L V S K DT E MV F A DK R QQ F I I DQ -G Y A Y NV Y Y L NR L E T V E QWR K N - -G - - - - - -T A Y P HK F
NY G A R MQ E S QR L G R V L R P K A F F Y S C I S DMT D L K Y S A R R QQ F L V DQ -G Y V Y E P Y HDR R L A E V T K QV E A H -R K D L S L P S P Y P HK F
L G A S R R Q E A QR L G R I L R P K S Y F Y T L V S QDT E I S Q S Y E R Q S W L R DQ -G F S Y R V Y Y DT R L A MV K E MG P L - -G - - - - - -A A Y P HK F
L G A S R R Q E A QR L G R I L R P K S Y F Y T L V S QDT E V QQ S Y G R Q S W L R DQ -G F A Y R V Y F DT R L A MV K E L G L L - -G - - - - - -A A Y P HK F
H F G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E M F Y S T K R QQ F L I DQ -G Y A F R V Y Y E R R F R T I S A L R E S K -N P - - - - -D P Y P HK F
HY G S R R Q E A QR L G R I L R A K A F F Y S L V S K DT E MY Y S T K R QA F L V DQ -G Y A F K V Y F E I R S R Q I DA L R Q S K -T P - - - - -N P Y P HK F
2500
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
2510
2520
2530
2540
2550
2560
2570
S V T V S L G E F I E R Y - - - S G - L QDG E T L D -DV T V S V A G R V HA I R E S G V K L I F F D L R - - - - - -G E G L K L QV F F E E T A R L R R G D I I G
A V S M S I P K Y I E T Y - - -G S - L NNG DHV E -NA E E S L A G R I M S K R S S S S K L F F Y D L H - - - - - -G DD F K V QV F L K L H S NA K R G D I V G
NV T I G L P A F L NK Y - - -A H - L QR G E T L P - E E R V S I A G R I HA K R E S G S K L R F Y V L H - - - - - -A DG V E V QV Y E QDHG L L K R G D I V G
QV T DD L R E Y L K T Y - - -DG - L A K G E QK P -DV T V R I A G R I Y T K R S S G S K L F F Y D I R - - - - - -A E G V K V QV F E A QH E H L R R G D I V G
QV T DD L R K Y L T DY - - - E G - L A K G E QK P - E V A V R I A G R I Y T K R A S G A K L I F Y D I R - - - - - -A E G V K V QV F E A QH E H L R R G D I V G
----------------------------------------------------------------------------------HV D I S L T H F I Q E Y - - - S H - L Q P G DH L T -D I T L K V A G R I HA K R A S G G K L I F Y D L R - - - - - -G E G V K L QV F I R I NNK L R R G D I I G
NV T I S L T D F I A K Y - - - S P - L QN E Q -V A -D E I V S V A G R I H S K R E S G S K L V F Y D I H - - - - - -G E G T H I Q I F V T L HDR I K R G D I V G
NV T I S L T D F I T K Y - - -T P - L E K E Q -V V - E E I V S V A G R I H S K R E S G S K L V F Y D I H - - - - - -G E G T H I Q I F V T L HDR I K R G D I V G
NV T T K I P E F V E K Y - - -A H - L QR G E T L K -DV T V S V S G R I MT K R E S G S K L K F Y V L K - - - - - -G DG V E V Q I F E S MH E I L R R G D I I G
QV S I S N P E F L A K Y - - -A H - L K R G E T L P -N E I V S I A G R I HA K R E S G S K L K F Y V L H - - - - - -G DG V E V QV Y E NDHD L I K R G D I V G
HV D I S L T H F I E E Y - - -G H - L Q P G DH L T -D I T L K V A G R I HA K R A S G G K L I F Y D L R - - - - - -G E G V K L QV F I H I NNK L R R G D I I G
HV S I S L S DY V E K Y - - -NN - I E V G S H L N -DQQV S I A G R I HA K R E A G P K L I F Y DV R - - - - - -G DG V K L QV Y Q E I N E R T R R G D I I G
NV T HA V P K F V E E WG K E G K - L E K G E T A Q L N E P I S L A G R V Y T I R E S S S K L R F Y D L K - - - - - -A DG V K V Q I Y L DT HDR I R R G D I I G
K I S M S L P A Y A L K Y - - -G N -V E NG Y I DK -DT T L S L S G R V T S I R S S S S K L I F Y D I F - - - - - -C E E QK V Q I F S V S H S E I R R G DV V G
K I S M S L P A Y A L K Y - - -G N -V E NG Y I DK -DT T L S L S G R V T S I R S S S S K L I F Y D I F - - - - - -C E E QK V Q I F S V S H S E I R R G DV V G
HV D L S L T E F I E R Y - - -NH - L Q P G DH L T -DV V L N L S G R V HA K R A S G A K L L F Y D L R - - - - - -G E G V K L QV F V H I NNK L R R G D I I G
HV S I Q L P A F A E K Y - - -K D - L K K G E S L K -DV E V K V S G R I MG K R E S G S K L K F Y V L K - - - - - -G DG V Q I Q I Y E K MH E Y L R R G D I I G
E V S HQ L P K F V E E F - - - S V - L E K DG E P S -T QV V S I A G R V L S K R A A G S G L V F Y D I T - - - - - -G E F NK V QV Y V K I NG L L R R G D I I G
HV S S S L E D F I A K Y - - E N S - L K E G E T L E -NV K L S V A G R V HA I R E S G A K L I F Y D L R - - - - - -G E G V K V QV F E I DT S K L R R G D I I G
NV S I S L E N F I E QY - - - S G - L T DG E T L E -K V S L S V A G R V HA I R E S G A K L I F Y D L R - - - - - -G E G V K L QV F E T DT A K L R R G D I I G
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -M S MR L H - - - - - -A R F C F F V V T - - - - - - S NG E S L QV R E QMA K F L R R G DV V G
NV S I T V P E F I A K Y - - - S G - L E K S Q -V S -DD I V S V A G R V L S K R S S - S A L M F I D L H - - - - - -D S QT K L Q I F V S L T K M I Y R G D I C G
HV D L S L S D F I E R Y - - - S H - L Q P G DH L T -D I T V S V A G R I HA K R A S G G K L I F Y D L R - - - - - -G E G V K L QV Y F R I NNK L R R G D I I G
HV D I S L T D F I QK Y - - - S H - L Q P G DH L T -D I T L K V A G R I HA K R A S G G K L I F Y D L R - - - - - -G E G V K L QV F I H I NNK L R R G D I I G
QV S V T L P E F L S K Y - - -A N - L K R G E T L P - E E K V S I A G R I HA K R E S G S K L K F Y V L H - - - - - -G DG V E V QV Y E DDH S L I K R G D I V G
HR QY T I P QY R R K Y - - -A P L L T E P DT S L -D E T V T I A G R I I NK R S S G S K L H F I T I Q - - - - - -G DM E I V QV F A E I H S K L K R G D I I G
HV D I S L T D F I QK Y - - - S H - L Q P G DH L T -D I T L K V A G R I HA K R A S G G K L I F Y D L R - - - - - -G E G V K L QV F I H I NNK L R R G D I I G
L V DY D P S Q F DK D F - - -K H - L K S G DV DK -T R E I R I A G R I F T K R S S G NK L I F Y D I K T G S DT T T T G S K MQ I F E QQH E H L G R G DV I G
E K NG D I C E I L V K F - - - E D - F E K N E G L S - - - -V R T A G R L Y N I R K HG -K M I F A D L G - - - - - -DQT G R I QV F A T F K N L MD S G D I I G
HV DT S L T H F I E QY - - -NN - L Q P G DH L T -D I T V R V A G R I HA K R A S G G K L I F Y D L R - - - - - -G E G V K L QV F F P I NNK L P R G D I F G
HV D I S L T Q F I Q E Y - - - S H - L Q P G DH L T -DV T L K V A G R I HA K R A S G G K L I F Y D L R - - - - - -G E G V K L QV F V H I NNK L R R G D I I G
QV NY DD S N F V E E F - - -G S - L K T G E T L P - E K E L R I A G R I Y N I R T A G S K L I F Y D I R T S A DT K S I G T R MQV F E K QHA H L R R G D I I G
L A N I T V A DY I E K Y - - -K S -MNV G DK L V -DV T E C L A G R I MT K R A Q S S K L L F Y D L Y - - - - - -G G G E K V QV F I K F H S T L K R G D I V G
HV DT R V G E F I E K Y - - - S G - L A DG T T A E -G E S A S V A G R I M S K R A S G K K L Y F Y D L I - - - - - -A DG K K I QV F QK I H S A T R R G D I V G
HV D I S L T D F I QK Y - - - S H - L Q P G DH L T -D I T L K V A G R I HA K R A S G G K L I F Y D L R - - - - - -G E G V K L QV F I H I NNK L R R G D I I G
QV D L T I A Q F R DK Y - - -G P L C T E K G K I H - E D F V S V A G R V V T I R S MG A K L M F Y D L Q - - - - - -G E G T K I QV F E K V HT L I K R G D I I G
HR N I T L P E F A E K Y - - - S S - L T R G E T L Q -DV E V K V T G R I MT K R E S G A K L R F Y V L K - - - - - -G DG V E V Q I Y E K MH E Y L R R G DV I G
E R T I S I P E F I E K Y - - -K D - L G NG E H L E -DT I L N I T G R I MR V S A S G QK L R F F D L V - - - - - -G DG E K I QV F A E C Y DK I R R G D I V G
E R T I T V P E F V E K Y - - -QN - L A S G E H L E -NT V L NV T G R I MR V S A S G QK L R F F D L V - - - - - -G DG A K I QV F A E A Y DK I R R G D I V G
E R T I T I P D F I E K Y - - -K D - L QNG E H L E - E T I L NMT G R I MR V S S S G QK L R F F D L V - - - - - -G DG K R I QV F V E C Y DK I K R G D I V G
F V T L S I P E Y I DK Y - - -G G - L S NG E H L E -DV S V S L A G R I M S K R S S S S K L F F Y D L H - - - - - -G L G A K V QV F S K L H S S V K R G D I V G
HV D I S L T Q F I Q E Y - - - S H - L Q P G DH L T -D I T L K V A G R I HA K R A S G G K L I F Y D L R - - - - - -G E G V K L QV F V H I NNK L R R G D I I G
HV S I S N P E F L A K Y - - -A H - L K K G E T L P - E E K V S I A G R I HA K R E S G S K L K F Y V L H - - - - - -G DG V E V Q L Y E K DHD L L K R G D I V G
QV T I T L P E F I A K Y - - - E G - L A R G E T K P - E V E V A V A G R V L G L R T A G NK L R F Y E I H - - - - - -A DG K K L QV F A A QH E H L R R G D I I G
HV T I S L T D F L E K Y - - -DY - L K A E D - I A -D E V L S L S G R V HA K R A S G A K L I F Y D L R - - - - - -G E G V K L QV F T R L N E K I R R G D I I G
HV DV S L T E F I E K Y - - -K N - L Q P G DQ L T -D -A V K V A G R V HA K R V S G A K L L F Y D L R - - - - - -G E G V K L QV F V A I NNK L R R G D I I G
DV S H S I S Q F I E E F - - -D P K L T E NG QT I -DT I V T I G A R I T S F R A S G K A L I F Y QV Q - - - - - -Q E G K K L QV F E E I N S L F K R G D I I G
- - -M S L K E Y V DK Y - - - E H - L E A G E H L E -N E L V S I A G R V S R I A S S S S K L R F L D I K - - - - - - S E G T K L QV F NDT Y NN I K R G D I I G
HV NM S L K E F V G K Y - - -DH - L E A G A H L E -N E L V S I A G R V S R I A S S S S K L R F L D I K - - - - - - S E G T K L QV F NDT Y NN I K R G D I I G
NV S HT F K Q F Y A Q F - - - E H - L K A G E E L P -DV K V S V A S R I A Q L R A HG -N L Y F F E MY - - - - - - E S T F K L Q L F K E E V S S F H L G D I V G
HR DY T L P A F R E C F - - -K P M L Q E K G QR L -DK V V T I A G R I V V K R S S S S K L H F L A L Q - - - - - -G DG E V L QV F A D I H S K I K R G D I I G
DR QY T I P A F K A R F - - -A P Q L S E K G QR V - E E V V A I A G R I V NK R S S G S K L N F L T L Q - - - - - -G DA DT V QV F A A V HG R I R R G D I I G
HV S I S L S E F I S K Y - - - E G K L E A G QH L D -Q E E V S I A G R L HNMR S S G QK L R F Y D L H - - - - - -G E G V K V QV F F A I H E L L R R G DV V G
NV T T K V D E F V E K Y - - -K G - L A R G E I K K -D E E V S V A G R V HT L R A A G S K L R F Y V L H - - - - - -Q E G K T V Q I DWG I HD L I R R G DV I G
2580
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
2590
2600
2610
2620
2630
2640
V T G V P G E L S A MA R R I K L L S P C L HM L P G L K DK E T R F R QR Y L D L I L NNNV R N I F V T R A L I I S Y V R R F F DN L G F L E V E T
V I G F P G E L S I F P R S F I L L S HC L HMM P V L K DQ E S R Y R QR H L DM I L NV E V R Q I F R T R A K I I S Y V R R F L DNK N F L E V E T
V E G Y V G E I S V F V S R I Q L L T P C L HM L P G F K DQ E T R Y R K R Y L D L I MNK DA R G R F I T R S K I I T Y I R K F L DNR D F I E V E T
V V G F P G E L S I F A T E V V L L A P C L HA I P G F QDK E QR F R QR Y L D L I MN E R S R NV F V T R S K I V R Y V R N F F D S R D F I E V E T
I V G F P G E L S I F A T E V V L L S P C L HA I P G L QDK E QR F R QR Y L D L I MNDK S R NV F V T R S K I V R Y I R N F F DNR D F V E V E T
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -MY K E V R Q I F Y T R A K I I A Y V R R F L DNMG F L E V E T
V QG N P G E L S I I P Y E I T L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L ND F V R QK F I I R S K I I T Y I R S F L D E L G F L E I E T
F T G QA G E L S L I P K E V L Q L T P C L HM L P G L K DK E L R F R K R Y L D L I L N P R V K DN F V I R S K I I T F L R R Y L DN L G F L E V E T
F T G R A G E L S L I P N E I L Q L T P C L HM L P G L K DK E L R F R K R Y L D L I L N P R V K DN F V I R S K I I T F L R R Y L DN L G F L E V E T
V T G Y P G E L S V F A T K V Q L L T P C L HM L P G F K DQ E A R Y R K R Y L D L I MND S S R E R F R V R S K I I QY I R K F L DNR D F V E V E T
V E G Y V G E I S V F V K R I E L L T P C L HM L P G F K DQ E T R Y R K R Y L D L I MNK D S R K R F I T R S K I I K Y I R K F L DNR D F I E V E T
V K G N P G E L S I I P Y E I T L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L ND F V R QK F I I R S K I I T Y I R S F L D E L G F L E I E T
V I G H P G E L S I V P NT I E I L S P C L HM L P G L K DK E T R Y R QR Y L D L I MNDQT R QK F I T R A K I I S Y I R S F F DQMG F L E V E T
V T G I P G E L S L S I S S I Q L L S P C L H L L P G V V D L E T R Y R K R Y L D L I MN P S T R D I F V T R S K V I NY I R K Y L DA QG F L E V E T
F T G F P G E L S L F S K S V V L L S P C Y HM L P G L K DQ E V R Y R QR Y L D L M L N E E S R K V F K L R S R A I K Y I R NY F DR L G F L E V E T
F T G F P G E L S L F S K S V V L L S P C Y HM L P G L K DQ E V R Y R QR Y L D L M L N E E S R K V F K L R S R A I K Y I R NY F DR L G F L E V E T
V R G N P G E L S I I P V E MT L L S P C L HM L P G L K DK E T R F R QR Y L D L I L ND F V R QK F V T R S K I I T Y L R S F L DQ L G F L E I E T
V T G Y P G E V S V F A T S V Q L L T P C L HM L P G F K DQ E A R Y R K R Y L D L I MN E S T R DR F K V R S Q I I S F I R K F L DT R D F T E V E T
A K G T P G E L S L F A T E V I L L S P C L HM L P G L T D P E T R F R QR Y L DM I C N E S V K K N F I I R S K V I QG V R R Y L DN L G F I E V E T
V V G H P G E L S V M P S E I K L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L NNNV R E K F Q I R A K I I S Y V R Q F L DR L G F L E I E T
V K G H P G E L S I M P T E I K L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L NNK V R E N F Q I R A K I I S Y V R Q F L DR L G F L E I E T
F T G N P L E A S V F A T D I I V L T P C L R T I P G L K D P E T I Y R K R Y MD L L I NR E S R NR F QK R A Q I I G Y I R S F L D S R G F L E V E T
F T G H P G E L S L I P I S G M I L S P C L HM L P G L G DQ E T R F R K R Y L D L I V N P E S V K N F V L R T K V V K A V R K Y L DDK G F L E V E T
V V G N P G E L S I I P Y E I T L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L NDY V R QK F I T R A K I V T Y I R S F L D E L G F L E I E T
V QG N P G E L S I I P Y E I T L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L ND F V R QK F I I R S K I I T Y I R S F L D E L G F L E I E T
V E G Y V G E I S V F V S R I Q L L T P C L HM L P G F K DQ E L R Y R K R Y L D L I MNK DA R NR F I T R S K I I S Y V R K F L DT R N F I E V E T
I A G K P N E F S L K A T E I T L L S T C Y HM L P G L S S F E QR F R QR Y L D F I V NR DN I K T F I QR A N I I K Y I R K F F D E R D F V E V E T
V QG N P G E L S I I P Y E I T L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L ND F V R QK F I I R S K MV T Y I R S F L D E L G F L E I E T
I V G F P G E L S L F A T E V V Q L S P S L H L L P G F T DG E K R F R MR Y L D F M F NDK S R E V L WQR S R I V K Y I R D F F HDR R F I E V E T
I QG E L G E N S I S V S E F S L L S K S L C A L P G L K DV E T R Y R K R Y L D L I V NA E K R E I F V MR S K L I S E I R R F L A DR E F L E F E T
V P G N P G E L S L I P H E I T L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L ND F V R QK F I T R S K I I T Y I R S F L D E L G F L E I E T
V E G N P G E L S I I P Q E I T L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L ND F V R QK F I V R S K I I T Y I R S F L D E L G F L E I E T
I V G F P G E L S V F A T E V Q L L S P C L HM L P P F A DA E QR A R MR Y L DM L WNDR S R E T L WQR S R MV R Y I R D F F H E R R F I E V E T
V C G Y P G E L S I F P K K I V V L S P C L HMM P V L R DQ E T R Y R QR Y L D L MV NH E V R H I F K T R S K V V S F I R K F L DG L D F L E V E T
V K G T P G E L S L F P S N F E I L T P C L K M L P G L K DV E T R F R MR F L D L MMNN E V R DT F Y I R S N I I R Y I R K Y L DDR D F L E V E T
V QG N P G E L S I I P Y E I T L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L ND F V R QK F I I R S K I I T Y I R S F L D E L G F L E I E T
V K G N P G E L S I A P G F I Q L L S P T L HM L P G F K DH E QR Y R MR Y L D L I MNK K V R D I F L T R S S V I K Q L R E Y F DG K G F I E V E T
V T G Y P G E V S V F A T S V Q L L T P C L HM L P G F K DQ E A R Y R K R Y L D L I MNDA T R DR F K V R S K I I G Y I R K F L DNR D F V E V E T
I V G F P G E L S I F P K E T I L L S A C L HM L P G L K DT E I R Y R QR Y L D L L I N E S S R HT F V T R T K I I N F L R N F L N E R G F F E V E T
I V G F P G E L S I F P K E T I I L S P C L HM L P G L K DT E I R S R QR Y L D L M I N E S T R S T F I T R T K I I NY L R N F L NDR G F I E V E T
I I G F P G E L S I F P K E T I V L S P C L HM L P G L K DT E I R Y R QR Y L D L L I N E S T R NV F I T R T K I I N F L R N F L NNQG F I E V E T
I T G F P G E L S I F P T S F MV L S HC L HMM P I L K DQ E T R Y R QR Y L D L M L N S E V R Q I F K T R S K I I K Y I QN F L DD L D F L E V E T
V E G N P G E L S I V P R E MT L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L ND F V R QK F I I R S K I I T Y I R S F L D E L G F L E I E T
V E G Y V G E V S V F V S R V Q L L T P C L HM L P G F K DQ E T R Y R K R Y L D L I MNK DA R NR F I T R S E I I R Y I R R F L DQR K F I E V E T
I R G Y P G E L S I F A R QC V L L S P C L R M L P G L K D L E I R HR QR Y L D L I MNR S T R DR F V MR S R I I QY I R H F F D S R D F M E V E T
V K G R P G E L S I L P S E I T L L S P C L HM L P G V T NK E T R F R QR Y L D L I MNDY V R DK F I T R S K I V S Y L R R F F D E L G F L E V E T
V C G N P G E L S I I P K E M I L L S P C L HM L P G L K DK E T R Y R QR Y L D L I L ND S V R QK F I T R S K I I T Y L R S F L DQMG F L E I E T
I T G K P G E L S I A P T K L Q L L S P C L HM L P G L K DM E T R Y R K R Y L D L I MNN S S R NN F I T R T K I I S Y I R R Y L DDR N F L E V E T
L T G F P G E L S V F P K S V K I L S P C L HM L P G L K DNDV R F R QR Y L D L MMNDD S L K V MK L R S R I I DY L R K F L T S R G F F E V E T
L T G F P G E L S V F P K S V Q I L S P C L HM L P G L K DNDV R F R QR Y L D L MMNDD S L K V MK L R S R I I DY L R K F L T S R G F F E V E T
A E G F P G E L S V V V T K L V L L A P C L F QM P K L E D L E V R Y R QR F F D L I V NR E NR Q I F E T R C K V V K M I R G F L DD L D F T E V E T
V R G V P G E F S M S A Y E I T L L S T C F HM L P G L S S V E QR F R QR Y L D L I V NR E NA K T F I L R S K I I S Y I R S F F DQK D F L E V E T
V K G V A G E F S MNA F E I T L L S T C Y HM L P G L S S I E QR F R QR Y L D F I V NR E N I QT F V T R S K V I R Y I R N F F E D L N F L E V E T
V T G V P G E L S I F P S S I K L L S P S L K M L P G F T DT E QR HR K R Y L D L I MNNHV R D I F V K R A K I I NY V R R F L DN L G F L E V E T
I R G Y P G E L S V F C K E L V L L T P S L HM L P G F K DV E T R F R QR Y L D L I MND S T R E R F I V R S K I I QY I R K F L DNK D F I E V E T
2650
P MMNM I P
P MMNM I A
P MMNV I A
P MMNA I A
P MMNA I A
P L MNMV P
P MMN I I P
P I MNQ I A
P I MNQ I A
P I L NV I A
P MMNV I A
P MMN I I P
P MMNMV A
P MM S M I A
P M L NM I Y
P M L NM I Y
P MMN L I P
P MMNV I A
P MMNM I A
P MMNM I A
P MMNMV A
P MMN L I P
P I L NT I P
P MMN I I P
P MMN I I P
P MMNV I A
P V L NQ I A
P MMN I I P
P MMT S I A
P I L QT V Y
P MMNV I P
P MMN I I P
P MMHA I A
P MMNM I A
P MMNM I A
P MMN I I P
P S L NV I Q
P I L NV I A
P MMN L I A
P T MN L V A
P S MN L MA
P MMNM I P
P MMN I I P
P MMNV I A
P MMNM I A
P MMNM I A
P MMN I V P
P QMNM I P
PMLK T T S
PMLK T T S
P I MWK T A
P M L NQ I A
P V L NQ I A
P MMNQ I A
P MMN I I A
2660
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
GGAT AK
GGAAAR
GGAT AK
GGAT AK
GGAT AK
GGAT AK
GGAV AK
GGAT AK
GGAT AK
GGAT AK
GGAT AK
GGAV AK
GGAT AK
GGAT AK
GGAAAR
GGAAAR
GGAV AK
GGAT AK
GGAAAK
GGAT AK
GGAT AK
GGAAAK
GGAT AR
G G A MA K
GGAV AK
GGAT AK
GGAAAR
GGAV AK
GGAT A L
G G A NA R
GGAV AK
GGAV AK
GGAT A L
GGAAAR
GGAT AR
GGAV AK
GGAT AK
GGAT AK
G G A NA R
G G A NA K
GGA SAR
GGAAAR
GGAV AK
GGAT AK
GGAT AK
GGAT AK
GGAV AR
GGAAAR
T GA SAK
T GA SAK
GGAT AK
GGAAAR
GGAAAR
GGAT AK
GGAT AK
2670
2680
2690
2700
2710
2720
2730
P F I T HHND L NMD L F L R I A P E L Y L K M L T V G G L DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY ND I I D I T QQ L L
P F V T HHND L DMR L Y MR I A P E L Y L K Q L I V G G L E R V Y E I G K Q F R N E G I D L T HN P E F T T C E F Y MA F A DY ND L M E MT E V M L
P F V T HHND L DMQMY MR I A P E L F L K Q L V V G G MDR V Y E I G R Q F R N E G I DMT HN P E F T T C E F Y QA Y A DV Y D L MDMT E L L F
P F I T HHND L DMN L F MR V A P E L Y L K M L I V G G L E R V Y E L G R Q F R N E G I D L T HN P E F T T C E F Y WA Y A DV Y DV MN L T E E L I
P F V T HHND L DMN L F MR V A P E L Y L K M L I V G G L E R V Y E L G R Q F R N E G I D L T HN P E F T T C E F Y WA Y A DV Y DV MN L T E E L V
P F I T HHN E L NMD L Y MR I A P E L Y HK M L V V G G L DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY ND L MT I T E S I L
P F I T Y HN E L DMN L Y MR I A P E L Y HK M L V V G G I DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY HD L M E I T E K M I
P F I T HHND L DMN L F L R V A P E L Y HK M L V V G G I DR V Y E V G R L F R N E G I D L T HN P E F T T C E F Y MA Y A DY E DV I Q L T E D L L
P F I T HHND L DMN L F L R V A P E L Y HK M L V V G G I DR V Y E V G R L F R N E G I D L T HN P E F T T C E F Y MA Y A DY E DV I Q L T E D L L
P F T T HHND L NM E M F MR I A P E L F L K E L V V G G MDR V Y E I G R Q F R N E G I DMT HN P E F T T C E F Y QA Y A DV Y D L MDMT E L M F
P F V T HHND L DMDM F MR I A P E L F L K E L V V G G MDR V Y E I G R Q F R N E G I DMT HN P E F T T C E F Y QA Y A DV Y D L MDMT E L L F
P F I T Y HN E L DMN L Y MR I A P E L Y HK I L V V G G I DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY HD L M E I T E K M I
P F I T HHND L DMD L Y MR V A P E L Y L K M L V V G G L DR V Y E I G R L F R N E G I DMT HN P E F T S C E F Y MA Y A DY E D L MK I S E T L I
P F V T HHND L K L D L F MR I A P E L Y L K E L V V G G L DR V F E I G R V F R N E Q I DMT HN P E F S I C E F Y MA Y A DMY D I MDMT E E L I
P F I T Y HN E L E T Q L Y MR I A P E L Y L K Q L I V G G L DK V Y E I G K N F R N E G I D L T HN P E F T A M E F Y MA Y A DY Y D L MD L T E E L I
P F I T Y HN E L E T Q L Y MR I A P E L Y L K Q L I V G G L DK V Y E I G K N F R N E G I D L T HN P E F T A M E F Y MA Y A DY Y D L MD L T E E L I
P F I T Y HND L NMN L Y MR I A P E L Y HK M L V V G G I DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY HD L M E I T E K L L
P F V T HHND L NMDM F MR I A P E L F L K E L V V G G MDR V Y E I G R Q F R N E G I DMT HN P E F T T C E F Y QA Y A DV Y D L MDMT E L L F
P F L T HHNA L NMD L F MR I A P E L Y L K Q L V V G G MDR V Y E I G K Q F R N E D I DHT HN P E F T T C E F Y MA Y A DY ND L Y T MT E Q L L
P F V T HHND L K MD L F MR I A P E L Y HK M L V V G G L DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY A D I MD I T E Q L V
P F V T HHN E L K MD L F MR I A P E L Y HK M L V V G G L DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY A DV MD I T E Q L I
P F I T HHN E L K L D L Y MR V S P E L Y L K K L V V G G L E R V Y E I G K Q F R N E G I D L T HN P E F T S C E F Y MA Y A DY ND L M E MT E E L I
P F I T HHNQ L D I QMY MR I A P E L Y L K E L V V G G I NR V Y E I G R L F R N E G I DQT HN P E F T T C E F Y MA Y A DY ND I MK MT E E L L
P F I T Y HN E L DMK L Y MR I A P E L Y HK M L V V G G L DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY R D L M E I T E K L L
P F I T Y HN E L DMN L Y MR I A P E L Y HK M L V V G G I DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY HD L M E I T E K MV
P F V T HHND L DMDMY MR I A P E L F L K E L V V G G MDR V Y E I G R Q F R N E G I DMT HN P E F T T C E F Y QA Y A DV Y D L MDMT E I M I
P F V T HHND L NQT M F L R I A P E L Y L K E L V V G G MDR V Y E I G K Q F R N E G I D L T HN P E F T S C E A Y WA Y MDY HDWMT A T E D L L
P F I T Y HN E L DMN L Y MR I A P E L Y HK I L V V G G I DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY HDV M E I T E K MV
P F V T HHN E Y D L DM F MR I A P E L Y L K M L V V G G Y NK V F E I G K N F R N E G C D L T HN P E F T T I E A Y A A Y Y DMY DV MDY T E E L V
P F K T F HNC L G QN L F L R I A P E L Y L K R L V V G G Y E K V F E I S K N F R N E D I DT T HN P E F T M I E V Y E A Y R DY NDMMD L T E A L I
P F I T Y HN E L DMN L Y MR I A P E L Y HK M L V V G G I DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY HD L M E I T E K M I
P F I T Y HN E L DMN L Y MR I A P E L Y HK M L V V G G I DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY HD L M E I T E K M L
P F V T HHND L DMDM F MR V A P E L F L K K M I V G Q F G K V F E MG K N F R N E G I D L T HN P E F T S I E F Y WA Y A DV Y D L M S I T E E L V
P F V T HHN E L NMR L Y MR I A P E L Y L K E L V V G G L DR V Y E I G K Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY ND L I E L T E T M L
P F I T HHND L NMT L Y MR I A P E L Y L K Q L V V G G I E R V Y E I G R Q F R N E G I DMT HN P E F T T C E F Y QA Y A DY DD L MQMT E E M I
P F I T Y HN E L DMN L Y MR V A P E L Y HK M L V V G G I DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY HD L M E I T E K MV
P F K T F HN S L HR D L F MR V A P E L Y L K M L I V G G L DR V Y E I G K N F R N E G I DQT HN P E F T A M E F Y WA Y C DY ND L MT V T E E V L
P F I T HHND L S MDM F MR I A P E L F L K E L V V G G MDR V Y E I G R Q F R N E G I DMT HN P E F T T C E F Y QA Y A DV Y D L MDMT E L M F
P F I T HHND L D L D L Y L R I A T E L P L K M L I V G G I DK V Y E I G K V F R N E G I DNT HN P E F T S C E F Y WA Y A DY ND L I K W S E D F F
P F I T HHND L D L D L Y L R I A T E L P L K M L I V G G I DK V Y E I G K V F R N E G I DNT HN P E F T S C E F Y WA Y A D F Y D L I K W S E D F F
P F I T HHND L D L D L Y L R I A T E L P L K M L I V G G L DR V Y E I G K V F R N E G I DNT HN P E F T S C E F Y WA Y A DY Y D L I K W S E E F F
P F K T HHND L NMK L Y MR I A P E L Y L K E L V V G G L DR V Y E I G K Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY ND L M E L T E K M L
P F I T Y HN E L DMN L Y MR I A P E L Y HK M L V V G G I DR V Y E I G R Q F R N E G I D L T HN P E F T T C E F Y MA Y A DY HD L M E I T E K M L
P F I T HHND L DMDMY MR I A P E L F L K Q L V V G G L DR V Y E I G R Q F R N E G I DMT HN P E F T T C E F Y QA Y A DV Y D L MDMT E L M F
P F V T HHND L DMD L Y MR I A P E L Y L K M L V V G G L DR V Y E I G R Q F R N E G A D L T HN P E F T S I E F Y QA Y A DY Y D L MDT T E E L L
P F I T HHND L NMD L F MR V A P E L Y L K M L V V G G L QR V Y E I G R Q F R N E G I D L T HN P E F T T L E F Y MA Y A DY ND L MD I A E R L L
P F V T Y HND L DMN L Y MR I A P E L Y HK M L V V G G I DR V Y E I G R Q F R N E G I DMT HN P E F T T C E F Y MA Y A DY HD L M E I T E K L L
P F V T HHND L NMD I F MR I A P E L Y L K N L V V G G F E R V Y E I G K Q F R N E G I DR T HN P E F T S I E L Y QA Y A DY E DMMK L T E D L L
P F I T HHN E L D L D L F MR I A P E L P L K L I I I G G F E K V F E I G K C F R N E G I D P T HN P E F T S C E F Y WA Y A DY HD L MK L T E E L L
P F I T HHN E L D L D L F MR I A P E L P L K L I I I G G F E K V F E I G K C F R N E G I D P T HN P E F T S C E F Y WA Y A DY HD L MK L T E E F L
P F I T HHNA L D I D L W L R V A P E L F L K M L V V G G MNR V Y E L G K Q F R N E G I D L T HN P E F T S C E F Y MA Y A DY ND L MD L T E K L Y
P F I T HHN E L NQT MY L R I A P E L Y L K K L V V G G L DR V Y E I G K Q F R N E G I D L T HN P E F T S V E S Y WA Y A DY NDWM E T T E E L L
P F I T HHN E L NQR MY L R I A P E L Y L K E L V V G G MDR V Y E L G K Q F R N E G I D L T HN P E F T S V E A Y WA Y A DY NDWMR T T E D L F
P F V T Y HND L K L D L F MR I A P E L F L K E L V V G G L DR V Y E I G R V F R N E S I DQT HN P E F S I C E F Y MA Y A DMY D L MD I T E S M I
P F V T HHND L N L DMY MR I A P E L Y L K Q L V V G G M E R V Y E I G R Q F R N E G I DQT HN P E F T T C E F Y E A Y A DV Y D L M E T T E L L F
2750
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
2760
2770
2780
2790
2800
2810
2820
S G MV HA I HG F T P P F R R I S M I S S L E E A V Q F L DA L C V K H E V - E C K P P R T A A R L L DK L V G E F L E E T C I N - P T F I C DH P Q I M S P L A K
S G MV K E L T G F T P P F R R I E M I G E L E E A NK Y L I DA C A R F DV -K C P P P QT T A R L L DK L V G E F L E P T C V N - P T F I I NQ P E I M S P L A K
S E MV K E I T G F A R P WK R I NM I E E L E E T G E F L K K I L S DNK M -DC P P P L T NA R M L DK L V G E - L E DT C I D - P T F I F G H P QMM S P L A K
S G L V K HV T G WK A P WR R V E M I P A L E E T G E F L K R V L K K T G V - E C S P P L T NA R M L DK L V G E F I E E T C V N - P T F I T G H P QMM S P L A K
S G L V K H I T G WK A P WR R V E M I P A L E E T G E F L K R V L K K T G V - E C S P P MT NA R M L DK L V G E F I E E T C V N - P T F I T G H P QMM S P L A K
S G MV Q S I HG F T P P F A R V P MMA T L E E A ND F L NK L C NT HQ I - E C S P P R T T A R L L DK L V S V F L E E E C I N - P T F I L DH P Q I M S P L S K
S G MV K H I T G F T P P F R K I S MV E E L E E T R R I L DD I C V A K DV - E C P P P R T T A R L L DK L V G E F L E V T C I N - P T F I C DH P Q I M S P L A K
S S MV L A I K G F T P P F K R V NMY E G L A E A R QT F DK L C R DNNV -DC S E P R T T A R L L DK L V G E Y L E S T F I S - P T F L I G H P Q I M S P L A K
S S MV M S I K G F T P P F K R V HMY DG L A E A R E V F DK L C R DNNV -DC S A P R T T A R L L DK L V G E Y L E S T F I S - P T F L I G H P Q I M S P L A K
S E MV K E I T G F S R P WK R V NM I E E L E E T G E F L K K V L K DNN L - E C S P P L T NA R M L DK L V G E - L E DA S I N - P T F I F G H P QMM S P L A K
S E MV K E I NG F A R P WK R I NM I E E L E E T G E F L QK V L K DNN L - E C P P P I T NA R M L DK L V G E - L E DT C I N - P T F I F G H P QMM S P L A K
S G MV K H I T G F T P P F R R I S MV E E L E E T R K I L DD I C L A R A V - E C P P P R T T A R L L DK L V G E F L E T T C I N - P T F I C DH P Q I M S P L A K
S G MV K Q I C G Y T P P F R R L R M L P D L E G A QA R L D E I C V K L G V - E C P P P R T T A R L L DK L V G DY L E V NC I N - P T F I T E H P E I M S P L A K
E G MV K S L T G F A R P WK R F DM I G E L E NT NK F L R E L C E K HNV -DC A E P K T N S R L L DK L V G E Y I E NQC V N - P S F I V G H P QV M S P L A K
S G L V L E I HG F T T P WK R F S F V E E I E E N I D F MV E MC E K HK I - E L P H P R T A A K L L DK L A G H F V E T K C T N - P S F I I DH P QT M S P L A K
S G L V L E I HG F T T P WK R F S F V E E I E E N I D F MV E MC E K H E I - E L P H P R T A A K L L DK L A G H F V E T K C T N - P S F I I DH P QT M S P L A K
S G MV K H I T G F T P P F R R I S MT Q E L E E MR K F L DD L C V QK E V - E C P P P R T T A R L L DK L V G D F L E V K C I N - P T Y I C DH P Q I M S P L A K
S E MV K K I T G F T R P WK R V NM I E E L E E T G K F L K Q I L I DHK L -DC S P P L T NA R M L DK L V G E - L E DA S I N - P T F I F G H P QMM S P L A K
Q S I V M S I HG F S S P WR K I DM I A D L E E C R E F L V K T C R E R K V - E C S A P QT T A R L L DK L V G E Y L E V QC I N - P T F I I NH P E I M S P L S K
S G MV K A I R G F T P P F K R V S M I K T L E E T NQ F L S Q L C A K HQV - E C P A P R T T A R L L DK L V G E F I E E F C V N - P T F I C E H P Q I M S P L A K
S G MV K S I R G F T P P F K R V S M I K T L E A T T D F L S Q L C V K HQV - E C P A P R T T A R L L DK L V G E F I E E E C I N - P T F I C E H P Q I M S P L A K
S G MV E NM F G F K R P F R V I S I L E E L N E T L E K L L S A C DK E G L - S V E K P R T L S R V L DK L I G HV I E P QC V N - P T F V K DH P I A M S P L A K
G NMV K D I T G F T A P F K R I S Y V HA L E E A L T F L K K QA I R F NA - I C A E P QT T A R V MDK L F G D L I E V D L V Q - P T F V C DQ P Q L M S P L A K
S G MV K H I T G F T P P F R R I S MV D E L E E T R R F F DD L C A V R NV - E C P P P R T T A R L L DK L V G E F L E V T C I N - P T F I C DH P Q I M S P L A K
S G MV K H I T G F T P P F R R I NMV E E L E E T R K I L DD I C V A K A V - E C P P P R T T A R L L DK L V G E F L E V T C I N - P T F I C DH P Q I M S P L A K
S E MV K E I T G F T R P WK R I NM I E E L E E T G E F L K K V L K DNK M -DC A P P L T NA R M L DK L V G E - L E DT C I N - P T F I F G H P QMM S P L A K
Y G L A V E L HG F S K P F K R L H I I P E L E A G I Q F L MD L C K K HK A -DC P P P Y T A P R L L DA L I A E F L E P E C HD - P C F I C DH P R V M S P L A K
S G MV K H I T G F T P P F R R I S M I E E L E E T R K I L DD I C V A K A V - E C P P P R T T A R L L DK L V G E F L E V T C I N - P T F I C DH P Q I M S P L A K
S G L V K H L T G WA R P WK R V K I M P E L E E T NQ F L R D L L K E K N I - E C T P P L T NA R M L DK L I G E Y L E E T C I N - P T F L M E H P Q L M S P L A K
S E L V F R L T G L R S P WK R I S M E G A L K H S L E E L K Q I A I QNR I E DY E K A K S HG E F L A L L F E G L V E DK L V N - P T F I Y D F P V E N S P L A K
S G MV K N I T G F T P P F R R I S MV E E L E E T R K I L DD I C V A R DV - E C P P P R T T A R L L DK L V G E F L E V T C I N - P T F I C DH P Q I M S P L A K
S G MV K S I T G F T P P F R R I S MV E E L E E T R K I L DD I C V A K A V - E C P P P R T T A R L L DK L V G E F L E V T C I S - P T F I C DH P Q I M S P L A K
S S L V K E L T G W E A P WR R V E M I P A L E E T NA F L QR I C K K MNV - E C P P P L T NA R M I DK L T G E F I E E T C I N - P T F I L E H P QMM S P L A K
S G MV K E L T G F T P P F R K I DM I E E L E E A NK Y L I DA C A K Y DV -K C P P P QT T T R L L DK L V G H F L E E T C V N - P T F I I NH P E I M S P L A K
S G MV Y A I K G F T P P F R R I S MV S G L E E N E D F L K E L I K K L G V - E M S P P Y T T A R M L D E L V G E Y L E S Q L V N - P G F I C DH P Q I M S P L A K
S G MV K H I T G F T P P F R R I NMV E E L E E T R K I L DD I C V A K A V - E C P P P R T T A R L L DK L V G E F L E V T C I N - P T F I C DH P Q I M S P L A K
S S I V L K L K G F T P P W P R V S MMA E L E E A NA F F V E QA K K HK V - E C S N P R T T A R L I DK L V G H F L E V N F R N - P T F L I DH P Q L M S P L S K
S E MV K E I T G F S R P WK R V NM I E E L E E T G K F L K Q I L I DNK L -DC T P P L T NA R M L DK L V G E - L E DA S I N - P T F I F G H P QMM S P L A K
S Q L V Y H L F G F T P P Y P K V S I V E E I E E T I E K M I N I I K E HK I - E L P N P P T A A K L L DQ L A S H F I E NK Y NDK P F F I V E H P Q I M S P L A K
S T L V MH L F G F T P P Y P K V S I V E E L E E T I NK M I N L I K E NK I - E M P N P P T A A K L L DQ L A S H F I E NQY P NK P F F I I E H P Q I M S P L A K
S K L V Y H L F G F T P P Y P K I S L V E E L E E T I NK M I N I I K E NN I - E M P N P P T A A K L L DQ L A S H F I E N I Y QNQ P F F I I E H P Q I M S P L A K
S G MV K E L T G F T P P F R R I DM I E E L E E A T K Y L V A A C E K F E V -K C P P P QT T T R L L DK L V G H F L E E T C V N - P S F I I NH P E I M S P L A K
S G MV R S I T G F T P P F R R I S MV E E L E E T R K I L DD I C V A R A V - E C P P P R T T A R L L DK L V G E F L E V T C I S - P T F I C DH P Q I M S P L A K
S E MV K E I T G F S R P WK R I NM I E E L E E T G E F L K K I L V DNK L - E C P P P L T NA R M L DK L V G E - L E DT C I N - P T F I F G H P QMM S P L A K
S G L V K D L T G F S R P WR R I NM I E Y L E E A NA F L R D L C A K HG V - E C A P P QT C S R L L DK L V G E F I E S E C I N - P T F I I G H P QMM S P L A K
S G MV K F V T G F T P P F R R V S M I N E L Q E T NK F L DD L C R K H E V - E C T S P R T T A R L L DK L V G E Y I E T QC I S - P T F I MDH P E I M S P L S K
S G MV K H I T G F T P P F R R L S MT HD L E E T R K F F DN L C A E K G V - E C P P P R T T A R L L DK L V G E F M E E T C I S - P T F I C DH P Q I M S P L A K
S S L V MK L T G F T P P F K R V P MM E T L S E A R E F F DK L C V QHNV -A C S A P R S T T R L I DK L V G H F I E V DC K N - P T F L M E H P Q I M S P L A K
S S L V F E L F G F T P P F QR V S MV E E L E E NV E K Y L T A I K E A G L -DM P K P P V P A K L I DQ L V G HY I E DQ I V K - P T F I V D F P QC T S P L S K
S S L V F E L F G F T P P F NK V S MV E E L E E NV E K Y L T A I K E A G L -DM P K P P V P A K L I DQ L V G HY I E DQ I V K - P T F I V D F P QC T S P L S K
QK I V M E V K G F S S P WQR I DM I E E L E E V R E L L E K K C K E L DV -DV P P P MT V A R M L DK MV G K F V E P L C V N - P T F MC NH P QV M S P L A K
Y G MV MH L Y G F NR P F K R L H I V P K L E E A N S F F L D I C K K NQV - E C N P P F T T T R L L DA L V S HY L E P QC HD - P T F L C DH P R I M S P L A K
Y G L A MH I HG F NK P F K R L Y I I P E L E S S NA F L Q E L C S K H E V - E C T P P L T T A R L L DA L I S HY L E P E C QD - P T F V C DH P R V M S P L A K
S G L V K A V T G F S T P WK R F DM I K E L E E T R K W L S D L A A K HNV -DC S E P R T S S R L I DK MT G E F I E T QC I N - P S F I V G H P QV M S P L A K
S E MV K E I T G F S R P WK R L D I I G T L E E T NQ F L Q E Q L K K V G L -V C T P P L T NA R M L DK L I G DY L E DT C I N - P T F L Y G H P E MM S P L A K
2830
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
Y HR S A T G L T E R
WHR S K S G L T E R
H S R DQ P G L C E R
Y HR QHA G L C E R
Y HR QNV G L C E R
Y HR DV P G L T E R
WHR S K E G L T E R
WHR S I P G L T E R
WHR S I P G L T E R
K DR N I P G L C E R
Y S R DQ P G L C E R
WHR S K E G L T E R
WHR S I K G L T E R
Y DR S R P G L C E R
WHR E K P E MT E R
WHR E K P E MT E R
WHR S QK G L T E R
K DR DNV G L C E R
Y HR E K P Q L T E R
Y HR S I P G L T E R
Y HR S A P G L T E R
NHR S K A G L T E R
Y HR S E P E L T E R
WHR I HR G L T E R
WHR S K E G L T E R
Y S R DQ P G L C E R
WHR ND P R L T E R
WHR S K E G L T E R
Y HR T E K G I S E R
NHR E K E G F V E R
WHR S K NG L T E R
WHR S K E G L T E R
Y HR S K NG L C E R
WHR S R P G L T E R
Y HR N I P G MT E R
WHR S K E G L T E R
V HR QY P G L T E R
K DR N I P G L C E R
Y HR T K P G L T E R
Y HR S K P G L T E R
Y HR S K P G L T E R
WHR S K P G L T E R
WHR C K E G L T E R
Y S R DQ P G L C E R
Y HR S DA G L C E R
WHR S I P G L T E R
WHR S E K G L T E R
Y HR S K P NV T E R
WHR S K E NV C E R
WHR S K E NV C E R
WHR T K P G I V E R
WHR K D L R L S E R
WHR ND P Q L T E R
R HR D I P G L C E R
Y S R DR P G I C E R
2840
2850
2860
2870
2880
2890
2900
F E L F V MR K E V C NA Y T E L ND P A V QR E R F E QQA A DK A A G DD E A Q L V D E N F C T A L E Y G L P P T G G WG MG I DR L T M F
F E L F I NK H E L C NA Y T E L ND P V V QR QR F A DQ L K DR Q S G DD E A MA L D E T F C NA L E Y G L A P T G G WG L G I DR L S M L
F E V F V A T K E I C NA Y T E L ND P F DQR A R F E E QA R QK DQG DD E A Q L I D E T F C NA L E Y G L P P T G G WG C G V DR L A M F
F E A F V C K K E I V NA Y T E L ND P F DQR L R F E E QA R QK DQG DD E A Q L I D E N F C T S L E Y G L P P T G G WG MG I DR L V M F
F E A F V C K K E I V NA Y T E L ND P F DQR L R F E E QA R QK DQG DD E A Q I I D E N F C T S L E Y G L P P T G G WG MG I DR L V M F
F E V Y V A K K E I C NA Y T E L ND P A T QR E R F E E QA K NR A A G DD E T P P T D E A F C T A L E Y G L P P T G G WG L G V DR L T M F
F E L F V MK K E I C NA Y T E L ND P V R QR Q L F E E QA K A K A A G DD E A M F I D E T F C T A L E Y G L P P T G G WG MG I DR V T M F
F E L F A V T R E I A NA Y T E L ND P I T QR QR F E QQA K DK DA G DD E A QM I D E T F C NA L E Y G L P P T G G WG MG I DR L S M I
F E L F A V T R E I A NA Y T E L ND P I T QR QR F E QQA K DK DA G DD E A QM I D E T F C NA L E Y G L P P T G G WG MG I DR L S M I
F E V F V A T K E I C NA Y T E L ND P F DQR A R F E E QA R QK A QG DD E A QMV D E T F C NA L E Y G L P P T A G WG C G I DR L A M F
F E V F V A T K E I C NA Y T E L ND P F DQR A R F E E QA R QK DQG DD E A Q L I D E T F C NA L E Y G L P P T G G WG C G I DR L A M F
F E L F V MK K E I C NA Y T E L ND P V R QR Q L F E E QA K A K A A G DD E A M F I D E N F C T A L E Y G L P P T A G WG MG I DR V T M F
F E L F V NK K E I C NA Y T E L ND P M I QR QR F E QQA L DK A A G DD E A QMV D E N F C T A L E Y G L P P T G G WG MG I DR L T M F
F E A F L C T K E I C NA Y T E L ND P F DQR E R F M E QV R QK E QG D E E A QG V D E T F L DA L E Y G L P P T G G WG L G I DR L V M F
F E L F V L G K E L C NA Y T E L N E P L QQR K F F E QQA DA K A S G DV E A C P I D E T F C L A L E HG L P P T G G WG L G I DR L I M F
F E L F V L G K E L C NA Y T E L N E P L QQR K F F E QQA DA K A S G DV E A C P I D E T F C L A L E HG L P P T G G WG L G I DR L I M F
F E L F V MK K E I C NA Y T E L ND P I R QR E L F E QQA K A K A E G DD E A M F I D E T F C T A L E Y G L P P T A G WG MG I DR L T M F
F E V F V A T K E I C NA Y T E L ND P F DQR QR F E E QA R QK A QG DD E A QMV D E T F C NA L E Y G L P P T A G WG C G I DR L A M F
F E L F V NT K E I C NA Y T E L NN P F V Q I E R F A E QA K A K A A G DD E S M L I DK V F T T S L E Y G L P P T G G F G L G I DR F A M L
F E L F V A K K E I C NA Y T E L ND P V V QR E R F E QQA S DK A A G DD E A Q L V D E N F C T S L E Y G L P P T G G F G MG I DR L A M F
F E L F V A K K E I C NA Y T E L ND P V V QR E R F E QQA S DK A A G DD E A QMV D E N F C T A L E Y G L P P T G G F G MG I DR L T M F
F E L F I NC K E I C NA Y T E L NN P F E QR E R F L QQT QD L NA G DD E A MMND E D F C T A L E Y G L P P T G G WG I G I DR L V MY
F E L F I L K R E I A NA Y T E L NN P I V QR S N F E QQA K DK A A G DD E A Q L V D E V F L DA I E HA F P P T G G WG L G I DR L A M L
F E L F V MK K E V C NA Y T E L ND P F QQR Q L F E DQA K A K A A G DD E A M F I D E N F C T A L E Y G L P P T A G WG MG I DR F T M F
F E L F V MK K E I C NA Y T E L ND P MR QR Q L F E E QA K A K A A G DD E A M F I D E N F C T A L E Y G L P P T A G WG MG I DR V A M F
F E V F V A T K E I C NA Y T E L ND P F DQR A R F E E QA NQK A QG DD E A Q L V D E T F C NA L E Y G L P P T G G WG C G I DR L A M F
F E L F V NK K E L A NA Y T E L NN P I V QR E E F L K QV R NR DK G DD E S M E I D E G F V A A L E HA L P P T G G WG L G I DR L V M F
F E L F V MK K E I C NA Y T E L ND P V R QR Q L F E E QA K A K A A G DD E A M F I D E N F C T A L E Y G L P P T G G F G MG L DR V A M F
F E G F V C K K E I C NA Y T E L NN P F DQR L R F E E QA R QK A QG DD E A QM I D E N F L R S L E Y G L P P T A G WG L G I DR L C M F
F E L F L NG W E L A NG Y S E L ND P L E Q E K R F E E QDK K R K L G D L E A QT V DY D F I NA L G Y G L P P T G G MG L G I DR L T M I
F E L F V MK K E I C NA Y T E L ND P V R QR Q L F E E QA K A K A A G DD E A MV I DDN F C T A L E Y G L P P T A G WG MG I DR L T M F
F E L F V MK K E I C NA Y T E L ND P V R QR Q L F E E QA K A K A A G DD E A M F I D E N F C T A L E Y G L P P T A G WG MG I DR L T M F
F E A F V C K K E I A NA Y T E L NN P F DQR L R F E E QA R QK DQG DD E A Q L V D E S F L NA L E Y G L P P T G G WG L G I DR L A M F
F E L F V NK H E V C NA Y T E L ND P V V QR QR F E E Q L K DR Q S G DD E A MA L D E T F C T A L E Y G L P P T G G WG L G I DR L T M L
F E L F V NT K E L C NA Y T E L ND P I DQR E R F D E QA K A K S S G DD E A M L I D E V F V Q S L E Y G L P P T G G WG L G V DR L T M L
F E L F V MK K E I C NA Y T E L ND P MR QR Q L F E E QA K A K A A G DD E A M F I D E N F C T A L E Y G L P P T A G WG MG I DR V A M F
F E L F V NY H E L C NA Y T E L ND P F V QK A L F QK QV E DA A K G DD E A MG Y D E G F I K S L E HA L P P T A G WG L G I DR F V M L
F E V F V A T K E I C NA Y T E L ND P F DQR QR F E E QA R QK A QG DD E A Q L V D E V F C NA L E Y G L P P T A G WG C G I DR L A M F
L E M F I C G K E V L NA Y T E L ND P F K QK E C F K L QQK DR E K G DT E A A Q L D S A F C T S L E Y G L P P T G G L G L G I DR I T M F
L E M F I C G K E V L NA Y T E L ND P F K QK E C F S A QQK DR E K G DA E A F Q F DA P Y C T S L E Y G L P P T G G L G L G I DR I T M F
L E M F I C G K E V L NA Y T E L ND P F K QK E C F A S QQK DK E K G DT E A F HC DA A F C T S L E Y G L P P T G G L G L G I DR I T M F
F E L F V NK H E L C NA Y T E L ND P V V QR QR F E A Q L K DR Q S G DD E A MA L D E T F C MA L E Y G L P P T G G WG L G I DR L A M L
F E L F V MK K E I C NA Y T E L ND P V R QR Q L F E E QA K A K A A G DD E A M F I D E N F C T A L E Y G L P P T A G WG MG I DR V T M F
F E V F V A T K E I C NA Y T E L ND P F DQR A R F E E QA R QK DQG DD E A Q L V D E T F C NA L E Y G L P P T G G WG C G I DR L A M F
F E A F V A T K E I C NA Y T E L ND I F DQR A R F E E QA R QK A QG DD E A Q I I D E N F C T A L E Y G L P P T G G WG MG V DR L V M F
F E L F V A R K E I C NA Y T E L ND P MV QR E R F A T QA K DHA A G DD E A Q L I D E N F C T A L E Y G L P P T G G F G L G I DR L A M F
F E L F V MK K E I C N S Y T E L ND S V R QR E L F E QQA K A K A E G DD E A M F I D E T F C T A L E Y G L P P T A G WG MG I DR L C M F
F E L F V NY Y E L C NA F T E L ND P F K QR K I F V QQ I E E K NK G DV E A MG Y DK D F C DC L E HA L P P T G G WG L G I DR L V M L
F E L F I C G K E L I N S Y T E L ND P I T QR E C F K QQQK A K D L G DD E A Q P P D E A F C T A L E Y G L P P T A G WG I G I DR L A M F
F E L F I C G K E L I N S Y T E L ND P I T QR DC F K QQQK A K D L G DD E A Q P P D E A F C T A L E Y G L P P T A G WG I G I DR L T M F
F E V F I NG L E Y A NA Y T E L NC P MV QR E L F L DQ L K A K A A K DD E A M P Y DDT F C T A L E Y A L P P T A G WG C G V DR L V M L
F E L F I NK K E I C N S Y T E L N S P L V QR E E F E R Q L R DR E K G DD E A MD I D E G Y V QA L E Y A L P P T G G WG L G I DR L V MY
F E L F L NK K E L C NA Y T E L NN P I V QR E E F MK Q L R NK E K G DD E A MD I D E G F V QA L E HA L P P T G G WG L G I DR L V M F
F E V F V A T K E I C NA Y T E L ND P WV QR A N F E E Q S R QK DQG DD E A QG I DHV F I DA L E HG L P P T G G WG L G I DR L V M F
F E V F V A T K E I C NA Y T E L ND P F DQR QR F E E QA R QK DA G DD E A Q L V D E T F C T A L E Y G L P P T A G WG C G V DR L T M F
2910
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
L T D S NN I
LT D S LN I
L T D S NT I
L T DNY S I
L T DNY S I
L T D S NN I
L T D S NN I
L T DNNN I
L T DNNN I
L T D S NT I
L T D S NT I
L T D S NN I
L T D S NN I
L T DC S N I
L A DK NN I
L A DK NN I
L T D S NN I
L T D S NT I
M S DT Y N I
L T D S NN I
L T D S NN I
L T DA A N I
L A DV DN I
LT D S SN I
L T D S NN I
L A D S NT I
LT SQ SN I
L T D S NN I
L T NNA T I
LAG L E S I
L T D S NN I
L T D S NN I
L T DNY S I
L T D S QN I
L A DK NN I
L T D S NN I
L T DT QN I
L T D S NT I
L T NK N S I
L T NK NC I
L T NK N S I
L T D S QN I
L T D S NN I
L T D S NT I
L T D S NT I
L T D S NN I
L T D S NN I
L T DN I Y I
L S DK NN I
L A DK NN I
L T NQV S I
L T S QNN I
L T S QA N I
LT D SN S I
L T N S NT I
2920
2930
2940
2950
2960
2970
2980
K E V L L F P A MK P R QA A E A HR QT R QY MQ -K W I K P G MT M I E I C E E L E NT A R G L A F P T G C S R NHC A A HY T P NA G D - P T V L
K E V L F F P A MR P R R A A E V HR QV R K Y V R - S I V K P G M L MT D I C E T L E NT V R G I A F P T G C S L NWV A A HWT P N S G D -K T V L
R E V L L F P T L K P R K G A E I HR R V R R H L Q -NR L R P G QT L T E V V E L V E NA T R G I G F P T G V S L NHC A A H F T P NA G D -T T V L
K E V L A F P F MK E R QA A E V HR QV R QY A Q -K T I K P G QT L T E I A E G I E DA V R G MG F P C G L S I NHC A A HY T P NA G N -K MV L
K E V L A F P F MK E R QA A E V HR QV R QY A Q -K T I K P G QT L T E I A E G I E E S V R G MG F P C G L S I NHC A A HY T P NA G N -K MV L
K E V L L F P A MK P R HA A E A HR QT R K H I R -NW I K P G MT M I D I C E E L E K T A R G L A F P T G C S R NHC A A HY T P NT G D -T T V L
K E V L L F P A MK P R E A A E A HR QV R K Y V M - S W I K P G MT M I E I C E K M E DC S R G L A F P T G C S L NNC A A HY T P NA G D -T T V L
K E V L L F P A MR P R R S A E A HR QV R QY V K - S W I K P G M S M I E I C E R L E T T S R G L A F P T G C S L NHC A A HY T P NA G D -T T V L
K E V L L F P A MR P R R S A E A HR QV R K Y V K - S W I K P G MT M I E I C E R L E T T S R G L A F P T G C S L NHC A A HY T P NA G D -T T V L
R E V L L F P T L K P R K G A E I HR R V R HK A Q - S S I R P G MT M I E I A N L I E D S V R G I G F P T G L S L NHV A A HY T P NT G D -K L I L
R E V L L F P T L K P R K G A E I HR R V R K NV Q -NK L K P G M L L T E V A D I I E NA T R G I G F P T G L S L NHC A A HY T P NT G D -K T V L
K E V L L F P A MK P R E A A E A HR QV R K Y V M - S W I K P G MT M I E I C E K L E DC S R G L A F P T G C S L NNC A A HY T P NA G D -T T V L
K E V L L F P A MK P R QA A E T HR QV R HHV Q - E F I K P G L S M I E I C E R L E QA S R G L A F P T G C S L NNC A A HY T P NA G D -K T V L
K E V L L F P A MR P R R A G E V HR QV R A Y A Q -K A I K P G MT MT E I A N L I E DG T R G I G F P T G L S V N E V A A HY T P N P G D -K QV L
K E V I L F P A MR NR R A A E V HR QV R K Y MQ - S I I R P E MK L I DMC N I L E S K V K G WG F P T G C S L NHC A A HY T P N P HD - F T K L
K E V I L F P A MR NR R A A E V HR QV R K Y MQ - S I I R P E MK L I DMC N I L E S K V K G WG F P T G C S L NHC A A HY T P N P HD - F T K L
K E V L L F P A MK P R QA A E A HR QV R K Y V Q - S W I K P G MT M I E I C E K L E DC S R G L A F P T G C S L NHC A A HY T P NA G D - P T V L
R E V L L F P T L K P R K G A E I HR R V R HK A Q - S S I R P G MNMT E I A D L I E N S V R G I G F P T G L S L NHV A A HY T P NA G D -K T V L
K E V I L F P A MK P R R A A E V HR QV R K Y V Q -G I V K P G L G L T E L V E S L E NA S R G I A F P T G V S L NH I A A H F T P NT G D -K T V L
K E V L L F P A MK P R QA A E A HR QT R QY MQ -R Y I K P G MT M I Q I C E E L E NT A R G L A F P T G C S L NHC A A HY T P NA G D - P T V L
K E V L L F P A MK P R QA A E A HR QT R QY MQ -R F I K P G MT M I Q I C E E L E NT A R G L A F P T G C S L NHC A A HY T P NA G D - P T V L
R DV I F F P T MK P R R A A E A HR R A R Y R V Q - S I V R P G I T L L E I V R S I E D S T R G I G F P A G M S MN S C A A HY T V N P G E QD I V L
K E V I L F P T MR P R K A A A I HK S V R QWA Q -QW I K P G M S D L F V A E N I E R K V R G MA F P C G L S V N S C A A H F T P N P ND P L S F Y
K E V L L F P A MK P R E A A E A HR QV R K Y V M - S W I K P G MT M I E I C E K L E DC S R G L A F P T G C S L NNC A A HY T P NA G D - P T V L
K E V L L F P A MK P R E A A E A HR QV R K Y V M - S W I K P G MT M I E I C E K L E DC S R G L A F P T G C S L NNC A A HY T P NA G D -T T V L
R E V L L F P T L K P R K G A E I HR R V R E S V R -NK I K P G MT L T E I A N L V E DG T R G I A F P T G L S L NHC A A H F T P NA G D -K T V L
K E V L L F P A MK P R E A A E V HR QV R T WA Q - S W I K P G L S L M L MT DR I E K K L NG QA F P T G C S L NHV A A HY T P NT G D E K V V L
K E V L L F P A MK P R E A A E A HR QV R K Y V M - S W I K P G MT M I E I C E K L E DC S R G L A F P T G C S L NNC A A HY T P NA G D -T T V L
R E V L A F P F MR DR HG A E A HR QA R R WA H -K HV K P G M S L T D I A NG I E D S V R G MG F P T G L S I NHC A A HY T P NA G N -K MV L
K E V I L F P QMK R R E A G R I L K I V R T E A A -DM I R V G N S L L E V A E F V E K K T I - -A F P C N I S R NQ E A A HA T P K A G D -QDV F
K E V L L F P A MK P R E A A E A HR QV R K Y V M - S W I K P G MT M I E I C E K L E DC S R G L A F P T G C S L NNC A A HY T P NA G D - P T V L
K E V L L F P A MK P R E A A E A HR QV R K Y V M - S W I K P G MT M I E I C E K L E DC S R G L A F P T G C S L NNC A A HY T P NA G D -T T V L
R E V L A F P F L R E R HA A E V HR QV R QWA Q -K S I K P G QT L T E I A E N I E D S V R G MG F P T G L S I NHC A A HY T P NA G N -K MV L
K E V L L F P A MK P R R A A E V HR QV R K HMR - S I L K P G M L M I D L C E T L E NMV R G I A F P T G C S L NWV A A HWT P N S G D -K T V L
K E V L L F P A MK P R QC A E V HR E V R QY I S -DWV K P G MK Y I DV C E T L E N S V R G V A F P T G C S K NHV A A HWT P NG G C - E S V I
K E V L L F P A MK P R E A A E A HR QV R K Y V M - S W I K P G MT M I E I C E K L E DC S R G L A F P T G C S L NNC A A HY T P NA G D -T T V L
Q E V L L F P A MK P R K A A E C HR QV R QY A QA K L L K P G NK L I D I C E K L E DMNR G I A F P T G C S L N F C A A HY T P NNG D -NT I L
R E V L L F P T L K P R K G A E I HR R V R HK A Q - S S I R A G M S MT E I A D L I E N S V R G I G F P T G L S L NHV A A HY T P NT G D -K L S L
K DV I L F P T MR P R K A A E C HR QV R K HMQ -A F I K P G K K M I D I A Q E T E R K T K G WG F P T G C S L NHC A A HY T P NY G D - E T V L
K DV I L F P T MR P R K A A E C HR QV R K Y I Q -A Y V Q P G R K M I D I V K E T E K K T K G WG F P T G C S L NHC A A HY T P NY G D - E T V L
K DV I L F P T MK P R K A A E C HR QV R K Y I Q - S Y I K P G R K M I D I V QK T E QK T K G WG F P T G C S L NNC A A HY T P NY G D - E T V L
K E V L L F P A MK P R QA A E V HR QV R K Y MK - S I L K P G M L MMD L C E T L E NT V R G I A F P T G C S L NWV A A HWT P N S G D -K T V L
K E V L L F P A MK P R E A A E A HR QV R K Y V M - S W I K P G MT M I E I C E K L E DC S R G L A F P T G C S L NNC A A HY T P NA G D -T T V L
R E V L L F P T L K P R K G A E I HR R V R R A I K -DR I V P G MK L MD I A DM I E NT T R G I G F P T G L S L NHC A A H F T P NA G D -K T V L
R E V L L F P HMK P R R A A E V HR QA R QY A Q - S V I K P G M S MMDV V NT I E NT T R G I G F P T G V S L NHC A A HY T P NA G D -T T I L
K E V L F F P A MK P R QA A E A HR QV R K HV Q -G F I K P G MT M I D I C E R L E T A S R G L A F P T G C S R NHC A A HY T P NA G D -T T V L
K E V L L F P A MK P R QA A E A HR QV R A Y V R - S W I K P G MT M I D I C E K L E DC S R G L A F P T G C S I NHC A A HY T P NA G D - P T V L
Q E V L L F P A MK P R K A A E C HR QV R K Y C Q -Q L I R P G K K L I D I C E S I E E MNR G I A F P T G C S L NHV A A HY T P NNG D - F T T I
K V F I G V I I I V -R R A A E V HR QV R R Y I Q - S V I R P G V S C L D I V QA V E S K T K G WG F P T G C S L N S C A A HY T P NY G D -K T V F
K E V I F F P T MR P R K A A E V HR QA R R Y I Q - S V I K P G L S C L D I V QA L E F K T K G WG F P T G C S L N S C A A HY T P NHG D -K T I F
R E V L L F P L MK P R E G A E I HR R V R R WA M E NV I K P G V K L Y DMC A Q I E E A V R G L A F P C G C S I NNC A A HY T P MY NT DQR V L
K E V L L F P A MK P R C A A E V HR QV R R Y A Q - S F I K P G I S L L S MT DR I E K K L E G QA F P T G C S L NHV A A HY T P NT G D -K C V L
K E V L L F P A MK P R HA A E V HR QV R R Y A Q - S F I K P G I S L I S MT DR I E R K V E G QA F P T G C S L NHV A A HY T P NT G D -K T V L
K E V L A F P A NK P R R A A E V HR QV R QY A Q - S A I K P G MT MT E I A E L V E DG T R G I G F P T G V S V N E C A A HY T P NA G D -K R V L
K E V L L F P A MK P R K G A E I HR V V R K Y A R -DN I K A G MT MT S I A E M I E D S V R G QG F P T G V S L NHC A A HY T P NA G D -K I V L
2990
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
3000
3010
3020
3030
3040
3050
3060
L Y DDV T K I D F G T H I K G R I I DC A F T L S F N P - -K Y DK L L E A V K E A T E T G I R E A G I DV R L C D I G A A I Q E V M E S Y E V E L DG K T Y QV K
QY DDV MK L D F G T H I DG H I I DC A F T V A F N P - -M F D P L L A A S R E A T Y T G I K E A G I DV R L C D I G A A I Q E V M E S Y E V E I NG K V F QV K
R H E DV MK V D F G V QV NG H I I D S A WT V T F D P - -R Y D P L L E A V R E A T Y T G I R E A G I DV R L T D I G E A I Q E V M E S Y E V T L G G QT Y QV R
QQG DV MK V D F G A H I NG R I V D S A F T MT F D P - -V Y D P L L E A V K DA T NT G I R E A G I DV R M S D I G A A I Q E A M E S Y E V E L NG T MY P V K
QQG DV MK V D F G A H I NG R I V D S A F T V A F D P - -V Y D P L L A A V K DA T NT G I R E A G I DV R M S D I G A A I Q E A M E S Y E V E I NG T MY P V K
E Y DDV V K I D F G T H I NG R I I DC A F T L H F N P - -R Y D P L V K G V Q E A T E A G I K A S G V DV R L C DV G A A V Q E V M E S H E V E L DG QMY - - QY DD I C K I D F G T H I S G R I I DC A F T V T F N P - -K Y DT L L K A V K DA T NT G I K C A G I DV R L C DV G E A I Q E V M E S Y E V E I DG K T Y QV K
QY G DV C K I DY G I HV R G R L I D S A F T V H F D P - -K F D P L V E A V K E A T NA G I R E S G I DV R L C DV G E V V E E V MT S H E V E L E G K T Y V V K
QY G DV C K I DY G I HV R G R L I D S A F T V H F D P - -K F D P L V E A V R E A T NA G I K E S G I DV R L C DV G E I V E E V MT S H E V E L DG K S Y V V K
K K DD I MK V D I G V HV NG R I C D S A F T MT F N E DG K Y DT I MQA V K E A T Y T G I K E S G I DV R L ND I G A A I Q E V M E S Y E M E E NG K T Y P I K
K Y E DV MK V D I G V QV NG H I V D S A WT V S F D P - -QY DN L L A A V K DA T Y T G I K E A G I DV R L T D I G E A I Q E V M E S Y E V E I K G K T Y QV K
QY DD I C K I D F G T H I S G R I I DC A F T V T F N P - -K Y DT L L K A V K DA T NT G I K C A G I DV R L C DV G E A I Q E V M E S Y E V E I DG K T Y QV K
S Y DDV C K I D F G T H I NG R I I DC A F T V S F N P - -K Y DR L L E A V K DA T NT G I K NA G I DV R L C DV G A A I Q E T M E S Y E V E I DG K T Y QV R
QQHDV MK V D F G V HV NG R I V D S A F T M S F E P - -T WDK L L E A V K DA T NT G I R E A G I DV R MC D I G E A I Q E V M E S Y E V E V NG K V Y P V K
T QDD I C K L D F G V QV NG M I I DC A F T V A F ND - -V F D P L I Q S T L DA T NT G L K V A G I DV M F S E I G S A I E E V I K S Y E F E Y K S K V Y N I K
T QDD I C K L D F G V QV NG M I I DC A F T V A F ND - -V F D P L I Q S T L DA T NT G L K V A G I DV M F S E I G S A I E E V I K S Y E F E Y K S K V Y N I K
QY DDV C K I D F G T H I NG R I I DC A F T V T F N P - -K Y DK L L E A V K DA T NT G I K C A G I DV R L C D I G E S I Q E V M E S Y E V D L DG K T Y QV K
NY E DV MK V D I G V HV NG H I V D S A F T L T F DD - -K Y D S L L K A V K E A T NT G V K E A G I DV R L ND I G E A I Q E V M E S Y E M E L NG K T Y P I K
K K DDV L K I D F G T HV NG Y I I DC A F T V T F D E - -K Y DK L K DA V R E A T NT G I Y HA G I DA R L G E I G A A I Q E V M E S H E I E L NG K T Y P I R
QY DDV C K I D F G T H I K G R I I DC A F T L T F NN - -K Y DK L L QA V K E A T NT G I R E A G I DV R L C D I G A A I Q E V M E S Y E I E L DG K T Y P I K
QY DDV C K I D F G T H I K G R I I DC A F T L T F NN - -K Y DK L L QA V K E A T NT G I K E A G I DV R L C D I G A A I Q E V M E S Y E V E L DG K T Y P I K
K E DDV L K I D F G T H S DG R I MD S A F T V A F K E - -N L E P L L V A A R E G T E T G I K S L G V DV R V C D I G R D I N E V I S S Y E V E I G G R MW P I R
K T DDV V K I D F G V HV NG H L I D S A F T MT WD P - -A L Q P I L DC S K DA T NT G I K N I G V DV R L C D I G DA I E E V M S S Y E V E I K G K T Y Q L Q
HY DD I C K I D F G T Y Y S G R I I DC A F T V T F N P - -K Y DR L L E A V K DA T NT G I K C A G I DV R L C DV G E A I Q E V M E S Y E V E I DG K T Y QV K
QY DD I C K I D F G T H I S G R I I DC A F T V T F N P - -K Y DT L L K A V K DA T NT G I K C A G I DV R L C DV G E A I Q E V M E S Y E V E I DG K T Y QV K
K F E DV MK V D F G V HV NG Y I I D S A F T I A F D P - -QY DN L L A A V K DA T NT G I K E A G I DV R L T D I G E A I Q E V M E S Y E V E I NG E T HQV K
T Y DDV MK V D F G T H I NG R I I DC A WT V A F N P - -M F D P L L QA V K E A T Y E G I K QA G I DV R L G D I G A A I E E V M E S H E V E I NG K V HQV K
QY DD I C K I D F G T H I S G R I I DC A F T V T F N P - -K Y DT L L K A V K DA T NT G I K C A G I DV R L C DV G E A I Q E V M E S Y E V E I DG K T Y QV K
E HDDV L K V D I G V HV NG R I V D S A F T V A F N P - -R Y DN L L A A V K DA T NT G I R E A G I DA R L G E I G E A I Q E T M E S Y E V E I DG E T Y P V K
G -NDMV K L D L G V HV DG Y I A D S A V T V D L S G - -N S D - I V K A S E E A L A A A I D L MK P G V S T G E I G A A I E E R I H S - - - - - - - - -Y G L K
QY DD I C K I D F G T H I S G R I I DC A F T V T F N P - -K Y DT L L K A V K DA T NT G I K C A G I DV R L C DV G E A I Q E V M E S Y E V E I DG K T Y QV K
QY DD I C K I D F G T H I S G R I I DC A F T V T F N P - -K Y D I L L T A V K DA T NT G I K C A G I DV R L C DV G E A I Q E V M E S Y E V E I DG K T Y QV K
Q E DDV MK V D F G V HV NG R I V D S A F T V A F N P - -R Y D P L L E A V K A A T NA G I K E A G I DV R V G D I G A A I Q E V M E S Y E V E I NG QM L P V K
QY DDV MK L D F G T H I DG Y I V DC A F T V A F N P - -M F D S L L QA S K DA T NT G V K E A G I DA R L C DV G A A I Q E V M E S Y E V E I NG K V F Q I K
DK DDV I K F D F G V QV K G R I I DC A F T K T F ND - -MY D P L L K A V N E A T E T G I R S A G I DV R L C D I G E A V Q E V M E S HT V E I HG K E Y QV K
QY DD I C K I D F G T H I S G R I I DC A F T V T F N P - -K Y DT L L K A V K DA T NT G I K C A G I DV R L C DV G E A I Q E V M E S Y E V E I DG K T Y QV K
T Y DDV C K I D F G T QV DG W I I DC A F T V A F N P - -V Y DT L L QA A K DA T DT G I R N S G I DV R L G DV G A A I Q E T M E S Y E V E I G G K V Y K V K
G K DD L MK V D I G V HV NG H I C D S A F T MT L NDT G K Y D S I MK A V K DA T NT G V K E A G I DV R L ND I G E A I Q E V M E S Y E M E L DG K T Y P V K
K Y DDV C K L D F G V HV NG Y I I DC A F T I A F N E - -K Y DN L I K A T QDG T NT G I K E A G I DA R MC D I G E A I Q E A I E S Y E I E L NQK I Y P I K
K Y DDV C K L D F G V HV NG Y I I DC A F T I A F N E - -K Y DN L I K A T QDG T NT G I R E A G I DA R MC D I G E A I Q E A I E S Y E I E L NK K I Y P I K
K E DDV C K L D F G V HV NG Y I I DC A F T I A F ND - -K Y DN L I K A T QDG T NT G I K E A G I DA R MC D I G E A I Q E A I E S Y E I E L NQK V Y P I K
QY DDV MK L D F G T H I DG H I V DC A F T V A F N P - -M F D P L L E A S R E A T NT G I K E S G I DV R L C DV G A A I Q E V M E S Y E V E I NG K V F QV K
QY DD I C K I D F G T H I S G R I I DC A F T V T F N P - -K Y D I L L K A V K DA T NT G I K C A G I D I R L C DV G E A I Q E V M E S Y E V E I DG K T Y QV K
K Y E DV MK V DY G V QV NG N I I D S A F T V S F D P - -QY DN L L A A V K DA T Y T G I K E A G I DV R L T D I G E A I Q E V M E S Y E V E I NG E T Y QV K
K E K DV MK V D I G V HV NG R I V D S A F T M S F D P - -QY DN L L A A V K A A T NK G I E E A G I DA R L N E I G E A I Q E V M E S Y E V E I NG K T HQV K
E Y DDV C K I D F G T H I NG R I I DC A F T V T F N P - -K Y DQ L L A A V K DA T NT G I K E A G I DV R L C DV G E R I Q E V M E S Y E V E L DG K T Y QV K
R Y DDV C K I D F G T H I NG R I I DC A F T V T F N P - -K F DG L L E A V R DA T NT G I K F A G I DV R L C DV G E T I Q E V M E S Y E V E I DG K T Y QV K
E Y DDV C K I D F G T QV E G R I I DC A F T V A F N P - -K Y DK L L E A V K E A T NT G I K E A G I DV R I P DV G A A I Q E V M E S Y E V E I E G K T Y P V K
E K DD I MK L D F G T HV NG Y I I D S A F T I A F D E - -K Y D P L I E S T K E A T NT G L K L A G I DA R T S E L G E A I E E V I E S F E I T L K NR T HK I K
HK NDV MK L D F G T HV NG Y I I D S A F T I A F D E - -K Y D P L I E S T K E A T NT G V K L A G I DA R T S E L G E A I Q E V I E S Y E I T L K NK T HK I K
G K S DV MK I D F G V A I NG N I I D S A F T V C F D P - -K F E P L L E A A K T A T NT G V K I A G I DA R MN E I G DA I Q E V F DA S S I D I DG K HY D I K
MY DDV MK V D F G T Q I NG R I I DC A WT V A F K D - - E Y E P L L T A V K E A T Y E G V K QA G I DV R L C DV G A A I Q E V M E S Y E V E L NG K V Y P V K
T Y DDV MK V D F G T Q I NG R I V DC A WT V A F ND - - E Y A P L L E A V K S A T Y E G I K QA G I DV R L C D I G E A I Q E V M E S Y E V E I K G K V Y P V K
QA T DV L K V D F G V HV K G R I V D S A F T L N F E P - -T WD P L L A A V K A A T NA G I K E A G I DA R L G E I G A S I Q E V M E S H E F E A NG K T HR V K
K E DDV L K V D F G V HV NG K I I D S A F T HV QND - -K WQG L L DA V K A A T E T G I R E A G I DV R L G D I G E A I Q E T M E S H E V E V DG K V Y QV K
3080
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
3090
A I R N L NG H S I S P Y R I HA - - - - - -G K T V
S I R N L NG H S I G P Y Q I HA - - - - - -G K S V
P C R N L C G HN I V P Y Q I HG - - - - - -G K S V
C I R N L NG HN I DR H I I HG - - - - - -G K S V
C I R N L NG HN I DQH I I HG - - - - - -G K S V
- - - - - - - - - - - - L E I HA - - - - - -G K T V
P I R N L NG H S I G P Y R I HA - - - - - -G K T V
P I R N L NG H S I A QY R I HA - - - - - -G K T V
P I R N L NG H S I A QY R I HA - - - - - -G K T V
C I K N L NG HN I DD F V I H S - - - - - -G K S V
P C R N L C G H S I G P Y T I HA - - - - - -G K S V
P I R N L NG H S I G P Y R I HA - - - - - -G K T V
P I R N L S G H S I G QY R I HA - - - - - -G K T V
S I S N L NG H S I T P Y T I HG G I G T R P G K S V
P I K N L NG H S I L P Y H I HG - - - - - -G K S V
P I K N L NG H S I L P Y H I HG - - - - - -G K S V
P I R N L NG H S I G QY R I HA - - - - - -G K T V
C I R N L NG HN I G DY L I H S - - - - - -G K T V
S I R N L NG H S I R P Y V I HG - - - - - -G K T V
A I R N L NG H S I S P Y R I HA - - - - - -G K T V
A I R N L NG H S I S P Y R I HA - - - - - -G K T V
P I S D L HG H S I S Q F R I HG - - - - - -G I S I
P V R N L S G HMV G S Y A V HA - - - - - -G K S I
P I R N L NG H S I G P Y R I HA - - - - - -G K T V
P I R N L NG H S I G QY R I HA - - - - - -G K T V
P C R N L C G HN I N P Y S I HG - - - - - -G K S V
S I R N L S G HN I A P Y I I H S - - - - - -G K S V
P I R N L NG H S I G QY R I HA - - - - - -G K T V
P I R N L NG HT I DR Y T I HG - - - - - -G K S V
P I T N L T G HG L S HY E A HD - - - - - -N P P V
P I R N L NG H S I G P Y R I HA - - - - - -G K T V
P I R N L NG H S I G P Y R I HA - - - - - -G K T V
S I R N L NG HT I NHY S I HG - - - - - -T K S V
S V R N L NG H S I G P Y Q I HA - - - - - -G K S V
C C S N L NG H S I D P Y R I HA - - - - - -G K S V
P I R N L NG H S I G QY R I HA - - - - - -G K T V
S V K N L NG H L I C K Y H I HG - - - - - -G K S V
C I K N L NG HN I G DY I I H S - - - - - -G K T V
A I S N L R G H S I NK Y I I HG - - - - - -G K C V
A I S N L R G H S I NK Y I I HG - - - - - -G K C V
P I S N L R G H S I C K Y V I HG - - - - - -G K C V
S I R N L NG H S I G P Y Q I HA - - - - - -G K S V
P I R N L NG H S I G P Y R I HA - - - - - -G K T V
P C R N L C G H S I A P Y R I HG - - - - - -G K S V
S I R N L C G HN L D P Y I I HG - - - - - -G K S V
P I R N L NG H S I G P Y R I HA - - - - - -G K T V
P I R N L NG H S I G QY R I H S - - - - - -G K T V
A I R N L NG H S I E A Y Q I HA - - - - - -G K S V
P I R N L T G HN I G QY I I HA - - - - - -G K A V
P I R N L T G HN I G QY V I HA - - - - - -G K A V
P I S N L S G H S L G P Y T V HA - - - - - -G K S I
S I R N L S G HT I A P Y V I HG - - - - - -G K S V
S I R N L C G HN I G P Y V I H S - - - - - -G K S V
C V E N L NG H S I E R Y S I HG - - - - - -G K S V
S I R N L NG HN I A P Y E I HG - - - - - -G K S V
3100
3110
3120
P I V K - - - -G G E -T T R -M E E N E F Y A I
P I V K - - - -G G E -QT K -M E E G E F Y A I
P I V K - - - -NG D - E T K -M E E G E H F A I
P I V K - - - -G S D -QT K -M E E G E T F A I
P I V K - - - -G G D -QT K -M E E G E V F A I
P I V K - - - -G G E -T T R -M E E N E I Y A I
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V K - - - -G G E -QT K -M E E N E I Y A I
P I V K - - - -G G E -QT K -M E E N E I Y A I
P I I A - - - -NG D -MT K -M E E G E T F A I
P I V K - - - -NG D -QT K -M E E G E H F A I
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V K - - - -G G D -QT R -M E E G E V F A I
P I V K QHG S DK D - E T K -M E E G E Y F A I
P I I A - - - -T ND -DT R -M E E N E I Y A I
P I I A - - - -T ND -DT R -M E E N E I Y A I
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V P - - - -NG D -MT K -M E E G E T F A I
P I V R - - - -G G E -MT K -M E E G E F Y A I
P I V K - - - -G G E - S T R -M E E D E F Y A I
P I V K - - - -G G E - S T R -M E E D E F Y A I
P A V N - - - -NR D -T T R - I K G D S F Y A V
P I C K - - - -G G P -QT K -M E E G E V Y A L
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V K - - - -NG D -NT K -M E E N E H F A I
P I V K - - - -G G E -QA K -M E E G E V F A I
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V K - - - - S A D -QT K -M E E G E I Y A I
P NK H - - - -V E G -G V I - L K E G DV L A I
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V K - - - - S ND -QT K -M E E G DV F A I
P I V K - - - -G G E -QT K -M E E G E F Y A I
P I V K - - - -G G V -QT K -M E E G E Y Y A I
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V K - - - - S ND -NT L -MK E G E L Y A I
P I V A - - - -NG D -MT K -M E E G E T F A I
P I V R - - - -QK E -K N E I M E E G E L F A I
P I V K - - - -QK E - E N E I M E E G E L F A I
P I V K - - - -QQ E -K H E I M E E G D L F A I
P I V K - - - -G G E -QT K -M E E G E F F A I
P I V K - - - -G G E -A T R -M E E G E V Y A I
P I V K - - - -NG D -T T K -M E E G E H F A I
P I V K - - - -G G E - E I K -M E E G E I F A I
P I V K - - - -G G E -A T R -M E E N E F Y A I
P I V K - - - -G G E -A T R -M E E G DV Y A I
P C I R - - - -T G P -NV K -M E E G E QY A I
P I V G - - - -K S G -NR D I M E E G DV F A I
P I V G - - - -NT N -NR D I M E E G E V F A I
P I T K - - - -G G N -A E K -M E E G E L F A C
P I V R - - - -G G E -A T R -M E E G E L F A I
P I V R - - - -G G E -A I K -M E E G E L F A I
P I V N - - - -M P D L QV K -M E E G E Y Y A I
P I V K - - - - S A D -MT K -M E E G E T F A I
3130
E T F G S -T G R G L V
E T F G S -T G K G Y V
E T F G T -T G R G Y V
E T F G S -T G K G Y V
E T F G S -T G K G Y V
E T F G S -T G R G QV
E T F G S -T G K G V V
E T F G S -T G K G Y V
E T F G S -T G K G Y V
E T F G S -T G NG Y V
E T F G T -T G R G Y V
E T F G S -T G K G V V
E T F G S -T G K G Y V
E T F G S -T G R G R V
E T F A T -T G R G Y V
E T F A T -T G R G Y V
E T F G S -T G K G MV
E T F G S -T G NG Y V
E T F G S -T G R A QV
E T F G S -T G R G L V
E T F G S -T G R G L V
E T F A T -T G K G S I
E T F A T -T G R G R V
E T F G S -T G K G V V
E T F G S -T G K G V V
E T F G S -T G R G Y V
E T F G S -T G R G F V
E T F G S -T G K G V V
E T F G S -T G L G Y V
E P F A T -NG T G L V
E T F G S -T G K G V V
E T F G S -T G K G V V
E T F G S -T G NG Y V
E T F G S -T G K G F V
E T F G T -T G R G Y V
E T F G S -T G K G V V
E T F G S -T G K G Y V
E T F G S -T G R G Y V
E T F A S -T G K G Y V
E T F A S -T G K G Y V
E T F A S -T G K G F V
E T F A S -T G K G Y V
E T F G S -T G K G V V
E T F G S -T G R G Y V
E T F G S -T G R G V V
E T F G S -T G K G F V
E T F G S -T G R G A V
E T F G V I NG K A S I
E T F A T -T G S G T V
E T F A T -T G S G MV
E T F G S -T G K G I V
E T F G S -T G R G F V
E T F G S -T G R G V V
E T F G S -T G R G Y V
E T F G S -T G R G Y V
3140
3150
S HY MK D F DA P - - -K V P L R L
S HY MK N F DA G - - -HV P L R L
S HY A K N P G A L - - - - P A P T L
S HY A L I P DA P - - - S V P L R L
S HY A L I P DH S - - -QV P L R L
S HY MK N F DQQ - - - F V P L R L
S HY MK N F DV G - - -HV P I R L
S HY MK N F E L A D - E K I P L R L
S HY MK N F E L A D - E K I P L R L
S HY A MNK G V E - - -H L K P P S
S HY A R L P S DG - - - L P Q P N L
S HY MK N F DV G - - -HV P I R L
S HY MK N F D L A N -QHV P L R L
S HY A L N S A A P - - E K Y QG HH
S HY MK Y Y DN P F L N E N S T R L
S HY MK Y Y DN P F L N E N S T R L
S HY MK N F E V G - - -HV P I R L
S HY A K N P G T D - - -D I V V P G
S HY MK T DY - - - - -QT T V R L
S HY MK N F D L P - - - F V P L R L
S HY MK N F D L P - - -Y V P L R L
S H F V L NT Y K S - - - - - - -R K
S HY MV DA NA F - - -DY P V R D
S HY MK N F DV G - - -HV P I R L
S HY MK N F DV G - - -HV P I R L
S HY A K K P G S H - - - - P T P S L
S HY MMQ P G A E - - -V MQ L R S
S HY MK N F DV G - - -HV P I R L
S HY A K R A DA P - - -NV A L R L
E I Y S L I K - - - - - -K K P V R L
S HY MK N F DV G - - -HV P I R L
S HY MK N F DV G - - -HV P I R L
S HY A K R G DA A - - -K V D L R L
S HY MK N F DV G - - -HV P L R V
S HY MK N F DV G - - -HV P L R L
S HY MK N F DV G - - -HV P I R L
S HY MK D F Y A K - - - P T A V R V
S HY S R NQN I D - - -G I R V P S
S HY MR N P E K Q - - - F V P I R L
S HY MR N P DK Q - - - F V P I R L
S HY MR NR DV Q - - -Y A P I R L
S HY MK N F DV G - - -H I P L R L
S HY MK N F DV E - - -HV P I R L
S HY A R S A E DH - - -QV M P T L
S HY A K I P DA G - - -H I P L R L
S HY MK N F E V G - - -HV P L R M
S HY MK N F NV G - - -HV P I R L
S HY MK D F NK E - - -MV P L R Q
S HY MK N P N S I - - -Y A P I R L
S HY MK N P N S I - - -Y A P I R L
S H F MV A R N P P - - - - -T P R T
S HY MMV P G G E - - -K T QV R S
S HY MMV P G G D - - -K T Q L R S
S HY A R K K N L P - -K S I P I R V
S HY A K NV G V G - - -HV P L R V
3160
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
3170
3180
3190
3200
3210
3220
3230
Q S S K S L L G L - - - I NR N F G T L A F C K R W L DR A G A T K - -Y QMA L K D L C DK G I V E A Y P P L C DV K -G S Y T A QY E HT I M L R P T C K - E V V
P R A K Q L L A T - - - I NK N F S T L A F C R R Y L DR I G E T K - -Y L MA L K N L C D S G I V Q P Y P P L C DV K -G S Y V S Q F E HT I L L R P T C K - E V L
S R A K A L L R T - - - I DA N F G T L P WC R R Y L DR L G E DK - -Y M F A L NH L V K QG I V QDY P P L V DV E -G S Y T A Q F E HT I L L H P HK K - E V V
S S A K N L L NV - - - I NK N F G T L P F C R R Y L DR L G Q E K - -Y L L G L NN L V S S G I V QDY P P L C DV K -G S Y T A Q F E HT I L L R P T V K - E V I
S S A K N L L NV - - - I NK N F G T L P F C R R Y L DR L G QDK - -Y L L G L NN L V S S G I V QDY P P L C D I K -G S Y T A QY E HT I V L R P NV K - E V I
Q S S K Q L L NV - - - I NK N F G T L A F C K R W L E R A G A S R - -Y A MA L K D L C DK G V V DA Y P P L C D I K -G C Y T A Q F E HT I L L R P T C K - E V V
P R T K H L L NV - - - I N E N F G T L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I V D P Y P P L C D I K -G S Y T A Q F E HT I L L R P T C K - E V V
QK S K G L L S L - - - I DK N F S T L A F C R R W - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - QK S K G L L N L - - - I DK N F A T L A F C R R W I DR L G E T K - -Y L MA L K D L C DK G I V D P Y P P L C DV K -G C Y T A QW E HT I L MR P T V K - E V V
E R S K Q L L E T - - - I K QN F G T L P WC R R Y L E R T G E E K - -Y L F A L NQ L V R HG I V E E Y P P I V DK R -G S Y T A Q F E HT I L L H P HK K - E V V
A S A K Q L L K V - - - I DDH F G T L P WC R R Y L DR L G QDK - -Y L F A L N S L V K QG HV QDY P P L NDV I -G S Y T A QY E HT I L L H P HK K - E V V
P R T K H L L NV - - - I N E N F G T L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I V D P Y P P L C D I K -G S Y T A Q F E HT I L L R P T C K - E V V
P R A K R L L HV - - - I N E N F G T L A F C R R W I DR I G E T K - -Y L MA L K N L C D S G V V DA Y P P L C D I K -G C Y T A Q F E HT I L MR P NC K - E V V
Q S A K S L L A S - - -V K R N F G T L P F C R R Y L DHV G E K N - -Y L L A L NT L V R E D F I A DY P P L V D P Q P G A MT A Q F E HT I L L R P T C K - E V V
N S A K I L L G G - - - I NT H F G T L A F C R R W L DQ L G F NK - -HA L A L K S L V D S E I I R P Y P P L ND I P -G S F S S QM E HT I L L R P S C K - E V V
N S A K I L L G G - - - I NT H F G T L A F C R R W L DQ L G F NK - -HA L A L K S L V D S E I I R P Y P P L ND I P -G S F S S QM E HT I L L R P S C K - E V V
P R A K H L L NV - - -V N E N F G T L A F C R R W L DR L G E T K - -Y L MA L K N L C D L G I I D P Y P P L C DT K -G C Y T A Q F E HT I L L R P T C K - E V V
DK A K S L L NV - - - I N E N F G T L P WC R R Y L DR L G QDK - -Y L L A L NQ L V R A G I V QDY P P I V D I K -G S Y T A Q F E HT I L L H P HK K - E V V
P K A K Q L L QY - - - I NK NY DT L C F C R R W L DR A G E DK - -H I L A L NN L C D L G I I QR HA P L V D S K -G S Y V A QY E HT L L L K P T A K - E V L
Q S S K Q L L G T - - - I NK N F G T L A F C K R W L DR A G A T K - -Y QMA L K D L C DK G I V E A Y P P L C D I K -G C Y T A QY E HT I M L R P T C K - E V V
Q S S K Q L L G T - - - I NK N F G T L A F C K R W L DR A G A T K - -Y QMA L K D L C DK G I V E A Y P P L C D I K -G C Y T A QY E HT I M L R P T C K - E I V
L F NK D L I K V Y E F V K D S L G T L P F S P R H L DY Y G L V K G G S L K S V N L L T MMG L L T P Y P P L ND I D -G C K V A Q F E HT V Y L S E HG K - E V L
G NA K R L L HA - - - L DA N F K T L A F C R R Y V DK I G F A K - -WQM P F K F L V DDG C V NA Y P P L S DC H -G S Y V A Q F E HT I Y L K P T C K - E V L
P R A K H L L NV - - - I N E N F G T L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I V D P Y P P L C D I K -G S Y T A Q F E HT I L L R P T C K - E V V
P R T K H L L NV - - - I N E N F G T L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I V D P Y P P L C D I K -G S Y T A Q F E HT I L L R P T C K - E V V
S S A K N L L K V - - - I D E N F G T I P F C R R Y L DR L G E DK - -HV Y A L NT L V R QG I V E DY P P L ND I K -G S Y T A Q F E HT L I L H P HK K - E I V
E K A QQ L L K H - - - I HK S Y S T L A F C R K W L DR DG F DR - -H L MN L NR L V D E G A V NK Y P P L V DV K -G S F T A QY E HT I Y L G P T A K - E I L
P R T K H L L NV - - - I N E N F G T L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I V D P Y P P L C D I K -G S Y T A Q F E HT I L L R P T C K - E V V
T S A QK I L NV - - - I NK N F G T L P F C R R Y L DR L G QDK - -Y L L G L NN L V S NG I V E A Y P P L V DK K -G S Y T A QY E HT I L L R P T V K - E V I
P A V R NV L K Q - - -V - E E Y R E L P F A K R W L E - - - S DK - - L E F S L I Q L E K A G I L H S Y P V L V E S A -G G L V S QA E HT V I I T R DG C - E V T
P R A K H L L NV - - - I N E N F G T L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I V D P Y P P L C D I K -G S Y T A Q F E HT I L L R P T C K - E V V
P R T K H L L NV - - - I N E N F G T L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I V D P Y P P L C D I K -G S Y T A Q F E HT I L L R P T C K - E V V
S S A K S L L NV - - - I T K N F G T L P F C R R Y I DR L G QDK - -Y L L G - - - - - - -G I V E A Y P P L V DK K -G S Y T A HW L S T QR L K N S T A F Q I V
A K A K Q L L G T - - - I NNN F G T L A F C R R Y L DR L G E T K - -Y L MA L K N L C DV G I V Q P Y P P L C DV R -G S Y V S Q F E HT I L L R P T C K - E V I
P R A K Q L L G V - - - I DR N F G T L A F C K R Y L DR I G E QR - -Y S MA L K N L C DNG I V Q P Y P P L C D I K -G S Y V A QY E HT I L L K P S S G V E V L
P R T K H L L NV - - - I N E N F G T L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I V D P Y P P L C D I K -G S Y T A Q F E HT I L L R P T C K - E V V
P K A K S L L T H - - - I DNHY DT L A F C R R F L DR DG Q S N - -Y L L G L K N L C D L G I V N P Y P P L C D I R -G S Y V S QY E HT I F L K P S C I - E V I
E R A K T L L N S - - - I T S N F G T L P WC R R Y L E R T G E E K - -Y L F A L NQ L V R A G I V E E Y P P L V D I K -G S Y T A QY E HT I L L H P HK K - E V V
N S A K T L L K V - - - I NDN F DT L P F C NR W L DD L G QT R - -H F MA L K T L I D L N I V E P Y P P L C D I K -N S F T S QM E HT I L L R P T C K - E V L
N S A K T L L K V - - - I NDN F DT L P F C HR W L DD L G QK R - -H F MA L K T L V D L N I V E P Y P P L C DV K -N S F T S QM E HT I L L R P T C K - E V L
N S A K T L L K V - - - I NDK F DT L P F C NR W L DD L G QT R - -H F MA L K T L V D L N I V E P Y P P L C D I K -N S F T S QM E HT I L L R P T C K - E V L
P R A K Q L L A T - - - I NK N F S T L A F C R R Y L DR L G E T K - -Y L MA L K N L C D S G I I Q P Y P P L C DV K -G S Y V S Q F E HT I L L R P T C K - E V I
P R T K H L L NV - - - I N E N F DT L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I V D P Y P P L C D I K -G S Y T A Q F E HT I L L R P T C K - E V V
D S A K N L L K T - - - I DR N F G T L P F C R R Y L DR L G Q E K - -Y L F A L NN L V R HG L V QDY P P L ND I P -G S Y T A Q F E HT I L L HA HK K - E V V
P R A K A L L NT - - - I T QN F G T L P F C R R Y L DR I G E S K - -Y L L A L NN L V S A G I V QDY P P L C D I R -G S Y T A Q F E HT I I L H P T QK - E V V
QR S K A L L K V - - - I NNN F G T L A F C R R W L DR L G E T K - -Y L MA L K N L C DT G L V D P Y P P L C DV K -G C Y T A QY E HT I M L R P T Y K - E V V
P R A K H L L NV - - - I N E N F G T L A F C R R W L DR L G E S K - -Y L MA L K N L C D L G I I D P Y P P L C D I K -G S Y T A QY E HT I L L R P T Y K - E V V
P K A K N L L K F - - - I DNN F G T L A F C R R W L DR G G QT G - -H I L S L K Q L C DA G I V V P Y P P L V DV R -G S Y V A QY E HT I V L K P S HK - E V I
K S A R E S L NV - - - I NR E F S T L P F C K R W L DD L T NK R - -G S L V L R N L V DA G I I V P Y P P L C DNN -N S F T S QM E HT I L L R P T C K - E V L
K S A R E A L NV - - - I NR E F S T L P F C K R W L DD L T NR R - -G S MV L R S L V DA G I V V P Y P P L S DNN -H S F T S QM E HT I L L R P T C K - E V L
P A A R K L L K T - - - L Q E N F S T L A F S QR F I DR I G E K K - -Y Q L N L R H L V E C R A V HDY P S L S DV K -G S Y V A Q F E HT F I L L P T HK - E V L
DK A QQ L L R H - - - I HK T Y NT L A F A R K W L DR DG HDR - -H L L N L NQ L V E A G A V NK Y P P L C D I R -G C Y T A Q L E HT L I L K P T A K - E I L
E K A QH L L K H - - - I NK T Y G T L A F A R K W L DR DG Y DR - -H L L N L NQ L V E A G A V NR Y P P L C DV K -G C Y T A Q F E HT I L L K P T A K - E I L
H S A HG L L R T - - - I NK H F D S L P F C R R Y L DR V G E K N - -Y L L G L K H L V S L G V V QDY P P L C D I A -G S MT A QY E HT I L L R P T C K - E V V
NK A K Q L L A T - - - I DK N F G T L P F C R R Y L DR L G E E K - -Y L L A L K N L V Q S G V V QDY P P L V DQK -G C QT A QY E HT I Y L R P T C K - E I L
3240
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
3250
3260
3270
3280
3290
3300
3310
S R - - - - - - - -G DDY - - - - - - - - - - - - - - - - - - - -A QMQMK A R L A G A HK G HG L L K K K A DA L QMR F R M I L S K I I E T K T L MG E V - S K - - - - - - - -G DDY M - -A G QNA -R L NV V P T V T - -M L G V MK A R L V G A T R G HA L L K K K S DA L T V Q F R A L L K K I V T A K E S MG DM - S K - - - - - - - -G DDY M - - - - S NN -R E QV F P T R M - -T L G L MK S K L K G A NQG H S L L K R K S E A L T K R F R E I T R R I D E S K QR MG A V - S R - - - - - - - -G DDY M - - S G A V G -R E P V F P T R Q - - S L G L MK S K L K G A E T G H S L L K R K S E A L T K R F R E I T R R I D E A K QK MG R V - S R - - - - - - - -G DDY M S G F N P P G -R E A V F P T R Q - - S L G L MK G K L K G A E T G H S L L K R K S E A L T K R F R E I T R R I D E A K QK MG R V - S R - - - - - - - -G DDY - - - - - - - - - - - - - - - - - - - - - -M L I K G R L A G A V K G HG L L K K K A DA L QV R F R M I L S K I I E T K T L MG E V - S R - - - - - - - -G DDY M - - - S G K D -R I E I F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA L T L R F R Q I L K K I I E T K M L MG E V - - - - - - - - - - - - - - -M S G G G G K D -R I A V F P S R M - -A QT L MK T R L K G A QK G H S L L K K K A DA L N L R F R D I L K K I V E NK V L MG E V - S R - - - - - - - -G DDY M - S G G G K D -R I A V F P S R M - -A QT L MK T R L K G A QK G H S L L K K K A DA L N L R F R D I L R K I V E NK V L MG E V - T K - - - - - - - -G DDY M - - S G A G N -R E QV F P T R M - -T L G V MK S K L K G A QQG H S L L K R K S E A L T K R F R D I T QR I DDA K R K MG R V - S K - - - - - - - -G DDY M - - - - S G N -R E QV F P T R M - -T L G L MK T K L K G A NQG H S L L K R K S E A L T K R F R D I T K R I D E A K QK MG R V - S R - - - - - - - -G DDY M - - - S G K D -R I E I F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA L T L R F R Q I L K K I I E T K M L MG E V - S R - - - - - - - -G DDY M - - - S S HD -R I D I F P S R M - -N L T I MK T R L K G A HK G H S L L K K K A DA L K MK F H S I L R K I I E A K Q L MG E I - S R - - - - - - - -G DDY M - - S G T G P -R E A I F P T R M - -N L T L T K G R L K G A QT G H S L L A K K R DA L T T R F R Q I L R K V D E A K R L MG R V - S R - - - - - - - -G DD F - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -M L K E I V E T K R S I G ND - S R - - - - - - - -G DD F L - - - - - - - - - - - - - - - L I Y R A L QA I K L K S K G A K QG Y D L L K R K S DA L S NK F R G M L K E I V E T K R S I G ND - S R - - - - - - - -G DDY M - - - S G K E -R I D I F P S R M - -A QT I MK A R L K G A QT G R S L L K K K S DA L S MR F R Q I L R K I I E T K T L MG E V - S R - - - - - - - -G DDY M - - S G A G N -R E QV F P T R M - -T L G L MK G K L K G A QQG H S L L K R K S E A L T K R F R D I T QR I DDA K R K MG R V - S R - - - - - - - -G DDY M - - - S G K N -R L N I F P T R M - -A L T V MK T K L K G A V T G H S L L K K K S DA L T I R F R R I L A N I V E NK Q L MG T T - S R - - - - - - - -G DDY M - - - S G K D -R L P I F P S R G - -A QM L MK A R L A G A QK G HG L L K K K A DA L QMR F R L I L G K I I E T K T L MG DV - S R - - - - - - - -G DDY M - - - S G K D -R L P I F P S R G - -A QM L MK A R L A G A QK G HG L L K K K A DA L QMR F R M I L G K I I E T K T L MG DV - T R - - - - - - - -G DDY M - - - -T G E -R I P V F P T R M - -N L R T M E T K QK S A QK G H S L L K R K S DA L K V R Y R A V E D E Y K R K E L G I NQK - S R - - - - - - - - - - - -M - - - - S DK -R Y T V F P T R M - -Q L T T Y K G K L V G A QR G HD L L K R K T DA L NQK F K S I L K K I I E E K M S MK DY - S R - - - - - - - -G DDY M - - - S G K D -R I E I F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA L T L R F R Q I L K K I I E T K L L MG E V - S R - - - - - - - -G DDY M - - - S G K D -R I E I F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA L T L R F R Q I L K K I I E T K M L MG E V - S R - - - - - - - -G DDY M - - - - S S N -R E QV F P T R M - -T L G L MK T K L K G A NQG Y S L L K R K S E A L T K R F R D I T K R I DD S K QK MG R V - S K - - - - - - - -G S DY M - - - S S T S -R Y P A L P S R M - - S L I A F K T R L K G A QK G H S L L K K K A DA L S L R Y R T V MG E L R T A K L E MA NQ - S R - - - - - - - -G DDY M - - - S G K D -R I E I F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA L T L R F R Q I L K K I I E T K M L MG E V - S R - - - - - - - -G DDY M - - S G A G E -R E A V F P T R Q - - S L G I MK A K L K G A E T G H S L L K R K S E A L T K R F R E I T K R I D E A K R K MG R V - T K - - - - - - - - - - - -M - - - - - - -A QQDV K P T R S - - E L I N L K K K I K L S E S G HK L L K MK R DG L I L E F F K I L N E A R NV R T E L DA A - S R - - - - - - - -G DDY M - - - S T K D -R I D I F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA MT L R F R Q I L K K V I QT K V L MG E V - S R - - - - - - - -G DDY M - - - S G K D -R I E I F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA L T L R F R Q I L K K I I E T K M L MG E V - NQR R R V T E A I T E A E M - - S G A A D -R E A V F P T R Q - - S L - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E I T R R I D E A K R K MG R V - S R - - - - - - - -G DDY M - - S G QT Q -R L NV V P T V T - -M L G V MK A R L V G A T R G HA L L K K K S DA L T V Q F R A I L K K I V A A K E S MG E A - T R - - - - - - - -G E DY M - - S S A G A -R L NV T P T V T - -T L A V I K S R L A G A QR G HR L L K K K A DA L T L R Y R G I L R D I V E A K R K L A T S - S R - - - - - - - -G DDY M - - - S G K D -R I E I F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA L T L R F R Q I L K K I I E T K M L MG E V - S R - - - - - - - -G DDY M - - - - - - - -A E QV V P S R M - -N L A L Y K A K I I S A K K G H E L L K K K C DA L K T K F R I V MV A L L E NK K F MG D E - T R - - - - - - - -G DDY M - - S G A G N -R E QV F P T R M - -T L G L MK G K L K G A QQG H S L L K R K S E A L T K R F R D I T QR I DDA K R K MG R V - S R - - - - - - - -G P D F M - - -G A L D - E S T P V P S R I - -T L Q L MK QK K K S A F QG Y S L L K K K S DA L F I H F R DV L K D I V K T K T K V G E E - S R - - - - - - - -G P D F M - - -G A L D - E S T P V P S R I - -T L Q L MK QK K K S A F QG Y S L L K K K S DA L F I H F R DV L K D I V K T K T K V G E E - S R - - - - - - - -G P D F M - - -G A L D - E S T P V P S R I - -T L H L MK QK K K S A F QG Y S L L K K K S DA L F I H F R DV L K D I V K T K NK V G E D - S R - - - - - - - -G DDY M - - S G S G Q -R L NV V P T V T - -V L G V V K A R L V G A T R G HA L L K K K S DA L T V Q F R Q I L K K I V S T K E S MG DK - S R - - - - - - - -G DDY M - - - S G K D -R I E I F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA L T L R F R Q I L K K I I E T K M L MG E V - S K - - - - - - - -G DDY M - - - - S G N -R E QV F P T R M - -T L G L MK T K L K G A NQG Y S L L K R K S E A L T K R F R D I T K R I DDA K QK MG R V - S R - - - - - - - -G DDY M - - -A S K Q -R E NV F P T R M - -T L T T MK T R L K G A QT G H S L L K R K S E A L K K R F R E I V V N I E QA K QK MG R V - S R - - - - - - - -G DDY M - - - - S K D -R I A V F P S R M - -A L T T MK I R L K G A QK G H S L L K K K A DA L T L K F R Q I L G K I I E NK T L MG E A - S R - - - - - - - -G E DY M - - - S G K E -R I DV F P S R M - -A QT I MK A R L K G A QT G R N L L K K K S DA L S MR F R Q I L R K I I E V S W L S S A I P I
S R - - - - - - - -G DDY M - - - - - - - - S QQ I T P S R M - -T L A I Y K A K T V S A K K G H E L L K K K C DA L K T K F R A I M I A L L E NK L K MD E E - S R - - - - - - - -G DDY M - - - - S S L - S V L L I P S R M - -N L QN L K QR R HNA H L G Y S L L K R K S DA L T S K F HR L L R A T V QG K E R L V E G - S R - - - - - - - -G DDY M - - - - S N L - S V L L I P S R M L V N L QN L K QR R HNA H L G Y S L L K R K S DA L T S K F HR L L R A T V QG K E R L V E G - S R - - - - - - - -G DDY M - - - - - - - - -A A I I P T R M - - E L QN L K E K L K G A R K G Y D L L K K K S DA L T MK F R S L L R E I R DT K L S V G NV - S K - - - - - - - -G DDY M - - - - S S N -R Y T A L P S R M - - S L I A F K T R L K G A QK G H S L L K K K A DA L A F R Y R T V MD E L R R A K L E V A DQ - S K - - - - - - - -G DDY M - - - - S S N -R Y P A L P S R M - - S L I S F K T R L K G A QK G H S L L K K K A DA L A I R Y R A I MG D L R NA K M E MV E Q - S R - - - - - - - -G T DY M - S S G K G Q -R E S V F P T R Q - -A L G S A K T R L K G A QT G H S L L K K K A DA L T K R F R T I T HK I D E A K R K MG R V - S R - - - - - - - -G DDY M - - - S A NN -R E A V F P T R M - -T L G MMK G K L K G A T QG HN L L K R K S E A L T K R F R D I T R K I D E S K HK MG R V - -
3330
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
3340
3350
3360
3370
3380
3390
3400
P V F G L A K G G QQ L QK L K K NY Q S A V K L L V E L A S L QT S
P K F G L A R G G QQV R A C R V A Y V K A I E V L V E L A S L QT S
P Q F G L G R G G QQV QR A K N I Y T K V V E S L V Q L A S L QT A
P Q F G L G K G G QQV QR C R E T Y A R A V E T L V E L A - - - - P H F G L G K G G MQV QR C R E T Y A R A V E T L V E L A S L QT A
P I F G L A R G G QQ L A K L K K N F Q S A V K L L V E L A S L QT S
P V F G L A R G G E Q L A K L K R NY A K A V E L L V E L A S L QT S
P V F G L G K G G A N I A R L K K NY NK A I E L L V E L A T L QT C
P V F G L G K G G A N I A R L K K NY NK A I E L L V E L A T L QT C
P T F A L A R G G QQV QK A K L I Y S K A V E T L V E L A S L QT A
P Q F G L G R G G QQV QR A K D I Y S K A V E T L V E L A S L QT A
P V F G L A R G G E Q L A K L K R NY A K A V E L L V E L A S L QT S
P V F G L S R G G E Q L S R L K K NY S K A V K L L V E L A S L QT S
P A F G L S R G G QQ I QK S R DT Y I K A V G T L V E L A S L QT A
P I F G V A S G G QV I Q S T R E I Y MK V L R D L V K L A S L QT A
P I F G V A S G G QV I Q S T R E I Y MK V L R D L V K L A S L QT A
P V F G L A R G G E Q L S R L K R NY A K A V E L L V E L A S L QT S
P T F G L G R G G QQV QK A K MV Y T K A V E T L V E L A S L QT A
P T F G L S K G G QQ I NK S R E S H I K A V E A L I A L A S L QT A
P V F G L A R G G QQ L A K L K K NY Q S A V K L L V E L A S L QT S
P V F G L A R G G QQ L A K L K K NY Q S A V K L L V E L A S L QT S
P F F F L DR S G Q S L N E C R E K F L E V L E M L V D L C A L K N S
P V F G L S K G G Q S V A NA R QQY L K A L D S L V K L A S L QT A
P V F G L A R G G E Q L A K L K R NY A K A V E L L V E L A S L QT S
P V F G L A R G G E Q L A K L K R NY A K A V E L L V E L A S L QT S
P Q F G L G R G G QQV QR A K E I Y S R A V E T L V E L A S L QT A
P S F G L G K G G E Q I K E A Y S A F R HT L S L L V K I A S L QT S
P V F G L A R G G E Q L A K L K R NY A K A V E L L V E L A S L QT S
P A F G L G K G G QQV QR C R E T Y A R A V E A L V E L A S L QT A
P K I G I I G T N S Y I D E T A DA Y E E L V E K I I A A A E L E T T
P V F G L A R G G E QV T K L K K NY G K A V E L L V E L A S L QT S
P V F G L A R G G E Q L A K L K R NY A K A V E L L V E L A S L QT S
P A F G L G K G G QQV QR C R E T Y A R A V E A L V E L A S L QT A
P K F G L A R G G QQV A A C R A A HV K A I E V L V E L A S L QT S
P K F G L A R G G A R V R E A K A S Y G E A I G L L S E L A S L QT A
P V F G L A R G G E Q L A K L K R NY A K A V E L L V E L A S L QT S
P E V G L A R G G Q S I QR C R DK F K D L L M L L V K I A S Y QT S
P T F G L G R G G QQV QK A K L V Y T R A V E T L V E L A S L QT A
P I F G V A A G G QV I NNT R E NY L QC L NM L V K L A S MQV A
P I F G I A S G G QV I NNT R E NY L QC L NM L V K L A S MQV A
P I F G V A A G G QV I NNT R E NY L QC L NM L V K L A S MQV A
P K F G L A R G G QQV QA C R A A Y V K A I E V L V E L A S L QT S
P V F G L A R G G E Q L A K L K R NY A K A V E L L V E L A S L QT S
S Q F G L G R G G QQV QR A K E I Y S R A V E T L V E L A S L QT A
P T F G L G K G G QQ I QK A R QV Y E K A V E T L V Q L A S Y Q S A
P V F G L S R G G QQ I DR L K K NY A K A I E L L V E L A S L QT S
P V F G L A K G G E Q I S R L K R NY A R A V E L L V E L A S L QT S
P N L G L DK G G F S I QK A K E R F K E A L Y L L V K V A S L QT S
PV F S L S SGG SA I Q SV K T T H LAA LD I LV E LA S LQ I S
PV F S L S SGG SA I Q SV K T T H LAA LD I LV E LA S LQ I S
P Q F G L A R G G QQ I QK A R E E F T K F L D S L V R L A E L QT A
P A F G I G R G G E Q L R E A R DA F R E T L K L F V K I A S L QV S
P S F G I G R G G E Q L R E A S E K F R E T L R L L V K I A S L QV S
P A F G L S R G G QQV S K A R E V Y T QA L K V L V E L A S L QT A
P S F G L G R G G QQV QK A K A V Y S K A V E T L V E L A S L QT A
3410
Anopheles_gambiae/1-3220
Arabidopsis_thaliana/1-3286
Ashbya_gossypii/1-3281
A_fumigatus/1-3241
Aspergillus_niger/1-3300
Bombyx_mori/1-2389
Bos_taurus/1-3273
C_briggsae/1-3231
Caenorhabditis_elegans/1-3289
Candida_albicans/1-3266
Candida_glabrata/1-3282
Canis_familiaris/1-3286
Ciona_intestinalis/1-2712
Cryptococcus_neoformans/1-3301
Cryptosporidium_hominis/1-3020
C_parvum/1-3281
Danio_rerio/1-3286
Debaryomyces_hansenii/1-3166
Dictyostelium_discoideum/1-3275
D_melanogaster/1-3287
Drosophila_pseudoobscura/1-3286
Encephalitozoon_cuniculi/1-3199
Entamoeba_histolytica/1-3271
Gallus_gallus/1-3286
Homo_sapiens/1-3286
Kluyveromyces_lactis/1-3281
Leishmania_major/1-3286
Macaca_mulatta/1-3182
Magnaporthe_grisea/1-3286
Methanosarcina_acetivorans/1-3043
Monodelphis_domestica/1-3196
Mus_musculus/1-3286
Neurospora_crassa/1-3264
Oryza_sativa/1-3218
Ostreococcus_lucimarinus/1-3287
Pan_troglodyte/1-3286
Paramecium_tetraurelia/1-3282
Pichia_stipitis/1-3290
P_falciparum/1-3285
P_knowlesi/1-3284
Plasmodium_yoelii/1-3285
Populus_trichocarpa/1-3286
Rattus_norvegicus/1-3286
Saccharomyces_cerevisiae/1-3284
Schizosaccharomyces_pombe/1-3284
Strongylocentrotus_purpuratus/1-3252
Takifugu_rubripes/1-2930
Tetrahymena_thermophila/1-3270
T_annulata/1-3248
Theileria_parva/1-3273
Trichomonas_vaginalis/1-3198
T_brucei/1-3264
Trypanosoma_cruzi/1-3266
Ustilago_maydis/1-3291
Yarrowia_lipolytica/1-3278
3420
3430
3440
3450
3460
F V T L D E V I K I T NR R V NA I E H - - - - -V I I P R I DR T L A Y I I S E L D E L E R E E F Y R L K K I QDK K
F L T L D E A I K T T NR R V NA L E N - - - - -V V K P K L E NT I S Y I K G E L D E L E R E D F F R L K K I QG Y K
F V I L D E V I K V T NR R V NA I E H - - - - -V I I P R T E NT I A Y I N S E L D E L DR E E F Y R L K K V Q E K K
- - - -N E V I K V V NR R V - S T S L - - - - - S L E P R T L S N - - - - - - - - - - - - - - - - - - - - - - - - - F V I L D E V I K V V NR R V NA I E H - - - - -V I I P R T E NT I K Y I N S E L D E L DR E E F Y R L K K V S G K K
F V T L D E V I K I T NR R V NA I E H - - - - -V I I P R L E R T L A Y I I S E L D E L E R E E F Y R L K K I QDK K
F V T L D E A I K I T NR R V NA I E HG E F K L P F C P R L H P C L R P A R T QA - - - - - - - - - - - - - - - - - F I T L D E A I K V T NR R V NA I E H - - - - -V I I P R I E NT L T Y I V T E L D E M E R E E F F R MK K I QA NK
F I T L D E A I K V T NR R V NA I E H - - - - -V I I P R I E NT L T Y I V T E L D E M E R E E F F R MK K I QA NK
F I I L D E V I K I T NR R V NA I E H - - - - -V I I P R T E NT I A Y I NG E L D E MDR E E F Y R L K K V Q E K K
F I I L D E V I K V T NR R V NA I E H - - - - -V I I P R T E NT I A Y I N S E L D E L DR E E F Y R L K K V Q E K K
F V T L D E A I K I T NR R V NA I E H - - - - -V I I P R I E R T L A Y I I T E L D E R E R E E F Y R L K K I Q E K K
F V T L D E S I K I T NR R V NA I E H - - - - -V I I P K I E R T I S Y I I T E L D E G E R E E F F R L K K I QQK K
F T I L D E V I R A T NR R V NA I E H - - - - -V V I P R L E NT I K Y I N S E L D E MDR E E F F R L K K V QG K K
F F S L D E E I K MT NR R V NA L QN - - - - -V V L P K L E DG MNY I L R E L D E I E R E E F F R L K K I Q E K K
F F S L D E E I K MT NR R V NA L QN - - - - -V V L P K L E DG MNY I L R E L D E I E R E E F F R L K K I Q E K K
F V T L D E A I K I T NR R V NA I E H - - - - -V I I P R I E R T L T Y I I T E L D E R E R E E F Y R L K K I Q E K K
F I I L D E V I K V T NR R V NA I E H - - - - -V I I P R T E NT I S Y I N S E L D E L DR E E F Y R L K K V Q E K K
F I T L D E V I K I T NR R V NA I E Y - - - - -V V K P K L E NT I S Y I I T E L D E S E R E E F Y R L K K V QG K K
F V T L D E V I K I T NR R V NA I E H - - - - -V I I P R I DR T L A Y I I S E L D E L E R E E F Y R L K K I QDK K
F V T L D E V I K I T NR R V NA I E H - - - - -V I I P R I DR T L A Y I I S E L D E L E R E E F Y R L K K I QDK K
F R V L N S I L M S T NR R V NA L E F - - - - -N I I P R L E NT V S Y I V S E L D E QDR G D F F R L K K V QN L K
F L T L DT V I K I T NR R V NA L E H - - - - -V V I P MT QA T V K Y I E T E L D E S E R E E F F R L K L I QNK K
F V T L D E A I K I T NR R V NA I E H - - - - -V I I P R I E R T L S Y I I T E L D E R E R E E F Y R L K K I Q E K K
F V T L D E A I K I T NR R V NA I E H - - - - -V I I P R I E R T L A Y I I T E L D E R E R E E F Y R L K K I Q E K K
F I I L D E V I K V T NR R V NA I E H - - - - -V I I P R T E NT I A Y I N S E L D E L DR E E F Y R L K K V Q E K K
W I T L D I A QK V T S R R V NA L E K - - - - -V V I P R V QNT L S Y I T S E L D E Q E R E E F F R L K MV QK K K
F V T L D E A I K I T NR R V NA I E H - - - - -V I I P R I E R T L A Y I I T E L D E R E R E E F Y R L K K I Q E K K
F V I L D E V I K V V NR R V NA I E H - - - - -V I I P R T E NT I K Y I N S E L D E L DR E E F Y R L K K V A G K K
MK R L L D E I E K T K R R V NA L E F - - - - -K V I P E L I A T MK Y I R F M L E E M E R E NT F R L K R V K A R M
F I T L D E A I K I T NR R V NA I E H - - - - -V I I P R I E R T L NY I V T E L D E R E R E E F Y R L K K I Q E K K
F V T L D E A I K I T NR R V NA I E H - - - - -V I I P R I E R T L A Y I I T E L D E R E R E E F Y R L K K I Q E K K
F V I L D E V I K V V NR R V NA I E H - - - - -V I I P R T E NT I K Y I N S E L D E L DR E E F Y R L K K V A A K K
F L T L D E A I K T T NR R V NA L E N - - - - -V V K P R L E NT I S Y I K G E L D E L E R E D F F R L K K I QG Y K
F V T L D E A I K T T NR R V NA L E N - - - - -Y V T P R L QNT V K Y I L S E L D E L E R E E F F R L K K V QA K K
F V T L D E A I K I T NR R V NA I E H - - - - -V I I P R I E R T L A Y I I T E L D E R E R E E F Y R L K K I Q E K K
F V S L DQV I K V T NR R V NA L E Y - - - - -V V I P R F T A T MNY I DM E L D E M S K E D F F R L K K V L DNK
F I I L D E V I K V T NR R V NA I E H - - - - -V I I P R T E NT I S Y I N S E L D E L DR E E F Y R L K K V Q E K K
F F S L D E E I K MT NR R V NA L NN - - - - - I V L P R L DG G I NY I I K E L D E I E R E E F Y R L K K I K E K K
F F S L D E E I K MT NR R V NA L NN - - - - - I V L P R L DG G I NY I I K E L D E I E R E E F Y R L K K I K E K K
F F S L D E E I K MT NR R V NA L NN - - - - - I V L P R L E G G I NY I I K E L D E I E R E E F Y R L K K I K E K K
F MT L DT A I K T T NR R V NA L E N - - - - -V V K P R L E NT I T Y I K G E L D E L E R E D F F R L K K I QG F K
F V T L D E A I K I T NR R V NA I E H - - - - -V I I P R I E R T L A Y I I T E L D E R E R E E F Y R L K K I Q E K K
F I I L D E V I K V T NR R V NA I E H - - - - -V I I P R T E NT I A Y I N S E L D E L DR E E F Y R L K K V Q E K K
F V L L G DV L QMT NR R V N S I E H - - - - - I I I P R L E NT I K Y I E S E L E E L E R E D F T R L K K V QK T K
F I T L D E V I K I T NR R V NA I E H - - - - -V I I P R I E NT I S Y I T T E L D E R E R E E F Y R L K K I Q E K K
F V T L D E A I K I T NR R V NA I E H - - - - -V I I P R I DR T L T Y I V T E L D E R E R E E F Y R L K K I Q E K K
F I T L D E V I K V T NR R V NA L E H - - - - -V V I P R F M E V QA Y I NQ E L D E M S R E D F F R L K K V L D F K
F I I L N E E I R MT NR R I NA L DN - - - - -V L I P S I DR N L E Y I R R E L D E M E R E E F Y R L K M I K K HK
F I I L N E E I R MT NR R I NA L DN - - - - -V L I P S I DR N L E Y I R R E L D E M E R E E F Y R L K M I K K HK
F NV I DDV L R I T NR R V NA M E C - - - - -V L I P K Y QA A I A F V D S T L D E N E R E E F F R L K K V Q E T I
WMT L DV A QK V T S R R V NA L E K - - - - -V V I P R M E NT L NY I S S E L D E Q E R E E F F R L K M I QK K K
WV T L D L A QK V T NR R V NA L E K - - - - -V V I P R V QNT L S Y I T S E L D E Q E R E E F F R L K MV QK K K
F V I L D E V I R MT NR R V NA I E H - - - - -V I I P R L E NT I S Y I V S E L D E A DR E E F F R L K K V QA K K
F V I L D E V I K I T NR R V NA I E H - - - - -V I I P R T E NT I K Y I N S E L D E L DR E E F Y R L K K V QDK K