Documente Academic
Documente Profesional
Documente Cultură
Fall 2003
Evolutionary Analysis
Fiona Brinkman
Simon Fraser University,
Greater Vancouver, BC, Canada
What do
• BLAST
• Protein motif searching
• Protein threading
• Multiple sequence alignment
Have in common?
1
Current Topics in Genome Analysis
Fall 2003
2
Current Topics in Genome Analysis
Fall 2003
3
Current Topics in Genome Analysis
Fall 2003
• Discoveries of fossils
accumulated
– Remains of unknown but
still living species that are
elsewhere on the planet?
– Cuvier (circa 1800): the
deeper the strata, the less
similar fossils were to
existing species
4
Current Topics in Genome Analysis
Fall 2003
5
Current Topics in Genome Analysis
Fall 2003
What is evolution?
6
Current Topics in Genome Analysis
Fall 2003
Characters
• Heritable changes in features (morphology,
DNA sequence etc…)
A Unique Character:
Hair for Mammals
• Hair evolved only once and is “unreversed”
• Presence of hair ‡ strong indication that
organism is a mammal
7
Current Topics in Genome Analysis
Fall 2003
Homoplasy:
The formation of tails
• Tails evolved independently in the ancestors
of frogs and humans
• Presence of a tail ‡ no useful conclusions
bioinformatics
bioinfortatics
bioinfortatios time
oinformatios
informatios
infortation
information
8
Current Topics in Genome Analysis
Fall 2003
All share the same deleted sequence region, which is not found
in any other transporter examined to date
‡Unique character?
‡Non-unique.
9
Current Topics in Genome Analysis
Fall 2003
Classification according to
characters – more characters can
be good
Classification according to
characters
10
Current Topics in Genome Analysis
Fall 2003
11
Current Topics in Genome Analysis
Fall 2003
VTISCTGSSSNIGAG-NHVKWYQQLPG
VTISCTGTSSNIGS--ITVNWYQQLPG The sole purpose
of multiple
LRLSCSSSGFIFSS--YAMYWVRQAPG sequence
LSLTCTVSGTSFDD--YYSTWVRQPPG alignments is to
place homologous
PEVTCVVVDVSHEDPQVKFNWYVDG--
positions of
ATLVCLISDFYPGA--VTVAWKADS-- homologous
AALGCLVKDYFPEP--VTVSWNSG--- sequences into the
same column.
VSLTCLVKGFYPSD--IAVEWESNG--
12
Current Topics in Genome Analysis
Fall 2003
in ‡ out!
- ATTYNETCITRTQ -
- SITYNETCVTITQ -
- SVTY-----CIVR -
13
Current Topics in Genome Analysis
Fall 2003
14
Current Topics in Genome Analysis
Fall 2003
15
Current Topics in Genome Analysis
Fall 2003
gh
16
Current Topics in Genome Analysis
Fall 2003
ClustalX
17
Current Topics in Genome Analysis
Fall 2003
Statistics Report
1: # residues identical
2: # residues > zero score (similar residues)
3: # residues lined up with a gap
18
Current Topics in Genome Analysis
Fall 2003
ILPITSPSKEGYESGKAPDEFSSGG
ILPEH--IKDDGELGAAPHSFSTAG
VLPLD-----S--AGRPADSFSAAG
VLPVDR-------DGQARDEYT-VG
VLPVDN-------KGEARDEYT-VG
LLPYDD-------QGRPQDDYSRAG
GIVSRSG---SNFDGEPKDSYGKVG
Delete?
19
Current Topics in Genome Analysis
Fall 2003
Evolution
A phylogenetic tree
A node
Human
A clade
Mouse
Fly
20
Current Topics in Genome Analysis
Fall 2003
A node
4
Human
A clade
2
Mouse
3
Fly
1
Phylogenetic analysis
• Organismal relationships
• Gene/Protein relationships
21
Current Topics in Genome Analysis
Fall 2003
Organismal relationships
22
Current Topics in Genome Analysis
Fall 2003
23
Current Topics in Genome Analysis
Fall 2003
rRNA genes
Multiple genes
rRNA genes!
Gene/Protein Relationships
24
Current Topics in Genome Analysis
Fall 2003
Homologs
Gene Duplication
Homologs
25
Current Topics in Genome Analysis
Fall 2003
Example Problem:
• Therefore, current best hit may be a paralog now and the true ortholog
not yet sequenced
• Assumption:
- Mouse and Human gene datasets are more complete, with more true
orthologs identified
26
Current Topics in Genome Analysis
Fall 2003
Bunch
of
Eukaryotes
Two
bacteria
Two
Eukaryotes
Bunch
of
Eukaryotes
A bacteria
Bunch
of
bacteria
27
Current Topics in Genome Analysis
Fall 2003
2 Forms in 1 Species
+ + ++ +
Gene present in
+ common ancestor
28
Current Topics in Genome Analysis
Fall 2003
Loss
Loss
29
Current Topics in Genome Analysis
Fall 2003
Gene originates
here
Gene lost
here
30
Current Topics in Genome Analysis
Fall 2003
Unusual Distribution -
Evolutionary Rate Variation -?
Unusual Distribution -
Incomplete Data
+/- +/- + +
31
Current Topics in Genome Analysis
Fall 2003
32
Current Topics in Genome Analysis
Fall 2003
• Parsimony
• Neighbor-joining
• Maximum Likelihood
Parsimony
• “Shortest-way-from-A-to-B” method
• The tree implying the least number of changes in
character states (most parsimonious) is the best.
• Note:
– May get more than one tree
– No branch lengths
– Uses all character data
33
Current Topics in Genome Analysis
Fall 2003
Neighbor-joining
(and other distance matrix methods)
• “speedy-and-popular” method
• distance matrix constructed
• distance estimates the total branch length between
a given two species/genes/proteins
• Neighbor-joining approach: Pairing those
sequences that are the most alike and using that
pair to join to next closest sequence.
34
Current Topics in Genome Analysis
Fall 2003
Maximum Likelihood
• “Inside-out” approach
• produces trees and then sees if the data could
generate that tree.
• gives an estimation of the likelihood of a
particular tree, given a certain model of nucleotide
substitution.
• Notes:
– All sequence info (including gaps) is used
– Based on a specific model of evolution – gives
probability
– Verrrrrrrrrrrry slow (unless topology of tree is known)
35
Current Topics in Genome Analysis
Fall 2003
Bootstrapping
The number of times a
particular branch is formed
in the tree (out of the X
times the analysis is done)
can be used to estimate its
probability, which can be
indicated on a consensus tree
Parametric Bootstrapping
36
Current Topics in Genome Analysis
Fall 2003
PHYLIP
http://evolution.genetics.washington.edu/phylip.html
PAUP
http://paup.csit.fsu.edu/
MEGA 2.1
www.megasoftware.net/
TREEVIEW
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
37
Current Topics in Genome Analysis
Fall 2003
Challenges
How do we classify?
38
Current Topics in Genome Analysis
Fall 2003
Computational Challenges
More Challenges
39
Current Topics in Genome Analysis
Fall 2003
Evolution
Evolutionary theory
is evolving
40