Introduction To Bioinformatics-5

Introduction to
Bioinformatics
Introduction to Bioinformatics.
LECTURE 5: Variation within and between
species
*
Chapter 5: Are Neanderthals among us?
Neandertal, Germany, 1856

Initial interpretations:
* bear skull
* pathological idiot
* Old Dutchman ...
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
5.1 Variation in DNA sequences

* Even closely related individuals differ in genetic sequences
* (point) mutations : copy error at certain location
* Sexual reproduction diploid genome
5.1 VARIATION IN DNA SEQUENCES
Diploid chromosomes
Mitosis: diploid reproduction
Meiosis: diploid (=double) haploid (=single)
10
* typing error rate very good typist: 1 error / 1K typed letters

* all our diploid cells constantly reproduce 7 billion letters
* typical cell copying error rate is ~ 1 error /1 Gbp
11
GERM LINE
Reverse time and follow your cells:
Now you count ~ 1013 cells
One generation ago you had 2 cells somewhere in your parents body
Small T generations ago you had (2T multiple ancestors) cells
Large T generations ago you counted #(fertile ancestors) cells
Congratulations: you are 3.4 billion years old !!!
Fast-forward time and follow your cells:

Only a few cells in your reproductive organs have a chance to live on
in the next generations
The rest (including you) will die
12
GERM LINE MUTATIONS

This potentially immortal lineage of (germ) cells is
called the GERM LINE
All mutations that we have accumulated are en route on
the germ line
13
* Polymorphism : multiple possibilities for a nucleotide: allelle

* Single Nucleotide Polymorphism SNP (snip) point mutation
example: AAATAAA vs AAACAAA
* Humans: SNP = 1/1500 bases = 0.067%
* STR = Short Tandem Repeats (microsatelites)
example: CACACACACACACACACA
* Transition - transversion
14
Purines Pyrimidines
15
Transitions Transversions
16
5.2 Mitochondrial DNA

* mitochondriae are inherited only via the maternal line!!!
* Very suitable for comparing evolution, not reshuffled
17
5.2 MITOCHONDRIAL DNA
H.sapiens mitochondrion
18
EM photograph of H. Sapiens mtDNA

19
20
5.3 Variation between species

* genetic variation accounts for morphologicalphysiological-behavioral variation
* Genetic variation (c.q. distance) relates to phylogenetic
relation (=relationship)
* Necessity to measure distances between sequences: a
metric
21
5.3 VARIATION BETWEEN SPECIES
Substitution rate
* Mutations originate in single individuals
* Mutations can become fixed in a population
* Mutation rate: rate at which new mutations arise
* Substitution rate: rate at which a species fixes new mutations
* For neutral mutations
22
5.3 VARIATION BETWEEN SPECIES
Substitution rate and mutation rate

* For neutral mutations
* = 2N*1/(2N) =
* = K/(2T)
23
5.4 Estimating genetic distance

* Substitutions are independent (?)
* Substitutions are random
* Multiple substitutions may occur
* Back-mutations mutate a nucleotide back to an earlier value
24
5.4 ESTIMATING GENETIC DISTANCE
Multiple substitutions and Back-mutations

conceal the real genetic distance
GACTGATCCACCTCTGATCCTTTGGAACTGATCGT
TTCTGATCCACCTCTGATCCTTTGGAACTGATCGT
TTCTGATCCACCTCTGATCCATCGGAACTGATCGT
GTCTGATCCACCTCTGATCCATTGGAACTGATCGT
observed : 2 (= d)
actual :
4 (= K)
25
* Saturation: on average one substitution per site

* Two random sequences of equal length will match
for approximately of their sites
* In saturation therefore the proportional genetic
distance is
26
* True genetic distance (proportion): K

* Observed proportion of differences: d
* Due to back-mutations K d
27
SEQUENCE EVOLUTION is a Markov process: a

sequence at generation (= time) t depends only the
sequence at generation t-1
28
The Jukes-Cantor model

Correction for multiple substitutions
Substitution probability per site per second is
Substitution means there are 3 possible replacements
(e.g. C {A,G,T})
Non-substitution means there is 1 possibility
(e.g. C C)
29
5.4 THE JUKES-CANTOR MODEL
Therefore, the one-step Markov process has the following
transition matrix:
MJC =
1-
/3
/3
/3
/3
1-
/3
/3
/3
/3
1-
/3
/3
/3
/3
1-
30
After t generations the substitution probability is:
M(t) = MJCt
Eigen-values and eigen-vectors of M(t):
1 = 1, (multiplicity 1):
v1 = 1/4 (1 1 1 1)T
2..4 = 1-4/3, (multiplicity 3): v2 = 1/4 (-1 -1 1 1)T

v3 = 1/4 (-1 -1 -1 1)T
v4 = 1/4 (1 -1 1 -1)T
31
Spectral decomposition of M(t):
MJCt = i itviviT
Define M(t) as:
MJCt =
r(t)
s(t)
s(t)
s(t)
s(t)
r(t)
s(t)
s(t)
s(t)
s(t)
r(t)
s(t)
s(t)
s(t)
s(t)
r(t)
Therefore, substitution probability s(t) per site after t

generations is:
s(t) = - (1 - 4/3)t
32
substitution probability s(t) per site after t generations:
s(t) = - (1 - 4/3)t
observed genetic distance d after t generations s(t) :
d = - (1 - 4/3)t
For small :
3
t
ln 1 43 d
4
33
For small the observed genetic distance is:
3
t
ln 1 43 d
4
The actual genetic distance is (of course):
K = t
So:
K 34 ln 1 43 d
This is the Jukes-Cantor formula : independent of and t.

34
The Jukes-Cantor formula :
K 34 ln 1 43 d
For small d using ln(1+x) x :

Kd
So: actual distance observed distance
For saturation: d :
K
So: if observed distance corresponds to random sequencedistance then the actual distance becomes indeterminate
35
Jukes-Cantor
36
Variance in K
2
K
K
2
2
If: K = f(d) then: 2
d
d
d
K
So: Var ( K ) d Var (d )
Generation of a sequence of length n with substitution rate

n k
d is a binomial process: Prob(k ) d (1 d ) n k
k
and therefore with variance: Var(d) = d(1-d)/n
K
1
Because of the Jukes-Cantor formula:
d 1 43 d
37
Variance in K
Variance: Var(d) = d(1-d)/n
K
1
Jukes-Cantor:
d 1 43 d
So:
d (1 d )
Var ( K )
n(1 43 d ) 2
38
Var(K)
39
EXAMPLE 5.4 on page 90

* Create artificial data with n = 1000: generate K* mutations
* Count d
* With Jukes-Cantor relation reconstruct estimate K(d)
* Plot K(d) K*
40
5.4 EXAMPLE 5.4 on page 90
41
42
43
5.4 EXAMPLE 5.4 on page 90 (= FIG 5.3)
44
The Kimura 2-parameter model

Include substitution bias in correction factor
Transition probability (GA and TC) per site per second
is
Transversion probability (GT, GC, AT, and AC) per
site per second is
45
5.4 THE KIMURA 2-PARAM MODEL
The one-step Markov process substitution matrix
now becomes:
MK2P =
1--
1--
1--
1--
46
After t generations the substitution probability is:
M(t) = MK2Pt
Determine of M(t):
eigen-values {i}
and eigen-vectors {vi}
47
Spectral decomposition of M(t):
MK2Pt = i itviviT
Determine fraction of transitions per site after t
generations : P(t)
Determine fraction of transitions per site after t
generations : Q(t)
Genetic distance: K - ln(1-2P-Q) ln(1 2Q)
Fraction of substitutions d = P + Q Jukes-Cantor
48
Other models for nucleotide evolution

* Different types of transitions/transversions
* Pairwise substitutions GTR (= General Time Reversible) model
* Amino-acid substitutions matrices
*
49
Other models for nucleotide evolution

DEFICIT:
all above models assume symmetric substitution probs;
prob(AT) = prob(TA)
Now strong evidence that this assumption is not true
Challenge: incorporate this in a self-consistent model
50
5.5 CASE STUDY: Neanderthals

* mtDNA of 206 H. sapiens from different regions
* Fragments of mtDNA of 2 H. neanderthaliensis, including
the original 1856 specimen.
* all 208 samples from GenBank
* A homologous sequence of 800 bp of the HVR could be
found in all 208 specimen.
51
* Pairwise genetic difference corrected with Jukes-Cantor
formula
* d(i,j) is JC-corrected genetic difference between pair (i,j);
* dT = d
* MDS (Multi Dimensional Scaling): translate distance table
d to a nD-map X, here 2D-map
52
distance map d(i,j)
53
MDS
a te d
r
a
p
se
wellH. neanderthaliensis
H. sapiens
54
phylogentic tree
55
END of LECTURE 5
56
57
58

Introduction To Bioinformatics-5

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Introduction To Bioinformatics-5

Încărcat de

Drepturi de autor:

Formate disponibile

Introduction to

Chapter 5: Are Neanderthals among us?

Neandertal, Germany, 1856

5.1 Variation in DNA sequences

Mitosis: diploid reproduction

Meiosis: diploid (=double) haploid (=single)

* typing error rate very good typist: 1 error / 1K typed letters

Fast-forward time and follow your cells:

GERM LINE MUTATIONS

* Polymorphism : multiple possibilities for a nucleotide: allelle

5.2 Mitochondrial DNA

EM photograph of H. Sapiens mtDNA

5.3 Variation between species

Substitution rate and mutation rate

5.4 Estimating genetic distance

Multiple substitutions and Back-mutations

* Saturation: on average one substitution per site

* True genetic distance (proportion): K

SEQUENCE EVOLUTION is a Markov process: a

The Jukes-Cantor model

2..4 = 1-4/3, (multiplicity 3): v2 = 1/4 (-1 -1 1 1)T

Therefore, substitution probability s(t) per site after t

This is the Jukes-Cantor formula : independent of and t.

For small d using ln(1+x) x :

If: K = f(d) then: 2

Generation of a sequence of length n with substitution rate

EXAMPLE 5.4 on page 90

The Kimura 2-parameter model

Other models for nucleotide evolution

Other models for nucleotide evolution

5.5 CASE STUDY: Neanderthals

S-ar putea să vă placă și