Documente Academic
Documente Profesional
Documente Cultură
Bioinformatics
Introduction to Bioinformatics.
LECTURE 5: Variation within and between
species
*
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES
Diploid chromosomes
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES
10
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES
11
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES
GERM LINE
Reverse time and follow your cells:
Now you count ~ 1013 cells
One generation ago you had 2 cells somewhere in your parents body
Small T generations ago you had (2T multiple ancestors) cells
Large T generations ago you counted #(fertile ancestors) cells
Congratulations: you are 3.4 billion years old !!!
12
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES
13
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES
14
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES
Purines Pyrimidines
15
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES
Transitions Transversions
16
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
17
Introduction to Bioinformatics
5.2 MITOCHONDRIAL DNA
H.sapiens mitochondrion
18
Introduction to Bioinformatics
5.2 MITOCHONDRIAL DNA
Introduction to Bioinformatics
5.2 MITOCHONDRIAL DNA
20
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
21
Introduction to Bioinformatics
5.3 VARIATION BETWEEN SPECIES
Substitution rate
* Mutations originate in single individuals
* Mutations can become fixed in a population
* Mutation rate: rate at which new mutations arise
* Substitution rate: rate at which a species fixes new mutations
* For neutral mutations
22
Introduction to Bioinformatics
5.3 VARIATION BETWEEN SPECIES
23
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
24
Introduction to Bioinformatics
5.4 ESTIMATING GENETIC DISTANCE
GACTGATCCACCTCTGATCCTTTGGAACTGATCGT
TTCTGATCCACCTCTGATCCTTTGGAACTGATCGT
TTCTGATCCACCTCTGATCCATCGGAACTGATCGT
GTCTGATCCACCTCTGATCCATTGGAACTGATCGT
observed : 2 (= d)
actual :
4 (= K)
25
Introduction to Bioinformatics
5.4 ESTIMATING GENETIC DISTANCE
26
Introduction to Bioinformatics
5.4 ESTIMATING GENETIC DISTANCE
27
Introduction to Bioinformatics
5.4 ESTIMATING GENETIC DISTANCE
28
Introduction to Bioinformatics
5.4 ESTIMATING GENETIC DISTANCE
29
Introduction to Bioinformatics
5.4 THE JUKES-CANTOR MODEL
Therefore, the one-step Markov process has the following
transition matrix:
MJC =
1-
/3
/3
/3
/3
1-
/3
/3
/3
/3
1-
/3
/3
/3
/3
1-
30
Introduction to Bioinformatics
5.4 THE JUKES-CANTOR MODEL
After t generations the substitution probability is:
M(t) = MJCt
Eigen-values and eigen-vectors of M(t):
1 = 1, (multiplicity 1):
v1 = 1/4 (1 1 1 1)T
Introduction to Bioinformatics
5.4 THE JUKES-CANTOR MODEL
Spectral decomposition of M(t):
MJCt = i itviviT
Define M(t) as:
MJCt =
r(t)
s(t)
s(t)
s(t)
s(t)
r(t)
s(t)
s(t)
s(t)
s(t)
r(t)
s(t)
s(t)
s(t)
s(t)
r(t)
s(t) = - (1 - 4/3)t
32
Introduction to Bioinformatics
5.4 THE JUKES-CANTOR MODEL
substitution probability s(t) per site after t generations:
s(t) = - (1 - 4/3)t
observed genetic distance d after t generations s(t) :
d = - (1 - 4/3)t
For small :
3
t
ln 1 43 d
4
33
Introduction to Bioinformatics
5.4 THE JUKES-CANTOR MODEL
For small the observed genetic distance is:
3
t
ln 1 43 d
4
The actual genetic distance is (of course):
K = t
So:
K 34 ln 1 43 d
Introduction to Bioinformatics
5.4 THE JUKES-CANTOR MODEL
The Jukes-Cantor formula :
K 34 ln 1 43 d
For saturation: d :
K
So: if observed distance corresponds to random sequencedistance then the actual distance becomes indeterminate
35
Jukes-Cantor
36
Introduction to Bioinformatics
5.4 THE JUKES-CANTOR MODEL
Variance in K
2
K
K
2
2
d
d
d
K
So: Var ( K ) d Var (d )
d 1 43 d
37
Introduction to Bioinformatics
5.4 THE JUKES-CANTOR MODEL
Variance in K
Variance: Var(d) = d(1-d)/n
K
1
Jukes-Cantor:
d 1 43 d
So:
d (1 d )
Var ( K )
n(1 43 d ) 2
38
Var(K)
39
Introduction to Bioinformatics
5.4 THE JUKES-CANTOR MODEL
40
Introduction to Bioinformatics
5.4 EXAMPLE 5.4 on page 90
41
Introduction to Bioinformatics
5.4 EXAMPLE 5.4 on page 90
42
Introduction to Bioinformatics
5.4 EXAMPLE 5.4 on page 90
43
Introduction to Bioinformatics
5.4 EXAMPLE 5.4 on page 90 (= FIG 5.3)
44
Introduction to Bioinformatics
5.4 ESTIMATING GENETIC DISTANCE
45
Introduction to Bioinformatics
5.4 THE KIMURA 2-PARAM MODEL
The one-step Markov process substitution matrix
now becomes:
MK2P =
1--
1--
1--
1--
46
Introduction to Bioinformatics
5.4 THE KIMURA 2-PARAM MODEL
After t generations the substitution probability is:
M(t) = MK2Pt
Determine of M(t):
eigen-values {i}
and eigen-vectors {vi}
47
Introduction to Bioinformatics
5.4 THE KIMURA 2-PARAM MODEL
Spectral decomposition of M(t):
MK2Pt = i itviviT
Determine fraction of transitions per site after t
generations : P(t)
Determine fraction of transitions per site after t
generations : Q(t)
Genetic distance: K - ln(1-2P-Q) ln(1 2Q)
Fraction of substitutions d = P + Q Jukes-Cantor
48
Introduction to Bioinformatics
5.4 ESTIMATING GENETIC DISTANCE
49
Introduction to Bioinformatics
5.4 ESTIMATING GENETIC DISTANCE
50
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
51
Introduction to Bioinformatics
5.5 CASE STUDY: Neanderthals
* Pairwise genetic difference corrected with Jukes-Cantor
formula
* d(i,j) is JC-corrected genetic difference between pair (i,j);
* dT = d
* MDS (Multi Dimensional Scaling): translate distance table
d to a nD-map X, here 2D-map
52
Introduction to Bioinformatics
5.5 CASE STUDY: Neanderthals
distance map d(i,j)
53
Introduction to Bioinformatics
5.5 CASE STUDY: Neanderthals
MDS
a te d
r
a
p
se
wellH. neanderthaliensis
H. sapiens
54
Introduction to Bioinformatics
5.5 CASE STUDY: Neanderthals
phylogentic tree
55
END of LECTURE 5
56
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION
57
58