Documente Academic
Documente Profesional
Documente Cultură
Lecture 5 & 6
Sequence Alignment
What is sequence alignment?
Procedure of comparing sequences
Two sequences (Pair-wise Sequence Alignment)
More than two (Multiple Sequence Alignment)
Match
Mis-match
Global VS Local*****
Global Alignment
Attempts to align the maximum of the entire sequence
Suitable for similar and equal length sequences
CTGTCG-CTGCACG
-TGC-CG-TG---Global alignment
CTGTCGCTGCACG--------TGC-CGTG
Local alignment
Local Alignment
Gathers islands of matches
Stretches of sequences with highest density of
matches are aligned
Suitable for partially similar, different length and
conserved region containing sequences
Frequency of mutations
Substitution > Insertion, Deletion
>>
Duplication
>
Inversion
Deletion:
A C G T T G A C
A C G A C
Insertion:
A C G T T G A C
A C G C A A T T G A C
Common mutations***
Duplication:
A C G T T G A C
A C G T T G A T T G A C
A
T
C
G
Terminology *****
Homolog
A gene related to a second gene by descent
from a common ancestral DNA sequence
Ortholog
Orthologs are genes in different species that
evolved from a common ancestral gene by
speciation
Paralog
Paralogs are genes related by duplication
within a genome
Terminology
C
A
T
T
C
A
-5
-5
10
-5
-10
-2
-7
-5
-10
-15
15
10
-5
-2
-7
-2
-7
-4
-20
-5
10 * 13
-25 -10
20
15
18
13
-30 -15
15
18
13
28
23
18
-35 -20
-5
10
13
28
23
26
33
C
Traceback can yield both optimum alignments
Local Alignment***
C
A
C
A
Introduction to Bioinformatics
Lecture 7
Difficult to score
si,j,k = max
+ (vi, _, uk)
+ (_, wj, uk)
+ (vi, _ , _)
+ (_, wj, _)
+ (_, _, uk)
face diagonal:
one in/del
edge diagonal:
two in/dels
Introduction to Bioinformatics
Lecture 8
Database searching
Instead, use faster heuristic approaches
FASTA [Pearson & Lipman, 1988]
BLAST [Altschul et al., 1990;
Smith-Waterman is slower, but more sensitive
FASTA
10
11
12
13
Hash table:
10
11
12
13
Hash table:
2
6
9
12
11
13
5
7
10
Target table
10
11
12
13
14
15
16
17
18
Hash table:
Target table
2
6
9
12
11
13
5
7
10
10
11
12
13
14
15
16
17
18
Hash table:
Target table
2
6
9
12
11
13
5
7
10
10
11
12
13
14
15
16
17
18
-4
-7
-5
-3
-3
-3
-3
-6
-3
-3
-3
-3
-15
Hash table:
Target table
2
6
9
12
11
13
5
7
10
10
11
12
13
14
15
16
17
18
-4
-7
-5
-3
-3
-3
-3
-6
-3
-3
-3
Offset
-15
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-15
-3
-2
-1
Hash table:
Target table
2
6
9
12
11
13
5
7
10
10
11
12
13
14
15
16
17
18
-4
-7
-5
-3
-3
-3
-3
-6
-3
-3
-3
Offset
-15
1
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-15
-3
-2
-1
4
1
2
6
9
12
11
13
5
7
10
Hash table:
Target table
10
11
12
13
14
15
16
17
18
-4
-7
-5
-3
-3
-3
-3
-6
-3
-3
-3
-15
Offset
-14
-13
-12
-11
-10
-9
-8
1
-3
-2
-1
-7
-6
-5
-4
-3
-15
-3
-2
-1
10
11
12
4
1
13
14
15
16
17
2
6
9
12
11
13
5
7
10
Hash table:
Target table
10
11
12
13
14
15
16
17
18
-4
-7
-5
-3
-3
-3
-3
-6
-3
-3
-3
-15
Offset
-14
-13
-12
-11
-10
-9
-8
1
-3
-2
-1
-7
-6
-5
-4
-3
-15
-3
-2
-1
10
11
12
13
14
15
4
1
16
17
Introduction to Bioinformatics
Lecture 9
29.5
20.4
20.5
29.6
i2
a xi1 xi where
x1
L
xi1 xi
i2
where
aB xi
i2
HMM:
CpG Islands
Written CpG to
distinguish from
a CG base pair
Transition probabilities
Prob(Fair Loaded) = 0.01
Prob(Loaded Fair) = 0.2
Transitions between states obey a Markov process
akl
0.99
0.80
0.01
3:
4:
5:
6:
ek (b)
1/6
1/6
1/6
1/6
Fair
0.2
1:
2:
3:
4:
5:
6:
1/10
1/10
1/10
1/10
1/10
1/2
Loaded
(1) FFF
(2) LLL
(3)
LFL
Pr(x,
(2)
Pr(x,
(3)
(i = 0)
v
(i
1)a
r
rk
r
Termination:
Pr(x, ) max
v
k (L)a k 0
k
*
Viterbi: Example
x
6
0
2
0
(1/6)(1/2)
= 1/12
(1/6)max{(1/12)0.99,
(1/4)0.2}
= 0.01375
(1/6)max{0.013750.99,
0.020.2}
= 0.00226875
(1/2)(1/2)
= 1/4
(1/10)max{(1/12)0.01,
(1/4)0.8}
= 0.02
(1/2)max{0.013750.01,
0.020.8}
= 0.08
6
0
0.80
0.99
vk (i ) ek (xi ) max
v r (i 1)ark
r
1:
2:
3:
4:
5:
6:
1/6
1/6
1/6
1/6
1/6
1/6
Fair
0.01
0.2
1:
2:
3:
4:
5:
6:
1/10
1/10
1/10
1/10
1/10
1/2
Loaded
THANKS A LOT...