Sunteți pe pagina 1din 25

Sequence Alignment Using Simulated

Annealing.

Hazrlayan: Samet DEMR, M.Gke BEKAROGLU


Alignment Methods

Pairwise
Multiple sequence alignment
Structural
Phylogenetic
Multiple Sequence Alignment Methods

Needleman-Wunsch or Smith-
Waterman algorithm used on pairs and
Multi-D Dynamic Programing alignment space is filled with possible
Progressive Alignment matches or gaps.

Iterative Alignment
Probabilistic CLUSTALW, T-COFFEE

MUSCLE

Genetic Algorithm , Hidden Markov


Models,
Simulated Annealing
Simulated Annealing

Searching large number of possible solutions to find optimal


solution can be difficult (near impossible).
We need an algorithm to obtain a good enough solution within a
reasonable time.
Algorithm inspired by metallurgy -> Simulated Annealing

Optimization
Simulated Annealing

We need Global Max. Or Min.


Searching the entire space is impossible-> study local areas
Compare neighbouring area results to reach global optimum
(explore).

Analogy
Solution of the problem =
States of the physical system

The cost of solution = energy of state


Simulated Annealing


Advantages

Good at avoiding the problem of getting caught at local optimum


Good at finding solution good enough and close to global optimum
Simulated annealing is proven to converge to the optimum solution
of a problem.
The algorithm makes it very easy to adapt a local search method
to a simulated annealing algorithm.
Simulated Annealing (How to run)

Write sequences to settings.py


Give gap_penalty,match and mismatch scores
Give maximum gap count (for max lenght sequence)
To run:
python driver.py
Simulated Annealing (Code)

Scorer.py
Fcn.py
Settings.py
Simann.py
Driver.py
Simulated Annealing (Scorer.py)

It is made for calculating scores of the aligned sequences which is taken


from different algorithms.
It takes sequences and gap_penalty, match, mismatch scores as input and
gives the score of the alignment.
Simulated Annealing (Fcn.py)

It has necessary functions for algorithm


Calculator function for score calculation of given sequences
Putter function for putting gaps to certain places supplied by simulated
annealing
Fitness function for giving necessary outputs to simulated annealing
Simulated Annealing (Settings.py)

It takes input sequences as strings


Makes array from sequences
Finds maximum lenght array
Calculates how many gaps to add for each sequences
Simulated Annealing (Simann.py)

It is the simulated annealing algorithm


It takes inputs from settings.py
Uses fcn.py to learn scores of each trial align
Takes an argument from user (given from terminal) which is gap count to
add maximum lenght sequence
Simulated Annealing (Driver.py)

This program drives the simulated annealing algorithm


It takes maximum gaps to add and run the simann.py
Example: If maximum_gaps_to_add = 10
It runs
Python simann.py 0
Python simann.py 1
Python simann.py ...
It is also responsible for printed output.
Simulated Annealing Pseudo Code
Alignment with SA Pseudo Code
Simulated Annealing Results

Our sequences:
#sekans1="FHELWKIGSGEFGWFKCVKRLWGCI
#sekans2="FHEEKGSWEFGGSVFCCVKLRLDGCI
#sekans3="FHELEKIGSGEFGSVFCCVKCLDGCI"

#sekans1="YAIKKKPLAGSVDEQNALREVYA
#sekans2="MYAIKRSGKKPLAGSVWDEQNWLREVYA
#sekans3="YAIKRSKKPLAGSVDEALREVYA

sekans1="HVLGGQHFHVVRYDSAWAEDDHMLIP
sekans2="HAVLGQHSHVVRYFNSAWADWHMLI
sekans3="HAVLGQHSHVVRYFSAWAEDDMLI

Pairwise alignment scores:


1: 53
2: 55
3: 54
T-Coffee

T-Coffee uses a consistency-based


objective function optimized using
progressive alignment.

Pairwise alignment score:


1: 50
2: 54
3: 50
Clustal W

ClustalW is the most widely used


multiple alignment program.
It uses a progressive alignment where
an initial guide tree (calculated from
pairwise alignments) is used to guide
a full multiple alignment.

Pairwise alignment score:


1: 52
2: 49
3: 42
MAFFT

MAFFT is a series of progressive


alignment programs.
The package consists of five
alignment programs:

Which uses fast Fourier transform


(FFT) algorithm to calculate the guide
tree and re-calculate the tree after a
first alignment.
Program that includes an iterative
alignment refinement step.
Finally a program that incorporates
local and global pairwise alignment
information.

Pairwise alignment score:


1: 44
2: 55
3: 50
ProbCons

ProbCons: Probabilistic consistency-


based multiple sequence alignment
Uses Hidden- Markov models.

Pairwise alignment score:


1: 44
2: 55
3: 48
Parallel PRRN

PRRN:
Iterative refinement strategy with tree-
dependent partitioning for multiple
sequence alignment.
Perform a large number of pairwise
group-to-group alignments to
gradually improve overall weighted
sum-of-pairs score.
Hill-climbing strategies do not
guarantee to achieve true optimization

Pairwise alignment score:


1: 44
2: 55
3: 50
Results

Pairwise Alignment Scores


SEQUENCES Simulate CLUSTAL T-COFFEE MAFFT PRRN ProbCons
d ann. W

FHELWKIGSGEFGWFKCVKRLWGCI 53 52 50 44 44 44
FHEEKGSWEFGGSVFCCVKLRLDGCI
FHELEKIGSGEFGSVFCCVKCLDGCI"
YAIKKKPLAGSVDEQNALREVYA 55 49 54 55 55 55
MYAIKRSGKKPLAGSVWDEQNWLREVYA
YAIKRSKKPLAGSVDEALREVYA
HVLGGQHFHVVRYDSAWAEDDHMLIP 54 42 50 50 48 50
HAVLGQHSHVVRYFNSAWADWHMLI
HAVLGQHSHVVRYFSAWAEDDMLI

Score of the correct alignments: 53,55,54


Future Work

- Web Interface (Should be easy to implement, few hours..)


- Extensive Test
- Further improvements to simulated annealing
- Can be improved with biological knowledge
Thank You for Listening

S-ar putea să vă placă și