Sunteți pe pagina 1din 21

BLAST

In bioinformatics, BLAST for Basic


Local Alignment Search Tool is an
algorithm for comparing primary
biological sequence information,
such as the amino-acid sequences of
different proteins or the nucleotides
of DNA sequences
BLAST ALGORITHM
Remove low-complexity region or sequence
repeats in the query sequence.
"Low-complexity region" means a region of a sequence
composed of few kinds of elements. These regions might
give high scores that confuse the program to find the
actual significant sequences in the database, so they
should be filtered out.

Make a k-letter word list of the query sequence.


Take k=3 for example, we list the words of length 3 in
the query protein sequence (k is usually 11 for a DNA
sequence) "sequentially", until the last letter of the query
sequence is included
TYPES OF BLAST
Nucleotide-nucleotide BLAST (blastn)
This program, given a DNA query, returns the most similar DNA
sequences from the DNA database that the user specifies.
Protein-protein BLAST (blastp)
This program, given a protein query, returns the most similar
protein sequences from the protein database that the user
specifies.
Position-Specific Iterative BLAST (PSI-BLAST) (blastpgp)
This program is used to find distant relatives of a protein. First, a
list of all closely related proteins is created. These proteins are
combined into a general "profile" sequence, which summarises
significant features present in these sequences. A query against
the protein database is then run using this profile, and a larger
group of proteins is found. This larger group is used to construct
another profile, and the process is repeated.
By including related proteins in the search, PSI-BLAST is much
more sensitive in picking up distant evolutionary relationships
than a standard protein-protein BLAST.
TYPES OF BLAST
Nucleotide 6-frame translation-protein (blastx)
This program compares the six-frame conceptual translation products of a
nucleotide query sequence (both strands) against a protein sequence database.
Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)
This program is the slowest of the BLAST family. It translates the query nucleotide
sequence in all six possible frames and compares it against the six-frame
translations of a nucleotide sequence database. The purpose of tblastx is to find
very distant relationships between nucleotide sequences.
Protein-nucleotide 6-frame translation (tblastn)
This program compares a protein query against the all six reading frames of a
nucleotide sequence database.
Large numbers of query sequences (megablast)
When comparing large numbers of input sequences via the command-line BLAST,
"megablast" is much faster than running BLAST multiple times. It concatenates
many input sequences together to form a large sequence before searching the
BLAST database, then post-analyzes the search results to glean individual
alignments and statistical values.

Alternatives to BLAST

An extremely fast but considerably


less sensitive alternative to BLAST is
BLAT (Blast Like Alignment Tool).
While BLAST does a linear search,
BLAT relies on k-mer indexing the
database, and can thus often find
seeds faster. Another software
alternative similar to BLAT is
PatternHunter.
Uses of BLAST

Identifying species
With the use of BLAST, you can possibly correctly identify a species or find homologous
species. This can be useful, for example, when you are working with a DNA sequence
from an unknown species.
Locating domains
When working with a protein sequence you can input it into BLAST, to locate known
domains within the sequence of interest.
Establishing phylogeny
Using the results received through BLAST you can create a phylogenetic tree using the
BLAST web-page. Phylogenies based on BLAST alone are less reliable than other
purpose-built computational phylogenetic methods, so should only be relied upon for
"first pass" phylogenetic analyses.
DNA mapping
When working with a known species, and looking to sequence a gene at an unknown
location, BLAST can compare the chromosomal position of the sequence of interest, to
relevant sequences in the database(s).
Comparison
When working with genes, BLAST can locate common genes in two related species, and
can be used to map annotations from one organism to another.
FASTA
FASTA is a DNA and protein
sequence alignment software
package first described (as FASTP) by
David J. Lipman and
William R. Pearson in 1985.[1] Its
legacy is the FASTA format which is
now ubiquitous in bioinformatics.
TYPES OF FASTA
Protein
Protein-protein FASTA.
Protein-protein Smith-Waterman (ssearch).
Global Protein-protein (Needleman-Wunsch)
(ggsearch)
Global/Local protein-protein (glsearch)
Protein-protein with unordered peptides
(fasts)
Protein-protein with mixed peptide sequences
(fastf)
MULTIPLE SEQUENCE
ALIGNMENTS
Definition:
A Multiple Sequence Alignment (MSA)
is a sequence alignment of three or more
biological sequences, generally protein,
DNA, or RNA. In many cases, the input set
of query sequences are assumed to have
an evolutionary relationship by which they
share a lineage and are descended from a
common ancestor

WHY WE DO MULTIPLE SEQUENCE ALIGNMENTS.


Multiple nucleotide or amino sequence alignment
techniques are usually performed to fit one of the
following scopes :
In order to characterize protein families, identify shared
regions of homology in a multiple sequence alignment
Determination of the consensus sequence of several
aligned sequences
Help prediction of the secondary and tertiary structures
of new sequences
Preliminary step in molecular evolution analysis using
Phylogenetic methods for constructing phylogenetic trees
Shady box tool.

ShadyBox is a multiple alignment editor


program which enables you to box and shade
residues or segments of multiple aligned
sequences.
ShadyBox will work on a msf or pretty output
file, and will produce a postscript output file.
The original input file is not changed.
ShadyBox enables you to save your work in
the middle, exit the program, and resume at a
later stage.
GENE SEARCHING AND SEQUENCE RETRIEVAL


1. GENSCAN is an program to identify
complete gene structures in genomic
DNA.
2. It Can be used to predict the
location of genes and their exon
intron boundaries in genomic
sequences from a variety of
organisms
Glimmer


Glimmer is a system for finding genes in
microbial DNA, especially the genomes of
bacteria, archaea, and viruses.(Gene
Locator and Interpolated Markov Modeller)

Glimmer is the system of choice for
genome annotation efforts on a wide range
of bacteria, archaeal, and viral species due
to high accuracy.
GeneID

Geneid is a program to predict genes


in anonymous genomic sequences
designed with a hierarchical
structure.
General Steps of Gene id tool

a) In the first step, splice sites, start and stop


codons are predicted and scored along the
sequence using Position Weight Arrays (PWAs)
b) In the second step, exons are built from the
sites. Exons are scored as the sum of the scores
of the defining sites, plus the the log-likelihood
ratio of a Markov Model for coding DNA.
c) Finally, from the set of predicted exons, the
gene structure is assembled, maximizing the
sum of the scores of the assembled exons
Scoring matrices

PAM
Point Accepted Mutation (Dayhoff et al.)
1 PAM = PAM1 = 1% average change of
all amino acid positions
After 100 PAMs of evolution, not every
residue will have changed
some residues may have mutated several times
some residues may have returned to their
original state
some residues may have not changed at all
BLOSUM
Blocks Substitution Matrix
Scores derived from
observations of the frequencies
of substitutions in blocks of local
alignments in related proteins
Matrix name indicates
evolutionary distance
BLOSUMx was created using
sequences sharing no more than x%
identity
TOOLS FOR MICROARRAY

Software tools for Microarray


OEAV- Oligo and EST anatomy viewer
www. MAIZE ARRAY .ORG
Maize oligonucleotide array project and distributed to
research community and high density microarray for maize
genome.
It contains information on project goals, participants, array
availability and online date access.
It is a tool developed by TIGR and this tool gives
quantitative data on maize transcript frequency based on
EST.
Data in OEAV can be searched based on keyword, oligo ID,
accession number, rice model or gene ontology.
MICROARRAY TOOLS
MANATEE
http:// Manatee. Source .force.net
Web based gene. Evaluation and genome annotation
tool
It can store and view annotation for prokaryotic and
eukaryotic genome
It allows biologist to quickly identify genes and make
high quality functional assignments, families etc.
PIRATE:
Prediction information resources at TIGR
Open source bioinformatics prediction programs
Training data, experimental results.
MICROARRAY TOOLS
PASA
Program to assemble spliced alignments
http:// www.tigr.org
Compare alignment assemblies
Compare Transcribed region alignment
Gene model annotation
Built new model gene bases on cDNA
alignment

S-ar putea să vă placă și