Sunteți pe pagina 1din 12

BLSAST(BASIC LOCAL ALIGNMENT SEARCH TOOL)

The BLAST program was developed by Stephen Altschul of NCBI in


1990 and has since become one of the most popular programs for
sequence analysis. BLAST uses heuristics to align a query sequence
with all sequences in a database. The objective is to find high-scoring
ungapped segments among related sequences. The existence of such
segments above a given threshold indicates pairwise similarity beyond
random chance, which helps to discriminate related sequences from
unrelated sequences in a database.

BLAST ALGORITHM

Step 1

Word search method:-Sequence is filtered in order to remove


complexity regions. Each then prepare a set of query words(w) with the
query sequence of length l. Fixed length for protein and nucleic acid are
selected as 1 and 3 respectively.

Step 2

Identification of exact match method:- This alignment then searches


the database for the neighborhood score threshold(T) are taken for the
alignment.This conserved alignments are called HITS.

Step 3

Maximum segment pair alignment method :-In this method it extends


the possible matches as an ungapped alignment in both the direction
that stops at the maximum score.
EXAMPLE

BLAST performs sequence alignment through the following steps. The


first step is to create a list of words from the query sequence. Each word
is typically three residues for protein sequences and eleven residues for
DNA sequences. The list includes every possible word extracted from
the query sequence. This step is also called seeding. The second step is
to search a sequence database for the occurrence of these words. This
step is to identify database sequences containing the matching words.
The matching of the words is scored by a given substitution matrix. A
word is considered a match if it is above a threshold. The fourth step
involves pairwise alignment by extending from the words in both
directions while counting the alignment score using the same
substitution matrix. The extension continues until the score of the
alignment drops below a threshold due to mismatches (the drop
threshold is twenty-two for proteins and twenty for DNA). The resulting
contiguous aligned segment pair without gaps is called high-scoring
segment pair (HSP; see working example ). In the original version of
BLAST, the highest scored HSPs are presented as the final report. They
are also called maximum scoring pairs.

BIT Scores
The bit score gives an indication of how good the alignment is; the
higher the score, the better the alignment.
E Value
The E− value is a measure of the reliability of the score. The larger the
E-value, the greater the chance that the similarity between the hit and the
query is due to mere coincidence. Hence hits with large E-values are to
be not relied upon. E-values are calculated from the following three
factors first one is The bit score, Since a larger bit score is less likely to
be obtained by chance than is a smaller bit score, larger bit scores
correspond to smaller E-values. Second is Length of the query, Since a
particular bit score is more easily obtained by chance with a longer
query than with a shorter query, longer queries correspond to larger E-
values and third one is Size of the database, Since a larger database
makes a particular bit score more easily obtained by chance, a larger
database results in larger E-values.
FEATURES
1) Local alignment: BLAST finds matches of regional similarities rather
than trying for global fit between query and database sequences. But
multiple hits to the same sequence are allowed.
2) Ungapped alignments: BLAST works on statistics of ungapped
alignment but this reduces sensitivity of search. But multiple local
alignments can anticipate gaps between two sequences of alignment.
3) Filters: BLAST uses filters to reduce problems of contaminations
with numerous artifacts (low complexity regions) in the database. By
using filters one can find out the exploded through positive hits from the
initial run.
4) Fast: BLAST is extremely fast, one can run locally or on the web
server but it is not a guarantee to find the best alignment between query
sequence and database sequences.
5) Heuristic: BLAST is heuristic method, expected to find most matches
and this way complete sensitivity is sacrificed in order to gain speed. In
practice, few biologically significant matches that are missed by BLAST
can be found out by some other programs.
6) Search: BLAST search the database in two phases 1. First for short
subsequences that are likely to have significant matches. 2. Then it tries
to extend these matched regions (sub sequences) on both sides in order
to obtain maximum sequence similarity.
7) Substitution matrix: BLAST uses substitution matrix in all phase of
sequences for scoring alignment.
Variants of BLAST

 blastp compares an amino acid query sequence against a protein


sequence database
 blastn compares a nucleotide query sequence against a nucleotide
sequence database
 • blastx compares a nucleotide query sequence translated in all
reading frames against a protein sequence database
 tblastn compares a protein query sequence against a nucleotide
sequence database dynamically translated in all reading frames
 tblastx compares the six-frame translations of a nucleotide query
sequence against the sixframe translations of a nucleotide sequence
database.
BLAST Output Format

The BLAST output includes a graphical overview box, a matching


list and a text description of the alignment .The graphical overview
box contains colored horizontal bars that allow quick identification
of the number of database hits and the degrees of similarity of the
hits. The color coding of the horizontal bars corresponds to the
ranking of similarities of the sequence hits (red: most related;
green and blue: moderately related; black: unrelated). The length
of the bars represents the spans of sequence alignments relative to
the query sequence. Each bar is hyperlinked to the actual pairwise
alignment in the text portion of the report. Below the graphical box
is a list of matching hits ranked by the E-values in ascending order.
Each hit includes the accession number, title (usually partial) of the
database record, bit score, and E-value. This list is followed by the
text description, which may be divided into three sections: the
header, statistics, and alignment. The header section contains the
gene index number or the reference number of the database hit plus
a one-line description of the database sequence. This is followed
by the summary of the statistics of the search output, which
includes the bit score, E-value, percentages of identity, similarity
(“Positives”), and gaps. In the actual alignment section, the query
sequence is on the top of the pair and the database sequence is at
the bottom of the pair labeled as Subject. In between the two
sequences, matching identical residues are written out at their
corresponding positions, whereas nonidentical but similar residues
are labeled with “+”. Any residues identified as LCRs in the query
sequence are masked with Xs or Ns so that no alignment is
represented in those regions.

HOW WE USE BLAST?

1.Go to the NCBI BLAST page ( http://blast.ncbi.nlm.nih.gov/Blast.cgi


). This page lists all of the BLAST related tools/programs available from
NCBI. Select the BLAST program based on your search.
2. Subject Databases- There are many databases which are used as
subject databases. One of the most commonly used is nr database:
collection of "non-redundant" sequences from GenBank and other
sequence databanks.

3.Paste the FASTA format (text-based format for representing either


nucleic acid sequences or protein sequences, in which base pairs or
protein residues are represented using single- letter codes.
4. The boxes immediately below the sequence entry box allow the
selection of only a part of the entered sequence as query for the search.
For this exercise nothing should be entered in these fields.
5. The next field is a drop menu that allows the selection of the database
to be searched. The non-redundant database (nr) is the default setting.
Non-redundant is the largest and most comprehensive database for
BLAST to search.

6. Click on the “BLAST” button to run the search. A new page will
appear with the ID number of the search and the approximate wait time.
7. The results will be returned when the search is complete.
Benefits of BLAST

 Both BLAST and FASTA search for local sequence similarly-


indeed they have exactly the same goals, though they use
somewhat different algorithms and statistical approaches.
 BLAST benefit
A. Speed
B. User friendly
C. Statistical rigor
D. More sensitive

Comparison of BLAST & FASTA

BLAST and FASTA have been shown to perform almost


equally well in regular database searching. However, there are
some notable differences between the two approaches. The
major difference is in the seeding step; BLAST uses a
substitution matrix to find matching words, whereas FASTA
identifies identical matching words using the hashing
procedure. By default, FASTA scans smaller window sizes.
Thus, it gives more sensitive results than BLAST, with a better
coverage rate for homologs. However, it is usually slower than
BLAST. The use of low-complexity masking in the BLAST
procedure means that it may have higher specificity than
FASTA because potential false positives are reduced. BLAST
sometimes gives multiple best-scoring alignments from the
same sequence; FASTA returns only one final alignment.

S-ar putea să vă placă și