Documente Academic
Documente Profesional
Documente Cultură
Hon-Ming Lam
BLAST Search
• www.ncbi.nlm.nih.gov/Blast
• Basic Local Alignment Search Tool
• Uses heuristic algorithm which seeks local
(instead of global) alignments; able to detect
relationships among sequences which shares
similarity only in isolated regions
• The initial search is done for a word of length
“W” that scores at least “T” when compared to
the query using a substitution matrix
• Word hits are then extended in either
direction in an attempt to generate an
alignment with a score exceeding the
threshold of “S”
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Word Size
= Word Length
= 11
Expect = The
statistical
significance
threshold for
reporting matches
against database
sequences; the
default value is 10,
meaning that 10
matches are
expected to be
found merely by
chance
Expect=Kmne-λT
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Bit Score
The value S’ is derived from the raw alignment score S in
which the statistical properties of the scoring system
used have been taken into account. Because bit scores
have been normalized with respect to the scoring system,
they can be used to compare alignment scores from
different searches.
S’=(λS-lnK)/ln2 [λ and K are normalizing parameters]
E Value
Expectation value. The number of different
alignments with scores equivalent to or better than S’
that are expected to occur in a database search by
chance. The lower the E value, the more significant
the score.
E=mn2-S’ [m: effective length of the query;
n: total number of bases of the database]
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
CDD Search
Compares protein sequences to the
Conserved Domain Database. The CDD
is a database containing a collection of
functional and/or structural domains
derived from two popular collections,
Smart and Pfam, plus contributions from
colleagues at NCBI.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
PSI-BLAST
Position specific iterative BLAST refers to a feature
of BLAST 2.0 in which a profile (or position specific
scoring matrix, PSSM) is constructed
(automatically) from a multiple alignment of the
highest scoring hits in an initial BLAST search. The
PSSM is generated by calculating position-specific
scores for each position in the alignment. Highly
conserved positions receive high scores and
weakly conserved positions receive scores near
zero. The profile is used to perform a second (etc.)
BLAST search and the results of each "iteration"
used to refine the profile. This iterative searching
strategy results in increased sensitivity.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
PSSM
Position-specific scoring matrix. Based
on a Profile (A table that lists the
frequencies of each amino acid in each
position of protein sequence. Frequencies
are calculated from multiple alignments of
sequences containing a domain of
interest). The PSSM gives the log-odds
score for finding a particular matching
amino acid in a target sequence.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam