Sunteți pe pagina 1din 3

bungenzurVorlesungSoftwarewerkzeugederBioinformatikSS03

BLASTTutorial

A)Shortintroduction:

Filter
Matrix
Evalue
Gapcosts

default

quicksearch

on
BLOSUM62
10
11,1

off
PAM30
1000
11,1

specialconditions
Searchfor
sequencefamilies
on
BLOSUM62
110
11,1

BLAST
withoutgaps
on
BLOSUM62
10
12,1

1. Filter:Thefiltermasksoffsegmentsthathavelowcompositionalcomplexity.
Thus the filtering can eliminate statistically significant but biologically
uninterestingreportsfromtheoutputlist.Itshouldbeoffforfastqueriesand
thosewhereahighamountofoutputsplaysanimportantrole.
2. Exchangematrices:TheBLOSUM62matrixisprovidesallroundqualities
whencomparedtoothermatricesandtendstogivethebestresultsondefault.
Evenincasesoflowproteinhomologybeingexpected,BLOSUM62seemsto
beagreatchoiceaswell.
3. Gapsrepresentinsertionsanddeletions,whichhappenedduringtheevolution.
Tobalancetheinsertionofgapswhilealigningthesequences,eachinsertion
ofagapcostsapenalty.Gapscoresareseparatedintopenaltiesforopeninga
gap and extending an existing gap. BLAST 2 automatically inserts gaps.
Anyway,itispossibletolaunchanungappedBLAST2alignment tooby
choosing the worst penalty for opening a gap. This is interesting for
alignmentswithfixedsequencelength.
4. Alignmentscores:
a. S value: Also known as raw score. This value is the sum over all
scoresthatarederivedbyeachresiduecomparisonbetweenthosetwo
sequences, considering the amino acid exchange matrix and the
penaltiesforGapopeningandextend.
b. Pvalue:thePvaluetellshowsignificanttheSvalueisandallowsan
efficient way of distinguishing true homologies from chance
similarities.Whenahighamountofrandomlyfoundsequenceswith
thesameorahigherSvalueoccur,thanthePvaluewillbehighas
wellandshowalowsignificancefortheSvalueofthesequence.
c. Evalue:theEvalueadditionallyconsidersthesizeofthedatabase.
ComparedtothePvalue,theEvaluetellsshowstheexpectednumber
ofsequencesthatwouldbefoundrandomlytobeatleastashighor
higher than the scored alignment. The E value is derived by
multiplication of the P value with the amount of sequences in the

chosendatabase.Alowervalueindicateshighersignificanceforthe
score.
d. Choosing 10 for the E value means, that you expect around 10
randomlyfoundsequencesinthedatabasetohaveatleastthesame
scoreifnotahigherscore.

B)Asimplequery:
1. Gotohttp://www.ncbi.nlm.nih.gov/BLAST/
2. ChooseunderProteinProteinproteinBLAST(blastp)
3. ThisistheBLASTPwebinterface.Topasteasequenceweneedtofinda
proteinsequencefromSRS.FindthesequenceoftheproteinwiththeAcc
Number:P00042.CopythesequenceandpasteitintotheSearchbox.Make
surethesequenceisshowninFASTAformat.(Hint:useViewinSRS)
4. Chooseadatabase.Hereswissprot.Whataretheotherdatabasesgoodfor?
5. ClickingonBLAST!willlaunchthequery.
6. Nowyouwillfindanewwindowtocustomizetheoutput.Fornowsimplyuse
defaultandclickonFormat!toprocessit.

C)Anextendedquery:
1. First we need the protein sequence with the DescriptionNumber MJ0577
whichisfromtheorganismMethanococcusjannaschii.
2. CopyandpastethesequenceinFASTAformatintotheSearchbox.
3. Chooseadatabase.Whichdatabasewouldyoupreferforadetailedalignment?
4. Choose1fortheEvalueintheExpectbox.ReducingtheEvaluefrom10
to1leadstoamuchmorerigorousalignmentwithmoresignificantscores.
Thiswillneedsomemorecalculationtime.
5. LeavetheLowcomplexityfiltermarkedsothatthealignmenthidesbiological
unimportant but statistical significant hits. The other filters are not yet
developedproperlysoleavethemunmarkedfornow.
6. ChoosetheBLOSUM62matrixanduseagapexistencecostof11andagap
costof1(shouldbedefault).
7. DefinetheoutputintheFormatcategory.Fornowjustleaveallasdefault.
8. ClickonBLAST!toprocess.
9. Againjustclickon Format! toleaveallviewsasdefault.Thismighttake
sometimenow.Feelfreetosnooparoundthepreviouspageandaskthetutor
asmanyquestionsasyoucanfind.

E)Interpretationoftheresults:

Thegraphicalfigureofthealignmentshowsmax.50sequences.Clickingeach
line leads you to the current pair wise alignment. The color of the lines
indicatesthescore.Scrollingdownwillshowyouthesequencesorderedby
the grade of homology descending. You will find a list of 100 sequences
accordingtothevaluenexttothefield Descriptions.Furtherdownyouwill
findall50pairwisealignments(numberinthefieldAlignments).

PSIBLASTTutorial
A)Shortintroduction:
The additional sensitivity of this program toward BLAST derives from a
profile,whichisgeneratedautomaticallyoralsoasPSSM(positionspecific
scoringmatrix),whichyoucanaddmanually.Thisprofilecontainsalistof
frequenciesfortheappearanceofspecificaminoacidsatspecificpositionsin
the proteinsequence. These frequencies derived from multiple sequence
alignmentsofthehighestscoringsequencesinthefirstiterationofthePSI
BLASTsearchpassingathreshold.Thereforehighlyconservedpositionsgeta
higher score than just by the amino acid exchange matrix. PSIBLAST
(position specific iterative BLAST) can be used, when one looks for far
members of a protein family, whose relationship does not come out from
directsequencecomparisons.YoucanusePSIBLASTalsowithhypothetical
proteins,inordertobeabletoarrangetheirfunction,withoutthattheyare
evenannotatedinanydatabase.TheinterfaceofPSIBLASTandBLASTis
identical.ForPSIBLASTtherearejustsomefurtheroptionsavailable.
B)Anexample:
1. ProceedlikeintheBLASTTutorialCandchoosePHIandPSIBLAST
2. InserttheproteinsequenceofMJ0577oftheorganismusMethanococcus
jannaschiiintotheSearchbox.
3. Pickthemostsuggestivedatabase.
4. FortheEvalue(Expect)underOptionschoose1insteadof10.
5. LeaveallsettingsatdefaultandgotothePSIBlastsettings
6. UnderFormatforPSIBLASTchoosewithinclusionthresholdandusea
thresholdof0,001
7. Launchthefirstiteration.
8. TheshownresultsderivedfromaregularBLASTsearch,sothereshouldntbe
anydifferencetothepreviousexercisewithBLAST.
9. Launchinganotheriterationwillchangetheresults.Comparetheresultsand
findoutwhatchanged.Whathappensafteryoulaunchedanotheriteration?

S-ar putea să vă placă și