Sunteți pe pagina 1din 8

Using ClustalW to Generate a Multiple Sequence Alignment and

Phylogenetic Tree
I. Why Make a Phylogenetic Tree?
II. About ClustalW
III. Submitting Sequences and Choosing Parameters
IV. nderstanding !ut"ut
a. Score Table
b. Multi"le Sequence Alignment
c. Phylogenetic Tree
Why Make a Phylogenetic Tree?
A "hylogenetic tree is a #isual de"iction o$ e#olutionary relationshi"s based on %&A or "rotein sequences
among di$$erent ta'onomic grou"s( the )leaves* o$ the tree. +#olutionary time is determined by ho,
di$$erent the %&A or amino acid sequences o$ the sam"les are $rom each other. -ranches $rom a common
node .branch "oint/ re"resent descendents o$ a common ancestor .or hy"othetical ancestral sequence/.
Trees can be rooted or unrooted. A rooted tree structures the nodes and branches based on in$erences
about common ancestors. In order to root a tree( there must be an in"ut sequence to act as a rele#ant
outgroup 0 a sequence close enough to the others to make an in$erence on e#olutionary distances( but $ar
enough to be its o,n se"arate branch. An unrooted tree structures the lea#es based only on relatedness
,ithout making in$erences about common ancestors.
Aout ClustalW
ClustalW is a $ree online tool through the +uro"ean -ioin$ormatics Institute .+-I/ used to align multi"le
sequences and generate "hylogenetic trees. 1ou can access the second #ersion o$ ClustalW at2
htt"233,,,.ebi.ac.uk3Tools3clustal,43 . When the user in"uts the desired sequences to align( ClustalW
generates a sequence alignment( and a rooted "hylogram or cladogram. A phylogram e'"licitly
re"resents the number o$ sequence character changes through the hori5ontal branch length. The sum o$
the hori5ontal distances bet,een t,o lea#es is the "redicted e#olutionary di$$erence in sequences. A
cladogram only de"icts branching "atterns( not e#olutionary time by branch length. In a cladogram(
branch length is arbitrary6 only grou"ings o$ lea#es are rele#ant.
Sumitting Sequences and Choosing Parameters
7. The user must collect sequences $rom an annotation ser#ice( gene database( sequence $ile( or so
on. I$ you retrie#e a sequence $rom a database( retrie#e it in the 8ASTA $ormat i$ "ossible.
4. 9o to the ClustalW "age2 htt"233,,,.ebi.ac.uk3Tools3clustal,43 . I$ the link does not ,ork( go to
the +-I main "age( htt"233,,,.ebi.ac.uk3 . In the grey menu at the to"( select2 Tools : Sequence
Analysis : ClustalW4. !n the ClustalW "age( you should see a coloured dro";do,n menu bo' to
edit "arameters( and belo, that bo' is an o"en dialogue bo' to enter sequences.
<. In the o"en dialogue bo'( "aste all o$ your %&A or amino acid sequences .Ma'imum =>>/.
Although not listed as an o"tion( ClustalW ,ill align ?&A sequences as ,ell. !n in"uting
sequences2
a. Pro#ide a name line be$ore each sequence( $ollo,ed by a return and the sequence.
b. -egin name lines ,ith a @:A. +'am"le2 @:S"ecies&ameA is an a""ro"riate name line. This
is essentially 8ASTA $ormatting.
c. se a #ery distinct name $or each sequence. 1ou can use numbers as ,ell. !o not use
spaces" 8or e'am"le( use @:9enusTri#ialA or )9;Tri#ialA instead o$ @9. Tri#ialA.
d. ClustalW ,ill truncate any names ,ith o#er <> characters( so your names must be
distinct ,ithin the $irst <> characters.
Sam"le dialogue bo' o$ amino acid sequences. &ote the @:A and name ,ith a distinct beginning and no s"aces. -e
sure the sequence begins on the ne't line.
B. 1ou can edit the "arameters $or your multi"le sequence alignment using the dro" do,n menus
abo#e the o"en dialogue bo'. #$ you are not sure o$ %hich parameters may e etter $or your
alignment pro&ect' use the !e$aults (de$)* do not alter anything in the drop+do%ns"
Parameter bo'es $or alignment and out"ut.
a. I$ you ,ould like the results sent to your email address( Select2 ?esults : +mail. Also
enter your email address to the le$t( and gi#e your alignment a distinct "roCect name.
b. Dee" alignment on de$ault )$ull*. !"tions in the ro, beginning ,ith DTP all re$er to
$ast alignment6 lea#e as de$aults.
c. There are < di$$erent alignment matrices $rom ,hich to choose. The de$ault is blosum<>(
belie#ed to be the most reliable. Choose "am or gonnet $rom the MAT?IE !"tion i$ you
"re$er either o$ those. The id matri' gi#es a score o$ F7> to t,o identical amino acids( or
else a score o$ >.
d. +dit alignment "arameters on this same line beginning ,ith MAT?IE i$ you "re$er.
These "arameters de$ine the scoring $or making ga"s in the alignment. See the de$ault
#alues here2 htt"233,,,.ebi.ac.uk3Tools3clustal,3$aq.htmlG4H
e. 8or IT+?ATI!&( there a""eared to be no obser#able di$$erence in the out"ut alignment
or trees.
$. &MIT+? is number o$ iterations a$ter each ste" o$ the alignment. %e$ault is <. There
a""eared to be no obser#able di$$erences in out"ut by increasing &MIT+?.
=. Press ?un to run the multi"le sequence alignment.
Understanding ,utput
8irst ensure that the )&umber o$ sequences* listed in the )?esults o$ Search* summary table does match
the number o$ sequences you in"ut. I$ it does not( return to the main "age and attem"t to run your
sequence alignment again.
The ClustalW out"ut ,ill gi#e you t,o main result $orms 0 the multiple sequence alignment and a
phylogram-cladogram" -elo, is an e'am"le o$ the results using the de$ault "arameters.
Score Tale
The score table is the $irst section o$ the "age belo, the results summary bo'. The score table sho,s the
scoring o$ the "air,ise alignment o$ all sequences.
ClustalW 8AI e'"lains ho, these alignment scores are calculated2 )Pairwise scores are calculated as
the number of identities in the best alignment divided by the number of residues compared (gap positions
are excluded). Both of these scores are initially calculated as percent identity scores and are converted to
distances by dividing by 100 and subtracting from 1.0 to give number of differences per site. We do not
correct for multiple substitutions in these initial distances.
Take a screen shot o$ this table( or do,nload by right;clicking the !ut"ut 8ile ..out"ut/ $ound in the result
summary bo' at the to" o$ the "age.
Multiple Sequence Alignment
Aligns all o$ the in"ut sequences. An JTMK te't #ersion is listed Cust belo, the Scores Table( and a more
e'tensi#e #ie, o$ the alignment can be seen using LalVie,.
nder Alignment( you can click )Sho, Colors* to #ie, a coloured #ersion o$ an amino acid alignment.
This $eature is only a#ailable $or out"ut $ormats AK& .de$ault/ and 9C9( $ound under !TPT
8!?MAT on the submission "age under the "arameters.
&ormal Vie, o$ Alignment Coloured Vie, o$ Alignment
In the ro, belo, the last sequence o$ the alignment( there may be symbols2
M N M 0 the residues or nucleotides in that column are identical in allsequences
M 2 M 0 conser#ed substitutions ha#e been obser#ed( according to thecolour data
M . M 0 semi;conser#ed substitutions are obser#ed
The colours tell in$ormation about the amino acid .le$t column belo,/ at the gi#en "osition. To see the 7;
letter code $or amino acids( see2 htt"233gcat.da#idson.edu3con#ersionsO>B3aminoOacids3inde'.html
Take a screenshot o$ this alignment( or do,nload the $ile by right;clicking the Alignment 8ile ..aln/ $ound
in the result summary bo' at the to" o$ the "age.
To access .al/ie%( click )Start LalVie,* in the results summary bo' at the to" o$ the "age.
The LalVie, out"ut sho,s a highlighted alignment o$ the sequences. It also "ro#ides a Consensus
sequence and scoring o$ conser#ation. There are dro"do,n ,indo,s at the to" o$ the result ,indo, to
alter the #ie, or sho, calculations.
Take a screenshot o$ this alignment to record the colours and $eatures o$ the annotation. To do,nload Cust
the alignment in LalVie,( go to2 8ile : !ut"ut to Te'tbo' : )Pile"* or )P8AM* are "robably the most
use$ul #ie,s.
Phylogenetic Tree
The generated "hylogenetic tree is at the #ery bottom o$ the results "age. 1ouAll notice abo#e this is a
)9uide Tree* section. 1ou can sa#e the Guide Tree in order to submit to another tree;construction
"rogram to ha#e it generate the same tree.
The tree can be #ie,ed as a "hylogram or a cladogram.
1ou can alternate bet,een these t,o #ie,s by clicking the le$tmost button )Sho, as OOOOgram Tree*
?ight;click the tree to change #ie, o"tions. The generated trees do not ha#e a measuring scale( but i$ you
click )Sho, %istances*( the distance ,ill be dis"layed to the right o$ the lea$ name.
Take a screen shot o$ these trees. 1ou can do,nload the 9uide Tree by right;clicking the 9uide tree $ile
..dnd/ in the results summary ,indo, at the to" o$ the "age.
1ou can get a te't $ile o$ your in"ut sequences by do,nloading the In"ut $ile ..in"ut/ in the results
summary ,indo, at the to" o$ the "age as ,ell.
8or "resentation o$ these "hylogenetic trees( you can edit your screenshot "icture to better $ormat the lea$
names. -e sure not to disru"t the length o$ the branches in editing your "hylogram 0 this is
misre"resenting you data.
.!li#ia Jo;Shing( 8all 4>>P/

S-ar putea să vă placă și