Sunteți pe pagina 1din 5

Wagstaff, Bowen, Nord 1 1.

You should look for low e-values (close to 0), because it indicates the probability of the match being something else. An E value of 0 indicates that the probability of the source sequence matching the target sequence is 100%. 2. Unknown1 = Gossypium Mustelinum (0.0) Unknown2 = Brassica Rapa (0.0) Unknown3 = Citrus Sinensis (0.0) Unknown4 = Brassica Napus (0.0) Unknown5 = Colocasia Esculenta (0.0)

Unknown 1

Unknown 2

Unknown 3

Unknown 4

Unknown 5

Wagstaff, Bowen, Nord 2

3. All of the plants that ranked highest in overlaps are flowery and angiosperms, therefore we should use the angiosperm reference data. However, angiosperm reference data is not available so monocot and eudicot reference data were chosen. 4. MUSCLE uses a log-expectation algorithm to align nucleotide and protein sequences. 5. The sequences arent completely identical although they do match with similarities above 80%. Matches close to 100% may indicate that the unknown is the same species as the match. Other species with lower percent similarities may indicate species that are phylogenetically similar to the unknown species; in this sense, these close matches may indicate whether the unknown species is a eudicot, monocot, angiosperm, gymnosperm, etc. based on closeness. Misalignments can be caused by insertions or deletions that have accumulated over time. Mutations in the genes result in differences seen in similar sequences. 6. A phylogenic tree is a visual representation for the genetic relationships between different species. A tip is a terminal node, meaning the species that are present today and have not undergone further mutations. Nodes are at any branching point or where the species exists today. A branch is a point at which speciation or mutation occurs. The branch length is the time between the next branch or current species at its previous branch point. The distance is the actual time between two nodes. A common ancestor is a parent node that is shared between two species. A clade consists of all the species that exist on one side of a branch point. PHYLIP ML

Wagstaff, Bowen, Nord 3

Wagstaff, Bowen, Nord 4 PHYLIP NJ

Wagstaff, Bowen, Nord 5 7. Based on the algorithm used to make the trees, it may result in different percent likelihoods between nodes. Linnaean classification is also an algorithm that is used to predict the relationship between species, and it may not be the best representation of genetic similarities. We assume that mutations in genes occur with uniform probabilities across species, and that the number of mutations in the gene is indicative of the genetic similarity between two species. 8. Unknown3 is closely related to citrus_sinensis. Unknown2 is identical to brassica_rapas. Unknown1 is distantly related to the other unknowns (it is identical to the species at its terminal node). Unknown2 and Unkown4 sit in the Brassicacaea clade. 9. The Neighbor Joining algorithm is a greedy algorithm that selects relationships based purely on similarity. Maximum Likelihood is an algorithm that makes comparisons based on a Nave Bayes model of gene mutation. 10. Yes, it does confirm our guesses given that the clade genus names among the unknowns and the species are very similar. 11. The muddiest point is what determines the universally standard phylogenetic tree. What objective measure leads the ML tree to be a more realistically exact measure of genetic mutations over time. How can you know which tree matches reality?

S-ar putea să vă placă și