Machine Learning Tools: (Scherf Et. Al. 2005)

Bioinformatics and Computational Biology
understand protein function. Techniques avail- disciplines such as statistics, graph theory and
able to predict the structure depend on the level network analysis. An applied approach to the
of similarity to previously determined protein methods of systems biology is presented in detail
structures. In homology modeling, for instance, by Winzeler (2006).
structural information from a homologous protein
is used as a template to predict the structure of a Biological systems are complex and constantly
protein once the structure of a homologous protein in a state of flux, making the measurement of these
is known. State of the art in protein prediction is systems difficult. Data used in bioinformatics today
regularly gauged at the CASP (Critical Assess- is inherently noisy and incomplete. The increasing
ment of Techniques for Protein Structure Predic- integration of data from different experiments adds
tion, http://predictioncenter.org/) meeting, where to this noise. Such datasets require the application of
leading protein structure prediction groups predict sophisticated machine learning tools with the ability
the structure of proteins that are experimentally to work with such data.
verified.
6. Comparative genomics and phylogenetic analysis. Machine Learning Tools
In this field, the main goal is to identify similarities
and differences between sequences in different A complete suite of machine learning tools is freely
organisms, to trace the evolutionary processes that available to the researcher. In particular, it is worth
have occurred in the divergence of the genomes. noting the extended use of classification and regres-
In bacteria, the study of many different species sion trees (Breiman, 1984), Bayesian learning, neural
can be used to identify virulence genes (Raskin networks (Baldi, 1998), Profile Markov models (Durbin
et. al. 2006). Sequence analysis commonly relies et. al. 1998), rule-based strategies (Witten, 2005), or
on evolutionary information about a gene or gene kernel methods (Schölkopf, 2004) in bioinformatics
family (e.g. sites within a gene that are highly applications. For the interested reader, the software
conserved over time imply a functional importance Weka (http://www.cs.waikato.ac.nz/ml/weka/, Witten,
of that site). The complexity of genome evolution 2005), provides an extensive collection of machine
poses many exciting challenges to developers of learning algorithms for data mining tasks in general,
mathematical models and algorithms, who have and in particular bioinformatics. Specifically, imple-
developed novel algorithmic, statistical and mentations of well-known algorithms such as trees,
mathematical techniques, ranging from exact, Bayes learners, supervised and unsupervised classi-
heuristics, fixed parameter and approximation fiers, statistical and advanced regression, splines and
algorithms. In particular, the use of probabilis- visualization tools.
tic-based models, such as Markov Chain Monte
Carlo algorithms and Bayesian analysis, has Bioinformatics Resources
demonstrated good results.
7. Text mining. The number of published scientific The explosion of bioinformatics in recent years has
articles is rapidly increasing, making it difficult lead to a wealth of databases, prediction tools and utili-
to keep up without using automated approaches ties and Nucleic Acids Research dedicates an issue to
based on text-mining (Scherf et. al. 2005). The databases and web servers annually. Online resources
extraction of biological knowledge from unstruc- are clustered around the large genome centers such as
tured free text is a highly complex task requiring the National Center for Biotechnology Information
knowledge from linguistics, machine learning (NCBI, http://www.ncbi.nlm.nih.gov/) and European
and the use of experts in the subject field. Molecular Biology Laboratory (EMBL, http://www.
8. Systems biology. The integration of large amounts embl.org/), which provide a multitude of public data-
of data from complementary methods is a major bases and services. For a useful overview of available
issue in bioinformatics. Combining datasets links see “The Bioinformatics Links Directory” (http://
from many different sources makes it possible bioinformatics.ubc.ca/resources/links_directory/, Fox
to construct networks of genes and interactions et. al. 2005) or http://www.cbs.dtu.dk/biolinks/.
between genes. Systems biology incorporates

Machine Learning Tools: (Scherf Et. Al. 2005)

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Machine Learning Tools: (Scherf Et. Al. 2005)

Încărcat de

Drepturi de autor:

Formate disponibile

Bioinformatics and Computational Biology

S-ar putea să vă placă și