Sunteți pe pagina 1din 1

Bioinformatics and Computational Biology

In addition to the use of online tools for analysis, it dimensional data, the introduction of string kernels to
is common practice for research groups to access these process biological sequences, or the development of B
databases and utilities locally. Different communities methods to learn from several kernels simultaneously
of bioinformatics programmers have set up free and (‘composite kernels’). The reader can find references
open source projects such as EMBOSS (http://emboss. in the book by Schölkopf et al. (2004), and a compre-
sourceforge.net/, Rice et al., 2000), Bioconductor (http:// hensive introduction in (Vert, 2006). Several kernels
www.bioconductor.org), BioPerl, (http://www.bioperl. for structured data, such as sequences, trees or graphs,
org, Stajich et al., 2002), BioPython (http://www. widely developed and used in computational biology,
biopython.org), and BioJava (http://www.biojava.org), are also presented in detail by Shawe-Taylor and Cris-
which develop and distribute shared programming tools tianini (2004).
and objects. Several integrated software tools are also In the near future, developments in transductive
available, such as Taverna for bioinformatics workflow learning (improving learning through exploiting un-
and distributed systems (http://taverna.sourceforge. labeled samples), refinements in feature selection and
net/), or Quantum 3.1 for drug discovery (http://www. string-based algorithms, and design of biological-based
q-pharm.com/home/contents/drug_d/soft). kernels will take place. Certainly, many computational
biology tasks are transductive and/or can be specified
through optimizing similarity criteria on strings. In
addition, exploiting the nature of these learning tasks
FUTURE TRENDS through engineered and problem-dependent kernels
may lead to improved performance in many biologi-
We have identified a broad set of problems and tools cal domains. In conclusion, the development of new
used within bioinformatics. In this section we focus methods for specific problems under the paradigm of
specifically on the future of machine learning within kernel methods is gaining popularity and how far these
bioinformatics. While a large set of data mining or methods can be extended is an unanswered question.
machine learning tools have captured attention in the
field of bioinformatics, it is worth noting that in ap-
plication fields where a similarity measure has to be CONCLUSION
built, either to classify, cluster, predict or estimate,
the use of support vector machines (SVM) and ker- In this chapter, we have given an overview of the field
nel methods (KM) are increasingly popular. This has of bioinformatics and computational biology, with its
been especially significant in computational biology, needs and demands. We have exposed the main research
due to performance in real-world applications, strong topics and pointed out common tools and software to
modularity, mathematical elegance and convenience. tackle existing problems. We noted that the paradigm
Applications are encountered in a wide range of prob- of kernel methods can be useful to develop a consis-
lems, from the classification of tumors to the automatic tent formalism for a number of these problems. The
annotation of proteins. Their ability to work with high application of advanced machine learning techniques
dimensional data, to process and efficiently integrate to solve problems based on biological data will surely
non-vectorial data (strings and images), along with a bring greater insight into the functionality of the human
natural and consistent mathematical framework, make body. What the future will bring is unknown, but the
them very suitable to solve various problems arising challenging questions answered so far and the set of
in computational biology. new and powerful methods developed ensure exciting
Since the early papers using SVM in bioinformatics results in the near future.
(Mukherjee et. al., 1998; Jaakkola et. al., 1999), the
application of these methods has grown exponentially.
More than a mere application of well-established
REFERENCES
methods to new datasets, the use of kernel methods
in computational biology has been accompanied by Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lip-
new developments to match the specificities and the man, D.J. (1990) Basic local alignment search tool. J
needs of bioinformatics, such as methods for feature Mol Biol. Oct 5;215(3):403-10.
selection in combination with the classification of high-



S-ar putea să vă placă și