Documente Academic
Documente Profesional
Documente Cultură
V. Curcin, M. Ghanem
Department of Computing
Imperial College London
180 Queen’s Gate, London SW7 2AZ
Email: vc100@doc.ic.ac.uk, mmg@doc.ic.ac.uk
Abstract—The past decade has witnessed a growing trend in workflow system (scientific or non-scientific) can be relied on
designing and using workflow systems with a focus on supporting to cover the scope of requirements from different domains.
the scientific research process in bioinformatics and other areas
of life sciences. The aim of these systems is mainly to simplify This paper approaches the problem by analysing leading
access, control and orchestration of remote distributed scientific scientific and non-scientific workflow systems, exposing their
data sets using remote computational resources, such as EBI web handling of control and data constructs, with the view of
services. In this paper we present the state of the art in the field informing future research and also facilitating the selection
by reviewing six such systems: Discovery Net, Taverna, Triana, of the most appropriate system for a specific application task.
Kepler, Yawl and BPEL.
We provide a high-level framework for comparing the systems As a start, Discovery Net [1] system will be presented
based on their control flow and data flow properties with a to illustrate the architectural and implementation complexity
view of both informing future research in the area by academic associated with a full workflow system. Then, three other main
researchers and facilitating the selection of the most appropriate scientific workflow systems, Taverna [2], Triana [3] and Kepler
system for a specific application task by practitioners.
[4], will be described, followed by two workflow languages
I. I NTRODUCTION aiming to be a generic solution across both business and
scientific domains. First of those, YAWL [5] is a theoretical
Training set Modelling workflow system based on the Petri Net paradigm that has
Web been designed to satisfy the full set of workflow patterns,
source
under the assumption that this will satisfy the needs of both
communities. Second, BPEL [6] is the accepted standard for
Integration Pre-processing Validation Storage
RDBMS
(data-parallel) business process orchestration, with several attempts being
Staging Test set
made to adapt it for use in scientific settings, most notably
Fig. 1. Workflow example by the OMII initiative [7].
Workflow server
Collaboration Data management
Resource integration Workflow management
Fig. 3. Structure of a component in Discovery Net