Documente Academic
Documente Profesional
Documente Cultură
PII: S2214-5796(16)30041-7
DOI: http://dx.doi.org/10.1016/j.bdr.2017.01.004
Reference: BDR 56
Please cite this article in press as: I.I. Yusuf et al., Chiminey: Connecting Scientists to HPC, Cloud and Big Data, Big Data Res. (2017),
http://dx.doi.org/10.1016/j.bdr.2017.01.004
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing
this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is
published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
Chiminey:
Connecting Scientists to HPC, Cloud and Big Data
Iman I. Yusufa , Ian E. Thomasa , Maria Spichkovab , Heinz W. Schmidtb
a
RMIT University, eResearch Office, 17-23 Lygon Street, 3053, Carlton, Australia
b
RMIT University, School of Science, 124 La Trobe Street, 3000, Melbourne, Australia
Abstract
The enabling of scientific experiments increasingly includes data, soft-
ware, computational and simulation elements, often embarrassingly parallel,
long running and data-intensive. Frequently, such experiments are run in a
cloud environment or on high-end clusters and supercomputers. Many dis-
ciplines in sciences and engineering (and outside computer science) find the
requisite computational skills attractive on the one hand but distracting from
their science domain. We developed Chiminey under directions by quantum
physicists and molecular biologists, to ease the steep learning curve in data
management and software platforms, required for the complex computational
target systems. Chiminey is a smart connector mediating running specialist
algorithms developed for workstations with moderately large data set and
relatively small computational grunt. This connector allows the domain sci-
entists to choose the target platform and then manages it automatically; it
accepts all the necessary parameters to run many instances of their program
regardless of whether this runs on a peak supercomputer, a commercial cloud
like Amazon EC2 or (in Australia) the national federated university cloud
system NeCTAR. Chiminey negotiates with target system schedulers, dash-
boards and data bases and provides an easy-to-use dashboard interface to the
running jobs, regardless of the specific target platform. The smart connector
encapsulates and virtualises a number of further aspects that the domain
scientists directing our effort found necessary or desirable.
In this article we present Chiminey and guide the reader through a hands-
on tutorial of this open-source platform. The only requirement is that the
reader has access to one of the supported clouds or cluster platforms - and
very likely there is a matching one. The tutorial stages range in difficulty
from requiring no to little technical background through to advanced sections,
1
such as programming your own domain-specific extension on top of Chiminey
application programmer interfaces.
The different exercises we demonstrate include: installing the Docker de-
ployment environment and Chiminey system; registering resources for file
stores, Hadoop MapReduce and cloud virtual machines; activating hrmclite
and wordcount smart connectors – two demonstrators; running a smart con-
nector and investigating the resulting output files; and building a new smart
connector. We also discuss briefly where to find more detailed information
on, and what is involved in, contributing to the Chiminey open source code
base.
Keywords: Big data, cloud, e-science, high performance computing,
parallel processing, scientific computing, service computing, simulation
1. Introduction
In this article, we present the Chiminey platform, which provides a re-
liable computing and data management service. Chiminey enables domain
scientists, hereafter scientists, to compute on both cloud-based, big data,
and high-performance computing (HPC) facilities, handle failure during the
execution of applications, curate and visualise execution outputs, share such
data with collaborators or the public, and to search for publicly available
data without the need to have a technical understanding of cloud-computing,
HPC, fault tolerance, or data management. Many scientific experiments have
a twofold challenge: they are challenging as complicated domain-specific re-
search tasks (i.e., a complicated analysis of the quantum physics approaches),
and at the same time the corresponding computations and datasets are too
large-scale to be executed on a local desktop machine, i.e., cloud-based and
high-performance computing (HPC) solutions are required.
Any new technology usually means not only new opportunities but also
new challenges, as a technology often manages some initial knowledge acqui-
sition task of its users. Cloud computing [1] enables acquisition of very large
3
model of a cloud-based platform and the latest version of its open-source im-
plementation [4], with the emphasis on usability and reliability aspects. The
feasibility of the Chiminey platform is shown using case studies from the
Theoretical Chemical and Quantum Physics group at RMIT university.
Outline: The rest of the article is organised as follows. Section 2 provides
background information, links or contrasts our work with related work. Sec-
tion 3 introduces one of the core artifacts of Chiminey, Smart Connectors,
as well as the resources the platform provides. Section 4 presents the tuto-
rial, targeting the different types of Chiminey users. Section 5 concludes the
article and presents the core directions of our future work on Chiminey.
2. Background
In 2009, Leavitt in his widely cited1 paper [5] analysed advantages and
challenges related to cloud computing, highlighting that this type of deploy-
ment architecture becomes appealing to many companies. Now, almost 8
years later, we can see that this paradigm becomes even more and more
appealing. Another widely cited2 paper on the cloud computing paradigm
[6] presents a survey done by Zhang et al. The survey highlights the key
concepts of cloud computing, its architectural principles, state-of-the-art im-
plementation as well as research challenges.
Cloud computing provides many benefits, e.g., provisioning of virtual ma-
chines (VMs) within literally 15 minutes, when purchases of physical servers
took days or weeks; access to online storage and computing resources at a
moment’s notice; cost savings by turning virtual servers and hence charges
for them on and off at will; and not least, improved resource utilisation,
across large numbers of users in one or more data centres.
However, failure in cloud services is arguably inevitable due to config-
uration errors, continuous upgrades somewhere in the cloud software stack
or application layers, the unreliability of networks that remote services de-
pend on, and thus generally the heterogeneous character of widely distributed
systems. Yusuf and Schmidt [7] have shown in formal reliability and perfor-
mance studies, that fault-tolerance is best achieved by reflecting the static
and dynamic (behavioural) architecture of high-performance computational
programs. Compared to architecture-agnostic replication, architecture-aware
1
more than 500 citations
2
more than 600 citations
4
fault-tolerance can achieve higher reliability at lower costs, but needs to be
tuned to different architectural/behavioural patterns such as stream process-
ing, map-reduce, randomised access etc.
The development of formal models and architectures for system involved
in cloud computing is a more recent area of system engineering. Vaquero
et al. [8] studied more than 20 definitions of the term cloud computing to
extract a consensus definition as well as a minimum definition containing
the essential characteristics. As a result, they consolidated the following
definition:
Clouds are a large pool of easily usable and accessible virtual-
ized resources (such as hardware, development platforms and/or
services). These resources can be dynamically reconfigured to ad-
just to a variable load (scale), allowing also for an optimum re-
source utilization. This pool of resources is typically exploited by a
pay-per-use model in which guarantees are offered by the Infras-
tructure Provider by means of customized Service-Level Agree-
ments.
Buyya and Sulistio [9] presented a discrete-event grid simulation toolkit,
GridSim, that can be used for investigating the design of utility-oriented
computing systems such as Data Centers and Grids.
Ostermann et al. in their paper [10] stated a research question on whether
the performance of clouds is sufficient for scientific computing. They analyzed
the performance of the Amazon EC2 platform using micro-benchmarks and
kernels, and came to the conclusion that the performance and the reliability
of the tested cloud are low, and probably insufficient for scientific computing
at large.
As the cloud-based systems deal with safety and security critical data, the
formal modelling and verification of cloud architectures becomes more and
more important. Su et al. used the CSP framework to model MapReduce
system, cf. [11]. Reddy et al. [12] proposed an approach to verify the
correctness of Hadoop systems (open source implementation of MapReduce)
using model checking techniques. Our previous work on a formal model of
the Chiminey system was presented in [3, 4].
Several approaches have proposed or compared different map-reduce ap-
proaches for cloud computing, others data stream processing systems, and yet
others parametric parallel solvers using special numeric packages or Monte
Carlo walks distributed over many VMs. For example, Martinaitis et al.
5
[13] introduced an approach towards component-based stream processing in
clouds. Kuntschke and Kemper [14] presented a work on data stream sharing.
For scientific computing, it is crucial to allow researchers to build their
own workflows. There are different types of scientific workflow systems that
are designed to provide this functionality. Oinn et al. [15] presented Taverna
Workbench for the composition and execution of workflows for the life sci-
ences community. Taverna enables users to interoperate services, but does
not support the semantic integration of data outcomes of these services. Af-
gan et al. [16] introduced a Galaxy Cloud that provides an interface with
automated management of cloud computing resources, which was used to
conduct biomedical experiments. Buyya et al. [17] presented Nimrod, a set of
software infrastructure for executing large and complex computations. Nim-
rod contains a simple language for describing sweeps over a parameter space
and the input and output of data for processing. Nimrod is compatible with
the Kepler system [18], such that users can set up complex computational
workflows and have them executed without having to interface directly with
a high-performance computing system.
The contribution of the work presented in this paper is that our platform
provides drop-in components, so-called Smart Connectors (SCs), for existing
workflow engines and user-defined control of fault-tolerance: (i) researchers
can utilise and adapt existing Smart Connectors; (ii) functionality of the
target schedulers, workflow engines or middleware platforms is abstracted
but not duplicated, (iii) the SCs target can be high-performance clusters
or clouds, and (iv) new types of Smart Connectors could be developed with
little effort within the framework if necessary. To our best knowledge, there is
no other framework with these advantages. SCs are geared toward providing
flexibility and power underneath simplicity.
3. Chiminey
The Chiminey platform was implemented as a part of the Bioscience
Data Platform project [19], which is an agile software collaboration between
software engineering and applied natural sciences researchers in quantum
physics (nanomaterials) and computational biology (crystallography studies).
It was important to support these sciences minimising prior knowledge in
cloud and cluster computing to ease the use of parallel computing within a
virtual laboratory development that provided the context for our Chiminey-
related grant.
6
1 Chiminey Cluster
1...
notify status
web front *
1
Smart Connector
1...
* *
* 1...
submit job *
1 External
1 1
1 1...
email job status 1
Data Manager * Storage
* 1 1
Instruments
1 1 1...
* 1 1...
(microscopes,
submit, monitor job MyTardis * Synchrotron,
1... Chiminey HPC, etc.)
1...
* scripts
1 * Storage 1
0..1
Research
Legend (communications) Repositories
among Chiminey components
Chiminey with external components
among external components
Python was selected as the development language due to its rapid pro-
totyping features, integration with the MyTardis data curation system3 , and
due to its increasing uptake by scientists as a scientific software develop-
ment language. However, the domain-specific calculations could be written
in any language. The choice of the language depends on the domain and the
concrete research task.
The reference architecture of the platform is presented in Figure 1. In
our implementation, the data can be sent to MyTardis [20], an applica-
tion for cataloguing, managing and assisting the sharing of large scientific
datasets privately and securely over the web. MyTardis is currently used
across Australian universities in collaborative characterisation of biomedical
or advanced materials and structures at the nano- and microscale.
Configuration parameters are provided by users through a browser inter-
face using web forms, prior to execution. The forms include
1. information for computation-specific plugins to implement sequential
user algorithms;
2. cloud storage and compute resource specification, in particular number
and type of virtual machines (or processes on a cluster);
3. any parallel-pattern specific parameters that are required to coordinate
compute sweeps on behalf of the user;
3
http://mytardis.org/
7
4. fault tolerance parameters to support predefined fault tolerance policies
such as replication and restart of VMs and processes in VMs;
5. data source and sink parameters, if data is fetched from or transferred
to repositories outside the cloud.
3.1.1. Stages
For both SaaS and PaaS users, the key concept is a chiminey com-
putational stage. Each stage is a unit of schedulable computation within
Chiminey. A smart connector is composed of stages, each stage with a unique
functionality. For the SaaS user, stages define an underlying workflow that
the user can control via the Chiminey configuration panels. For the PaaS
8
user, stages can be extended by scripting, method redefinitions, by adding
functionality and configuration options or also by prefilling configuration op-
tions the SaaS user of the extended functionality does not have to deal with.
Stages are implemented using python classes with the following elements:
post-condition This is where the new state of the smart connector job
is written to a persistent storage upon the successful completion of a
stage execution. During the execution of a stage, the state of a smart
connector job changes. This change is saved via the output(self, . . .)
method.
Classes with these elements can act as stages that can be connected to
form smart connectors. The chiminey system provides a library of core stages
that can be used to create smart connectors that follow well-known compu-
tational patterns, and a PaaS user can write additional stages or specialise
these existing stages to implement additional behaviour.
The provided suite of stages of a smart support a set of key phases of
computation:
1. Data analysis: determining initial inputs for computations including
algorithm parameters and compute and storage resources.
2. Execution environment setup: creating compute resource (if applicable)
and configuring of the compute resources (e.g., bootstrap of software
requirements).
9
3. Computations: scheduling of computations onto VMs, execution, and
then waiting for output.
4. Output transfer transferring data into designated storage and/or data
curation systems like MyTardis.
5. Cleanup: decommission of allocated VM resources (if applicable).
3.1.2. Payload
For some specific types of stages, there are hooks that allow arbitrary
packages of domain-specific executables and data files to be processed within
the system. These allows the same smart connector to be parameterised on
the specific executable task. A payload is a set of system and optionally
domain-specific files that are needed for the correct execution of a smart
connector. The system files are composed of Makefiles and bash scripts while
the domain-specific files are developer provided executables. The system
files enable the Chiminey server to setup the execution environment, execute
domain-specific programs, and monitor the progress of setup and execution.
3.2. Resources
In the execution of a smart connector, external entities both computa-
tional and storage are registered within the system as compute and storage
resources (respectively). Compute resources are used for execution of tasks.
Examples include Unix hosts, Jenkins servers, PBS cluster head nodes and
cloud nodes. Storage resources are unix filesystems, either local or remote,
which can store directories and files both as the source of data and sink for
computation results.
3.3. Architecture
The Chiminey architecture relies on Docker – an automatic software de-
ployment tool [21] for software platforms and applications in the cloud. While
VMs can be transported from cloud to cloud, extended and specialised, VM
images are large and time-consuming to start up with many software plat-
forms running in the VM. Docker sits somewhere in between managing entire
VMs and managing a single software package by sharing its source code in
some open repository. To this end Docker introduces the notion of contain-
ers. Like the containers on a ship share the same ship, Docker containers
share the same operating system kernel, the same file system and disks etc.
ultimately the same VM or type of VM. Thus Docker containers start up
instantly and use fewer resources. Container layers can be added as needed.
10
These containers are based on open standards and middleware platforms that
run on all major Linux distributions, Windows and generally on top of any
infrastructure.
The Chiminey system is a composition of docker containers with spe-
cialised functions, including front end portals (a Django MVC web frame-
work), databases (Postgresql), task schedulers (Celery), task queue (Redis)
and multiple worker containers that execute jobs (Celery workers).
The basic installation is configured to run on a single container node VM,
though more sophisticated architectures can be deployed by using a multi-
node container orchestration tool such as Google Kubernetes, Docker Swarm,
or Mesos Marathon.
• Deployment
These activities install and deploy the chiminey system in a standard
configuration:
1. container infrastructure — the container framework for deploy-
ment of the chiminey components.
2. Chiminey deployment — including containers for the portal, work-
ers and databases.
3. configuration of users — creation of accounts for users of chiminey.
4. registration of smart connectors — enabling of pre-existing smart
connectors with the system.
• Usage
These activities show the operation of the running chiminey system:
1. registration of resources — Identifying local and remote storage
and computation resources and providing identifying handles with-
in Chiminey.
11
2. creation of jobs — registration of new executions by identifying
resources and key parameters.
3. creation of new smart connectors — extending existing and build-
ing new smart connectors.
4.1.1. Docker
Purpose. In this section, you will create a virtual machine (VM) to
run docker, either on Mac or Windows. Refer to the Docker manuals [21] if
you have a Linux OS.
4
This exercise is inspired by the official Docker website docs [21].
12
• Run docker engine.
$ docker run hello-world
Expected output. You will see a message similar to the one below.
Unable to find image ’hello-world:latest’ locally
latest: Pulling from library/hello-world
03f4658f8b78: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:8be990ef2aeb16dbcb92...
Status: Downloaded newer image for hello-world:latest
• Run docker-compose
$ docker-compose --version
13
4.1.2. Chiminey Deployment
Purpose. In this section, you will deploy a Chiminey platform.
14
Expected output.
DOCKER_HOST=tcp://IP:port
E.g., DOCKER_HOST=tcp://192.168.99.100:2376
• Open a browser and visit the Chiminey portal at IP, in our ex-
ample, http://192.168.99.100.
Expected output. After a while, the Chiminey portal will be shown.
15
Expected output. You will be redirected to a webpage that displays a
list of jobs. Since no jobs are run yet, the list is empty.
The syntax to add any of the smart connectors that are included with
the platform is $ ./activatesc smart-connector-name.
3. Verify the smart connector is successfully activated.
• Open a browser and visit the Chiminey portal.
• Login with your regular username and password.
• Click Create Job.
16
Expected output. hrmclite will appear under the Smart Connectors list.
17
(a) Open a browser and visit the Chiminey portal
(b) Login with your credentials
18
Expected output. The new resource will be displayed under the HPC - Cluster
or Standalone Server list.
Expected output. The newly added resource will be displayed under the Re-
mote File System list.
Expected output. The new resource will be displayed under the Analytics -
Hadoop MapReduce list.
19
Exercise 10. Here, we focus on updating registered resources.
1. Click Settings.
2. From the Settings menu, depending on which resource you wish to up-
date, click either Compute Resource or Storage Resource. All registered
resources will be listed.
3. Locate the resource you wish to update, then click Update.
4. Make the changes, and when finished click Update.
Expected output. The resource will be listed with its new details.
• Presets: The end-user can save the set of parameters values of a job as a
preset. Each preset must have a unique name. Using the unique preset
name, the end-user can retrieve, update and delete saved presets.
• Compute resource: This section includes the parameters that are needed
to utilise the compute resource associated with the given SC. Hadoop
compute resources need only the name of the registered Hadoop cluster
(see Exercise 9), while the cloud compute resource needs the resource
name as well as the total VMs that can be used for the computation.
Note that the names of all registered compute resources are automati-
cally populated to a dropdown menu on the submission form.
20
Figure 2: Job Submission UI for wordcount SC
21
ity parameters: reschedule failed processes and maximum retries.
• Data curation resource: This section provides the parameters that are
needed to curate the output of a SC job. The section includes a drop-
down menu that is populated with the name registered data curation
services like MyTardis.
22
5. Submit the job. Before submitting the job, you can save the parameter
values through presets.
Expected output. You will be redirected to the job monitoring page with the
new job listed at the top.
Expected output. You will be redirected to the job monitoring page with the
new job listed at the top.
23
The Jobs page also allows researchers to terminate submitted jobs. To
terminate a job, check the box at the end of the status summary of the
job, click the Terminate selected jobs button at the end of the page. The
termination of the selected jobs will be scheduled. Depending on the current
activity of each job, terminating one job may take longer than the other.
When a SC job is completed, login to your storage resource. The output
is located in the offset directory under the root path of the resource.
Payload
A payload has the following structure:
payload name
bootstrap.sh
process payload
main.sh
schedule.sh
domain-specific executables
24
The names of the files and directories under payload name, except the
domain specific ones, cannot be changed. bootstrap.sh includes instructions
to install packages, which are needed by the SC job, on the compute resource.
schedule.sh is needed to add process-specific configurations. Some SCs
spawn multiple processes to complete a single job. If each process needs to
be configured differently, the instruction on how to configure each process
should be recorded in schedule.sh. main.sh runs the core functionality of
the SC, and writes the output to a file. domain-specific executables are
additional files that are needed by main.sh.
Not all SC jobs require new packages to be installed, process-level configu-
ration or additional domain-specific executables. On such cases, the minimal
payload, as shown below, can be used.
payload name
process payload
main.sh
payload randnum
process payload
main.sh
OUTPUT_DIR=$1
echo $RANDOM > $OUTPUT_DIR/signed_randnum
5
chiminey.initialisation.coreinitial.CoreInitial
25
date > $OUTPUT_DIR/signed_randnum
class RandNumInitial(CoreInitial):
def get_ui_schema_namespace(self):
schemas = [
settings.INPUT_FIELDS[’unix’],
settings.INPUT_FIELDS[’output_location’],
]
return schemas
# ---EOF ---
Registration
The final step is registering randnum SC with the Chiminey server. The
details of this SC will be added to the dictionary, SMART CONNECTORS, in
chiminey/settings changeme.py. The details include a unique name (with
no spaces), a python path to RandNumInitial class, the description of the
SC, and the absolute path to the payload.
"randnum": {
"name": "randnum",
"init": "chiminey.randnum.initialise.RandNumInitial",
"description": "Randnum generator, with timestamp",
"payload": "/opt/chiminey/current/payload_randnum"
},
6
The SC package must be under the /opt/chiminey/current/chiminey/ hierarcy
26
Restart the Chiminey server and then activate randnum SC.
$ sh restart
$ ./activatesc randnum
Expected output. randnum SC will appear under the Smart Connector list of
the Chiminey portal (see Exercise 5).
Now we are ready to submit the randnum job (see Figure 4). We only need
to select the compute resource name, and provide the storage resource for
transferring the output of the computation. Recall, storage resource name/offset
is a location (see Section 4.2.3). Suppose we have already registered a storage
resource with the name mystor and a root path of /root/chiminey home and
let the offset be randomnumber. We set the output location to mystor/randomnumber.
Submit the job and when the job is completed, login to your storage resource
and check the contents of randomnumber under the root path of the resource,
in the case of mystor, /root/chiminey home.
Reflections
In this exercise, we have shown how to create a smart connector. Even
though we have used a simple random number generator, the tasks that are
involved for other programs are similar. If the program can be executed
on a cloud, cluster or Hadoop, then this program can be packaged as a
27
smart connector. The huge benefit of using the Chiminey server to run your
program is you dont need to worry about how to manage the execution of
your program on any of the provided compute resources. For instance, if we
want to generate random on a cloud VM, we need to change only one word
in get ui schema namespace method. Replace unix by cloud. Then restart
Chiminey, and activate your cloud-based random generator. We encourage
the reader to check the examples in the Chiminey documentation [23]. They
show how to create various types of smart connectors: cloud-based, Hadoop-
based, sweeps, reliability, data curation resources.
5. Summary
In this article we surveyed research methods and tools that lower the bar-
riers to accessing high-performance computing, big data and cloud resources
for domain scientists. We introduced Chiminey - a platform that evolved
in the bioscience and material science communities with our e-science soft-
ware research perspective complementing the scientific methods in these dis-
ciplines. The article summarised prior and current research in parallel and
distributed software architecture and fault-tolerance underpinning Chiminey.
We briefly mentioned various uses of the platform since its inception but
focussed largely on its architecture and simplified conceptual view of and
usage for scientific workflow automation, when big data is at the centre of
experiments, simulation and computation and when computational resources
include high-performance computing and/or cloud computing.
A large second part of the article is dedicated to hands-on tutorials
demonstrating the Chiminey system through a number of different scenarios
of usage and stages in the science workflow. We demonstrated:
28
is licensed with a New BSD license, and scientists are invited to contribute
their new stages and connectors into the core library of functionality (see
http://chiminey.net) for others to utilise and extend.
Beyond its intial purpose, chiminey has been used by further domain
scientists such as engineering disciplines for testing of software-intensive en-
gineered systems and also ab initio quantum physics simulation.
Acknowledgement
The Bioscience Data Platform project acknowledges funding from the
NeCTAR project No. 2179.
[5] N. Leavitt, Is cloud computing really ready for prime time?, Computer
42 (2009) 15–20.
29
[7] I. Yusuf, H. Schmidt, Parameterised architectural patterns for providing
cloud service fault tolerance with accurate costings, in: Proc. of the 16th
Int. ACM Sigsoft Symp. on Component-Based Software Engineering, pp.
121–130.
[11] W. Su, F. Yang, H. Zhu, Q. Li, Modeling mapreduce with csp, in:
3rd IEEE International Symposium on Theoretical Aspects of Software
Engineering, 2009. TASE 2009., pp. 301–302.
30
[15] T. Oinn, M. Greenwood, M. Addis, et al., Taverna: Lessons in Cre-
ating a Workflow Environment for the Life Sciences, Concurrency and
Computation: Practice and Experience 18 (2006) 1067–1100.
[19] NeCTAR, 2015. The National eResearch Collaboration Tools and Re-
sources. http://www.nectar.org.au/.
31
2) Reviewer 1 asks for an overview of the overall workflow before the div-
ing into hands-on tutorial elements. We expanded the section on Chiminey
stages, a concept that allows overall workflows to be organised hierarchically
and sequentially while permitting parallelism and, in fact, complex compu-
tational patterns at a further level of refinement.
3) Also we revised the tutorial to emphasise and separate deployment
from different usages of the method/tool.
We have not eliminated the tutorial, but would like to point out that this
special BDR issue explicitly encouraged tutorial papers and in fact even has
the term ’tutorials’ in its title.
Reviewer 1 clearly sees the value in a tutorial and specifically in methods
and tools that support acceleration of scientific method in non-CS sciences
bringing CS research to bear.
We hope that the ’researchy’ of this revised paper meets the stringent
requirements of original multi-diciplinary research that is expected for BDR
generally, and that the remaining tutorial assists digitally savvy researchers
in other disciplines to learn and trial our method/tool in a self-provisioned
way, and perhaps draws them into the open e-Science community that is
growing around our tool outside of the CS discipline.
33