SlideShare a Scribd company logo
1 of 8
Download to read offline
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
ISSN 1738-6535 © Web Services Research Foundation
36

Abstract—Life scientists use a variety of bioinformatics
software tools to perform tasks such as annotation of DNA and
protein sequences. Most of these tools are command-line driven
and handle various data types (nucleotide, protein, etc.) and data
formats (Fasta, Genbank, GCG, etc.). As many bioinformatics
software tools are generally involved in analysis tasks, scientists
are more and more requiring that these heterogeneous
bioinformatics tools be integrated in a uniform way. They are also
requiring graphical user interfaces of these tools, and the ability to
compose workflows without much programming effort. In this
paper, we propose a Web services based framework that meets the
above requirements.
Index Terms—Web services, data and tools integration,
biological workflows.
I. INTRODUCTION
IFE science data and application integration and
interoperability are one of the most challenging problems
facing bioinformatics today. Indeed, to enable the discovery of
new biological insights and to create a global perspective from
which unifying principles in biology can be discerned,
scientists have to interpret many types of information from a
variety of sources. These sources include nucleotide and amino
acid sequences, protein domains, protein structures, and gene
expression profiles.
The structure of biological data has its own characteristics
which make it apart from data in other domains. Biological data
exists in the form of terabytes of nucleotide sequence data,
microarray and other image data, and various other forms of
data that result from both experimental and “in silico” research
efforts. Due to this huge amount of data, it is often impossible,
without the support of additional hardware and software
facilities, to interpret and to understand this data.
Manuscript received March 15, 2009.
Elarbi Badidi is an Assistant Professor of computer science at the College of
Information Technology (CIT) of United Arab Emirates University, Maqam
Campus, PO. Box 17551, Al-Ain, UAE. (Phone: 971-3713-5552; Fax: 971– 3
–762 6309; e-mail: ebadidi@ uaeu.ac.ae).
M. Vall Mohamed Salem, is an Assistant Professor of computer science at
the University of Wollongong in Dubai, UAE (e-mail: salem@uow.edu.au).
Salah Bouktif is an Assistant Professor of computer science at the College of
Information Technology (CIT) of United Arab Emirates University, Maqam
Campus, PO. Box 17551, Al-Ain, UAE. (Phone: 971-3713-5523; Fax: 971– 3
–762 6309; e-mail: salahb@ uaeu.ac.ae).
Larbi Esmahi is an Associate Professor at the School for Computing &
Information Systems, Athabasca University, Athabasca, Alberta, Canada
(e-mail: larbie@athabascau.ca).
While genomic data have a well-known representation as
sequences taken from the {A,C,G,T} alphabet, there is no clear
model for data representing the expression products of genes:
proteins and higher forms of organisms e.g., cells and the
multitude of forms they assume in response to environmental
challenges.
To accomplish tasks, such as annotation and manipulation of
DNA and protein sequences, and comparison of genes and
genomes across species, life scientists have to use a variety of
bioinformatics analysis software tools. These tools may use
various data types (nucleotide, protein, taxonomy, etc.) and
data formats (Fasta, Staden, Embl, Genbank, etc.). Most of
them are originally stand-alone, command-line driven, with
textual input and output. Moreover, the types and styles of their
inputs and outputs are similarly variable as there is no
standardization for parameters usage. The lack of graphical
user interfaces (GUI) makes them cumbersome for the end
user.
Another major bottleneck is that most of these software
applications are incompatible with one another as they use
different file formats. As a consequence, the output of one tool
cannot be used as an input for another, without data format
conversion. A further complication is that the user has to define
a multitude of parameters and options according to the
particular data or aim of the analysis. There is no standard way
to describe their input parameters and output results. In this
paper, we use the terms tool and application interchangeably.
In the past few years, two main approaches have been
considered to deal with the issue of bioinformatics tools
integration. The first approach consists in developing and
deploying locally interactive environments in order to facilitate
bioinformatics analyses. Examples of such environments are:
Isys [1], Turbobench [2], and Applab [3]. The drawback of this
approach is that the integration environment as well as the tools
must be installed and configured locally, which requires
substantial IT expertise.
The second approach consists in taking benefit of the
growing use of the Web to make the tools accessible through
Web interfaces using HTML and various scripting languages
(CGI, Perl, etc.) and technologies (Java, RMI, EJB, and
CORBA). Examples of environments adopting this approach
are: Bionavigator [4], NCSA Biology Workbench [5], and
Anabench [6].
The recent development in terms of distributed computing
technologies, have led to the Web and Grid services
technologies, which promise to alleviate the integration and
A Web Services based Framework for Uniform
Integration of Command-line Bioinformatics
Software Tools
Elarbi Badidi, M. Vall Mohamed Salem, Salah Bouktif, and Larbi Esmahi
L
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
ISSN 1738-6535 © Web Services Research Foundation
37
interoperability issues. In this paper, we present our proposed
Web-services based framework for bioinformatics tools
integration, which allows wrapping applications as Web
services. In contrast to the above frameworks, in which the
tools are accessed only from a Web interface, our framework
will allow accessing application services through a Web portal
as well as programmatically by exporting their WSDL [7] files.
The Web portal may be implemented using various Web
technologies such JSP, ASP, and JSF.
II. RELATED WORK
The trend now is to use Web and Grid services technologies
as well as semantic Web services to solve the integration
problem not only in life sciences but in enterprise (or business)
integration as well. Examples of life sciences frameworks using
these technologies are Soaplab, myGrid, and BioMoby.
Soaplab [8] is a soap-based programmatic interface to
command-line applications on remote computers. Soaplab uses
Apache Axis [9] to create Sun Java implementation classes and
deployment descriptors for all derived analysis services. It uses
CORBA on the server side to find, start, control, and use
applications. Soaplab has been developed within the UK
e-Science initiative as a component of the myGrid [10] project.
myGrid provides high-level open source grid middleware to
facilitate building high level services for data and application
resource integration such as resource discovery, workflow
execution and distributed query processing. The emphasis is on
data integration, and workflow, personalization. Using the
myGrid workflow construction tool Taverna [11], workflows
can be composed with semantic descriptions and published.
BioMoby [12] is an open source project that aims to provide a
framework for the discovery, representation, integration, and
retrieval of biological data from widely disparate data hosts and
analysis services using Web services technology. myGrid and
BioMoby are very ambitious projects that aim to achieve
integration of biological data distributed worldwide in disparate
sources.
Our framework shares the same goals with Soaplab, which is
also designed to provide Web services wrappers for
command-line applications, mainly to the Emboss
bioinformatics package [13]. However, the main difference
between our framework and Soaplab resides in the approach
used for describing input and output of the analysis programs.
In soaplab, the service generation mechanism expects inputs
and outputs described in the ACD language of the Emboss
package, while our service generation expects inputs and
outputs described in XML using our XML schema for tools
description that we have developed. The utilization of XML
and XML Schema greatly reduces the complexity of dealing
with heterogeneous analysis tools.
By developing Web services wrappers for these
command-line tools, we can overcome many of the limitations
mentioned above. By using the Java Architecture for XML
Binding (JAXB), we can implement very easily these Web
service wrappers as well as the user interfaces of the tools from
their XML descriptors. Also, the composition of Web services
using standards such as BPEL [14] will facilitate the
composition of biological protocols.
III. BACKGROUND
A. Command-line interfaces
A command line interface (CLI) is a user interface to a
computer's operating system or a software application in which
the user responds to a visual prompt by typing in a command on
a specified line to perform specific tasks, receives a response
back from the system, and then enters another command, and so
forth. The Unix Prompt and the MS-DOS Prompt in a Windows
operating system is an example of the provision of a command
line interface. CLIs are often used by system administrators and
programmers in engineering and scientific environments. A
CLI is often used when a large vocabulary of commands,
together with a wide range of options, can be entered more
rapidly as text than with a pure GUI. This is typically the case
with operating system command shells.
Today, most users prefer the graphical user interface (GUI)
offered by Windows, Mac OS, and others. Typically, most of
today's Unix-based systems and software applications, such as
MYSQL and MATLAB, offer both a command line interface
and a graphical user interface with the benefits of both.
In a CLI, commands are typically written in a particular way.
For example, the command is typed first with no spaces in the
name. Then after a space, one can sometimes modify the
command by adding what are called “options” or “parameters”.
Options change or limit the way the command is executed.
They are usually preceded by a dash or another symbol. A
command may also include the name of a file or directory that
one wants the command to work on. The finished command
will look something like this.
command -option file
command -option sourceFile destinationFile
Fig. 1. Example of command-line options
A CLI defines a grammar, a set of rules that all commands
within the CLI must follow. This is the case for UNIX
operating system. These rules may be different from one CLI to
another. Therefore, with the heterogeneity of rules, it’s only
through the documentation of the CLI that one can learn how to
run the commands with the right options.
In bioinformatics, several software packages and tools, such
as Emboss provide only a command line interface. Fig. 1
provides a short description of the options that can be used with
the seqret tool of Emboss.
B. Web services in the Life Sciences
With increasing acceptance among software vendors and
rising adoption in the marketplace, Web Services are becoming
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
ISSN 1738-6535 © Web Services Research Foundation
38
the basis for many Web-based applications. They are
interoperable across platforms and neutral to languages, which
makes them appropriate for access from heterogeneous
environments, enabling mass dissemination of knowledge. In
the life sciences, adopting the Web services technology is seen
as a key to achieve the coordination and interoperability among
incompatible bioinformatics applications available from
different providers, an endeavor that is becoming more and
more significant to biological research.
Within the life sciences community, in which scientists
spend a great deal of their time using various incompatible tools,
accessing remote databases, copying and pasting data to
combine their analyses, converting data formats, and
assembling results in ad-hoc protocols, the service oriented
approach for accessing bioinformatics tools and databases as
services in a standardized fashion has very quickly caught the
interest of several organizations and research centers that have
published their tools as Web services. The National Center for
Biotechnology Information (NCBI;
http://www.ncbi.nlm.nih.gov/) has published its Entrez
Utilities as Web Services. The DNA Data Bank of Japan
(DDBJ; http://www.ddbj.nig.ac.jp/) provides a Web API for
biology (WABI), which includes several services. The
European Bioinformatics Institute Web service (EBI;
http://www.ebi.ac.uk/) provides programmatic access to data
retrieval and analytical tools for several molecular databases.
C. Biological workflows
Complex analysis, annotation, and data integration typically
involves various bioinformatics tools. In the past few years,
several software environments and platforms emerged to
enable, the orchestration, and execution of these tools in
biological workflows. Examples of such systems include
Bionavigator [4], Turbobench [2], Pegasys [15], w3h [16],
G-Pipe [17], Biopipe [18], VIBE [19], Flosys [20], and Pise
[21].
With the current trends towards using Web services
technology in life sciences, new environments and frameworks
have emerged to enable the orchestration and the execution of
biological Web services. The most notable systems include
Taverna (part of the myGrid project) [11] and BioMoby [12] as
mentioned earlier. Moreover, the different standards in the area
of Web services (WS_* standards) in particular the standards
for service composition, including the Business Process
Execution Language for Web Services (BPEL4WS) and the
Business Process Modeling Language (BPML), will allow to
go further in biological data and tools integration.
D. Workflow use case: phylogenetic analysis
A phylogenetic analysis workflow of newly sequenced
protein genes involves the following steps:
1) Translation of the nucleic acid sequence to the
corresponding peptide sequence in six frames (e.g. using
tools such as transeq [13] or ExPASy translate tool [22]),
2) Identification of ORFs (Open reading frame) that
correspond to conserved proteins by similarity search
(e.g. using tools such as blast [23] or getorf [13]),
3) Retrieval of protein sequences from GenBank [24] (e.g.
using Entrez [25] or seqret [13]),
4) Multiple protein alignment (e.g. using clustalw [26]),
5) Extraction of well aligned sequence stretches [27],
6) Tree inferences based on various models of evolution, and
7) Tree testing (e.g. using phylip [28] and consel [29]).
Typically, such analysis pipelines are employed several times
with different data sets or parameters.
IV. FRAMEWORK OVERVIEW
A. Objectives
With the growing interest in data and tools integration in life
sciences and the limited number of integration frameworks
based on Web services, we set out to develop a framework that
allows to:
1) Develop a Web service wrapper around each
command-line tool,
2) Make unified and remotely accessible the interfaces of
these tools,
3) Hide their dependencies on the underlying operating
systems, and
4) Access these tools programmatically in order to be able to
compose workflows describing many biological protocols.
By converting the command-line applications into Web
services, we can overcome many of the limitations and
heterogeneity of styles of these applications we mentioned in
the introduction. In this paper, an application service is an
application with a Web service interface that is described by
WSDL document as collections of network endpoints, or ports.
To wrap a command-line tool as a Web service, we first
describe the tool properties and its parameters in XML. We
then translate the XML specification of the tool, which
describes its parameters, its data types and data formats, into
WSDL, and then create an entry in the UDDI registry [30] to
advertise the WSDL specification. Web services clients can
then look up the WSDL in the registry and interact with the tool
as a Web service. Client applications can be written in various
programming languages.
The XML description of the data types and the data formats
supported by command-line tools greatly reduces the
complexity of their composition into workflows. Two given
tools may be composed in a workflow in a serial fashion
provided that the data type and data format of the first tool
output are compatible with the data types and data formats of
the second tool input. In the case of incompatibility of data
formats, a data format conversion tool, such as readseq [31], is
introduced between the two given tools in the workflow. The
conversion is then performed without manual user intervention
and without data loss. However, readseq is not recommended
for very large (100+MB) sequence files, whether as a single
record or multiple records.
B. System components
The architecture of the system is shown in Fig. 2. It is a
three-tier architecture composed of a client layer providing
presentation logic, a middle-tier layer containing the business
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
ISSN 1738-6535 © Web Services Research Foundation
39
logic, and a back-end database. Users may interact with the
system through a portal using HTML and JSP (Java Server
Pages) screen forms, for example, or by invoking the
application services programmatically. The requests of the
application services submitted by users at the presentation level
are handled by an application server equipped with a Servlet
and JSP container. Requests may be invocation of individual
services or may be part of a workflow.
The Workflows manager allows the user to compose
workflows from the application services generated from
command-line applications. This may be performed from a user
interface or programmatically given that the input and output
types and formats of applications are described for each
application as we will see in the next section. At the back-end
level, we find the command-line applications, the session
management database which stores information about users’
sessions, and the service registry.
We have designed the system architecture based on the
service oriented architecture principles, and especially on using
XML to describe tools and their parameters. The utilization of
XML and XML schema greatly reduces the system complexity
and deals with the heterogeneity of applications.
Fig. 2. Framework components
The key component of the system is the XML schema we
have developed to describe the applications and their
parameters. This XML schema catches most of the different
situations and cases of applications and parameters. For
instance, with this schema we should be able to describe
various applications that are handling several data types
(nucleotide, protein, taxonomic, etc.), several data formats
(Fasta, Genbank, Embl, etc), as well as heterogeneous
parameters with different types and syntax for assigning values
to the parameters. From the XML description of an application,
we may generate the user interface in the form of HTML and
JSP pages, for example, and the Web service associated to the
application. This is performed by the User Interface generator
and the Web service Generator components. The WSDL of the
application service is then published to the service registry.
C. Bioinformatics tools as Web services
Many platforms are now available to develop Web services
applications. Some of them are commercial products, while
others are freely available, such as Jakarta Tomcat Web server
and the Apache Axis toolkit. Using this toolkit, for example,
one can convert a Java application into a Web service.
To build a Web service for a given tool, we have to write a Java
application to invoke and execute this tool. One can use the
following Java class, described in Fig. 3, to run an application,
which is external to the Java virtual machine (JVM).
Fig. 3: a Java class to run applications external to the JVM.
The method runApplication() requires the name of the
application (name of the executable program), the path to the
application, and the arguments to be passed to the application
(input, output, and other options of the tool).
To illustrate this, we will consider a small tool called infoalign,
which is part of the Emboss package. Infoalign is a small utility
to list some simple properties of sequences in an alignment. The
above code fragment can be converted into a Web service with
the following methods: setInfoalignInput(),
setInfoAlignOutput(), setInfoAlignOptions(), and
runInfoalign(). A WSDL file for this Web service will be
automatically created. This feature is provided by most Web
services development platforms. This file may be accessed, for
example, from:
http://localhost:8091/axis/services/RunInfoAlign.wsdl
Using this WSDL file, client applications may be created to
consume the newly created Web service. Under this scheme,
the client should have prior knowledge about the parameters of
the application - input, output, and options- with their syntax
and order. To solve this problem, information about the
application parameters may be part of the information that may
be obtained from the Web service. So, by adding a new method
called getInfoalignParameters() to the Web service, the client
can get a description of the parameters of the infoalign
application, and then he can customize the user interface to
invoke the infoalign Web service. This solution is very
simplistic as the description of the application and its
parameters is more complex because of the variety of tools and
their parameters.
D. Application and parameters description
Prior to generating a Web service for a command-line
application, the parameters of the application should be
described in a structured way that can be used by programs. As
these command-line applications are developed by several
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
ISSN 1738-6535 © Web Services Research Foundation
40
programmers, and implemented in various programming
languages, they do not follow the same rules for specifying
their parameters. Therefore, coming up with a general
description schema of the application parameters is very
tentative. We have formalized the above descriptions of
applications and parameters by defining the XML schema for
both application and parameters descriptions.
The main elements describing an application are:
application name, application description, version number,
category, documentation URL, application path, minimum
number of input data, maximum number of input data, input
types, input formats.
Fig. 4. Definition of input types and input/output data formats.
An application tool may handle one or more of data types,
including: protein, nucleotide, taxonomic, and result. Also, an
application tool may handle one or several input and output
data formats. This is described in Fig. 4 using XML schema
types.
The output of an application may be as well in one of the
above data formats. By describing the data types and data
formats handled by an application, we can compose workflows
by connecting outputs of an application service to the inputs of
other application services.
The main elements describing a parameter of an application
are: parameter name, parameter description, type, the option
used to invoke the parameter, the value to be assigned to the
parameter, the syntax describing how a value is assigned to the
parameter, the min and max value in the case of integer
parameters, and default values. Each parameter belongs to one
following types: IntegerParameter, FloatParameter,
StringParameter, SwitchParameter, ChoiceListParameter,
FileParameter, and SequenceParameter. The complete XML
schema is available at:
http://faculty.uaeu.ac.ae/ebadidi/applSchema.xsd
V. IMPLEMENTATION
A. Generic XML Schema
A prototype of our proposed framework is under
construction. We have developed The XML schema for
application and parameters description, using Stylus Studio
enterprise edition [32]. This schema was used to develop and
validate the description of some bioinformatics tools such as
clustalw, infoalign, seqret, transeq, pepcoil, and silent. A
template has been also generated from the schema to allow easy
description of any command-line analysis tool. Using this
template and the textual documentation of a given tool, one can
create its XML descriptor that may be validated against our
XML schema. Fig. 6 shows an extract from the XML
description of the clustalw application for multiple alignments.
B. Web service and User interface generation
To implement the Java Web service for a given XML
descriptor, we are using the Java Architecture for XML
Binding (JAXB 2.0), which provides a fast and convenient way
to bind between XML schemas and Java representations,
making it easy for Java developers to incorporate XML data
and processing functions in Java applications. As part of this
process, JAXB provides methods for un-marshalling XML
descriptor documents into Java content trees of data objects
instantiated from the generated JAXB classes. These content
trees are then used to implement associated Web services and
user interfaces using JSP and HTML. This process is illustrated
in Fig. 5.
Fig. 5. Implementation process
Fig. 7 depicts the JSP interface generated from the above
classes for the seqret tool from the Emboss package.
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
ISSN 1738-6535 © Web Services Research Foundation
41
Fig. 6. XML Description of the clustalw application
Fig. 7. Generated JSP User Interface of Seqret
Using the above process, we generate JAXB classes from the
XML descriptors for few command-driven bioinformatics
software tools from the Emboss package, such as: infoalign,
seqret, transeq, getorf, silent, and pepcoil. These classes are
used in the implementation of related Web services. Table 1
provides a description of the operations of some of these Web
service.
TABLE I
SAMPLE WEB SERVICES OPERATIONS
Fig. 8. Tree representation of the seqret WSDL file.
The getInputTypes() operation returns the list of data types
(nucleotide, protein, etc) that should be provided as input data
to the tool. The getInputFormats() operation returns the list of
data formats (Fasta, Genbank, GCG, etc.) of the input data of
the tool. The getOutputFormats() operation returns the possible
data formats of the output files of the tool. The run operations
(runInfoalign, runSeqret, 
) allow launching the execution of
a tool given input data and a list of arguments. Fig. 8 shows the
tree representation of the seqret WSDL file.
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
ISSN 1738-6535 © Web Services Research Foundation
42
C. Biological workflow composition
Using the above operations, it becomes possible to link the
tools’ Web services in workflows. Indeed, these operations
allow checking the compatibility between the types and formats
of output data of a given tool with the types and formats of
input data of another tool. Two of the generated Web services
can be composed in a workflow if the output data of the first
Web service is compatible in terms of types and formats with
the input data of the second Web service.
As a first attempt to specify biological workflows from the
generated Web services, the Business Process Execution
Language (BPEL) was the obvious composition language of
choice. BPEL provides a rich vocabulary for defining processes
and has several features which are not found in programming
languages. It enables users to describe business process
activities as Web services and define how they can be
connected to accomplish specific tasks. The Netbeans
development platform supports designing BPEL processes
since version 5.5.
Our goal is to allow the scientist to visually compose and
execute workflows in an easy way by hiding the technical
details of BPEL. A graphical user interface will allow the user
to specify his workflow in an easy way by just dragging and
dropping tools into a canvas. In addition, the composition of
workflows should be carried out by checking the compatibility
among application services based on their inputs and outputs.
The Workflow Manager component is responsible for allowing
such visual composition and enactment of workflows from our
generated Web services.
While investigating existing tools for visual composition, we
have found a tool called JOpera [33], which provides a
language for visual composition and which is implemented as a
plugin of the Eclipse development environment. JOpera is a
rapid service composition tool offering a visual language and
an execution environment for building processes out of
reusable services, which include but are not strictly limited to
Web services. It enables composing Web services into
processes by visually specifying the order of invocation of each
service (control flow) and to model the patterns of data
exchange between the services (data flow). The JOpera
environment provides support for the whole lifecycle of a
process; it features a visual monitoring and debugging
environment that lets the user interact with a running process.
Fig. 9 depicts a biological workflow that we have developed to
experiment with the JOpera environment. It is created from the
SeqretWS, TranseqWS, and GetorfWS Web services generated
respectively for seqret, transeq, and getorf Emboss tools. The
Bioworklow process is composed of three sub-processes:
SeqretSubprocess, TranseqSubprocess, and GetorfSubProcess.
Each of these subprocess is composed of tasks that represent
the invokation of the associated Web service operations. For
instance, the SeqretSubprocess is associated with the SeqretWS
Web service.
Fig. 9- Biological workflow created with the JOpera environment.
VI. CONCLUSION
In this paper, we have presented a new framework for
integrating bioinformatics tools by wrapping them as Web
services. These tools are characterized by the heterogeneity of
their styles, their parameters, and the data types and formats
they can handle. Our proposed framework allows creating
uniform interfaces of these tools without having to modify their
code or write additional code. This greatly simplifies
composing these applications into workflows to implement
biological protocols. The framework is based on using a
generic XML schema to describe bioinformatics applications
and their parameters in an easy way that catches various styles
and scenarios for using parameters in a command-line tool. A
prototype of our framework is still under development and
some sample application services, such as infoalign, seqret,
transeq, silent, getorf, and pepcoil have been generated and
customized.
As a future work, we intend to add various command-line
biological tools to the framework and to integrate JOpera with
our workflow manager. In addition to the framework tools, the
framework will provide support for importing external
biological Web services, available from the bioinformatics
community, and for their composition into workflows in the
same way as local Web services.
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
ISSN 1738-6535 © Web Services Research Foundation
43
ACKNOWLEDGMENT
The authors would like to thank Haifa Al Abdouli, Halima
Shehyari, and Mariam Hefaity for their contribution in the
implementation of the proposed test-bed.
REFERENCES
[1] A. Siepel, A. Farmer, A. Tolopko, M. Zhuang, P. Mendes, W. Beavis,
and B. Sobral. ISYS: a decentralized, component-based approach to the
integration of heterogeneous bioinformatics resources. Bioinformatics,
2001, 17, pp. 83-94.
[2] TurboGenomics Inc (n.d.). TurboBench overview.
http://www.turbogenomics.com/products/turbobench_overview.pdf
[3] M. Senger. AppLab - A CORBA-Java based Application Wrapper.
http://www.omg.org/docs/corbamed/98-03-08.pdf
[4] T.G. Littlejohn. Bioinformatics tools for genome projects. In
Molecular Breeding of Forage Crops, Spangenberg, G. (ed.), Kluwer
Acad. Publ., The Netherlands, 2001, pp. 83-99.
[5] R. Unwin, J. Fenton, M. Whitsitt, C. Jamison, M. Stupar, E. Jakobsson,
and S. Subramaniam. Biology Workbench: A WWW-based Virtual
Computing and Analysis Environment for the Biological Sciences.
Bioinformatics (Databases and Systems, S. Letovsky (Ed.)), 1998, pp.
233-244.
[6] E. Badidi, C. DeSousa, F. Lang, and G. Burger. AnaBench: a
Web/CORBA-based Workbench for biomolecular sequence Analysis.
BMC Bioinformatics, 2003, 4:63.
[7] World Wide Web Consortium. Web Services Description Language 2.0
(W3C working draft 3). http://www.w3.org/tr/wsdl20
[8] M. Senger, P. Rice, and T. Oinn. Soaplab - a unified Sesame door to
analysis tools. Paper presented at the UK e-Science All Hands Meeting,
2003, Nottingham, UK.
[9] The Apache Software Foundation (n.d.). Web services – Axis.
http://ws.apache.org/axis
[10] D.S. Robert, J.R. Alan, and A.G. Carole. myGrid: personalised
bioinformatics on the information grid. Bioinformatics, 2003, 19 (Suppl.
1), pp. i302-i304.
[11] T. Oinn, M.J. Addis, J. Ferris, D.J. Marvin, M. Greenwood, T. Carver,
A. Wipat, and P. Li. Taverna, lessons in creating a workflow
environment for the life sciences. Paper presented at the GGF10, Berlin,
Germany, 2004.
[12] M.D. Wilkinson, and M. Links. BioMOBY: An open source biological
web services proposal. Briefings in bioinformatics, 2003, 3(4), pp.
331–341.
[13] P. Rice, I. Longden, A. Bleasby. EMBOSS: The European Molecular
Biology Open Software Suite. Trends in Genetics, 2000, 16, pp.
276-277.
[14] OASIS. Web Services Business Process Execution Language Version
2.0. OASIS Standard, 11 April 2007.
[15] S.P. Shah, D.Y. He, J.N. Sawkins, J.C. Druce, G. Quon, D. Lett, G.X.
Zheng, T. Xu, B.F. Ouellette. Pegasys: software for executing and
integrating analyses of biological sequences. BMC Bioinformatics 2004,
5:40
[16] P. Ernst, K-H. Glatting, and S. Shuai. A task framework for the web
interface W2H. Bioinformatics, 2003, 19, 278-282.
[17] A.G. Castro, S. Thoraval, L.J. Garcia, and M.A. Ragan. Workflows
in bioinformatics: meta-analysis and prototype implementation of a
workflow generator. BMC Bioinformatics, 2005, 6:87.
[18] S. Hoon, K. Kumar Ratnapu, J. Chia, B. Kumarasamy, X. Juguang, M.
Clamp, A. Stabenau, S. Potter, L. Clarke, and E. Stupka, Biopipe: A
Flexible Framework for Protocol-Based Bioinformatics Analysis,
Genome Research, 2003, 13:1904-1915.
[19] INCOGEN, visual integrated bioinformatics environment. White paper.
http://www.incogen.com/public_documents/vibe/VIBE_Whitepaper.pd
f
[20] E. Badidi, G. Burger, and B.F. Lang. FLOSYS - a Web accessible
workflow system for protocol-driven biomolecular sequence analysis.
Cellular and Molecular Biology Journal, 2004, 50(7):785-793.
[21] C. Lethondal. A web interface generator for molecular biology
programs in Unix. Bioinformatics, 2001, 17: 73-82.
[22] Swiss Institute of Bioinfomatics (SIB). ExPASy Proteomics tools.
http://www.expasy.ch/tools/
[23] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, Basic
local alignment search tool. J. Mol. Biol. 1990, 215: 403-410.
[24] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, and D.L.
Wheeler. GenBank. Nucl. Acids Res. 2003, 31: 23-27.
[25] G.D. Schuler, J.A. Epstein. H. Ohkawa, and J.A. Kans. Entrez:
molecular biology database and retrieval system. Methods in
Enzymology, 1996, 266: 141-162.
[26] J.D. Thompson, D.G. Higgins, and T.J. Gibson. CLUSTALW:
improving the sensitivity of progressive multiple sequence alignment
through sequence weighting, position-specific gap penalties and weight
matrix choice. Nucleic Acids Research, 1994, 22, pp. 4673-4680.
[27] J. Castresana. Selection of conserved blocks from multiple
alignments for their use in phylogenetic analysis. Molecular Biology and
Evolution, 2000, 17: 540-552.
[28] J. Felsenstein. PHYLIP phylogeny inference package (version 3.2).
Cladistics 1989, 5: 164-166.
[29] H. Shimodaira, and M. Hasegawa. CONSEL: for assessing the
confidence of phylogenetic tree selection. Bioinformatics 2001,
17(12): 1246-1247.
[30] OASIS. Universal Description, Discovery and Integration (UDDI)
Version 3.0.2. http://uddi.org/pubs/uddi-v3.0.2-20041019.htm
[31] D.G. Gilbert. Sequence file format conversion with command-line
Readseq. In Current Protocols in Bioinformatics, A. Baxevanis and D.
Davison, eds. Wiley, 2002.
[32] Progress Software Corporation (n.d. Stylus studio.
http://www.stylusstudio.com/
[33] C. Pautasso. JOpera: Process Support for more than Web services.
http://www.iks.ethz.ch/jopera
Elarbi Badidi is an Assistant Professor of computer science at the College of
Information Technology (CIT) of United Arab Emirates University. Before
joining the CIT, he held the position of bioinformatics group leader at the
Biochemistry Department of Université de Montréal from 2001 to July 2004.
He received a Ph.D. in computer science in 2000 from Université de Montréal,
Québec (Canada). His research interests include Web services and Service
Oriented Computing, Middleware, and Bioinformatics data and tools
integration.
M. Vall Mohamed Salem is currently an Assistant Professor with the
University of Wollongong in Dubai. His current interests are in performance
analysis and scalability issues, distributed systems and software engineering.
He received a Ph.D. in computer science in 2002 from Université de Montréal,
Québec (Canada). He held an IBM Canada Centre for Advanced Studies
fellowship and can be joined at salem@uow.edu.au.
Salah Bouktif is an Assistant Professor of software engineering at the College
of Information Technology (CIT) of United Arab Emirates University. Before
joining CIT, Dr. Bouktif was a Post Doc Fellow for two years at the department
of computer engineering of the polytechnic school of engineering of Montreal.
He received his Ph.D. Degree in 2005 with high honors from the University of
Montreal. Dr. Bouktif’s research interest includes Metrics and software quality
models, Software quality prediction improvement, Search-Based Software
Engineering, Software testing and test data generation, Software evolution,
Change and cost modeling.
Larbi Esmahi is an Associate Professor of the School of Computing and
Information Systems at Athabasca University. He was the graduate program
coordinator at the same school during 2002-2005. He holds a PhD in electrical
engineering from Ecole Polytechnique, University of Montreal. His current
research interests are in e-services, e-commerce, multiagent systems, and
intelligent systems. He is an associate editor for the Journal of Computer
Science, and the Tamkang Journal of Science and Engineering. He is also
member of the editorial advisory board of the Advances in Web-Based
Learning Book Series, IGI Global, and member of the international editorial
review board the International Journal of Web-Based Learning and Teaching
Technologies.
.

More Related Content

Similar to A Web Services Based Framework For Uniform Integration Of Command-Line Bioinformatics Software Tools

International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
IRJET- Virtual Community Using Cloud Technology “Unitalk”
IRJET-  	  Virtual Community Using Cloud Technology “Unitalk”IRJET-  	  Virtual Community Using Cloud Technology “Unitalk”
IRJET- Virtual Community Using Cloud Technology “Unitalk”IRJET Journal
 
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInThe “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInKun Le
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
A survey of cloud based secured web application
A survey of cloud based secured web applicationA survey of cloud based secured web application
A survey of cloud based secured web applicationIAEME Publication
 
IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)IRJET Journal
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system reportAmit Kulkarni
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system reportAmit Kulkarni
 
An Algorithm to synchronize the local database with cloud Database
An Algorithm to synchronize the local database with cloud DatabaseAn Algorithm to synchronize the local database with cloud Database
An Algorithm to synchronize the local database with cloud DatabaseAM Publications
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
 
Effective Information Flow Control as a Service: EIFCaaS
Effective Information Flow Control as a Service: EIFCaaSEffective Information Flow Control as a Service: EIFCaaS
Effective Information Flow Control as a Service: EIFCaaSIRJET Journal
 
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTIONWEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTIONIRJET Journal
 
A comparative study of laravel and symfony PHP frameworks
A comparative study of laravel and symfony PHP frameworksA comparative study of laravel and symfony PHP frameworks
A comparative study of laravel and symfony PHP frameworksIJECEIAES
 
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...ijwscjournal
 
A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)Editor IJCATR
 
DESIGN PATTERNS IN THE WORKFLOW IMPLEMENTATION OF MARINE RESEARCH GENERAL INF...
DESIGN PATTERNS IN THE WORKFLOW IMPLEMENTATION OF MARINE RESEARCH GENERAL INF...DESIGN PATTERNS IN THE WORKFLOW IMPLEMENTATION OF MARINE RESEARCH GENERAL INF...
DESIGN PATTERNS IN THE WORKFLOW IMPLEMENTATION OF MARINE RESEARCH GENERAL INF...AM Publications
 
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...Dr. Amarjeet Singh
 

Similar to A Web Services Based Framework For Uniform Integration Of Command-Line Bioinformatics Software Tools (20)

International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
IRJET- Virtual Community Using Cloud Technology “Unitalk”
IRJET-  	  Virtual Community Using Cloud Technology “Unitalk”IRJET-  	  Virtual Community Using Cloud Technology “Unitalk”
IRJET- Virtual Community Using Cloud Technology “Unitalk”
 
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInThe “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedIn
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
A survey of cloud based secured web application
A survey of cloud based secured web applicationA survey of cloud based secured web application
A survey of cloud based secured web application
 
IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)IRJET- Deep Web Searching (DWS)
IRJET- Deep Web Searching (DWS)
 
V5I1-IJERTV5IS010514
V5I1-IJERTV5IS010514V5I1-IJERTV5IS010514
V5I1-IJERTV5IS010514
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system report
 
Office automation system report
Office automation system reportOffice automation system report
Office automation system report
 
An Algorithm to synchronize the local database with cloud Database
An Algorithm to synchronize the local database with cloud DatabaseAn Algorithm to synchronize the local database with cloud Database
An Algorithm to synchronize the local database with cloud Database
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
50120130406030 2
50120130406030 250120130406030 2
50120130406030 2
 
Effective Information Flow Control as a Service: EIFCaaS
Effective Information Flow Control as a Service: EIFCaaSEffective Information Flow Control as a Service: EIFCaaS
Effective Information Flow Control as a Service: EIFCaaS
 
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTIONWEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
 
A comparative study of laravel and symfony PHP frameworks
A comparative study of laravel and symfony PHP frameworksA comparative study of laravel and symfony PHP frameworks
A comparative study of laravel and symfony PHP frameworks
 
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
 
A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)
 
DESIGN PATTERNS IN THE WORKFLOW IMPLEMENTATION OF MARINE RESEARCH GENERAL INF...
DESIGN PATTERNS IN THE WORKFLOW IMPLEMENTATION OF MARINE RESEARCH GENERAL INF...DESIGN PATTERNS IN THE WORKFLOW IMPLEMENTATION OF MARINE RESEARCH GENERAL INF...
DESIGN PATTERNS IN THE WORKFLOW IMPLEMENTATION OF MARINE RESEARCH GENERAL INF...
 
50120140505006
5012014050500650120140505006
50120140505006
 
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
 

More from Kim Daniels

8 Pcs Vintage Lotus Letter Paper Stationery Writing P
8 Pcs Vintage Lotus Letter Paper Stationery Writing P8 Pcs Vintage Lotus Letter Paper Stationery Writing P
8 Pcs Vintage Lotus Letter Paper Stationery Writing PKim Daniels
 
Essay Writing Words 100 Useful Words And Phrase
Essay Writing Words 100 Useful Words And PhraseEssay Writing Words 100 Useful Words And Phrase
Essay Writing Words 100 Useful Words And PhraseKim Daniels
 
Descriptive Essay Example Of Expository Essays
Descriptive Essay Example Of Expository EssaysDescriptive Essay Example Of Expository Essays
Descriptive Essay Example Of Expository EssaysKim Daniels
 
Free Printable Primary Lined Writing Paper - Discover T
Free Printable Primary Lined Writing Paper - Discover TFree Printable Primary Lined Writing Paper - Discover T
Free Printable Primary Lined Writing Paper - Discover TKim Daniels
 
Example Of Argumentative Essay Conclusion Sitedoct.Org
Example Of Argumentative Essay Conclusion Sitedoct.OrgExample Of Argumentative Essay Conclusion Sitedoct.Org
Example Of Argumentative Essay Conclusion Sitedoct.OrgKim Daniels
 
WRITE THE DATES Gener
WRITE THE DATES GenerWRITE THE DATES Gener
WRITE THE DATES GenerKim Daniels
 
How To Start An Essay Introduction About A Book Printers Copy
How To Start An Essay Introduction About A Book Printers CopyHow To Start An Essay Introduction About A Book Printers Copy
How To Start An Essay Introduction About A Book Printers CopyKim Daniels
 
Shocking How To Write A Transfer Essay Thatsnotus
Shocking How To Write A Transfer Essay ThatsnotusShocking How To Write A Transfer Essay Thatsnotus
Shocking How To Write A Transfer Essay ThatsnotusKim Daniels
 
Owl Writing Paper Differentiated By Loving Le
Owl Writing Paper Differentiated By Loving LeOwl Writing Paper Differentiated By Loving Le
Owl Writing Paper Differentiated By Loving LeKim Daniels
 
Free Printable Letter To Santa - Printable Templates
Free Printable Letter To Santa - Printable TemplatesFree Printable Letter To Santa - Printable Templates
Free Printable Letter To Santa - Printable TemplatesKim Daniels
 
Heart Writing Practice Or Story Paper For Valentine
Heart Writing Practice Or Story Paper For ValentineHeart Writing Practice Or Story Paper For Valentine
Heart Writing Practice Or Story Paper For ValentineKim Daniels
 
What I Learned In Computer Class Essay. I Learned For My Programming
What I Learned In Computer Class Essay. I Learned For My ProgrammingWhat I Learned In Computer Class Essay. I Learned For My Programming
What I Learned In Computer Class Essay. I Learned For My ProgrammingKim Daniels
 
Citing A Website In An Essay Mla - MLA Citation Guide (8Th Edition ...
Citing A Website In An Essay Mla - MLA Citation Guide (8Th Edition ...Citing A Website In An Essay Mla - MLA Citation Guide (8Th Edition ...
Citing A Website In An Essay Mla - MLA Citation Guide (8Th Edition ...Kim Daniels
 
Arguing About Literature A Guide And Reader - Arguin
Arguing About Literature A Guide And Reader - ArguinArguing About Literature A Guide And Reader - Arguin
Arguing About Literature A Guide And Reader - ArguinKim Daniels
 
30 Nhs Letter Of Recommendation Templat
30 Nhs Letter Of Recommendation Templat30 Nhs Letter Of Recommendation Templat
30 Nhs Letter Of Recommendation TemplatKim Daniels
 
Writing Personal Essays Examples - Short Essay S
Writing Personal Essays Examples - Short Essay SWriting Personal Essays Examples - Short Essay S
Writing Personal Essays Examples - Short Essay SKim Daniels
 
Printable Primary Writing Paper PrintableTempl
Printable Primary Writing Paper  PrintableTemplPrintable Primary Writing Paper  PrintableTempl
Printable Primary Writing Paper PrintableTemplKim Daniels
 
Mla Format Template With Cover Page HQ Printabl
Mla Format Template With Cover Page  HQ PrintablMla Format Template With Cover Page  HQ Printabl
Mla Format Template With Cover Page HQ PrintablKim Daniels
 
Reaction Paper Introduction Sample. Reaction Paper In
Reaction Paper Introduction Sample. Reaction Paper InReaction Paper Introduction Sample. Reaction Paper In
Reaction Paper Introduction Sample. Reaction Paper InKim Daniels
 
Persuasive Essay Essay Writing Help Online
Persuasive Essay Essay Writing Help OnlinePersuasive Essay Essay Writing Help Online
Persuasive Essay Essay Writing Help OnlineKim Daniels
 

More from Kim Daniels (20)

8 Pcs Vintage Lotus Letter Paper Stationery Writing P
8 Pcs Vintage Lotus Letter Paper Stationery Writing P8 Pcs Vintage Lotus Letter Paper Stationery Writing P
8 Pcs Vintage Lotus Letter Paper Stationery Writing P
 
Essay Writing Words 100 Useful Words And Phrase
Essay Writing Words 100 Useful Words And PhraseEssay Writing Words 100 Useful Words And Phrase
Essay Writing Words 100 Useful Words And Phrase
 
Descriptive Essay Example Of Expository Essays
Descriptive Essay Example Of Expository EssaysDescriptive Essay Example Of Expository Essays
Descriptive Essay Example Of Expository Essays
 
Free Printable Primary Lined Writing Paper - Discover T
Free Printable Primary Lined Writing Paper - Discover TFree Printable Primary Lined Writing Paper - Discover T
Free Printable Primary Lined Writing Paper - Discover T
 
Example Of Argumentative Essay Conclusion Sitedoct.Org
Example Of Argumentative Essay Conclusion Sitedoct.OrgExample Of Argumentative Essay Conclusion Sitedoct.Org
Example Of Argumentative Essay Conclusion Sitedoct.Org
 
WRITE THE DATES Gener
WRITE THE DATES GenerWRITE THE DATES Gener
WRITE THE DATES Gener
 
How To Start An Essay Introduction About A Book Printers Copy
How To Start An Essay Introduction About A Book Printers CopyHow To Start An Essay Introduction About A Book Printers Copy
How To Start An Essay Introduction About A Book Printers Copy
 
Shocking How To Write A Transfer Essay Thatsnotus
Shocking How To Write A Transfer Essay ThatsnotusShocking How To Write A Transfer Essay Thatsnotus
Shocking How To Write A Transfer Essay Thatsnotus
 
Owl Writing Paper Differentiated By Loving Le
Owl Writing Paper Differentiated By Loving LeOwl Writing Paper Differentiated By Loving Le
Owl Writing Paper Differentiated By Loving Le
 
Free Printable Letter To Santa - Printable Templates
Free Printable Letter To Santa - Printable TemplatesFree Printable Letter To Santa - Printable Templates
Free Printable Letter To Santa - Printable Templates
 
Heart Writing Practice Or Story Paper For Valentine
Heart Writing Practice Or Story Paper For ValentineHeart Writing Practice Or Story Paper For Valentine
Heart Writing Practice Or Story Paper For Valentine
 
What I Learned In Computer Class Essay. I Learned For My Programming
What I Learned In Computer Class Essay. I Learned For My ProgrammingWhat I Learned In Computer Class Essay. I Learned For My Programming
What I Learned In Computer Class Essay. I Learned For My Programming
 
Citing A Website In An Essay Mla - MLA Citation Guide (8Th Edition ...
Citing A Website In An Essay Mla - MLA Citation Guide (8Th Edition ...Citing A Website In An Essay Mla - MLA Citation Guide (8Th Edition ...
Citing A Website In An Essay Mla - MLA Citation Guide (8Th Edition ...
 
Arguing About Literature A Guide And Reader - Arguin
Arguing About Literature A Guide And Reader - ArguinArguing About Literature A Guide And Reader - Arguin
Arguing About Literature A Guide And Reader - Arguin
 
30 Nhs Letter Of Recommendation Templat
30 Nhs Letter Of Recommendation Templat30 Nhs Letter Of Recommendation Templat
30 Nhs Letter Of Recommendation Templat
 
Writing Personal Essays Examples - Short Essay S
Writing Personal Essays Examples - Short Essay SWriting Personal Essays Examples - Short Essay S
Writing Personal Essays Examples - Short Essay S
 
Printable Primary Writing Paper PrintableTempl
Printable Primary Writing Paper  PrintableTemplPrintable Primary Writing Paper  PrintableTempl
Printable Primary Writing Paper PrintableTempl
 
Mla Format Template With Cover Page HQ Printabl
Mla Format Template With Cover Page  HQ PrintablMla Format Template With Cover Page  HQ Printabl
Mla Format Template With Cover Page HQ Printabl
 
Reaction Paper Introduction Sample. Reaction Paper In
Reaction Paper Introduction Sample. Reaction Paper InReaction Paper Introduction Sample. Reaction Paper In
Reaction Paper Introduction Sample. Reaction Paper In
 
Persuasive Essay Essay Writing Help Online
Persuasive Essay Essay Writing Help OnlinePersuasive Essay Essay Writing Help Online
Persuasive Essay Essay Writing Help Online
 

Recently uploaded

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

A Web Services Based Framework For Uniform Integration Of Command-Line Bioinformatics Software Tools

  • 1. International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 ISSN 1738-6535 © Web Services Research Foundation 36  Abstract—Life scientists use a variety of bioinformatics software tools to perform tasks such as annotation of DNA and protein sequences. Most of these tools are command-line driven and handle various data types (nucleotide, protein, etc.) and data formats (Fasta, Genbank, GCG, etc.). As many bioinformatics software tools are generally involved in analysis tasks, scientists are more and more requiring that these heterogeneous bioinformatics tools be integrated in a uniform way. They are also requiring graphical user interfaces of these tools, and the ability to compose workflows without much programming effort. In this paper, we propose a Web services based framework that meets the above requirements. Index Terms—Web services, data and tools integration, biological workflows. I. INTRODUCTION IFE science data and application integration and interoperability are one of the most challenging problems facing bioinformatics today. Indeed, to enable the discovery of new biological insights and to create a global perspective from which unifying principles in biology can be discerned, scientists have to interpret many types of information from a variety of sources. These sources include nucleotide and amino acid sequences, protein domains, protein structures, and gene expression profiles. The structure of biological data has its own characteristics which make it apart from data in other domains. Biological data exists in the form of terabytes of nucleotide sequence data, microarray and other image data, and various other forms of data that result from both experimental and “in silico” research efforts. Due to this huge amount of data, it is often impossible, without the support of additional hardware and software facilities, to interpret and to understand this data. Manuscript received March 15, 2009. Elarbi Badidi is an Assistant Professor of computer science at the College of Information Technology (CIT) of United Arab Emirates University, Maqam Campus, PO. Box 17551, Al-Ain, UAE. (Phone: 971-3713-5552; Fax: 971– 3 –762 6309; e-mail: ebadidi@ uaeu.ac.ae). M. Vall Mohamed Salem, is an Assistant Professor of computer science at the University of Wollongong in Dubai, UAE (e-mail: salem@uow.edu.au). Salah Bouktif is an Assistant Professor of computer science at the College of Information Technology (CIT) of United Arab Emirates University, Maqam Campus, PO. Box 17551, Al-Ain, UAE. (Phone: 971-3713-5523; Fax: 971– 3 –762 6309; e-mail: salahb@ uaeu.ac.ae). Larbi Esmahi is an Associate Professor at the School for Computing & Information Systems, Athabasca University, Athabasca, Alberta, Canada (e-mail: larbie@athabascau.ca). While genomic data have a well-known representation as sequences taken from the {A,C,G,T} alphabet, there is no clear model for data representing the expression products of genes: proteins and higher forms of organisms e.g., cells and the multitude of forms they assume in response to environmental challenges. To accomplish tasks, such as annotation and manipulation of DNA and protein sequences, and comparison of genes and genomes across species, life scientists have to use a variety of bioinformatics analysis software tools. These tools may use various data types (nucleotide, protein, taxonomy, etc.) and data formats (Fasta, Staden, Embl, Genbank, etc.). Most of them are originally stand-alone, command-line driven, with textual input and output. Moreover, the types and styles of their inputs and outputs are similarly variable as there is no standardization for parameters usage. The lack of graphical user interfaces (GUI) makes them cumbersome for the end user. Another major bottleneck is that most of these software applications are incompatible with one another as they use different file formats. As a consequence, the output of one tool cannot be used as an input for another, without data format conversion. A further complication is that the user has to define a multitude of parameters and options according to the particular data or aim of the analysis. There is no standard way to describe their input parameters and output results. In this paper, we use the terms tool and application interchangeably. In the past few years, two main approaches have been considered to deal with the issue of bioinformatics tools integration. The first approach consists in developing and deploying locally interactive environments in order to facilitate bioinformatics analyses. Examples of such environments are: Isys [1], Turbobench [2], and Applab [3]. The drawback of this approach is that the integration environment as well as the tools must be installed and configured locally, which requires substantial IT expertise. The second approach consists in taking benefit of the growing use of the Web to make the tools accessible through Web interfaces using HTML and various scripting languages (CGI, Perl, etc.) and technologies (Java, RMI, EJB, and CORBA). Examples of environments adopting this approach are: Bionavigator [4], NCSA Biology Workbench [5], and Anabench [6]. The recent development in terms of distributed computing technologies, have led to the Web and Grid services technologies, which promise to alleviate the integration and A Web Services based Framework for Uniform Integration of Command-line Bioinformatics Software Tools Elarbi Badidi, M. Vall Mohamed Salem, Salah Bouktif, and Larbi Esmahi L
  • 2. International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 ISSN 1738-6535 © Web Services Research Foundation 37 interoperability issues. In this paper, we present our proposed Web-services based framework for bioinformatics tools integration, which allows wrapping applications as Web services. In contrast to the above frameworks, in which the tools are accessed only from a Web interface, our framework will allow accessing application services through a Web portal as well as programmatically by exporting their WSDL [7] files. The Web portal may be implemented using various Web technologies such JSP, ASP, and JSF. II. RELATED WORK The trend now is to use Web and Grid services technologies as well as semantic Web services to solve the integration problem not only in life sciences but in enterprise (or business) integration as well. Examples of life sciences frameworks using these technologies are Soaplab, myGrid, and BioMoby. Soaplab [8] is a soap-based programmatic interface to command-line applications on remote computers. Soaplab uses Apache Axis [9] to create Sun Java implementation classes and deployment descriptors for all derived analysis services. It uses CORBA on the server side to find, start, control, and use applications. Soaplab has been developed within the UK e-Science initiative as a component of the myGrid [10] project. myGrid provides high-level open source grid middleware to facilitate building high level services for data and application resource integration such as resource discovery, workflow execution and distributed query processing. The emphasis is on data integration, and workflow, personalization. Using the myGrid workflow construction tool Taverna [11], workflows can be composed with semantic descriptions and published. BioMoby [12] is an open source project that aims to provide a framework for the discovery, representation, integration, and retrieval of biological data from widely disparate data hosts and analysis services using Web services technology. myGrid and BioMoby are very ambitious projects that aim to achieve integration of biological data distributed worldwide in disparate sources. Our framework shares the same goals with Soaplab, which is also designed to provide Web services wrappers for command-line applications, mainly to the Emboss bioinformatics package [13]. However, the main difference between our framework and Soaplab resides in the approach used for describing input and output of the analysis programs. In soaplab, the service generation mechanism expects inputs and outputs described in the ACD language of the Emboss package, while our service generation expects inputs and outputs described in XML using our XML schema for tools description that we have developed. The utilization of XML and XML Schema greatly reduces the complexity of dealing with heterogeneous analysis tools. By developing Web services wrappers for these command-line tools, we can overcome many of the limitations mentioned above. By using the Java Architecture for XML Binding (JAXB), we can implement very easily these Web service wrappers as well as the user interfaces of the tools from their XML descriptors. Also, the composition of Web services using standards such as BPEL [14] will facilitate the composition of biological protocols. III. BACKGROUND A. Command-line interfaces A command line interface (CLI) is a user interface to a computer's operating system or a software application in which the user responds to a visual prompt by typing in a command on a specified line to perform specific tasks, receives a response back from the system, and then enters another command, and so forth. The Unix Prompt and the MS-DOS Prompt in a Windows operating system is an example of the provision of a command line interface. CLIs are often used by system administrators and programmers in engineering and scientific environments. A CLI is often used when a large vocabulary of commands, together with a wide range of options, can be entered more rapidly as text than with a pure GUI. This is typically the case with operating system command shells. Today, most users prefer the graphical user interface (GUI) offered by Windows, Mac OS, and others. Typically, most of today's Unix-based systems and software applications, such as MYSQL and MATLAB, offer both a command line interface and a graphical user interface with the benefits of both. In a CLI, commands are typically written in a particular way. For example, the command is typed first with no spaces in the name. Then after a space, one can sometimes modify the command by adding what are called “options” or “parameters”. Options change or limit the way the command is executed. They are usually preceded by a dash or another symbol. A command may also include the name of a file or directory that one wants the command to work on. The finished command will look something like this. command -option file command -option sourceFile destinationFile Fig. 1. Example of command-line options A CLI defines a grammar, a set of rules that all commands within the CLI must follow. This is the case for UNIX operating system. These rules may be different from one CLI to another. Therefore, with the heterogeneity of rules, it’s only through the documentation of the CLI that one can learn how to run the commands with the right options. In bioinformatics, several software packages and tools, such as Emboss provide only a command line interface. Fig. 1 provides a short description of the options that can be used with the seqret tool of Emboss. B. Web services in the Life Sciences With increasing acceptance among software vendors and rising adoption in the marketplace, Web Services are becoming
  • 3. International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 ISSN 1738-6535 © Web Services Research Foundation 38 the basis for many Web-based applications. They are interoperable across platforms and neutral to languages, which makes them appropriate for access from heterogeneous environments, enabling mass dissemination of knowledge. In the life sciences, adopting the Web services technology is seen as a key to achieve the coordination and interoperability among incompatible bioinformatics applications available from different providers, an endeavor that is becoming more and more significant to biological research. Within the life sciences community, in which scientists spend a great deal of their time using various incompatible tools, accessing remote databases, copying and pasting data to combine their analyses, converting data formats, and assembling results in ad-hoc protocols, the service oriented approach for accessing bioinformatics tools and databases as services in a standardized fashion has very quickly caught the interest of several organizations and research centers that have published their tools as Web services. The National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/) has published its Entrez Utilities as Web Services. The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp/) provides a Web API for biology (WABI), which includes several services. The European Bioinformatics Institute Web service (EBI; http://www.ebi.ac.uk/) provides programmatic access to data retrieval and analytical tools for several molecular databases. C. Biological workflows Complex analysis, annotation, and data integration typically involves various bioinformatics tools. In the past few years, several software environments and platforms emerged to enable, the orchestration, and execution of these tools in biological workflows. Examples of such systems include Bionavigator [4], Turbobench [2], Pegasys [15], w3h [16], G-Pipe [17], Biopipe [18], VIBE [19], Flosys [20], and Pise [21]. With the current trends towards using Web services technology in life sciences, new environments and frameworks have emerged to enable the orchestration and the execution of biological Web services. The most notable systems include Taverna (part of the myGrid project) [11] and BioMoby [12] as mentioned earlier. Moreover, the different standards in the area of Web services (WS_* standards) in particular the standards for service composition, including the Business Process Execution Language for Web Services (BPEL4WS) and the Business Process Modeling Language (BPML), will allow to go further in biological data and tools integration. D. Workflow use case: phylogenetic analysis A phylogenetic analysis workflow of newly sequenced protein genes involves the following steps: 1) Translation of the nucleic acid sequence to the corresponding peptide sequence in six frames (e.g. using tools such as transeq [13] or ExPASy translate tool [22]), 2) Identification of ORFs (Open reading frame) that correspond to conserved proteins by similarity search (e.g. using tools such as blast [23] or getorf [13]), 3) Retrieval of protein sequences from GenBank [24] (e.g. using Entrez [25] or seqret [13]), 4) Multiple protein alignment (e.g. using clustalw [26]), 5) Extraction of well aligned sequence stretches [27], 6) Tree inferences based on various models of evolution, and 7) Tree testing (e.g. using phylip [28] and consel [29]). Typically, such analysis pipelines are employed several times with different data sets or parameters. IV. FRAMEWORK OVERVIEW A. Objectives With the growing interest in data and tools integration in life sciences and the limited number of integration frameworks based on Web services, we set out to develop a framework that allows to: 1) Develop a Web service wrapper around each command-line tool, 2) Make unified and remotely accessible the interfaces of these tools, 3) Hide their dependencies on the underlying operating systems, and 4) Access these tools programmatically in order to be able to compose workflows describing many biological protocols. By converting the command-line applications into Web services, we can overcome many of the limitations and heterogeneity of styles of these applications we mentioned in the introduction. In this paper, an application service is an application with a Web service interface that is described by WSDL document as collections of network endpoints, or ports. To wrap a command-line tool as a Web service, we first describe the tool properties and its parameters in XML. We then translate the XML specification of the tool, which describes its parameters, its data types and data formats, into WSDL, and then create an entry in the UDDI registry [30] to advertise the WSDL specification. Web services clients can then look up the WSDL in the registry and interact with the tool as a Web service. Client applications can be written in various programming languages. The XML description of the data types and the data formats supported by command-line tools greatly reduces the complexity of their composition into workflows. Two given tools may be composed in a workflow in a serial fashion provided that the data type and data format of the first tool output are compatible with the data types and data formats of the second tool input. In the case of incompatibility of data formats, a data format conversion tool, such as readseq [31], is introduced between the two given tools in the workflow. The conversion is then performed without manual user intervention and without data loss. However, readseq is not recommended for very large (100+MB) sequence files, whether as a single record or multiple records. B. System components The architecture of the system is shown in Fig. 2. It is a three-tier architecture composed of a client layer providing presentation logic, a middle-tier layer containing the business
  • 4. International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 ISSN 1738-6535 © Web Services Research Foundation 39 logic, and a back-end database. Users may interact with the system through a portal using HTML and JSP (Java Server Pages) screen forms, for example, or by invoking the application services programmatically. The requests of the application services submitted by users at the presentation level are handled by an application server equipped with a Servlet and JSP container. Requests may be invocation of individual services or may be part of a workflow. The Workflows manager allows the user to compose workflows from the application services generated from command-line applications. This may be performed from a user interface or programmatically given that the input and output types and formats of applications are described for each application as we will see in the next section. At the back-end level, we find the command-line applications, the session management database which stores information about users’ sessions, and the service registry. We have designed the system architecture based on the service oriented architecture principles, and especially on using XML to describe tools and their parameters. The utilization of XML and XML schema greatly reduces the system complexity and deals with the heterogeneity of applications. Fig. 2. Framework components The key component of the system is the XML schema we have developed to describe the applications and their parameters. This XML schema catches most of the different situations and cases of applications and parameters. For instance, with this schema we should be able to describe various applications that are handling several data types (nucleotide, protein, taxonomic, etc.), several data formats (Fasta, Genbank, Embl, etc), as well as heterogeneous parameters with different types and syntax for assigning values to the parameters. From the XML description of an application, we may generate the user interface in the form of HTML and JSP pages, for example, and the Web service associated to the application. This is performed by the User Interface generator and the Web service Generator components. The WSDL of the application service is then published to the service registry. C. Bioinformatics tools as Web services Many platforms are now available to develop Web services applications. Some of them are commercial products, while others are freely available, such as Jakarta Tomcat Web server and the Apache Axis toolkit. Using this toolkit, for example, one can convert a Java application into a Web service. To build a Web service for a given tool, we have to write a Java application to invoke and execute this tool. One can use the following Java class, described in Fig. 3, to run an application, which is external to the Java virtual machine (JVM). Fig. 3: a Java class to run applications external to the JVM. The method runApplication() requires the name of the application (name of the executable program), the path to the application, and the arguments to be passed to the application (input, output, and other options of the tool). To illustrate this, we will consider a small tool called infoalign, which is part of the Emboss package. Infoalign is a small utility to list some simple properties of sequences in an alignment. The above code fragment can be converted into a Web service with the following methods: setInfoalignInput(), setInfoAlignOutput(), setInfoAlignOptions(), and runInfoalign(). A WSDL file for this Web service will be automatically created. This feature is provided by most Web services development platforms. This file may be accessed, for example, from: http://localhost:8091/axis/services/RunInfoAlign.wsdl Using this WSDL file, client applications may be created to consume the newly created Web service. Under this scheme, the client should have prior knowledge about the parameters of the application - input, output, and options- with their syntax and order. To solve this problem, information about the application parameters may be part of the information that may be obtained from the Web service. So, by adding a new method called getInfoalignParameters() to the Web service, the client can get a description of the parameters of the infoalign application, and then he can customize the user interface to invoke the infoalign Web service. This solution is very simplistic as the description of the application and its parameters is more complex because of the variety of tools and their parameters. D. Application and parameters description Prior to generating a Web service for a command-line application, the parameters of the application should be described in a structured way that can be used by programs. As these command-line applications are developed by several
  • 5. International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 ISSN 1738-6535 © Web Services Research Foundation 40 programmers, and implemented in various programming languages, they do not follow the same rules for specifying their parameters. Therefore, coming up with a general description schema of the application parameters is very tentative. We have formalized the above descriptions of applications and parameters by defining the XML schema for both application and parameters descriptions. The main elements describing an application are: application name, application description, version number, category, documentation URL, application path, minimum number of input data, maximum number of input data, input types, input formats. Fig. 4. Definition of input types and input/output data formats. An application tool may handle one or more of data types, including: protein, nucleotide, taxonomic, and result. Also, an application tool may handle one or several input and output data formats. This is described in Fig. 4 using XML schema types. The output of an application may be as well in one of the above data formats. By describing the data types and data formats handled by an application, we can compose workflows by connecting outputs of an application service to the inputs of other application services. The main elements describing a parameter of an application are: parameter name, parameter description, type, the option used to invoke the parameter, the value to be assigned to the parameter, the syntax describing how a value is assigned to the parameter, the min and max value in the case of integer parameters, and default values. Each parameter belongs to one following types: IntegerParameter, FloatParameter, StringParameter, SwitchParameter, ChoiceListParameter, FileParameter, and SequenceParameter. The complete XML schema is available at: http://faculty.uaeu.ac.ae/ebadidi/applSchema.xsd V. IMPLEMENTATION A. Generic XML Schema A prototype of our proposed framework is under construction. We have developed The XML schema for application and parameters description, using Stylus Studio enterprise edition [32]. This schema was used to develop and validate the description of some bioinformatics tools such as clustalw, infoalign, seqret, transeq, pepcoil, and silent. A template has been also generated from the schema to allow easy description of any command-line analysis tool. Using this template and the textual documentation of a given tool, one can create its XML descriptor that may be validated against our XML schema. Fig. 6 shows an extract from the XML description of the clustalw application for multiple alignments. B. Web service and User interface generation To implement the Java Web service for a given XML descriptor, we are using the Java Architecture for XML Binding (JAXB 2.0), which provides a fast and convenient way to bind between XML schemas and Java representations, making it easy for Java developers to incorporate XML data and processing functions in Java applications. As part of this process, JAXB provides methods for un-marshalling XML descriptor documents into Java content trees of data objects instantiated from the generated JAXB classes. These content trees are then used to implement associated Web services and user interfaces using JSP and HTML. This process is illustrated in Fig. 5. Fig. 5. Implementation process Fig. 7 depicts the JSP interface generated from the above classes for the seqret tool from the Emboss package.
  • 6. International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 ISSN 1738-6535 © Web Services Research Foundation 41 Fig. 6. XML Description of the clustalw application Fig. 7. Generated JSP User Interface of Seqret Using the above process, we generate JAXB classes from the XML descriptors for few command-driven bioinformatics software tools from the Emboss package, such as: infoalign, seqret, transeq, getorf, silent, and pepcoil. These classes are used in the implementation of related Web services. Table 1 provides a description of the operations of some of these Web service. TABLE I SAMPLE WEB SERVICES OPERATIONS Fig. 8. Tree representation of the seqret WSDL file. The getInputTypes() operation returns the list of data types (nucleotide, protein, etc) that should be provided as input data to the tool. The getInputFormats() operation returns the list of data formats (Fasta, Genbank, GCG, etc.) of the input data of the tool. The getOutputFormats() operation returns the possible data formats of the output files of the tool. The run operations (runInfoalign, runSeqret, 
) allow launching the execution of a tool given input data and a list of arguments. Fig. 8 shows the tree representation of the seqret WSDL file.
  • 7. International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 ISSN 1738-6535 © Web Services Research Foundation 42 C. Biological workflow composition Using the above operations, it becomes possible to link the tools’ Web services in workflows. Indeed, these operations allow checking the compatibility between the types and formats of output data of a given tool with the types and formats of input data of another tool. Two of the generated Web services can be composed in a workflow if the output data of the first Web service is compatible in terms of types and formats with the input data of the second Web service. As a first attempt to specify biological workflows from the generated Web services, the Business Process Execution Language (BPEL) was the obvious composition language of choice. BPEL provides a rich vocabulary for defining processes and has several features which are not found in programming languages. It enables users to describe business process activities as Web services and define how they can be connected to accomplish specific tasks. The Netbeans development platform supports designing BPEL processes since version 5.5. Our goal is to allow the scientist to visually compose and execute workflows in an easy way by hiding the technical details of BPEL. A graphical user interface will allow the user to specify his workflow in an easy way by just dragging and dropping tools into a canvas. In addition, the composition of workflows should be carried out by checking the compatibility among application services based on their inputs and outputs. The Workflow Manager component is responsible for allowing such visual composition and enactment of workflows from our generated Web services. While investigating existing tools for visual composition, we have found a tool called JOpera [33], which provides a language for visual composition and which is implemented as a plugin of the Eclipse development environment. JOpera is a rapid service composition tool offering a visual language and an execution environment for building processes out of reusable services, which include but are not strictly limited to Web services. It enables composing Web services into processes by visually specifying the order of invocation of each service (control flow) and to model the patterns of data exchange between the services (data flow). The JOpera environment provides support for the whole lifecycle of a process; it features a visual monitoring and debugging environment that lets the user interact with a running process. Fig. 9 depicts a biological workflow that we have developed to experiment with the JOpera environment. It is created from the SeqretWS, TranseqWS, and GetorfWS Web services generated respectively for seqret, transeq, and getorf Emboss tools. The Bioworklow process is composed of three sub-processes: SeqretSubprocess, TranseqSubprocess, and GetorfSubProcess. Each of these subprocess is composed of tasks that represent the invokation of the associated Web service operations. For instance, the SeqretSubprocess is associated with the SeqretWS Web service. Fig. 9- Biological workflow created with the JOpera environment. VI. CONCLUSION In this paper, we have presented a new framework for integrating bioinformatics tools by wrapping them as Web services. These tools are characterized by the heterogeneity of their styles, their parameters, and the data types and formats they can handle. Our proposed framework allows creating uniform interfaces of these tools without having to modify their code or write additional code. This greatly simplifies composing these applications into workflows to implement biological protocols. The framework is based on using a generic XML schema to describe bioinformatics applications and their parameters in an easy way that catches various styles and scenarios for using parameters in a command-line tool. A prototype of our framework is still under development and some sample application services, such as infoalign, seqret, transeq, silent, getorf, and pepcoil have been generated and customized. As a future work, we intend to add various command-line biological tools to the framework and to integrate JOpera with our workflow manager. In addition to the framework tools, the framework will provide support for importing external biological Web services, available from the bioinformatics community, and for their composition into workflows in the same way as local Web services.
  • 8. International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 ISSN 1738-6535 © Web Services Research Foundation 43 ACKNOWLEDGMENT The authors would like to thank Haifa Al Abdouli, Halima Shehyari, and Mariam Hefaity for their contribution in the implementation of the proposed test-bed. REFERENCES [1] A. Siepel, A. Farmer, A. Tolopko, M. Zhuang, P. Mendes, W. Beavis, and B. Sobral. ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources. Bioinformatics, 2001, 17, pp. 83-94. [2] TurboGenomics Inc (n.d.). TurboBench overview. http://www.turbogenomics.com/products/turbobench_overview.pdf [3] M. Senger. AppLab - A CORBA-Java based Application Wrapper. http://www.omg.org/docs/corbamed/98-03-08.pdf [4] T.G. Littlejohn. Bioinformatics tools for genome projects. In Molecular Breeding of Forage Crops, Spangenberg, G. (ed.), Kluwer Acad. Publ., The Netherlands, 2001, pp. 83-99. [5] R. Unwin, J. Fenton, M. Whitsitt, C. Jamison, M. Stupar, E. Jakobsson, and S. Subramaniam. Biology Workbench: A WWW-based Virtual Computing and Analysis Environment for the Biological Sciences. Bioinformatics (Databases and Systems, S. Letovsky (Ed.)), 1998, pp. 233-244. [6] E. Badidi, C. DeSousa, F. Lang, and G. Burger. AnaBench: a Web/CORBA-based Workbench for biomolecular sequence Analysis. BMC Bioinformatics, 2003, 4:63. [7] World Wide Web Consortium. Web Services Description Language 2.0 (W3C working draft 3). http://www.w3.org/tr/wsdl20 [8] M. Senger, P. Rice, and T. Oinn. Soaplab - a unified Sesame door to analysis tools. Paper presented at the UK e-Science All Hands Meeting, 2003, Nottingham, UK. [9] The Apache Software Foundation (n.d.). Web services – Axis. http://ws.apache.org/axis [10] D.S. Robert, J.R. Alan, and A.G. Carole. myGrid: personalised bioinformatics on the information grid. Bioinformatics, 2003, 19 (Suppl. 1), pp. i302-i304. [11] T. Oinn, M.J. Addis, J. Ferris, D.J. Marvin, M. Greenwood, T. Carver, A. Wipat, and P. Li. Taverna, lessons in creating a workflow environment for the life sciences. Paper presented at the GGF10, Berlin, Germany, 2004. [12] M.D. Wilkinson, and M. Links. BioMOBY: An open source biological web services proposal. Briefings in bioinformatics, 2003, 3(4), pp. 331–341. [13] P. Rice, I. Longden, A. Bleasby. EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics, 2000, 16, pp. 276-277. [14] OASIS. Web Services Business Process Execution Language Version 2.0. OASIS Standard, 11 April 2007. [15] S.P. Shah, D.Y. He, J.N. Sawkins, J.C. Druce, G. Quon, D. Lett, G.X. Zheng, T. Xu, B.F. Ouellette. Pegasys: software for executing and integrating analyses of biological sequences. BMC Bioinformatics 2004, 5:40 [16] P. Ernst, K-H. Glatting, and S. Shuai. A task framework for the web interface W2H. Bioinformatics, 2003, 19, 278-282. [17] A.G. Castro, S. Thoraval, L.J. Garcia, and M.A. Ragan. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator. BMC Bioinformatics, 2005, 6:87. [18] S. Hoon, K. Kumar Ratnapu, J. Chia, B. Kumarasamy, X. Juguang, M. Clamp, A. Stabenau, S. Potter, L. Clarke, and E. Stupka, Biopipe: A Flexible Framework for Protocol-Based Bioinformatics Analysis, Genome Research, 2003, 13:1904-1915. [19] INCOGEN, visual integrated bioinformatics environment. White paper. http://www.incogen.com/public_documents/vibe/VIBE_Whitepaper.pd f [20] E. Badidi, G. Burger, and B.F. Lang. FLOSYS - a Web accessible workflow system for protocol-driven biomolecular sequence analysis. Cellular and Molecular Biology Journal, 2004, 50(7):785-793. [21] C. Lethondal. A web interface generator for molecular biology programs in Unix. Bioinformatics, 2001, 17: 73-82. [22] Swiss Institute of Bioinfomatics (SIB). ExPASy Proteomics tools. http://www.expasy.ch/tools/ [23] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, Basic local alignment search tool. J. Mol. Biol. 1990, 215: 403-410. [24] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, and D.L. Wheeler. GenBank. Nucl. Acids Res. 2003, 31: 23-27. [25] G.D. Schuler, J.A. Epstein. H. Ohkawa, and J.A. Kans. Entrez: molecular biology database and retrieval system. Methods in Enzymology, 1996, 266: 141-162. [26] J.D. Thompson, D.G. Higgins, and T.J. Gibson. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 1994, 22, pp. 4673-4680. [27] J. Castresana. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution, 2000, 17: 540-552. [28] J. Felsenstein. PHYLIP phylogeny inference package (version 3.2). Cladistics 1989, 5: 164-166. [29] H. Shimodaira, and M. Hasegawa. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 2001, 17(12): 1246-1247. [30] OASIS. Universal Description, Discovery and Integration (UDDI) Version 3.0.2. http://uddi.org/pubs/uddi-v3.0.2-20041019.htm [31] D.G. Gilbert. Sequence file format conversion with command-line Readseq. In Current Protocols in Bioinformatics, A. Baxevanis and D. Davison, eds. Wiley, 2002. [32] Progress Software Corporation (n.d. Stylus studio. http://www.stylusstudio.com/ [33] C. Pautasso. JOpera: Process Support for more than Web services. http://www.iks.ethz.ch/jopera Elarbi Badidi is an Assistant Professor of computer science at the College of Information Technology (CIT) of United Arab Emirates University. Before joining the CIT, he held the position of bioinformatics group leader at the Biochemistry Department of UniversitĂ© de MontrĂ©al from 2001 to July 2004. He received a Ph.D. in computer science in 2000 from UniversitĂ© de MontrĂ©al, QuĂ©bec (Canada). His research interests include Web services and Service Oriented Computing, Middleware, and Bioinformatics data and tools integration. M. Vall Mohamed Salem is currently an Assistant Professor with the University of Wollongong in Dubai. His current interests are in performance analysis and scalability issues, distributed systems and software engineering. He received a Ph.D. in computer science in 2002 from UniversitĂ© de MontrĂ©al, QuĂ©bec (Canada). He held an IBM Canada Centre for Advanced Studies fellowship and can be joined at salem@uow.edu.au. Salah Bouktif is an Assistant Professor of software engineering at the College of Information Technology (CIT) of United Arab Emirates University. Before joining CIT, Dr. Bouktif was a Post Doc Fellow for two years at the department of computer engineering of the polytechnic school of engineering of Montreal. He received his Ph.D. Degree in 2005 with high honors from the University of Montreal. Dr. Bouktif’s research interest includes Metrics and software quality models, Software quality prediction improvement, Search-Based Software Engineering, Software testing and test data generation, Software evolution, Change and cost modeling. Larbi Esmahi is an Associate Professor of the School of Computing and Information Systems at Athabasca University. He was the graduate program coordinator at the same school during 2002-2005. He holds a PhD in electrical engineering from Ecole Polytechnique, University of Montreal. His current research interests are in e-services, e-commerce, multiagent systems, and intelligent systems. He is an associate editor for the Journal of Computer Science, and the Tamkang Journal of Science and Engineering. He is also member of the editorial advisory board of the Advances in Web-Based Learning Book Series, IGI Global, and member of the international editorial review board the International Journal of Web-Based Learning and Teaching Technologies. .