Invited talk at the GeoClouds Workshop, Indianapolis, 2009

Scientific Workflow Management System

Taverna,
Biocatalogue,
and
myExperiment:
a
three-‐legged
founda;on
for
eﬀec;ve
collabora;on
in
E-‐science
A collaborative talk by Paolo Missier
Information Management Group
School of Computer Science, University of Manchester, UK

with additional material kindly shared by:
Prof. Dave DeRoure and David Newman, University of Southampton
Prof. Carole Goble and the e-Labs design group, University of Manchester
1
GeoClouds workshop, Indianapolis, IN, Sept. 17, 2009 - P. Missier

Sunday, 13 March 2011

What is the myGrid Project?

 UK
e-‐Science
pilot
project
since
2001.

 Centred
at
Manchester,
Southampton
and
the
EMBL-‐EBI
 Part
of
Open
Middleware
Infrastructure
InsEtute
UK
hFp://
www.omii.ac.uk.

 Mixture
of
developers,
bioinformaEcians
and
researchers
 An
alliance
of
contribuEng
projects
and
partners

 Open
source
development
and
content
LGPL
or
BSD
 Infrastructure

 We
don’t
own
any
resources
(apart
from
catalogues)
 Or
a
Grid.

ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier


Taverna

Graphical

Workbench
For
Professionals

Plug-‐in
architecture
Nested
Workﬂows
Drag
and
Drop
Wiring
together

Rapidly
incorporate
new
service
without
coding.

Not
restricted
to
predetermined
services
Access
to
local
and
remote
resources
and
analysis
tools
3500+
service
operaEons
available
when
start
up


What do Scientists use Taverna for?

Systems
biology
model
building Netherlands
BioinformaEcs
Centre
Genome
Canada
BioinformaEcs
Plaaorm
Proteomics
BioMOBY
Sequence
analysis US
FLOSS
social
science
program
Protein
structure
predicEon RENCI
Gene/protein
annotaEon
SysMO
ConsorEum
Microarray
data
analysis French
SIGENAE
farm
animals
project
QTL
studies ThaiGrid
CARMEN
Neuroscience
project
QSAR
studies SPINE
consorEum
Medical
image
analysis EU
Enﬁn,
EMBRACE,
BioSapian,
Casimir
Public
Health
care
epidemiology EU
SysMO
ConsorEum
Heart
model
simulaEons NERC
Centre
for
Ecology
and
Hydrology
High
throughput
screening Bergen
Centre
for
ComputaEonal
Biology
Max-‐Planck
insEtute
for
Plant
Breeding
Research
Phenotypical
studies
Genoa
Cancer
Research
Centre
Phylogeny AstroGrid

StaEsEcal
analysis

30
USA
academic
and
research

Text
mining ins;tu;ons

Astronomy,
Music,
Meteorology


Who else is in this space?

Trident Triana

Kepler

Ptolemy II

Taverna

BioExtract
BPEL

5


www.myexperiment.org
Socially share,
discover and reuse
workflows and
other methods.

Cooperative bazaar.

l Sunday
10th
May:
1748
registered
users,
143
groups,
669
workflows,
197
files,
52
packs
56
different
countries.
Top
4:
UK,
US,
The
Netherlands,
Germany

Why data provenance matters, if done right
• To establish quality, relevance, trust
• To track information attribution through complex transformations
• To describe one’s experiment to others, for understanding / reuse
• To provide evidence in support of scientific claims
• To enable post hoc process analysis for improvement, re-design

The W3C Incubator on Provenance has been collecting numerous use cases:
http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases#

Linköping, Sweden -- January 2010


Goals, expected contributions
• Established technology provider - open-source
– traditionally active in the bioinf space
– but also involved in the e-Lico EU project (data mining
portal)
– large community base, established production
environment

• Main goal:
– to offer our workflow and workflow repository technology,
put it to the test on the challenges of data preservation
pipelines

• Challenges:
– expect new requirements on our current technology
• robust, high-volume data pipelines
• workflow provenance -- process evolution
10
• data provenance

Invited talk at the GeoClouds Workshop, Indianapolis, 2009

More Related Content

What's hot

Viewers also liked

Similar to Invited talk at the GeoClouds Workshop, Indianapolis, 2009

More from Paolo Missier

Recently uploaded

Invited talk at the GeoClouds Workshop, Indianapolis, 2009