The Taverna Workflow Management Software Suite - Past, Present, Future
1. The Taverna Workflow
Management Software Suite:
Past, Present, Future.
Prof Carole Goble CBE FREng FBCS CITP
The University of Manchester, UK
Software Sustainability Institute UK
carole.goble@manchester.ac.uk
http://www.taverna.org.uk
http://www.mygrid.org.uk
2. More of what we generally do!
Prof Carole Goble CBE FREng FBCS CITP
The University of Manchester, UK
Software Sustainability Institute UK
carole.goble@manchester.ac.uk
http://www.taverna.org.uk
http://www.mygrid.org.uk
3. e-Science,
Computational Science, Scientific Computing
• Support global scientific collaboration,
enable large scale resource, tools and
results sharing, assist scientific
processing, avoid unnecessary
repeated work.
• Accelerate scientific discovery,
improving scientific productivity,
stimulate technological innovation.
• Cope with scales and speed of
scientific innovation and data.
4. Data-centric Computation
Scientific workflows over Distributed
Cyber-Infrastructure.
Data sharing Social Methods
libraries and catalogues for all types of
scientific artefacts and all types of
scientists.
Knowledge Management
Metadata, semantics digital exchange,
preservation, publishing
Software Engineering
Software sustainability, software and
data policy, training
Products Methods
Systems Biology
Chemistry
Astro-Physics
Astronomy
Biology
Social Science
Library
Digital
Preservation
Biodiversity
Public Health
Applications
6. Long Tail Little science
Self-organising groups
Disconnected, independent, distributed scientists
Disconnected, independent, distributed resources
Open in the wild.
Organised science
Organised groups
Clubs of scientists
Organised, planned and in-house resources
Closed and well behaved services.
7. VPH-Share
Models of Human
Physiology
Eagle Genomics
Next Generation
Sequencing
based Patient
Diagnostics
Astronomy &
HelioPhysics
Document
Preservation
Digitisation
Systems Biology
OpenTox Project
Chemistry
Development Kit
Drug Toxicity Ecological
Niche
Modelling
Population
Modelling
Meta-
genomics
Phylo-
genetics
• Data cleaning
• Data movement
• Data retrieval and
annotation
• Data analysis
• Data mining
• knowledge
management
• Data curation and data
warehouse population
• Data visualisation
• Parameter sweeps over
simulations
Drug discovery,
small molecules,
targets,
compounds
OpenPHACTS
8. BioSTIF
Inputs:
data, parameters,
configurations
Outputs
Workflow in a nutshell • Orchestrate series of
automated / interactive
steps
– Process pipelines
– Analytic and synthesis
procedures
– Repetitive code-run
sweeps
• Housekeeping tasks
– Process data at scale
– Auto documentation
• Mix in house & public
resources, native hosting
– Chain and choreograph
components
– Handle interoperability
– Bridge resources
– Shield operational
complexity and change
Services & Resources
Infrastructures
9. Taverna Workflow Management
http://www.taverna.org.uk
• Dataflow
– Computational Lambda Calculus with a monad extension*
– Simple control flows, iterations over collections
– Data type agnostic, domain independent
– Data movement, monitoring, staging, reference
– Custom (VO Tables), XML, JSON
• Mixed steps
– Services, codes & command line tools
– SOAP + REST Web Services
– Scripts: R, “In Workflow Programming” Beanshell scripting …
– Codes: Java, libraries, HPC, Grid and ~Cloud platforms etc …
– Nested workflows
– Interactions and Batch
*Turi et al Taverna Workflows: Syntax and Semantics e-Science 2007: 441-448;
Sroka et al A formal semantics for the Taverna 2 workflow model J. Comput. Syst. Sci. 76(6): 490-508 (2010)
10. • Computational Lambda Calculus
• Visual Programming
• Process mining
• Adaptive & parallel computing
• Cloud computing
• SOA, Semantic Web Services
• Data integration, data quality
• Semantic representation and linked data
• Reporting & tracking, credit propagation
• Workflow reusability, quality, discovery
• Security, monitoring, fault detection
• AI planning, re-run analysis, auto-planning,
auto-repair, auto-composition, auto-
annotation, service discovery, service matching,
auto-substitution
E.Science laboris
Tools
Standards
Services
11. Weeks -> Hours
Surprise predicted result tested in
lab. DAXX Gene
Genetic differences between breeds
Noyes, PNAS 2011 108(22) 9304-9309
BioDiversity Invasive
Species Modelling
American Horseshow Crabs
in the Baltic
Trypanosomiasis
resistance in African
Cattle
Software as a Service /
(Cloud) Appliance
Analytic bottleneck
Repetitive, unbiased,
accurate record,
taming data,
transparency, avoiding
shortcuts.
Interactive steps
Dev. Years->Weeks
Runs. Weeks -> Hours
Generalised ENM data
mapping and overlaying
pipelines.
Workflow-based Computation
12. 15
#SummerSchool 24-Jun-13
VPH-Share @neurist Aneurysm Morphology Workflow
P a t ie n t P s e u d o id e n t ifi e r (P ID )
D e m o g r a p h ic s
H e ig h t
W e ig h t
V it a l S ig n s
H e a r t R a t e
B lo o d P r e s s u r e
F lo w R a t e
T r a n s ie n t P r e s s u r e
A n e u r y s m P r o p e r t ie s
T is s u e P r o p e r t ie s
W a ll T h ic k n e s s
R is k F a c t o r s
M e d ic a l Im a g e s
M e d ic a t io n s
Patients Patient Avatar Disease Simulation
Work ofl w
Systemic Factors
Gene Expression Pro lfie
P a t ie n t P s e u d o id e n t ifi e r (P ID )
D e m o g r a p h ic s
H e ig h t
W e ig h t
V it a l S ig n s
H e a r t R a t e
B lo o d P r e s s u r e
F lo w R a t e
T r a n s ie n t P r e s s u r e
A n e u r y s m P r o p e r t ie s
T is s u e P r o p e r t ie s
W a ll T h ic k n e s s
R is k F a c t o r s
M e d ic a l Im a g e s
M e d ic a t io n s
A n e u ry sm R u p tu r e P ro fi le
M o rp h o lo g y P r o fi le
H a e m o d y n a m ic P r o fi le
M e c h a n o b io lo g ic a l P r o fi le
P re d ic tio n U n c e rta in ity
Patient Avatar
Updated
RISK
Patients Patient Avatar Disease Simulation
Workflow Patient Avatar
updatedSystemic Factors
Gene Expression Profile
RISK
[Susheel Varma] http://www.vph-share.eu/
13. • Morphological, hemodynamic and structural analyses have been linked to
aneurysm genesis, growth and rupture.
• Evidence indicating differences in morphology and flow between ruptured
and unruptured aneurysms have been shown for reduced patient cohorts.
• Structural wall mechanics has been used to justify the growth and
remodelling happening at the aneurysm level.
Confidence in
physical measures
+
images
+ BC,
material
+ BC,
material
Morphological
analysis
Direct
diagnostic power
+
Morphological
descriptors
Structural descriptors
Hemodynamic
descriptors
Haemodynamic
analysis
Structural analysis
Practically,
morphological
characterizations might
currently have the
highest predictive
capabilities with respect
to the other analyses.
Morphological Workflow
[Susheel Varma]
14. Medical image
from imaging equipment
@neurIST
morphological descriptors
Complex indices (Zernike moment invariants)
Basic size indices describing aneurysm sac
depth
neck
Morphological Analysis Workflow
[Susheel Varma]
16. Biodiversity
marine monitoring and health assessment
ecological niche modelling
Data Intensive Science
Collaborative Science
Pilumnus hirtellusEnclosed sea problem
(Ready et al., 2010)
Sarah Bourlat
18. Ecological Niche
Modeling
.
Step 1: Explorative modeling
-Use unfiltered data
-Use fixed parameters: Mahalonobis distance
-Native projections
-Test the model, distribution of points, number of points
Step 2: Deep modeling
-Filtering environmentally unique points with BioClim algorithm
-ENM with Support Vector Machine and Maximum Entropy
-Parameter optimization (if necessary) on the model test results
-2 masks (model generate, model project)
Data discoveryData discovery
Data assembly,
cleaning, and
refinement
Data assembly,
cleaning, and
refinement
Ecological Niche
Modeling
Ecological Niche
Modeling
Statistical analysisStatistical analysis
Analytical cycle
Pilumnus hirtellusEnclosed sea problem
(Ready et al., 2010)
The workflows work over large geographical,
taxonomic, and environmental scales, incl.
terrestrial ecosystems
Baltic species invasions of various crabs/sea
creatures
Interactions of different forest insects and trees
19. Ecological Niche
Modeling
.
Step 1: Explorative modeling
-Use unfiltered data
-Use fixed parameters: Mahalonobis distance
-Native projections
-Test the model, distribution of points, number of points
Step 2: Deep modeling
-Filtering environmentally unique points with BioClim algorithm
-ENM with Support Vector Machine and Maximum Entropy
-Parameter optimization (if necessary) on the model test results
-2 masks (model generate, model project)
Data discoveryData discovery
Data assembly,
cleaning, and
refinement
Data assembly,
cleaning, and
refinement
Ecological Niche
Modeling
Ecological Niche
Modeling
Statistical analysisStatistical analysis
Analytical cycle
Pilumnus hirtellusEnclosed sea problem
(Ready et al., 2010)
The workflows work over large geographical,
taxonomic, and environmental scales, incl.
terrestrial ecosystems
Baltic species invasions of various crabs/sea
creatures
Interactions of different forest insects and trees
BioSTIF
25. Taverna:
a Knowledge Discovery Framework
•Asthma sputum inflammatory phenotypes, a transcriptome analysis, Saeedeh
Maleki-Dizaji, Chris Newby,
Rachid Berair, Rod Smallwood , Chris Brightling 2014
(to be submitted)
•A systematic approach to a transcriptome analysis to asthma sputum inflammatory
phenotypes ISMB 2014.
•The Battle of the Sexes starts in the oviduct : modulation of oviductal transcriptome
by X and Y-bearing spermatozoa: Almiñana C, Caballero I, Heath PR, Maleki-Dizaji
S, Parrilla I, Cuello C, Gil MA, Vazquez JL, Vazquez JM, Roca J, Martinez EA, Holt
WV and Fazeli A. submitted to BMC Genomics 2014 ,(In Press)
•transcription regulation network involving E2F6, IRF7 and STAT1, Thomas R.J.
Lovewella ,Andrew J.G. McDonaghb, Andrew G Messengerb, Saeedeh Maleki-
Dizaji, Mimoun Azzouzd and Rachid Tazi-Ahniniaformation submitted to PNAS,
2014
•Kiran, M., Bicak, M., Maleki-Dizaji, S., Holcombe, M. FLAME: A Platform for High
Performance Computing of Complex Systems. Journal of Acta Physica Polonica
2011.
•Maleki-Dizaji S, Holcombe M, Rolfe MD, Fisher P, Green J, Poole RK, Graham AI,
A Systematic Approach to Understanding Escherichia coli Responses to
Oxygen: From Microarray Raw Data to Pathways and Published Abstracts,
Online J Bioinformatics, (1):51-59, 2009
[Saeedeh Maleki-Dizaji]
27. Taverna Workflow Management
Open extensibility
• Plug-in framework
– Command line tool
– Data Services: VOTables for AstroTaverna
– Optimisations: E.g. Holl. model parameter sweeps
– Infrastructures: Grid, HPC, Web Services
– Domains: CDK, BioMart, VOTable
– Commodities: Excel Spreadsheets, Open Refine, R
• Plug into other frameworks & platforms
– Portals: Scratchpads
– Interactive platforms: iPython Notebook
– Wfms: KNIME Node, Galaxy tool, Kepler Actor
• Third party applications
– Taverna Online
– XworX
– OGC chainer
28. Taverna Online: 3rd
party app
Dr Vadim Surpin and Vitaly Sharanutsa, Institute for Information Transmission
Problems of Russian Academy of Sciences (IITP RAS)
An online, in-browser application for assembling and running Taverna
Workflows over a HPC platform http://onlinehpc.com/site/main
29. Interoperability: Data format/identity mismatches
Service interface handling
Components: Well described, behaved,
curated, annotated modularised
workflow modules
• Semantic annotations, prescribed
failover, formats, provenance
• Organised into common families
30. Taverna Directions
AccessAccess
Framework to access and leverage heterogeneous
legacy applications, services, datasets and codes.
Shielding from complexity.
CustomiseCustomise
Rapid development: Flexibility, Extensibility,
Adaptability, Reuse. Reusable Workflow
Components
ProcessProcess
Automated plumbing + Interaction
Systematic, repetitive and unbiased analysis and
processing and error handling
Ensembles, comparisons, “what ifs”
CustomiseCustomise
Rapid development: Flexibility, Extensibility,
Adaptability, Reuse. Reusable Workflow
Components
ProcessProcess
Automated plumbing + Interaction
Systematic, repetitive and unbiased analysis and
processing and error handling
Ensembles, comparisons, “what ifs”
CustomiseCustomise
Rapid development: Flexibility, Extensibility,
Adaptability, Reuse. Reusable Workflow
Components
AccessAccess
Cloud and Scale, Registries
Standards data formats, programmatic interfaces.
Adapting to change. Security.
Governance of components
ProcessProcess
Seamless, pluggable wf as a service.
Scale. Adaptability. Specific-Generic tension.
Easier development, user experience
Workflow commodities, Research Objects
Design practices for reuse. Credit
Executable interactive notebooks. Provenance
A tool for reproducibility
ReportReport
EmbedEmbed
Workflows in common applications
Integration into reporting & publishing
Underpin integrative platforms.
Service based science and science as a service
31. Fix on demand.
Notify as needed.
Monitor for decay
Workflow/Service Monitors
3rd
Party Monitors
Workflow analytics
Detect and Repair
QUASAR toolkit
[Zhao et al. Why workflows break e-Science 2012]
32. The Execution Provenance Gap
Data tracking
Summarisation,
Labelling,
Distillations,
Selective tracking
Filtering
Big
Fine grain
1 White box
One System
Special tools
Collection
A Big Graph
What do I cite?
What did I do?
N Black boxes
Many Systems
My Lab Book
Analytics
Smart in situ Presentation
Why am I citing?
Pinar Alper, Khalid Belhajjame, Carole A. Goble, Pinar Karagoz: Enhancing and abstracting scientific workflow provenance for data
publishing. EDBT/ICDT Workshops 2013: 313-318
Sarah Cohen Boulakia, Jiuqiang Chen, Paolo Missier, Carole A. Goble, Alan R. Williams, Christine Froidevaux: Distilling structure in
Taverna scientific workflows: a refactoring approach. BMC Bioinformatics 15(S-1): S12 (2014)
http://provenanceweek.dlr.de
34. Research Objects
• Bundles and relates multi-hosted digital resources of a scientific
experiment or investigation using standard mechanisms
• Descriptive reproducibility
• Exchange, Releasing paradigm for publishing
http://www.researchobject.org/ http://www.researchobject.org/
39. Virtual Liver Network
BMBF “Großprojekt“• ~45 organisations, ~70 groups
• multiscale rep. of the liver
• clinical impact
• general public portal
47
Same key requirements:
yellow pages, exchange of all
sops/data/models, sharing
rights
Different biology
• Multiscale data
• Multiscale models
• Imaging
Different project structure
• Hierarchies (A, A1, A1.2)
• Regional groups of groups
Flexibility, extensibility, open
sourceness of SEEK key
40. simulate models
project mgt,
access control
reporting, citation
governance &
policies
yellow pages
of peers
projects,
experts
catalogue and link
data, models, samples,
specimens, sops,
experiments,
publications using
standards
curate &
annotate data
and models using
standards
access, link to and
deposit in public
data and model
repositories
manage, store and
exchange different
types and scales of
data
integrate local and
project tools and
data systems
scaled-out
collection &
processing
41. experimentalists,
modellers, X-
informaticians,
computational Xs,
software engineers,
computer scientists,
systems
administrators,
resource providers,
tool builders
social scientists,
librarians, curators
Social Computation
Storing, Sharing and Reusing data, methods, models,
between collaborating and competing scientists
e-Laboratories, collaboratories, VREs, repositories
An ego-system
43. Knowledge Computation
•Accurate, intelligible and comparable descriptions
•Data interoperability
•Machine readable metadata
Semantic technologies, Ontologies,
Linked Data, Data schema
44. Semantic Description
Describing and linking data in terms of
shared concepts, relationships and identifiers
Data
object property
data property
subClassOf
Ontology
Person
Organization
Place
State
name
birthdate
bornIn
worksFor state
name
phone
name
livesIn
City
Event
ceo
location
organizer
nearby
startDate
endDate
title
isPartOf
postalCode
Column 1 Column 2 Column 3 Column 4 Column 5
Bill Gates Oct 1955 Microsoft Seattle WA
Mark Zuckerberg May 1984 Facebook White Plains NY
Larry Page Mar 1973 Google East Lansing MI
[Taheriyan et al
adapted]
54. Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a
Computational science: ...Error…why scientific programming does not compute.
55. Training
• Training infrastructure
• Scalable training approaches
• Review needs
• Coordinate activities and materials
• Liaise with Nodes and Hub
56. Data-centric Computation
Scientific workflows over Distributed
Cyber-Infrastructure.
Data sharing Social Methods
libraries and catalogues for all types of
scientific artefacts and all types of
scientists.
Knowledge Management
Metadata, semantics digital exchange,
preservation, publishing
Software Engineering
Software sustainability, software and
data policy, training
Products Methods
Systems Biology
Chemistry
Astro-Physics
Astronomy
Biology
Social Science
Library
Digital
Preservation
Biodiversity
Public Health
Applications
Bioinformaticians in the wild
No predetermined VOs
Exploratory investigations
Services in the wild
Natively and distributedly hosted
Data and Platform agnostic
Production level engine to handle cross cutting concerns and large data collections
Customisation opportunities
Experiment with Semantic Technologies
Domain independence
Restrictive vs open worlds
OPEN STUFF
Independent life science informaticians in the field
Expert bioinformaticians but not programmers
An open community
Open applications
Independent third party world-wide service providers, local and remote over the web
In house applications, tools and datasets
Open (and closed) worlds.
Open SourceManaged worldsWild worlds
Underpin integrative platforms.
Powering service based science and science as a service
A tool for reproducibility
logos
Coordinate execution of services and codes.
Dataflow at scale
Reusable variants
Comparable repetitions
Import own data / codes + public libraries/datasets
Honour hosted codes
Shield operational complexity
Auto-document provenance
Package up dependencies
aimed at different layers of the software stack
“The Many Faces of IT as Service”, Foster, Tuecke, 2005
“Provisioning” – reservation to configuration to … … make sure resource will do what I want it to do, with the right qualities of service
Virtualization = separation of concerns between provider & consumer of “content”
Client and service
Service provider and resource provider
Provisioning = assemble & configure resources to meet user needs
Management = sustain desired qualities of service despite dynamic environment
It’s a framework!
Provenance collection…
W3C PROV+, OPM formats
OAuth security plug-in
Java, Grid services, R scripts, libraries (BioConductor, libSBML…)
Just released Taverna 2.5, since 17 April It's now 642 workbench and 500 CLT downloads
100,000+ downloads over its lifetime.
Audit last year to track startups – just under 1000unique starts in one month
IInteraction: Visual programming, workflow reusability, workflow quality, workflow discovery
Service oriented computing, cloud computing, grid computing, optimisation, parallelism, adaptation, security, monitoring and fault correction
AI & Semantics: re-run analysis, auto-planning, auto-repair, auto-composition, auto-annotation, service discovery, service matching, auto-substitution
Data integration, data mapping, service integration, provenance tracking, credit propagation, data spaces, data quality
Understanding genetic differences between breeds of cattle
Ecological niche modeling of Baltic invasives
Collection, Preparation & Production Pipelines
Exploratory analytics
Simulation codes
Text mining
Auto recommendations
Visual analytics
Morphological, hemodynamic and structural analyses have been linked to aneurysm genesis, growth and rupture.
Evidence indicating differences in morphology and flow between ruptured and unruptured aneurysms have been shown for reduced patient cohorts.
Structural wall mechanics has been used to justify the growth and remodelling happening at the aneurysm level.
Collecting, processing and management of big data
Metagenomics, genotyping, genome sequencing, phylogenetics, gene expression analysis, proteomics, metabolomics, auto sampling
Analytics and management of broad data from many different disciplines
Coupling analytical metagenomics with meaningful ecological interpretations
Continuous development of novel methods and technologies
Functional trait-based ecology approach proposed by Barberán et. al 2012.
Not all things are batch
VPH-Share opens a VNC connection spawned instance.
Taverna Interaction Service
Users interact with a workflow (wherever it is running) in a web browser.
Interaction Service Workbench Plug-in
The BioVeL Ecological Niche Modelling workflow running while embedded into the AntKey Scratchpads site
Custom resources and platforms
Components
Plug-in Framework
Infrastructures: Grid, HPC, Web Services (SOAP, REST)
Domain: CDK, BioMart, VOTable, SADI
Common Tools: Excel Spreadsheets, Open Refine, R
COMPUTING POWER
The service provides two types of computing nodes: Amazon AWS cluster computing instances
Automatic configuration of computer clusters on AWS cloud resources
In-house powerful computing cluster
Hundreds of Intel Xeon (3.00 Ghz, 4 cores) nodes available
2 CPUs per node
8 cores per node
2 Gb of RAM per core
100 Gb of local storage per node
Providers
Contact us to access you own computing facilities with our service
OnlineHPC looks for partnership with supercomputer providers all over the world. Contact us for details.
Large number of re-usable, versioned components
26 ENM components
42 components in myExperiment
A workflow in their own right
Test by running individually
Annotatable for semantic description of profile
Create new workflows remixing any components – like the ENM ones we have made.
Research Objects, Metadata structuring
Annotation by Stealth, Shared Templates
Other communities
Workflows Apps
Workflow commodities
Adaptability, Tiers of infrastructure
Computational Reproducibility is hard in the wild: description / execution
Added after the fact
Shims – beanshell programming in the small
Mapping services for names
Curated service signatures
Data and semantic interoperability in the services, service families and service collections (that is where your types are)
Data agnostic, Semantic layering
Shim services
Workflow flexibility and reusability but makes things untidy
Next steps – Shim libraries and packaged components
Annotation
What do the services DO? And HOW? Expert curation
One size does not fit all: scientists need simplish metadata for decision support; automated validation, configuration, repair needs rich metadata decision making.
Next steps – BioCatalogue social & auto curation through myExperiment
Workflow Run RO BundleFolder structure or Zip file with some JSONUnpack into local file system, ship to myExperiment or notebook
1 constantly running server for workflows that aren’t security sensitive
Multiple commandline tools
For secure workflows, spawn own server and own command line in a bubble
Start up performance issues: start server, start cmmdline start image start apps.
VPH-Share plugin exposes in Taverna Online list of tools you can instantiate on their VM
Execution deals with requesting ofstart and close down of VM. WSDL at a specific location rebinds the tool.
BioCatalogue work by Dimitri for unbound WSDL for the tools
Player needs a workflow file from the portal or myExperiment or something else.
Rails plugin for running Taverna Workflows
Integrates into any Rails app
Embed workflows into any web page
Job queuing system scales runs with the number of workflows the servers can handle. Each run in parallel with its own worker.
Input provenance: setup, input gathering, parameters and data used
Runs: Taverna Server operations, interactions, run workflow, re-run / restart
Results management: storing, viewing, downloading, result type rendering
Service credential management: for secure services within workflows
Look and results rendering fully customizable
LifeWatch, Scratchpads, personal web page, …
Just like embedding a YouTube video
Gets bigger when it needs to & tells you when its full.
Result type rendering: Text, XML, JSON, HTML, Images, PDF, Workflow errors, Links for types that browsers cannot show inline, more…..
Taverna server spawns commandline tool for user separation.
The components of the architecture:
An OSGi platform, with the Taverna Platform API
implemented by Taverna Core
executes a workflow using the Taverna Engine
uses Activity plugins for the different service types (WSDL, REST, Biomart, R scripts, command line tools, etc)
also implemented by the Taverna Server client which uses the Java Client library to proxy running of a workflow on the Taverna Server
The Taverna workbench to design and run workflows
UI plugins for each service type
executes workflows using the Taverna platform API
The Taverna command line which executes workflows using the Taverna platform API
A Taverna Server, which exposes the Taverna platform API as a REST API and SOAP API for executing workflows
Taverna Player, which use the Ruby client library to execute workflows on the Taverna Server
Taverna Lite, which also uses the Ruby client library to execute workflows, but also manage a repository of workflows and allow user interactions.
The OSGi framework (OSGi being an acronym for "Open Services Gateway initiative") is a module system and service platform for the Java programming language that implements a complete and dynamic component model, something that does not exist in standalone Java/VM environments. Applications or components (coming in the form of bundles for deployment) can be remotely installed, started, stopped, updated, and uninstalled without requiring a reboot; management of Java packages/classes is specified in great detail. Application life cycle management (start, stop, install, etc.) is done via APIs that allow for remote downloading of management policies. The service registry allows bundles to detect the addition of new services, or the removal of services, and adapt accordingly.
The OSGi specifications have moved beyond the original focus of service gateways, and are now used in applications ranging from mobile phones to the open source Eclipse IDE. Other application areas include automobiles, industrial automation, building automation, PDAs, grid computing, entertainment, fleet management and application servers.
ENCODE threads
exchange between tools and researchers
bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms
Explore, Personal….
Recording and reporting
Production….
Reporting.
Issues: non-secure html using http inside secure https iframe in ipython doesn’t work – need to update interaction service to deliver on https.
Variety:
common metadata models
rich metadata collection
ecosystem
Validity:
auto record of experiment set-up, citable and shareable descriptions
curation, publication,
mixed stewardship
third part availability
model executability
citability, QC/QA. trust.
Social issues of understanding the culture of risk, reward, sharing and reporting.
Blending SEEK and openBIS together
It’s a lot like a start-up
Software Engineering
for Science, Software sustainability, software and data policy, training
Why did I start as a Computer Scientist and, proudly, end up as a Software Engineer and Social Worker?
Web Science related activity
Making people think its their idea
Nearly every time I ask people they ask for today’s and not tomorrow.
Sample of three commercial datasets
Information on handful of targets only
Gemma Sattertwaite mentioned this
Sample of three commercial datasets
Information on handful of targets only
Cache copies of data
Chemistry data normalisation/alignment through ChemSpider
Domain specific API
API calls populate SPARQL queries
It’s like a start up
Social Software Engineering
T shaped people
“As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software”
An aside
Training infrastructure
A pilot training e-support service platform
Share training material
Scalable training approaches
Training the trainers, Support network
Trainer pool, Share know-how
Review needs
Cooperating training sectors
Manage and monitor outcomes
Coordinate activities and materials
Workshops, bootcamps, online
Pop-up training provision
Liaise with Nodes and Hub
Programmes retain branding
The multidimensional paper
A scientific article can be envisioned as juxtaposed layers—Title, Abstract, Synopsis, Article, Expanded View and Datasets—that provide access to the paper with increasing resolution and allow readers to zoom in or out to access the information at the required level of granularity.
A scientific article can be envisioned as juxtaposed layers—Title, Abstract, Synopsis, Article, Expanded View and Datasets—that provide access to the paper with increasing resolution and allow readers to zoom in or out to access th