Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Taverna Workflow
Management Software Suite:
Past, Present, Future.
Prof Carole Goble CBE FREng FBCS CITP
The Universit...
More of what we generally do!
Prof Carole Goble CBE FREng FBCS CITP
The University of Manchester, UK
Software Sustainabili...
e-Science,
Computational Science, Scientific Computing
• Support global scientific collaboration,
enable large scale resou...
Data-centric Computation
Scientific workflows over Distributed
Cyber-Infrastructure.
Data sharing Social Methods
libraries...
Computer
Science
Software
Engineering
Scientific Informatics
Computational Science
THEORY PRACTICEAPPLICATION
fundamental ...
Long Tail Little science
Self-organising groups
Disconnected, independent, distributed scientists
Disconnected, independen...
VPH-Share
Models of Human
Physiology
Eagle Genomics
Next Generation
Sequencing
based Patient
Diagnostics
Astronomy &
Helio...
BioSTIF
Inputs:
data, parameters,
configurations
Outputs
Workflow in a nutshell • Orchestrate series of
automated / intera...
Taverna Workflow Management
http://www.taverna.org.uk
• Dataflow
– Computational Lambda Calculus with a monad extension*
–...
• Computational Lambda Calculus
• Visual Programming
• Process mining
• Adaptive & parallel computing
• Cloud computing
• ...
Weeks -> Hours
Surprise predicted result tested in
lab. DAXX Gene
Genetic differences between breeds
Noyes, PNAS 2011 108(...
15
#SummerSchool 24-Jun-13
VPH-Share @neurist Aneurysm Morphology Workflow
P a t ie n t P s e u d o id e n t ifi e r (P ID ...
• Morphological, hemodynamic and structural analyses have been linked to
aneurysm genesis, growth and rupture.
• Evidence ...
Medical image
from imaging equipment
@neurIST
morphological descriptors
Complex indices (Zernike moment invariants)
Basic ...
Implementation in VPH-Share
The @neurIST morphological workflow specification in Taverna
[Susheel Varma]
Biodiversity
marine monitoring and health assessment
ecological niche modelling
Data Intensive Science
Collaborative Scien...
Ecological Niche
Modeling
.
Step 1: Explorative modeling
-Use unfiltered data
-Use fixed parameters: Mahalonobis distance
...
Ecological Niche
Modeling
.
Step 1: Explorative modeling
-Use unfiltered data
-Use fixed parameters: Mahalonobis distance
...
www.biovel.eu
Ecological Niche
Modeling
Workflow (ENM)
data
configuration
parameters
steps
Data and Parameter Sweeps
Hosted installation
Local installations
Taverna:
a Knowledge Discovery Framework
•Asthma sputum inflammatory phenotypes, a transcriptome analysis, Saeedeh
Maleki-...
Application
Runtime
Middleware
Resources/Codes/Services Infrastructures
Repositories
Execution Activity Plug-ins
Applicati...
Taverna Workflow Management
Open extensibility
• Plug-in framework
– Command line tool
– Data Services: VOTables for Astro...
Taverna Online: 3rd
party app
Dr Vadim Surpin and Vitaly Sharanutsa, Institute for Information Transmission
Problems of Ru...
Interoperability: Data format/identity mismatches
Service interface handling
Components: Well described, behaved,
curated,...
Taverna Directions
AccessAccess
Framework to access and leverage heterogeneous
legacy applications, services, datasets and...
Fix on demand.
Notify as needed.
Monitor for decay
Workflow/Service Monitors
3rd
Party Monitors
Workflow analytics
Detect ...
The Execution Provenance Gap
Data tracking
Summarisation,
Labelling,
Distillations,
Selective tracking
Filtering
Big
Fine ...
Tracking Provenance
File Stores Lab Books Repositories
• Granularity
• Scales
• Blackbox
• Hybrid
Research Objects
• Bundles and relates multi-hosted digital resources of a scientific
experiment or investigation using st...
Flexibility
Review, Revise/Discard
Scale
Deploy
into tools
Comparison
Personal
Group
Production
Research Reporting
Harden
http://nbviewer.ipython.org/github/myGrid/DataHackL
eiden/blob/alan/Player_example.ipynb
https://www.youtube.com/watch?v=Q...
Archiving
Publishing
Component Libraries
Preserving
Recording
Storing
Exchanging
Versioning
Sharing
PACKS
SEEK4Science
Sharing and interlinking Methods, Models, Data…
Data
Model
Article
External
Databases
Metadata
Virtual Liver Network
BMBF “Großprojekt“• ~45 organisations, ~70 groups
• multiscale rep. of the liver
• clinical impact
•...
simulate models
project mgt,
access control
reporting, citation
governance &
policies
yellow pages
of peers
projects,
expe...
experimentalists,
modellers, X-
informaticians,
computational Xs,
software engineers,
computer scientists,
systems
adminis...
Computer
Scientist
Software
Engineer
Social
Engineer
Knowledge Computation
•Accurate, intelligible and comparable descriptions
•Data interoperability
•Machine readable metadat...
Semantic Description
Describing and linking data in terms of
shared concepts, relationships and identifiers
Data
object pr...
Curation Knowledge Ramps
Populous
http://www.rightfield.org.uk
Katy Wolstencroft
Pathways
Pharmacological
Activities
Biological
Processes
Transcripts
Pathological
Processes
Diseases
Genes
Proteins
Intera...
Pathways
Pharmacological
Activities
Biological
Processes
Transcripts
Pathological
Processes
Diseases
Genes
Proteins
Intera...
NanopubNanopub
DbDb
VoIDVoID
Data Cache
(Virtuoso Triple
Store)
Semantic Workflow EngineSemantic Workflow Engine
Linked Da...
Strict Relaxed
Analysing Browsing
Dynamic Equality
skos:closeMatch
(Drug Name)
skos:closeMatch
(Drug Name)
skos:exactMatch...
CS
Research
Software
Engineering
Science Engage
Delivery & Support
2001-
2006
CS
Research
Software
Engineering
Science Engage
Delivery & Support
2006
-today
“Startup-Like”
Balance Innovation with Usefulness
Software Engineering
Research Software Engineers.
Sustainable software.
Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a
Computational science: ...Error…why scientific programming...
Training
• Training infrastructure
• Scalable training approaches
• Review needs
• Coordinate activities and materials
• L...
Data-centric Computation
Scientific workflows over Distributed
Cyber-Infrastructure.
Data sharing Social Methods
libraries...
Lemberger T Mol Syst Biol 2014;10:715
©2014 by European Molecular Biology Organization
Born Reproducible | Exchangeable |
...
• myGrid
– http://www.mygrid.org.uk
• Taverna
– http://www.taverna.org.uk
• myExperiment
– http://www.myexperiment.org
• B...
The Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, Future
Upcoming SlideShare
Loading in …5
×

The Taverna Workflow Management Software Suite - Past, Present, Future

1,448 views

Published on

Carole Goble at VPH meeting in Sheffield

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The Taverna Workflow Management Software Suite - Past, Present, Future

  1. 1. The Taverna Workflow Management Software Suite: Past, Present, Future. Prof Carole Goble CBE FREng FBCS CITP The University of Manchester, UK Software Sustainability Institute UK carole.goble@manchester.ac.uk http://www.taverna.org.uk http://www.mygrid.org.uk
  2. 2. More of what we generally do! Prof Carole Goble CBE FREng FBCS CITP The University of Manchester, UK Software Sustainability Institute UK carole.goble@manchester.ac.uk http://www.taverna.org.uk http://www.mygrid.org.uk
  3. 3. e-Science, Computational Science, Scientific Computing • Support global scientific collaboration, enable large scale resource, tools and results sharing, assist scientific processing, avoid unnecessary repeated work. • Accelerate scientific discovery, improving scientific productivity, stimulate technological innovation. • Cope with scales and speed of scientific innovation and data.
  4. 4. Data-centric Computation Scientific workflows over Distributed Cyber-Infrastructure. Data sharing Social Methods libraries and catalogues for all types of scientific artefacts and all types of scientists. Knowledge Management Metadata, semantics digital exchange, preservation, publishing Software Engineering Software sustainability, software and data policy, training Products Methods Systems Biology Chemistry Astro-Physics Astronomy Biology Social Science Library Digital Preservation Biodiversity Public Health Applications
  5. 5. Computer Science Software Engineering Scientific Informatics Computational Science THEORY PRACTICEAPPLICATION fundamental applied PRODUCT (Open Source) PRINCIPLE Science “USE CASE”
  6. 6. Long Tail Little science Self-organising groups Disconnected, independent, distributed scientists Disconnected, independent, distributed resources Open in the wild. Organised science Organised groups Clubs of scientists Organised, planned and in-house resources Closed and well behaved services.
  7. 7. VPH-Share Models of Human Physiology Eagle Genomics Next Generation Sequencing based Patient Diagnostics Astronomy & HelioPhysics Document Preservation Digitisation Systems Biology OpenTox Project Chemistry Development Kit Drug Toxicity Ecological Niche Modelling Population Modelling Meta- genomics Phylo- genetics • Data cleaning • Data movement • Data retrieval and annotation • Data analysis • Data mining • knowledge management • Data curation and data warehouse population • Data visualisation • Parameter sweeps over simulations Drug discovery, small molecules, targets, compounds OpenPHACTS
  8. 8. BioSTIF Inputs: data, parameters, configurations Outputs Workflow in a nutshell • Orchestrate series of automated / interactive steps – Process pipelines – Analytic and synthesis procedures – Repetitive code-run sweeps • Housekeeping tasks – Process data at scale – Auto documentation • Mix in house & public resources, native hosting – Chain and choreograph components – Handle interoperability – Bridge resources – Shield operational complexity and change Services & Resources Infrastructures
  9. 9. Taverna Workflow Management http://www.taverna.org.uk • Dataflow – Computational Lambda Calculus with a monad extension* – Simple control flows, iterations over collections – Data type agnostic, domain independent – Data movement, monitoring, staging, reference – Custom (VO Tables), XML, JSON • Mixed steps – Services, codes & command line tools – SOAP + REST Web Services – Scripts: R, “In Workflow Programming” Beanshell scripting … – Codes: Java, libraries, HPC, Grid and ~Cloud platforms etc … – Nested workflows – Interactions and Batch *Turi et al Taverna Workflows: Syntax and Semantics e-Science 2007: 441-448; Sroka et al A formal semantics for the Taverna 2 workflow model J. Comput. Syst. Sci. 76(6): 490-508 (2010)
  10. 10. • Computational Lambda Calculus • Visual Programming • Process mining • Adaptive & parallel computing • Cloud computing • SOA, Semantic Web Services • Data integration, data quality • Semantic representation and linked data • Reporting & tracking, credit propagation • Workflow reusability, quality, discovery • Security, monitoring, fault detection • AI planning, re-run analysis, auto-planning, auto-repair, auto-composition, auto- annotation, service discovery, service matching, auto-substitution E.Science laboris Tools Standards Services
  11. 11. Weeks -> Hours Surprise predicted result tested in lab. DAXX Gene Genetic differences between breeds Noyes, PNAS 2011 108(22) 9304-9309 BioDiversity Invasive Species Modelling American Horseshow Crabs in the Baltic Trypanosomiasis resistance in African Cattle Software as a Service / (Cloud) Appliance Analytic bottleneck Repetitive, unbiased, accurate record, taming data, transparency, avoiding shortcuts. Interactive steps Dev. Years->Weeks Runs. Weeks -> Hours Generalised ENM data mapping and overlaying pipelines. Workflow-based Computation
  12. 12. 15 #SummerSchool 24-Jun-13 VPH-Share @neurist Aneurysm Morphology Workflow P a t ie n t P s e u d o id e n t ifi e r (P ID ) D e m o g r a p h ic s H e ig h t W e ig h t V it a l S ig n s H e a r t R a t e B lo o d P r e s s u r e F lo w R a t e T r a n s ie n t P r e s s u r e A n e u r y s m P r o p e r t ie s T is s u e P r o p e r t ie s W a ll T h ic k n e s s R is k F a c t o r s M e d ic a l Im a g e s M e d ic a t io n s Patients Patient Avatar Disease Simulation Work ofl w Systemic Factors Gene Expression Pro lfie P a t ie n t P s e u d o id e n t ifi e r (P ID ) D e m o g r a p h ic s H e ig h t W e ig h t V it a l S ig n s H e a r t R a t e B lo o d P r e s s u r e F lo w R a t e T r a n s ie n t P r e s s u r e A n e u r y s m P r o p e r t ie s T is s u e P r o p e r t ie s W a ll T h ic k n e s s R is k F a c t o r s M e d ic a l Im a g e s M e d ic a t io n s A n e u ry sm R u p tu r e P ro fi le M o rp h o lo g y P r o fi le H a e m o d y n a m ic P r o fi le M e c h a n o b io lo g ic a l P r o fi le P re d ic tio n U n c e rta in ity Patient Avatar Updated RISK Patients Patient Avatar Disease Simulation Workflow Patient Avatar updatedSystemic Factors Gene Expression Profile RISK [Susheel Varma] http://www.vph-share.eu/
  13. 13. • Morphological, hemodynamic and structural analyses have been linked to aneurysm genesis, growth and rupture. • Evidence indicating differences in morphology and flow between ruptured and unruptured aneurysms have been shown for reduced patient cohorts. • Structural wall mechanics has been used to justify the growth and remodelling happening at the aneurysm level. Confidence in physical measures + images + BC, material + BC, material Morphological analysis Direct diagnostic power + Morphological descriptors Structural descriptors Hemodynamic descriptors Haemodynamic analysis Structural analysis Practically, morphological characterizations might currently have the highest predictive capabilities with respect to the other analyses. Morphological Workflow [Susheel Varma]
  14. 14. Medical image from imaging equipment @neurIST morphological descriptors Complex indices (Zernike moment invariants) Basic size indices describing aneurysm sac depth neck Morphological Analysis Workflow [Susheel Varma]
  15. 15. Implementation in VPH-Share The @neurIST morphological workflow specification in Taverna [Susheel Varma]
  16. 16. Biodiversity marine monitoring and health assessment ecological niche modelling Data Intensive Science Collaborative Science Pilumnus hirtellusEnclosed sea problem (Ready et al., 2010) Sarah Bourlat
  17. 17. Ecological Niche Modeling . Step 1: Explorative modeling -Use unfiltered data -Use fixed parameters: Mahalonobis distance -Native projections -Test the model, distribution of points, number of points Step 2: Deep modeling -Filtering environmentally unique points with BioClim algorithm -ENM with Support Vector Machine and Maximum Entropy -Parameter optimization (if necessary) on the model test results -2 masks (model generate, model project) Data discoveryData discovery Data assembly, cleaning, and refinement Data assembly, cleaning, and refinement Ecological Niche Modeling Ecological Niche Modeling Statistical analysisStatistical analysis Analytical cycle Pilumnus hirtellusEnclosed sea problem (Ready et al., 2010) The workflows work over large geographical, taxonomic, and environmental scales, incl. terrestrial ecosystems Baltic species invasions of various crabs/sea creatures Interactions of different forest insects and trees
  18. 18. Ecological Niche Modeling . Step 1: Explorative modeling -Use unfiltered data -Use fixed parameters: Mahalonobis distance -Native projections -Test the model, distribution of points, number of points Step 2: Deep modeling -Filtering environmentally unique points with BioClim algorithm -ENM with Support Vector Machine and Maximum Entropy -Parameter optimization (if necessary) on the model test results -2 masks (model generate, model project) Data discoveryData discovery Data assembly, cleaning, and refinement Data assembly, cleaning, and refinement Ecological Niche Modeling Ecological Niche Modeling Statistical analysisStatistical analysis Analytical cycle Pilumnus hirtellusEnclosed sea problem (Ready et al., 2010) The workflows work over large geographical, taxonomic, and environmental scales, incl. terrestrial ecosystems Baltic species invasions of various crabs/sea creatures Interactions of different forest insects and trees BioSTIF
  19. 19. www.biovel.eu Ecological Niche Modeling Workflow (ENM)
  20. 20. data configuration parameters steps Data and Parameter Sweeps
  21. 21. Hosted installation Local installations
  22. 22. Taverna: a Knowledge Discovery Framework •Asthma sputum inflammatory phenotypes, a transcriptome analysis, Saeedeh Maleki-Dizaji, Chris Newby, Rachid Berair, Rod Smallwood , Chris Brightling 2014 (to be submitted) •A systematic approach to a transcriptome analysis to asthma sputum inflammatory phenotypes ISMB 2014. •The Battle of the Sexes starts in the oviduct : modulation of oviductal transcriptome by X and Y-bearing spermatozoa: Almiñana C, Caballero I, Heath PR, Maleki-Dizaji S, Parrilla I, Cuello C, Gil MA, Vazquez JL, Vazquez JM, Roca J, Martinez EA, Holt WV and Fazeli A. submitted to BMC Genomics 2014 ,(In Press) •transcription regulation network involving E2F6, IRF7 and STAT1, Thomas R.J. Lovewella ,Andrew J.G. McDonaghb, Andrew G Messengerb, Saeedeh Maleki- Dizaji, Mimoun Azzouzd and Rachid Tazi-Ahniniaformation submitted to PNAS, 2014 •Kiran, M., Bicak, M., Maleki-Dizaji, S., Holcombe, M. FLAME: A Platform for High Performance Computing of Complex Systems. Journal of Acta Physica Polonica 2011. •Maleki-Dizaji S, Holcombe M, Rolfe MD, Fisher P, Green J, Poole RK, Graham AI, A Systematic Approach to Understanding Escherichia coli Responses to Oxygen: From Microarray Raw Data to Pathways and Published Abstracts, Online J Bioinformatics, (1):51-59, 2009 [Saeedeh Maleki-Dizaji]
  23. 23. Application Runtime Middleware Resources/Codes/Services Infrastructures Repositories Execution Activity Plug-ins Application Scufl Runtime Middleware Resources/Codes/Services Platforms Repositories Taverna Desktop Workbench Taverna Online Web Tool Portals and Applications Engine Server Player Cmd line Provenance Third Party Servers BioSTIF Workflows & workflow components PROV, OPM Data Provenance Registries
  24. 24. Taverna Workflow Management Open extensibility • Plug-in framework – Command line tool – Data Services: VOTables for AstroTaverna – Optimisations: E.g. Holl. model parameter sweeps – Infrastructures: Grid, HPC, Web Services – Domains: CDK, BioMart, VOTable – Commodities: Excel Spreadsheets, Open Refine, R • Plug into other frameworks & platforms – Portals: Scratchpads – Interactive platforms: iPython Notebook – Wfms: KNIME Node, Galaxy tool, Kepler Actor • Third party applications – Taverna Online – XworX – OGC chainer
  25. 25. Taverna Online: 3rd party app Dr Vadim Surpin and Vitaly Sharanutsa, Institute for Information Transmission Problems of Russian Academy of Sciences (IITP RAS) An online, in-browser application for assembling and running Taverna Workflows over a HPC platform http://onlinehpc.com/site/main
  26. 26. Interoperability: Data format/identity mismatches Service interface handling Components: Well described, behaved, curated, annotated modularised workflow modules • Semantic annotations, prescribed failover, formats, provenance • Organised into common families
  27. 27. Taverna Directions AccessAccess Framework to access and leverage heterogeneous legacy applications, services, datasets and codes. Shielding from complexity. CustomiseCustomise Rapid development: Flexibility, Extensibility, Adaptability, Reuse. Reusable Workflow Components ProcessProcess Automated plumbing + Interaction Systematic, repetitive and unbiased analysis and processing and error handling Ensembles, comparisons, “what ifs” CustomiseCustomise Rapid development: Flexibility, Extensibility, Adaptability, Reuse. Reusable Workflow Components ProcessProcess Automated plumbing + Interaction Systematic, repetitive and unbiased analysis and processing and error handling Ensembles, comparisons, “what ifs” CustomiseCustomise Rapid development: Flexibility, Extensibility, Adaptability, Reuse. Reusable Workflow Components AccessAccess Cloud and Scale, Registries Standards data formats, programmatic interfaces. Adapting to change. Security. Governance of components ProcessProcess Seamless, pluggable wf as a service. Scale. Adaptability. Specific-Generic tension. Easier development, user experience Workflow commodities, Research Objects Design practices for reuse. Credit Executable interactive notebooks. Provenance A tool for reproducibility ReportReport EmbedEmbed Workflows in common applications Integration into reporting & publishing Underpin integrative platforms. Service based science and science as a service
  28. 28. Fix on demand. Notify as needed. Monitor for decay Workflow/Service Monitors 3rd Party Monitors Workflow analytics Detect and Repair QUASAR toolkit [Zhao et al. Why workflows break e-Science 2012]
  29. 29. The Execution Provenance Gap Data tracking Summarisation, Labelling, Distillations, Selective tracking Filtering Big Fine grain 1 White box One System Special tools Collection A Big Graph What do I cite? What did I do? N Black boxes Many Systems My Lab Book Analytics Smart in situ Presentation Why am I citing? Pinar Alper, Khalid Belhajjame, Carole A. Goble, Pinar Karagoz: Enhancing and abstracting scientific workflow provenance for data publishing. EDBT/ICDT Workshops 2013: 313-318 Sarah Cohen Boulakia, Jiuqiang Chen, Paolo Missier, Carole A. Goble, Alan R. Williams, Christine Froidevaux: Distilling structure in Taverna scientific workflows: a refactoring approach. BMC Bioinformatics 15(S-1): S12 (2014) http://provenanceweek.dlr.de
  30. 30. Tracking Provenance File Stores Lab Books Repositories • Granularity • Scales • Blackbox • Hybrid
  31. 31. Research Objects • Bundles and relates multi-hosted digital resources of a scientific experiment or investigation using standard mechanisms • Descriptive reproducibility • Exchange, Releasing paradigm for publishing http://www.researchobject.org/ http://www.researchobject.org/
  32. 32. Flexibility Review, Revise/Discard Scale Deploy into tools Comparison Personal Group Production Research Reporting Harden
  33. 33. http://nbviewer.ipython.org/github/myGrid/DataHackL eiden/blob/alan/Player_example.ipynb https://www.youtube.com/watch?v=QVQwSOX5S08 ?
  34. 34. Archiving Publishing Component Libraries Preserving Recording Storing Exchanging Versioning Sharing PACKS
  35. 35. SEEK4Science Sharing and interlinking Methods, Models, Data… Data Model Article External Databases Metadata
  36. 36. Virtual Liver Network BMBF “Großprojekt“• ~45 organisations, ~70 groups • multiscale rep. of the liver • clinical impact • general public portal 47  Same key requirements: yellow pages, exchange of all sops/data/models, sharing rights  Different biology • Multiscale data • Multiscale models • Imaging  Different project structure • Hierarchies (A, A1, A1.2) • Regional groups of groups  Flexibility, extensibility, open sourceness of SEEK key
  37. 37. simulate models project mgt, access control reporting, citation governance & policies yellow pages of peers projects, experts catalogue and link data, models, samples, specimens, sops, experiments, publications using standards curate & annotate data and models using standards access, link to and deposit in public data and model repositories manage, store and exchange different types and scales of data integrate local and project tools and data systems scaled-out collection & processing
  38. 38. experimentalists, modellers, X- informaticians, computational Xs, software engineers, computer scientists, systems administrators, resource providers, tool builders social scientists, librarians, curators Social Computation Storing, Sharing and Reusing data, methods, models, between collaborating and competing scientists e-Laboratories, collaboratories, VREs, repositories An ego-system
  39. 39. Computer Scientist Software Engineer Social Engineer
  40. 40. Knowledge Computation •Accurate, intelligible and comparable descriptions •Data interoperability •Machine readable metadata Semantic technologies, Ontologies, Linked Data, Data schema
  41. 41. Semantic Description Describing and linking data in terms of shared concepts, relationships and identifiers Data object property data property subClassOf Ontology Person Organization Place State name birthdate bornIn worksFor state name phone name livesIn City Event ceo location organizer nearby startDate endDate title isPartOf postalCode Column 1 Column 2 Column 3 Column 4 Column 5 Bill Gates Oct 1955 Microsoft Seattle WA Mark Zuckerberg May 1984 Facebook White Plains NY Larry Page Mar 1973 Google East Lansing MI [Taheriyan et al adapted]
  42. 42. Curation Knowledge Ramps Populous http://www.rightfield.org.uk Katy Wolstencroft
  43. 43. Pathways Pharmacological Activities Biological Processes Transcripts Pathological Processes Diseases Genes Proteins Interactions Clinical Drug Applications Indications Drugs Compounds Pharmacological data for drug discovery combining public and private datasets Pre-competitive silo-breaking for competitive analytics
  44. 44. Pathways Pharmacological Activities Biological Processes Transcripts Pathological Processes Diseases Genes Proteins Interactions Clinical Drug Applications Indications Drugs Compounds “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” “What is the selectivity profile of known p38 inhibitors?” “Let me compare MW, logP and PSA for known oxidoreductase inhibitors” Broad data: combining public and private datasets
  45. 45. NanopubNanopub DbDb VoIDVoID Data Cache (Virtuoso Triple Store) Semantic Workflow EngineSemantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Chemistry Registration Normalisation & Q/C Identifier Management Service Indexin g CorePlatformCorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” VoIDVoID DbDb NanopubNanopub DbDb VoIDVoID DbDb VoIDVoID NanopubNanopub VoIDVoID Public Content Commercial Public Ontologies User Annotations Apps ChemBio Navigator Target Dossier Pipeline Pilot Under the hood
  46. 46. Strict Relaxed Analysing Browsing Dynamic Equality skos:closeMatch (Drug Name) skos:closeMatch (Drug Name) skos:exactMatch (InChI)
  47. 47. CS Research Software Engineering Science Engage Delivery & Support 2001- 2006
  48. 48. CS Research Software Engineering Science Engage Delivery & Support 2006 -today
  49. 49. “Startup-Like” Balance Innovation with Usefulness
  50. 50. Software Engineering Research Software Engineers. Sustainable software.
  51. 51. Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a Computational science: ...Error…why scientific programming does not compute.
  52. 52. Training • Training infrastructure • Scalable training approaches • Review needs • Coordinate activities and materials • Liaise with Nodes and Hub
  53. 53. Data-centric Computation Scientific workflows over Distributed Cyber-Infrastructure. Data sharing Social Methods libraries and catalogues for all types of scientific artefacts and all types of scientists. Knowledge Management Metadata, semantics digital exchange, preservation, publishing Software Engineering Software sustainability, software and data policy, training Products Methods Systems Biology Chemistry Astro-Physics Astronomy Biology Social Science Library Digital Preservation Biodiversity Public Health Applications
  54. 54. Lemberger T Mol Syst Biol 2014;10:715 ©2014 by European Molecular Biology Organization Born Reproducible | Exchangeable | Reusable Rich descriptions Open & Available Transparent Method Re-executable
  55. 55. • myGrid – http://www.mygrid.org.uk • Taverna – http://www.taverna.org.uk • myExperiment – http://www.myexperiment.org • BioCatalogue – http://www.biocatalogue.org • SEEK and SysMO-SEEK – http://www.seek4science.org – http://seek.sysmo-db.org • RightField – http://www.rightfield.org.uk • BioVeL – http://www.biovel.eu • Wf4ever – http://www.wf4ever-project.org • Research Object – http://www.researchobject.org • Software Sustainability Institute – http://www.software.ac.uk

×