Invited talk @ ESIP summer meeting, 2009

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Taverna, Biocatalogue, myExperiment, and the provenance of it all: forward-looking while looking back Scientific Workflow Management System

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

What is the myGrid Project? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],E. Science laboris ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

Workflows: E. Science laboris ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

[object Object],[object Object],[object Object],Workflows: E. Science laboris ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

Separate the “workflow” from the application ,[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Retrieves a protein sequence in Fasta format from Genbank and then performs a BLAST on that sequence

Workflow as data integrator ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier QTL genomic regions genes in QTL metabolic pathways (KEGG)

Data-driven computation in Taverna ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier [ path:mmu04010 MAPK signaling, path:mmu04370 VEGF signaling ] [ [ path:mmu04210 Apoptosis, path:mmu04010 MAPK signaling, ...], [ path:mmu04010 MAPK signaling , path:mmu04620 Toll-like receptor, ...] ] List-structured KEGG gene ids: [ [ mmu:26416 ], [ mmu:328788 ] ] geneIDs pathways • • • • • • • • • • geneIDs pathways • • • • • • • • • •

Integration platform Just in Time and Just in Case Interoperability ,[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

Integration platform Just in Time Interoperability ,[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

What do Scientists use Taverna for? ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Systems biology model building Proteomics Sequence analysis Protein structure prediction Gene/protein annotation Microarray data analysis QTL studies QSAR studies Medical image analysis Public Health care epidemiology Heart model simulations High throughput screening Phenotypical studies Phylogeny Statistical analysis Text mining Astronomy, Music, Meteorology Netherlands Bioinformatics Centre Genome Canada Bioinformatics Platform BioMOBY US FLOSS social science program RENCI SysMO Consortium French SIGENAE farm animals project ThaiGrid CARMEN Neuroscience project SPINE consortium EU Enfin, EMBRACE, BioSapian, Casimir EU SysMO Consortium NERC Centre for Ecology and Hydrology Bergen Centre for Computational Biology Max-Planck institute for Plant Breeding Research Genoa Cancer Research Centre AstroGrid 30 USA academic and research institutions

ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier 200 Genotype Phenotype Metabolic pathways Literature [Paul Fisher]

Hypothesis Construction and Explanation from the Literature my BioAID ,[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

Genome-wide SNP data sets analysis ,[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

WaaS: Workflows as a Service ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier [Pettifer, Kell, University of Manchester] inside

Workflows operating over Grid Infrastructure ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier http://www.knowarc.eu KnowARC integrated with Taverna" application prototype to use Taverna as direct interface to Grid resources running ARC. Open source grid software infrastructure aimed at enabling multi-institutional data sharing and analysis. Underpins caBIG. Taverna links together caGrid resources. http://cagrid.org / http://www.eu-egee.org/ Europe’s leading grid computing project, Piloted Taverna over EGEE gLite services

ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Provisioning Workflows Appln Service Appln Service Users Workflows Composition Incorporation Invocation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Foster 2005] Workflows

caBIG cancer cyberinfrastructure uses Taverna to link services ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier A sample caGrid workflow for microarray analysis, using caArray, GenePattern and geWorkbench [Ravi Madduri] Orchestrating caGrid Services in Taverna Wei Tan, Ravi Madduri, Kiran Keshav, Baris E. Suzek, Scott Oster, Ian Foster, Proc IEEE Intl Conf on Web Services (ICWS 2008)

Who else is in this space? ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Kepler Triana BPEL Ptolemy II Taverna Trident BioExtract

An Ecosystem of Workflow Management Systems (WFMS) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Workflow-based experimentation lifecycle ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Run Analyse Results Publish Develop Collect and query provenance metadata

Taverna ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Graphical Workbench For Professionals Plug-in architecture Nested Workflows Drag and Drop Wiring together Rapidly incorporate new service without coding. Not restricted to predetermined services Access to local and remote resources and analysis tools 3500+ service operations available when start up

Services Mutability implications for sustainability, accountability and reproducability ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

BioCatalogue ,[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier 28 April 2009, Boston MA Professor Carole Goble University of Manchester, UK Director myGrid Consortium

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The short story

Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

A virtual circle ,[object Object],[object Object],curation usage trust annotators content

ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Curation Model Versioning Quantitative Content Tags Service Model Semantic Content Ontologies Functional Capabilities Provenance Operational Capabilities Operational Metrics Usage Policy Community Standing Ratings Usage Statistics Attribution Free text Searching Statistics Usable and Useful Understandable Controlled vocabs Interfaces

Curation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Today: 14902 annotations (provider, user, registries) KEGG: 1433 annotations

Who? ,[object Object],[object Object]

Scaling up along the social dimension ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier What ? Where ? Why ? Who ? How ? Crossing the boundaries of individual investigation Develop Run Analyze Publish Develop Run Analyze Publish

Scientific collaboration ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Source: Andrea Wiggins , talk given at the School of Computer Science, University of Manchester, UK, June 18th, 2009

ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Source: Andrea Wiggins , talk given at the School of Computer Science, University of Manchester, UK, June 18th, 2009

The Selfish (or Self-interested) Scientist ,[object Object],Mike Ashburner and others Professor Genetics, University of Cambridge,K “ Data mining: my data’s mine and your data’s mine”

The potential for collaboration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

Asymmetric vs symmetric sharing ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Traditional sharing is asymmetric: Open science is symmetric: myGrid combines both paradigms:

Publishing for collaboration ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Run Analyse Results Publish Develop Collect and query provenance metadata Design-time reuse: Composition from existing workflows Runtime reuse: Workflows as services compare results across versions foster virtual scientific communities provenance exchange and interoperability the OPM experiment

Collaboration in the workflow space ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier What ? Where ? Why ? Who ? How ? Develop Run Analyze Publish Develop Run Analyze Publish

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Competitive advantage. Academic vanity. Adoption. Reputation. Scrutiny. Being scooped. Misinterpretation. Reputation. Rewards Fears

[object Object],[object Object],[object Object],[object Object]

Incentive and reputation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Distinct myExperiment communities ,[object Object],[object Object],[object Object],[object Object],+

Reuse, Recycling, Repurposing Cross-fertilization ,[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],?

[object Object],Socially share, discover and reuse workflows and other methods. Cooperative bazaar. www.myexperiment.org

Just Enough Sharing Credit and Attribution

Authoring workflows for reuse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Authoring workflows for reuse ,[object Object],[object Object]

Workflows and Services Curation by Experts Social Curation by the Crowd refine validate refine validate Self-Curation by Contributors seed seed refine validate seed refine validate seed Automated Curation

Three chief user groups ,[object Object],[object Object],[object Object],Contributors Members Anon. Users

Technical implications of open science ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

Provenance of data ,[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Luc Moreau, Paul Groth, Simon Miles, Javier Vazquez-Salceda, John Ibbotson, Sheng Jiang, Steve Munroe, Omer Rana, Andreas Schreiber, Victor Tan, Laszlo Varga, The provenance of electronic data, Communications of the ACM, Vol. 51 No. 4, Pages 52-58

Analysis of process results ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

Example: inverse data associations ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier List-structured KEGG gene ids: [ [ mmu:26416 ], [ mmu:328788 ] ] [ path:mmu04010 MAPK signaling, path:mmu04370 VEGF signaling ] [ [ path:mmu04210 Apoptosis, path:mmu04010 MAPK signaling, ...], [ path:mmu04010 MAPK signaling , path:mmu04620 Toll-like receptor, ...] ] geneIDs pathways • • • • • • • • • • geneIDs pathways • • • • • • • • • •

Taverna + provenance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

Forms of provenance ... ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Focus is on the data: the observable outcomes of a process raw provenance metadata provenance metadata + interpretation framework design ,[object Object],[object Object],[object Object],[object Object],[object Object],execution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

...and their uses and associated challenges ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier fully implemented in Taverna 2 raw provenance metadata provenance metadata + interpretation framework design ,[object Object],[object Object],[object Object],execution ,[object Object],[object Object],[object Object],[object Object],[object Object],to be released in Sept. 2009

Querying provenance traces ,[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier [ p1, ....] [ g1, ....]

Naive provenance trace queries ,[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Z. Bao and S. Cohen-Boulakia and S. Davidson and A. Eyal and S. Khanna, Differencing Provenance in Scientific Workflows , Procs. ICDE, 2009

Querying provenance graphs in Taverna ,[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier This results in a more efficient lineage query algorithm that scales to large provenance graphs Example: BACKTRACE (paths_per_gene[3,4], paths_per_gene[1,2]) AT get_pathway_by_genes AND commonPathways[1] AT TOP

Provenance management architecture ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Taverna runtime process design Provenance capture component ,[object Object],[object Object],[object Object],[object Object],relational data model Lineage query processor workflow results workflow inputs Provenance DB Results browser Results analysis Provenance browser

Provenance interoperability for open science OPM ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Develop Run Analyze Publish Develop Run Analyze Publish OPM: the Open Provenance Model

The Science Lifecycle scientists Graduate Students Undergraduate Students experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs, ... Digital Libraries Next Generation Researchers Adapted from David De Roure’s slides Local Web Repositories Virtual Learning Environment Technical Reports Reprints Peer-Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses

Finding the Provenance of research outputs across all the systems data transited through scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer-Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs, ... Digital Libraries Next Generation Researchers

Provenance Across Applications Local provenance stores Adapted from Luc Moreau’s slides: “The Open Provenance Model” (Univ. of Southampton,UK), 2009 Application Application Application Application Application Provenance Inter-Operability Layer import from OPM export to OPM

Illustration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],P used(divisor) used(dividend) wasGeneratedBy(rest) wasGeneratedBy(quotient) type=division From Luc Moreau’s slides set: “The Open Provenance Model” (Univ. of Southampton,UK), 2009 A1 A2 A3 A4

Integrated OPM generation in Taverna ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

More information on OPM ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

From knowledge capture to exploitation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

The Provenir ontology ,[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier Satya S. Sahoo, Roger S. Barga, Jonathan Goldstein, Amit P. Sheth, Where did you come from...Where did you go?” An Algebra and RDF Query Engine for Provenance , TR-2009-03, Kno.e.sis Center, CSE Dept., Wright State University, Dayton, OH, March, 2009

Upcoming events ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier SWPM 2009: The First International Workshop on the Role of Semantic Web in Provenance Management http://wiki.knoesis.org/index.php/SWPM-2009 Co-located with ISWC'09, October 25/26 2009, Washington D.C., USASubmission Deadline: Friday, July 31, 2009 Special issue of Future Generation Computer Systems Journal (FGCS) on the third provenance challenge (to be announced) expected deadline: Dec., 2009

Summary: Support for collaborative science ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Selected literature on provenance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier

Invited talk @ ESIP summer meeting, 2009

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Invited talk @ ESIP summer meeting, 2009

Similar to Invited talk @ ESIP summer meeting, 2009 (20)

More from Paolo Missier

More from Paolo Missier (20)

Recently uploaded

Recently uploaded (20)

Invited talk @ ESIP summer meeting, 2009

Editor's Notes