Your SlideShare is downloading. ×
0
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Session talk @ AGU09
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Session talk @ AGU09

426

Published on

Presentation at the AGU'09 Fall Meeting, San Francisco, CA, Dec. 2009

Presentation at the AGU'09 Fall Meeting, San Francisco, CA, Dec. 2009

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
426
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Scientific Workflow Management System Janus Provenance Towards
systema-c
informa-on
exchange
 and
reuse
in
e‐laboratories AGU
Fall
mee-ng,
Dec.
2009 Paolo Missier Information Management Group School of Computer Science, University of Manchester, UK with additional material by Sean Bechhofer and Matthew Gamble, e-Labs design group, University of Manchester AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 2. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) http://www.nature.com/news/specials/datasharing/index.html 2 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 3. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) • timeliness requires rapid sharing • repurposing • the Human Genome project use case http://www.nature.com/news/specials/datasharing/index.html 2 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 4. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) • timeliness requires rapid sharing • repurposing • the Human Genome project use case http://www.nature.com/news/specials/datasharing/index.html • Debate is much further along in Earth Sciences – ESIP - data preservation / stewardship, 2009 – Long established in some communities - Atmospheric sciences, 1998 [1] • Science Commons recommendations for Open Science – (July 2008) [link] [1] Strebel DE, Landis DR, Huemmrich KF, Newcomer JA, Meeson BW: The FIFE Data Publication Experiment. Journal of the Atmospheric Sciences 1998, 55:1277-1283 2 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 5. Collaboration in workflow-based science workflow workflow + execution input dataset specification AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 6. Collaboration in workflow-based science workflow workflow + execution input dataset specification outcome outcome (provenance) (data) AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 7. Collaboration in workflow-based science workflow workflow + execution input dataset specification outcome outcome (provenance) (data) Research Object Packaging AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 8. Collaboration in workflow-based science workflow workflow + execution input dataset specification outcome outcome (provenance) (data) Research Object Packaging AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 9. Collaboration in workflow-based science workflow workflow + execution input dataset specification outcome ul outcome Pa (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 10. Collaboration in workflow-based science workflow workflow + execution input dataset specification Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 11. Collaboration in workflow-based science What is needed for Paul to make sense of third party data? Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 12. Collaboration in workflow-based science What is needed for Paul to make sense of third party data? Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 13. Collaboration in workflow-based science What is needed for Paul to make sense of third party data? Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 14. Collaboration in workflow-based science What is needed for Paul to make sense of third party data? Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 15. Paul’s
 Paul’s
Pack QTL Research
 Object Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 16. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Results Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 17. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Representation Results Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 18. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Representation Results Domain Relations Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 19. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper produces Published in Representation Results Domain Relations Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 20. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper Metadata produces Published in Representation Results Domain Relations Aggregation Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 21. ORE: representing generic aggregations Resource Map Data structure (descriptor) http://www.openarchives.org/ore/1.0/primer.html section 4 A. Pepe, M. Mayernik, C.L. Borgman, and H.V. Sompel, "From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web," Journal of the American Society for Information Science and Technology (JASIST), to appear, 2009. AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 22. AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 23. Content: Workflow provenance A detailed trace of workflow execution - tasks performed, data transformations - inputs used, outputs produced AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 24. Content: Workflow provenance A detailed trace of workflow execution - tasks performed, data transformations - inputs used, outputs produced AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 25. Content: Workflow provenance A detailed trace of workflow execution lister - tasks performed, data transformations get pathways by genes1 - inputs used, outputs produced merge pathways gene_id concat gene pathway ids output pathway_genes AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 26. Why provenance matters, if done right • To establish quality, relevance, trust • To track information attribution through complex transformations • To describe one’s experiment to others, for understanding / reuse • To provide evidence in support of scientific claims • To enable post hoc process analysis for improvement, re-design The W3C Incubator on Provenance has been collecting numerous use cases: http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases# AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 27. What users expect to learn • Causal relations: - which pathways come from which genes? - which processes contributed to producing an lister image? - which process(es) caused data to be incorrect? get pathways by genes1 - which data caused a process to fail? merge pathways • Process and data analytics: – analyze variations in output vs an input gene_id parameter sweep (multiple process runs) – how often has my favourite service been concat gene pathway ids executed? on what inputs? – who produced this data? output – how often does this pathway turn up when the input genes range over a certain set S? pathway_genes 9 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 28. Open Provenance Model • graph of causal dependencies involving data and processors • not necessarily generated by a workflow! • v1.1 out soon wasGeneratedBy (R) A P Goal: used (R) P A standardize causal dependencies to enable provenance metadata exchange wgb(R5) A1 wgb(R1) used(R3) A3 P1 P3 wgb(R6) A2 wgb(R2) used(R4) A4 P2 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 29. Additional requirements on OPM • Artifact values require uniform common identifier scheme – Linked Data in OPM? • OPM accounts for structural causal relationships – additional domain-specific knowledge required – attaching semantic annotations to OPM graph nodes • OPM graphs can grow very large – reduce size by exporting only query results • Taverna approach – multiple levels of abstraction • through OPM accounts (“points of view”) AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 30. Additional requirements on OPM • Artifact values require uniform common identifier scheme – Linked Data in OPM? • OPM accounts for structural causal relationships – additional domain-specific knowledge required – attaching semantic annotations to OPM graph nodes • OPM graphs can grow very large – reduce size by exporting only query results • Taverna approach – multiple levels of abstraction • through OPM accounts (“points of view”) AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 31. Query results as OPM graphs prov(W) execute W run W query Q export Q(prov(W)) OPM(Q(prov(W))) prov(WA) Q(prov(W)) - Approach implemented in the Taverna 2.1 workflow system - Internal provenance DB with ad hoc query language Just released! AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 32. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 33. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 34. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A result A → input B AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 35. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A workflow B+ input B Research Object result exp. B result B provenance result A → input B datasets B B AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 36. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 37. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B Provenance composition accounts for implicit collaboration AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 38. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B Provenance composition accounts for implicit collaboration Aligned with focus of upcoming Provenance Challenge 4: “connect my provenance to yours" into a whole OPM provenance graph. - P.Missier AGU Fall meeting, San Francisco, Dec. 2009
  • 39. Contacts The myGrid Consortium (Manchester, Southampton) http://mygrid.org.uk http://www.myexperiment.org Janus Me: pmissier@acm.org Provenance AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier

×