Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Credible workshop


Published on

Presentation of Research Objects given at the CrEDIBLE workshop

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Credible workshop

  1. 1. Research Objects: Preserving Scientific Workflows and Provenance Khalid Belhajjame Université Paris Dauphine 1
  2. 2. Storyline • Why Research Objects? • Overview of the Research Objects • Portfolio of Research Object Management Tools • Provenance Distillation Through Workflow Summarization 2
  3. 3. Electronic papers are not enough 3 Electronic paper
  4. 4. Electronic papers are not enough 4 Research Object Datasets Results Scientists Hypothesis Experiments Annotations Provenance Electronic paper
  5. 5. Benefits Of Research Objects • A research object aggregates all elements that are necessary to understand research investigations. • Methods (experiments) are viewed as first class citizens • Promote reuse • Enable the verification of reproducibility of the results 5
  6. 6. Reproducibility a principle of the scientific method normal people and scientist 6
  7. 7. 47 of 53 “landmark” publications could not be replicated Inadequate cell lines and animal models Nature, 483, 2012 Credit to Carole Goble JCDL 2012 Keynote 7
  8. 8. Fig. 2. Overview of the Research Object Model with the different ontologies that encode it: Core RO for describing the basic Research Object structure, roevo for tracking Research Object evolution and wfprov and wfdesc for describing workflows and Overview of the Research Object Model 8
  9. 9. Research Object as an ORE Aggregation 9 Fig. 3. Research Object as an ORE aggregation. to annotation; and ao: Body, which comprises a description of the target. Research Objects use annotations as a means for decorating a resource (or a set of resources) with metadata information. The body is specified in the @base <ht t p : / / exampl e. com/ r o/ 389/ > . @pr ef i x dct : <ht t p: / / pur l . or g/ dc / t er ms / > @pr ef i x or e: <ht t p : / / www. openar chi ves . or g/ or e/ t er ms
  10. 10. Scientific Workflows • Data driven analysis pipelines • Systematic gathering of data and analysis tools into computational solutions for scientific problem-solving • Tools for automating frequently performed data intensive activities • Provenance for the resulting datasets – The method followed – The resources used – The datasets used 10
  11. 11. PROV Primer, Gil et al WF Execution Trace Retrospective Provenance: Actual data used, actual invocations, timestamps and data derivation trace WF Description Prospective Provenance: Intended method for analysis 11
  12. 12. Specifying Workflows using WfDESC 12 Fig. 5. T he wfdesc ontology and its relation to PROV-O. 21
  13. 13. Specifying Workflow Provenance using WfPROV 13 Fig. 7. T he wfprov ontology and its relationship to PROV-O.
  14. 14. Portfolio of Research Object Tools 14
  15. 15. DEMO 15
  16. 16. Workflows can get complex! • Overwhelming for users who are not the developers • Abstractions required for reporting • Lineage queries result in very long trails 16
  17. 17. Overall Approach 17 Workflow Designer Taverna Workbench Motif WF Summary WF Description Summarizer Summarization Rules
  18. 18. PART-1: Scientific Workflow Motifs • Domain Independent categorization – Data-Oriented Nature – Resource/Implementation- Oriented Nature • Captured In a lightweight OWL Ontology 18
  19. 19. Motif annotations over operations motifs(color_pathway_by_objects) = {m1:DataRetrieval} motifs(Get_Image_From_URL_2) = {m2:DataMoving} 19 DataRetrieval DataMovingl
  20. 20. PART-2: Workflow reduction primitives • Collapse (Up/Down) • Compose • Eliminate 20
  21. 21. Eliminate 21
  22. 22. Two sample strategies • By-Elimination – Minimal annotation effort – Single rule • By Collapse – More specific annotation – Multiple rules 22
  23. 23. By-Collapse23
  24. 24. By-Elimination24
  25. 25. Analysis Data Set • 30 Workflows from the Taverna system • Entire dataset & queries accessible from • Manual Annotation using Motif Vocabulary 25
  26. 26. Mechanistic Effect of Summarization 26
  27. 27. User Summaries vs. Summary Graphs 27
  28. 28. Highlights • Research Object model and associated management tools • Annotations of Workflow Using Motifs • Methods for Summarizing Workflow and distilling their provenance traces • Algorithms for Repairing Workflows • Validation of the workflow summarization • Querying of Workflow Execution Provenance using summaries. 28 Ongoing Work
  29. 29. References • P Alper, K Belhajjame, C Goble, P Karagoz. Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotations. IEEE International Congress on Big Data, 2013 • Pinar Alper, Khalid Belhajjame, Carole A. Goble, Pinar Karagoz: Enhancing and abstracting scientific workflow provenance for data publishing. EDBT/ICDT Workshops 2013. • Belhajjame K, Corcho O, Garijo D, et al. Workflow-Centric Research Objects: A First Class Citizen in the Scholarly Discourse. In proceedings of the ESWC2012 Workshop on the Future of Scholarly Communication in the Semantic Web (SePublica2012), Heraklion, Greece, May 2012 • Belhajjame K, Zhao J, Garijo D, et al. The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web, submitted to the Journal of Web Semanics. • Daniel Garijo, Pinar Alper, Khalid Belhajjame, Óscar Corcho, Yolanda Gil, Carole A. Goble: Common motifs in scientific workflows: An empirical analysis. eScience 2012. • Zhao J, Gómez-Pérez JM, Belhajjame K, Klyne G, et al. Why workflows break - Understanding and combating decay in Taverna workflows. IEEE eScience 2012. 29
  30. 30. Acknowledgement
  31. 31. Acknowledgement EU Wf4Ever project (270129) funded under EU FP7 (ICT- 2009.4.1). (