Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OeRC Seminar


Published on

Slides from a seminar at Oxford e-Research Centre, 23rd Feb, 2012.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

OeRC Seminar

  1. 1. Who am I? Sean Bechhofer University of Manchester @seanbechhofer 1
  2. 2. Who am I? Sean Bechhofer University of Manchester @seanbechhofer 2
  3. 3. Who am I? Sean Bechhofer University of Manchester @seanbechhofer 3
  4. 4. Who am I? Sean Bechhofer University of Manchester @seanbechhofer 4
  5. 5. Who am I? Sean Bechhofer University of Manchester @seanbechhofer 5
  6. 6. Who am I? Sean Bechhofer University of Manchester @seanbechhofer 6
  7. 7. Who am I? Sean Bechhofer University of Manchester @seanbechhofer 7
  8. 8. Who am I? Sean Bechhofer University of Manchester @seanbechhofer 8
  9. 9. Who am I? Sean Bechhofer University of Manchester @seanbechhofer 9
  10. 10. Research Objects: TowardsExchange and Reuse of Digital Knowledge Sean Bechhofer University of Manchester @seanbechhofer 10
  11. 11. Publication •  Argumentation: Convince the reader of the validity of a position [Mesirov] –  Reproducible Results System: facilitates enactment and publication of reproducible research. J. Mesirov Accessible Reproducible Research Science 327(5964), p.415-416, 2010 •  Results are reinforced by reproducability [De Roure] –  Explicit representation of method. D. De Roure and C. Goble Anchors in Shifting Sand: the Primacy of Method in the Web of Data Web Science Conference 2010, Raleigh NC, 2010 •  Verifiability as a key factor in scientific discovery. Stodden et. al. Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science Computing in Science and Engineering 12 (5), p.8-13, 2010
  12. 12. Publication •  Nano-publications. Explicit representation at the statement level. Groth et. al. The Anatomy of a Nano-publication Information Services and Use 30(1), p.51-56, 2010 •  Executable Papers –  Collage –  SHARE –  Verifiable Computational Results Nowakowski et. al. The Collage Authoring Environment ICCS 2011, 2011 http:// Van Gorpet. al SHARE: a web portal for creating and sharing executable research papers ICCS 2011, 2011 Gavish et. al. A Universal Identifier for Computational Results ICCS 2011, 2011 12
  13. 13. Knowledge Burying in paper publication Experiment Knowledge Publication Text Mining Paper •  Publishing/mining cycle results in loss of knowledge –  ≥ 40% of information lost •  RIP – Rest in Paper •  Need for mechanisms for publication of knowledge, preserving information about the process. B.Mons Which Gene Did You Mean? BMC Bioinformatics 6 p.142 2005
  14. 14. The Problem •  Moving to digital environments –  Workflows, protocols, algorithms –  Consuming and producing data –  Electronic publication methods •  From (linear) paper publications to…. ??? •  Need for frameworks for facilitating reuse and exchange of digital knowledge 14
  15. 15. Workflows A Scientific Workflow can be seen as the •  Central in experimental science combination of data and processes into a •  Enable automation configurable, structured set of steps that implement •  Make science repeatable (and sometimessemi-automated computational solutions in scientific reproducible) problem-solving •  Encourage best practices •  Scientist-friendly •  Aimed at (some types of) scientists, possibly even without strong computational skills •  Communities: Need for scientific data preservation •  Enhance scientific development by building on, sharing, and extending previous results within scientific communities •  However, workflow preservation is especially complex •  Workflows not only specified statically at design time but also interpreted through their execution BioAID_DiseaseDiscovery v3 •  Complex models are required to describe workflows and related resources, including documents, data and services •  Resources often beyond control of scientists
  16. 16. myExperiment   A repository of research   A probe into researcher methods behaviour   A community social network of   Open source (BSD) Ruby on Rails people and things app   A Social Virtual Research   REST and SPARQL interfaces, Environment supports Linked Data   Part of product family including   Web 2.0 “boutique” site BioCatalogue, MethodBox and SysmoDB 5550  members,  300  groups,  2300  workflows,  220  packs   16
  17. 17. Motivating Projects •  myExperiment –  Workflow sharing •  Sysmo-DB –  Assets catalogue supporting exchange of data, models, SOPs •  Obesity e-Lab/MethodBox –  Sharing survey data/analysis scripts •  myExperiment packs –  Packs supporting (simple) aggregations. –  Links not just references –  Packs as nascent ROs 17
  18. 18. Wf4Ever …technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines. •  Architecture/implementation for workflow preservation, sharing and reuse •  Research Object models •  Workflow Decay, Integrity and Authenticity •  Workflow Evolution and Recommendation •  Provenance •  Driven by Use Cases FP7 Digital Libraries and Digital Preservation iSOCO, University of Manchester, Universidad Politécnica de Madrid, University of Oxford, Poznan Supercomputing and Networking Centre, Instituto de Astrofísica de Andalucía, Leiden University Medical Centre 18
  19. 19. Research Objects Semantically rich aggregations of resources, supporting a research objective Linking 19
  20. 20. Bio Scenario 20
  21. 21. Bio Scenario 21
  22. 22. Astronomers Questions When accessing a workflow When sharing a workflow •  Can I use it for my purposes (in my •  What rights do others have? words)? •  What a good workflow is to get a •  If I can expect it to run, when was good score? it was last run, by whom? –  Make my workflow findable, reusable, and ready for review •  What it does quickly, by one of –  Instructions to authors –  example input / output (and trying it) –  Two types of contributions: serious –  a description science, preliminary/playing around –  ‘reading’ its key parts •  If my workflow may have issues –  what it was used for –  What the system or other users think –  related workflows its creator it does –  contacting the creator or last user •  How it relates to other things •  How I need to cite the author and workflow? •  Share freely or anonymously upon request? 22
  23. 23. User Requirements Reader Re-User Trainee Contributor Finder/Searcher Creator Contributor Publisher Comparator Curator Evaluator/Reviewer As a Creator of ROs, I want to aggregate existing resources so that I can conveniently access related resources from a single place. •  Study of user scenarios •  Isolation of User Requirements As a Reader of ROs, I want to compare an RO with others so that I can determine whether the investigation is novel •  User review As a Comparator of ROs, I want to follow the steps •  Project Technical requirements taken so that I can understand the investigative process or method •  Classify Technical Requirements 23
  24. 24. User Roles Creator. Collecting together resources as an RO for reuse or repurpose. May be for personal use. Contributor. Providing materials to be used within an RO Collaborator. Providing materials to be used without necessarily being aware of the RO Reader. Looking for related works, state of the art. Comparator. Looking for similar or previous work to a task in hand Re-User. Understands the underlying methods encapsulated (e.g. workflow) and how to extract/replace components. Publisher. Disseminating results or methods. Upload to repository, publish via myExp, embed in blog post. Evaluator/Reviewer. Evaluating/validating or reviewing content. Confirmation of results or validation of process. 24
  25. 25. Workflow Reproducibility Stability, Completeness, Integrity, Authenticity, QualityWorkflow Decay•  Component level•  flux/decay/unavailability•  Data level •  formats/ids/standards•  Infrastructure level •  platform/resourcesExperiment Decay•  Methodological changes•  New technologies•  New resources/components•  New data 25
  26. 26. Wf4Ever functionalitiesAccess Usage Functionalities Edit Use Annotate … Data Management Analysis Functionalities Stability Completeness Recommenda Visualization Collaboration … Evaluation Evaluation -tionsStorage Functionalities Lifecycle Functionalities Storage Retrieval Maintenance … Execution Publication Archival … 26
  27. 27. Wf4Ever Reference Implementation (By the end of 1st Year) Access Usage Clients Dropbox Client RO Manager RO Portal Tool ROBox Data Management Analysis Services Stability Completeness Recommender Evaluation EvaluationStorage Services Lifecycle Services Taverna Workflow Mgmt System RO Digital Library 27
  28. 28. Linked Data •  A set of best practices for publishing and connecting data on the Web 1.  Use URIs to name things 2.  Use dereferencable HTTP URIs 3.  Provide useful content on lookup using standards 4.  Include links to other stuff 28
  29. 29. Linked Data
  30. 30. Linked Data is not Enough! Note: The answer is •  A set of best practices for publishing not not Linked Data!* and connecting data on the Web *Logician joke 1.  Use URIs to name things 2.  Use dereferencable HTTP URIs 3.  Provide useful content on lookup using standards 4.  Include links to other stuff •  All very nice, lots of publishing going on, but no common models for lifecycle, aggregation, ownership, etc •  A platform for sharing and publishing, but more is needed Bechhofer et al Linked Data is not Enough for Scientists Future Generation Computer Systems, 2011 30
  31. 31. ROs and Linked Data •  Linked Data: Collection of best practices for publishing and connecting structured data on the web. •  ROs should be independent of mechanisms for representation and delivery •  ROs as non-information resources LD Cloud –  “Named Graphs for LD RO 31
  32. 32. WP2 - Workflow Lifecycle Management Research Object Model»  Research Object Model ›  Focus of work in M6-12 »  Version 0.1 released to project in November 2011 Container Structure»  Use within developed RO services (RODL) »  A suite of linked ontologies ›  Research Object Core - ro (aggregation and annotation) •  Research Object Emphasis on ›  Workflow Description - wfdesc (content) Workflow-centric Research Objects •  Abstract workflow ›  Workflow Provenance - wfprov (provenance) •  Workflow provenance Minimal place holder 32
  33. 33. WP2 - Workflow Lifecycle Management Research Object Core (ro)»  Aggregation (OAI-ORE) ›  Use of OAI-ORE to support the description of collections of resources. ›  Established vocabulary ›  Usage in existing work (myExperiment) ›  Fit with Linked Data publication »  Annotation (AO) ›  Survey of existing annotation vocabularies, Annotation Ontology (Clark et al) and Open Annotation Collaboration (Van de Sompel et al). ›  Liaison and discussion with both groups •  Little to choose in technical terms •  A catalyst and focus for collaboration between AO and OAC ›  Choice of AO •  Existing collaboration/relationship (UNIMAN and AO) »  Formation of W3C Open Annotation Community Group ›  Participation from Wf4Ever staff ›  Potential for impact/collaborations »  Defines the core data model used by the RO Digital Library service and the Command Line Tool developed in WP1. 33
  34. 34. WP2 - Workflow Lifecycle Management Workflow Description (wfdesc)»  Model providing initial descriptions of workflows ›  Process instances ›  Linked via input/output/parameters. ›  Support for the tasks of workflow abstraction, indexing, classifications, and general workflow analysis. ›  Generic technologies, adaptable to different domains using specific catalogues, e.g. SADI framework. ›  Reflects explicit focus on workflow-centric ROs »  Evolved from the OPMW ontology by Wf4Ever staff member Daniel Garijo and Yolanda Gil. »  Tooling generating wfdesc descriptions from aggregated Taverna workflows has been developed. ›  Descriptions already used by the Workflow Recommendation Service for inspecting workflow structures and service interconnections. WP3 34
  35. 35. WP2 - Workflow Lifecycle Management Workflow Provenance (wfprov)»  A provenance convergence layer ›  Potential for links to OPM-V or PROV-O. ›  Mappings to OPM-V and PROV-O are under development ›  A placeholder for the v0.1 ontology suite »  Taverna plugin has been developed exporting Taverna provenance in PROV-O format in WP4 »  Prototype for a conversion agent that generates wfprov descriptions from PROV- O developed, wfprov data will primarily be used by Integrity and Authenticity in order to inspect workflow executions. WP4 »  More extended modeling and descriptions of provenance information will be reported in WP4. 35
  36. 36. ROs are Technical and Social •  An artefact to support preservation of the method, data etc. •  Technical details of platform, services etc. •  A record of an investigation or experiment •  A mechanism for communication, packaging, sharing, publishing, finding •  An object that connects people together De Roure et al. Social Scientific Objects 1st International Workshop on Social Object Networks, Boston, 2011 myExpSocialObjects.pdf
  37. 37. Where Next/Challenges •  Prototype development •  Models for Research Objects –  Vocabularies •  Refinement of lifecycle states –  Versioning and Evolution •  Provenance –  RO components –  The RO itself •  Trust 37
  38. 38. Music 38
  39. 39. Music •  Music IR and Linked Data –  Publication of collections   eTree   Million Song Dataset   Benefits? •  Music IR and ROs –  What are the Research Objects of Music IR? –  Intermediate results/feature sets •  Ontologies and vocabularies for describing results/feature sets 39
  40. 40. Thanks! •  Manchester Information Management Group – •  myGrid Team – •  Wf4Ever Team – 40
  41. 41. Where Next? 41