Successfully reported this slideshow.

A Clean Slate?

4

Share

1 of 77
1 of 77

More Related Content

Similar to A Clean Slate?

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

A Clean Slate?

  1. 1. A Clean Slate? @hvdsomp http://public.lanl.gov/herbertv/ herbert van de sompel Includes slides by Sean Bechhofer, Carole Goble, Robert Sanderson
  2. 2. paper-based scholarly communication system scanned version of paper-based scholarly communication system natively digital, web-based, scholarly communication system Context of My Work, My Talk painful  transi,on  
  3. 3. In Silico (Computational) Science Datasets Data collections Algorithms Configurations Tools and Apps Codes Code Libraries Services, Infrastructure, Compilers Hardware Simulations, data exploration, data processing, analytics, database based, text mining, auto recommendation, visual analytics…Actually Digital Science is just Science Carole Goble, JCDL 2012 Keynote https://dl.dropbox.com/u/617206/JCDL2012keynoteGoble.ppt
  4. 4. Scientific Workflows, Services, Data, Workflow Engines   Carole Goble, JCDL 2012 Keynote https://dl.dropbox.com/u/617206/JCDL2012keynoteGoble.ppt All components continuously in flux. How to reproduce results in such an environment?
  5. 5. A Lot of Rs for Reproducibility •  Rerun re-execute original experiment using revised setting. •  Review Validate and justify the results empirically. Trust. Understand. Train. Convincing and comfort •  Replicate / Repeat Exactly replicate the original experiment. Eliminate change. •  Reproduce Run experiment with differences in elements (materials, methods, platform or setting) and compare to test for same result. •  Replay Run through what happened using logs without original platform or need to execute. Carole Goble, JCDL 2012 Keynote https://dl.dropbox.com/u/617206/JCDL2012keynoteGoble.ppt
  6. 6. A Lot of Rs for Reuse •  Refresh execute an upgraded original experiment. •  Reconstruct rebuild using new elements or different platform when they are lost/unavailable/inaccessible •  Reuse use as part of new experiments. •  Repurpose/Reassemble reuse elements in a new experiment Carole Goble, JCDL 2012 Keynote https://dl.dropbox.com/u/617206/JCDL2012keynoteGoble.ppt
  7. 7. The Article is the Knowledge Bottleneck “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” Backheit, J. and Donoho, D. (1995) Wavelab and reproducible research http:// citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.2982
  8. 8. The Article is the Knowledge Bottleneck “Changes are occurring in the ways in which scientific research is conducted. Within e-laboratories, methods such as scientific workflows, research protocols, standard operating procedures and algorithms for analysis and simulation are used to manipulate and produce data. Experimental or observational data and scientific models are typically born digital with no physical counterpart. This move to digital content is driving a sea-change in scientific publication, and challenging traditional scholarly publication.” Bechhofer S. et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge http://dx.doi.org/10.1038/npre.2010.4626.1
  9. 9. •  Involved in each such experiment is a complex set of resources with complex relationships •  There is a need to share these resources in order to support forms of reuse, reproducibility •  This entails the augmentation of the scholarly record with an explicit account of the research process •  Digital exchange of each resource individually is trivial, exchange of the combined knowledge is not •  Traditional, electronic publications, can not handle this job •  Targeted at humans, not machines •  Communicates findings not all scientific knowledge behind the findings •  Content not decomposable in actionable units •  Outputs, results, methods not reusable If not the Article, then What? Bechhofer S. et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge http://dx.doi.org/10.1038/npre.2010.4626.1
  10. 10. The Clean Slate Challenge
  11. 11. The Clean Slate Challenge Add features to support these needs to the existing scholarly communication system?
  12. 12. The Clean Slate Challenge Start with a clean slate?
  13. 13. Research Objects http://www.researchobject.org/ http://www.wf4ever-project.org/
  14. 14. Research Objects: Aggregated Content •  Data used or results produced in an experiment study •  Methods employed to produce and analyze that data •  Provenance and setting information about the experiments •  People involved in the investigation •  Annotations about these resources, that are essential to the understanding and interpretation of the scientific outcomes captured by a research object. http://www.researchobject.org/
  15. 15. http://www.w3.org/community/rosc/
  16. 16. Research Objects http://www.researchobject.org/
  17. 17. Research Objects: Aggregation “Research Objects are aggregations of content. Thus a Research Object framework needs to provide a mechanism for this aggregation. Aggregations are likely to include references to resources but there may also, however, be situations, where, for reasons of efficiency or in order to support persistence, Research Objects should also be able to aggregate literal data as well as references to data.” Bechhofer S. et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge http://dx.doi.org/10.1038/npre.2010.4626.1
  18. 18. •  OAI-ORE observation: Scholarly assets are rapidly becoming compound, consisting of multiple resources •  e.g. datasets, software, ontologies, workflows, online debate, slides, blogs, videos, etc. with various: •  Relationships •  Interdependencies •  How to convey this compound-ness in an interoperable manner so that applications can access, consume such assets? 2007   Funded by the Mellon Foundation & Microsoft Research http://www.openarchives.org/ore/
  19. 19. Foundations of the ORE Solution •  Web Architecture - Resource, URI, Representation •  Semantic Web: •  URIs for documents (information resources), •  URIs for physical entities, concepts, abstractions (non-information resources) •  RDF – to express properties, relationships pertaining to resources •  Linked Data: •  HTTP URIs for both information and non-information resources •  HTTP 303 redirect: •  From: The HTTP URI of non-information resource •  To: The HTTP URI of an information resource that describes the non-information resource
  20. 20. Adding Account of Research Life Cycles to Scholarly Record Pepe, A., Mayernik, M., Borgman, C., Van de Sompel, H. (2009) Technology to Represent Scientific Practice: Data, Life Cycles, and Value Chains. http://dx.doi.org/ 10.1002/asi21263
  21. 21. ORE & Research Objects “…, Research Objects should also be able to aggregate literal data as well as references to data.” •  Aggregated Resources in ORE have HTTP URIs; probably needs to be relaxed. •  Embedding content in RDF, irrespective of ORE, is … interesting •  See: Representing Content in RDF 1.0 http://www.w3.org/TR/ Content-in-RDF10/ •  Allows embedding base64, text, XML •  Resource Map as manifest in e.g. ZIP file?
  22. 22. Research Objects http://www.researchobject.org/
  23. 23. Research Objects: Annotation “Annotations about these resources, that are essential to the understanding and interpretation of the scientific outcomes captured by a research object.” http://www.researchobject.org/
  24. 24. •  Annotation is a pervasive scholarly activity, conducted by people and machines •  Many annotation efforts and tools •  But annotations stuck in silos: •  Only consumable by client that created it •  Annotations not shareable beyond original environment •  Open Annotation focuses on interoperability for annotations in order to allow sharing of annotations across: •  Annotation clients •  Content collections •  Services that leverage annotations 2009   Funded by the Mellon Foundation http://www.openanotation.org/spec/core/
  25. 25. •  Established to reconcile Open Annotation Collaboration and Annotation Ontology models •  67 participants from around the world: 7th of 119 groups Many universities, also commercial and not-for-profit •  Mission: Interoperability between Annotation systems and platforms, by …following the Architecture of the Web …reusing existing web standards …providing a single, coherent model to implement …without requiring adoption of specific platforms …while maintaining low implementation costs W3C Open Annotation Community Group http://www.w3.org/community/openannotation/
  26. 26. An Annotation is considered to be a set of connected resources, typically including a body and target, where the body is related to the target. “   ”   Highlighting, Bookmarking Commenting, Describing Tagging, Linking Classifying, Identifying Questioning, Replying Editing, Moderating …Provide an Aide-Memoire …Share and Inform …Improve Discovery …Organize Resources …Interact with Others …Create as well as Consume What is an Annotation? http://www.w3.org/community/openannotation/
  27. 27. Annotates   Annotations
  28. 28. Annotates?   Annotations?
  29. 29. Basic Open Annotation Data Model
  30. 30. Use Case: Bookmarking
  31. 31. Use Case: Commenting
  32. 32. Use Case: Commenting
  33. 33. Use Case: Tagging
  34. 34. Specific Body and Specific Target resources identify the region of interest, and/or the state of the resource. Need to be able to describe the state of the resource, the segment of interest, and potentially styling hints for how to render it. Open Annotation introduces: State Describes how to retrieve representation Selector Describes how to select segment Style Describes how to render/process segment Scope Describes context of the resource Further Specification of Resources
  35. 35. Use Case: Changing Content at the Same URI
  36. 36. Use Case: Segment of Interest
  37. 37. W3C Open Annotation & Research Objects •  Early renderings of Research Objects emerging from the Wf4Ever project use Annotation Ontology as the annotation framework •  But since the Annotation Ontology and Open Annotation Collaboration models now merge into the W3C Open Annotation model, it is safe to assume W3C Open Annotation will be used for Research Objects
  38. 38. Research Objects http://www.researchobject.org/
  39. 39. Research Objects: Versioning and Evolution “Research Objects are dynamic in that their contents can change and be changed – additional contents may be added to aggregations, or additional metadata can be asserted about the contents or relationships between content. The resources that are aggregated may change. Thus there is a need for versioning, allowing the recording of changes to objects, potentially along with facilities for retrieving objects or aggregated elements at particular historical points in their lifecycle.” Bechhofer S. et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge http://dx.doi.org/10.1038/npre.2010.4626.1
  40. 40. ORE Experiment: Versioning and Evolution of Compound Objects Van de Sompel, H. et al. (2007) Appendix to Interoperability for the Discovery, Use, and Re-Use of Units of Scholarly Communication http://www.ctwatch.org/quarterly/articles/2007/08/interoperability-for-the-discovery-use- and-re-use-of-units-of-scholarly-communication/
  41. 41. •  Memento is about the Web and time: •  Resources evolve over time •  Only the current representation is available from a resource’s URI •  How to seamlessly access prior representation, if they exist? •  Memento looks at this problem for the Web, in general Digital  Preserva,on  Award  2010   2009   Funded by the Library of Congress http://www.mementoweb.org/
  42. 42. URI for Original, URI for Version   URI-­‐M  -­‐  hDp://web.archive.org/web/20010911203610/hDp://www.cnn.com/     Web  Archive   URI-­‐R  -­‐  hDp://www.cnn.com/    
  43. 43. URI for Original, URI for Version   URI-­‐M  -­‐  hDp://en.wikipedia.org/w/index.php?,tle=September_11_aDacks&oldid=282333     CMS   URI-­‐R  -­‐  hDp://en.wikipedia.org/wiki/September_11_aDacks  
  44. 44. Time Travel for the Web: Demo   http://www.mementoweb.org/demo/Memento_Time_Travel.mov
  45. 45. Memento & Research Objects •  The combination of: •  Pro-active archiving of Research Objects and their constituent resources, using •  Web archiving techniques, e.g. crawling, transactional archiving •  Platforms with strong versioning capabilities, e.g. datawikis, github •  Assigning URIs to Research Objects and their constituent resources according to the well-established time-generic (URI-R) and time-specific (URI-M) resource pattern •  The Memento protocol to access time-specific versions of Research Objects and their constituent resources via their time- generic URI and timestamp makes a good candidate for addressing the versioning and evolution need.
  46. 46. Research Objects http://www.researchobject.org/
  47. 47. Research Objects: Provenance “The issue of provenance, and being able to audit experiments and investigations is key to the scientific method. Third parties must be able to audit the steps performed in an experiment in order to be convinced of the validity of results. Audit is required not just for regulatory purposes, but allows for the results of experiments to be interpreted and reused, thus a Research Object should provide sufficient information to support audit of the aggregation as a whole, its constituent parts, and any process that it may encapsulate.” Bechhofer S. et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge http://dx.doi.org/10.1038/npre.2010.4626.1
  48. 48. Van de Sompel, H. (2003) Roadblocks http://www.sis.pitt.edu/~dlwkshop/paper_sompel.html Provenance
  49. 49. Moreau, L. et al. (2010) The Open Provenance Model: Abstract Model http://eprints.ecs.soton.ac.uk/21449/ Open Provenance Model
  50. 50. W3C Provenance http://www.w3.org/TR/prov-primer/
  51. 51. Research Objects http://www.researchobject.org/ W3C  PROV  
  52. 52. The Clean Slate Challenge
  53. 53. •  ResourceSync is about synchronization of web resources, things with a URI that can be dereferenced •  Small websites/repositories (a few resources) to large repositories/datasets/ linked data collections (many millions of resources) •  Low change frequency (weeks/months) to high change frequency (seconds) •  Synchronization latency and accuracy needs may vary •  Modular framework based on Sitemaps and extensions 2012   Funded by the Sloan Foundation http://www.openarchives.org/rs/
  54. 54. •  Investigates reference rot at massive scale: •  Citation rot - Do HTTP references in scholarly articles still resolve? •  Content rot - If so, is the content at the end of the HTTP reference still representative of the content that was originally referenced? •  Investigates pro-active ways to archive HTTP referenced resources that occur in scholarly articles 2013   hiberlink Funded by the Mellon Foundation Soon at http://www.hiberlink.org
  55. 55. Research Objects http://www.researchobject.org/ http://www.wf4ever-project.org/
  56. 56. http://www.w3.org/community/rosc/
  57. 57. A Clean Slate? @hvdsomp http://public.lanl.gov/herbertv/ herbert van de sompel Includes slides by Sean Bechhofer, Carole Goble, Robert Sanderson

×