SemTech West 2011 - Digital Provenance


Published on

Digital Provenance overview presentation given at SemTech 2011 by Greg Joiner of Raytheon BBN Technologies.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SemTech West 2011 - Digital Provenance

  1. 1. Implementing Digital Provenance on the World Wide Web Using Semantic Web Technology Gregory Joiner*, Douglas Reid Raytheon BBN Technologies {gjoiner,dreid} June 9th, 2011
  2. 2. First…Some Administrivia! • Updated slides are located on SlideShare at: • Presentation is not “Technical – Intermediate.” – I wanted to reach the maximum number of users – Was not enough time to provide both an overview and technical instruction. • Feel free to interrupt me anytime with questions! June 9th, 2011 2
  3. 3. Goals of this Talk • Learn what digital provenance is • Understand why it is important • Know what is currently being done by whom • Have starting point for implementing provenance in your semantic web applications • Be passionate about digital provenance! June 9th, 2011 3
  4. 4. Agenda • Part 1: A Introduction to Digital Provenance – What is Digital Provenance – National Cyber Leap Year Summit • Part 2: Digital Provenance Use Cases – Everyday Web Browsing – Contradictory, Time-Sensitive Information – Closed Network Provenance • Part 3: Where Are We Now? – W3C Provenance Work – Review of the Current State-of-the-Art • Part 4: Digital Provenance Tool Development – Why SemWeb is Perfect for Digital Provenance – Open Source and Standards Compliance – Securing Provenance Metadata – Additional Design Considerations June 9th, 2011 4
  5. 5. A INTRODUCTION TO DIGITAL PROVENANCE Part 1: Part 1: A Introduction to Digital Provenance Part 2: Digital Provenance Use Cases Part 3: Where Are We Now? Part 4: Digital Provenance Tool Development June 9th, 2011 5
  6. 6. What is Digital Provenance • Provenance is defined by Webster’s Dictionary as “the origin or source of something” – mainly pertaining to art or architectural artifacts • Digital Provenance is metadata that establishes the chain-of-custody information needed for users to make trust decisions about digital data • Digital Provenance Metadata can describe any type of electronic data at any granularity level from entire web sites to single files to even individual assertions within a webpage or document June 9th, 2011 6
  7. 7. What is Digital Provenance Types of Digital Provenance Metadata include: • Bibliographical Information – Provides a list of all of the sources behind a document or assertion • Chain-of-Custody Information – Provides a history of the different people and/or systems that have handled the document or assertion • Proof / Justification Information – Documents the logical steps followed to make an assertion • Trust Information – Provides a quantifiable metric to measure and compare the trustworthiness of one document or assertion to another. June 9th, 2011 7
  8. 8. National Cyber Leap Year Summit • Convened in 2009 as a response to the President’s call to secure the nation’s cyber infrastructure and charged with identifying the “game- changing” technologies needed to secure cyberspace • Identified Digital Provenance as one of those technologies because it enables the identification, authentication, and reputation of entities and objects with appropriate granularity at many layers of the protocol hierarchy. June 9th, 2011 8
  9. 9. DIGITAL PROVENANCE USE CASES Part 2: Part 1: A Introduction to Digital Provenance Part 2: Digital Provenance Use Cases Part 3: Where Are We Now? Part 4: Digital Provenance Tool Development June 9th, 2011 9
  10. 10. Everyday Web Browsing • Scenario: People often rely on the Internet for advice on important subjects, such health or finance, and frequently make key decisions based on web content alone. This is especially true for mobile users who lack the bandwidth and display room to investigate the provenance on their own. • Solution: By dynamically marking the trustworthiness of web content, users can quickly determine what data they can trust so they can make more informed decisions. June 9th, 2011 10
  11. 11. Contradictory, Time-Sensitive Information • Scenario: When breaking news happens, content re-publishers and end users are often forced to chose between contradicting information. For example, after the tragic shooting in Arizona in January 2011, some websites claimed Rep. Gifford was dead while others properly reported that she was still alive. • Solution: By providing a standard way to view and compare the bibliographical and chain-of-custody information of the conflicting articles, users can make an informed decision on which one to trust. June 9th, 2011 11
  12. 12. Closed Network Provenance • Scenario: Even in a closed network, users frequently have to decide whether to trust existing content. This is often the case within the Intelligence Community and Department of Defense where certain time-sensitive tasks allow assumptions to be made that other tasks can not. For example, the use of lethal force against a target requires more concrete evidence than other, less irreparable actions. • Solution: By providing analysts with a complete list of the assumptions and justifications behind a given assertion, they can determine whether or not they can use that assertion in their analysis. June 9th, 2011 12
  13. 13. Additional Use Cases • License and Contract Compliance • Public Policy Conformance • Assigning Credit and Blame to Information • Many more were identified by the W3C Provenance Incubator Group and are located at: June 9th, 2011 13
  14. 14. WHERE ARE WE NOW? Part 3: Part 1: A Introduction to Digital Provenance Part 2: Digital Provenance Use Cases Part 3: Where Are We Now? Part 4: Digital Provenance Tool Development June 9th, 2011 14
  15. 15. W3C Provenance Work • Provenance Interchange Working Group – Chartered through Oct 2012, based on Incubator Group’s findings – Formed to “support the widespread publication and use of provenance information of Web documents, data, and resources” – Will publish Recommendations to define a language for exchanging provenance information (PIL) among applications • Provenance Interchange Language (PIL) design goals – Be applicable to any resource – Provide a low barrier to entry to facilitate widespread adoption – Provide a small, extensible core model – Draw from existing vocabularies ontologies • Deliverables – Conceptual Model, Formal Model, Formal Semantics, Accessing and Query Provenance, XML Serialization, Best Practice Cookbook, Primer June 9th, 2011 15
  16. 16. W3C’s work (cont.) • Key Recommendations for PIL – Standard way to represent, at a minimum, three basic entities 1. A handle (URI) to refer to an object 2. A person/entity that the object is attributed to 3. A processing step done by a person/entity to an object – Mechanism to access provenance-related information addressed by other standards • Licensing information of an object • Digital signature for the object • Digital signature for the provenance records – Standard way for sites to make provenance information about their content available to other parties in a selective manner, and for others to access that information June 9th, 2011 16
  17. 17. Review of the Current State-of-the-Art Representation • Existing Provenance Vocabularies/Ontologies – Dublin Core: “Librarian” vocabulary capturing bibliographical information. – Provenir Ontology: Upper-level ontology for use in SemWeb applications – Provenance Vocabulary: Captures data using the Linked Data principles – Proof Markup Language (PML): “Full-Featured” interlingua that describes basic provenance meta-data plus justification and trust information. – Others: Changeset Vocabulary, PREMIS, SWAN Provenance Ontology, Semantic Web Publishing Vocabulary, and WOT Schema • Concrete mapping specified between existing ontologies – The Open Provenance Model (OPM) was chosen as a reference vocabulary since it contained is a general and broad model that encompasses many aspects of provenance – W3C Incubator Group formally encoded the mappings according to Simple Knowledge Organization System (SKOS) vocabulary June 9th, 2011 17
  18. 18. Review of the Current State-of-the-Art Implementation • News aggregation scenario – Content tracking (Memetracker, Spinn3r & BlogTracker, influence studies) – Explicit provenance (trackbacks / pingbacks, Twitter’s Retweet) – Licensing (Creative Commons, Google Books Right Registry) • Disease outbreak scenario – Data provenance (human-readable changelogs, database research) – Workflow provenance (Taverna/Pegasus, Inference Web, ZOOM) – Justification for policy (ad-hoc user effort) • Business Contract scenario – Tracking design (VisTrails) – Computer-aided Design (Design Rationale editor (DRed), IBIS software) June 9th, 2011 18
  19. 19. State-of-the-Art (cont.) Gaps • Content – No mechanism to refer to the identity/derivation of an information object – No guidance on granularity for description of complex objects – No common standard for exposing/expressing provenance information – No standard for versioning and publishing updates – No standard to characterize suitability of provenance info for proof • Management – No standard for linking provenance between sites – No guidance on combining existing standards to provide provenance – No guidance for exposing provenance info on the Web – No proven approaches to manage scale – No standard way to ensure only essential non-confidential provenance is released June 9th, 2011 19
  20. 20. State-of-the-Art (cont.) More Gaps • Use – No clear understanding of how to relate provenance at different levels of abstraction – No general solutions to understand provenance publish on the Web – No standard to enable provenance integration/comparison – No broadly applicable methodology for making trust judgments based on provenance when presented with information of varying quality – No existing mechanism to check compliance with laws, regulations or contracts – No means to resolve conflicts in provenance data June 9th, 2011 20
  21. 21. DIGITAL PROVENANCE TOOL DEVELOPMENT Part 4: Part 1: A Introduction to Digital Provenance Part 2: Digital Provenance Use Cases Part 3: Where Are We Now? Part 4: Digital Provenance Tool Development June 9th, 2011 21
  22. 22. Why SemWeb is Perfect for Digital Provenance • Semantic Web Technologies allow data to be shared and reused in a manner that is more flexible and integratable than traditional knowledge representations. • The Web Ontology Language (OWL) allows deeper context to be encoded in the digital provenance metadata which enables the capture of more complex information in a standard, well specified format. • With the provenance metadata in a machine-readable format, powerful automated information processing can which can provide additional provenance knowledge. • By semantically tagging the digital provenance metadata, it can be dynamically linked to supporting (or contradicting) information to provide a more complete chain-of-custody picture. June 9th, 2011 22
  23. 23. Why Digital Provenance is Perfect for SemWeb June 9th, 2011 23 Provenance helps complete the path to the top of the Semantic Web layer cake and to TBL’s SemWeb nirvana.
  24. 24. Open Source and Standards Compliance • As explained in the National Cyber Leap Year Summit’s Co-Chairs’ Report, establishing standards early on in the development process is crucial to achieving rapid, widespread community acceptance that is required for any digital provenance tool to be successful. • Therefore, Digital Provenance tools should comply with and even inform the emerging W3C standards discussed earlier in this presentation • Furthermore, since digital provenance tools require an additional time burden for both content developers and end-users, they should be available at little to no cost to further encourage acceptance. June 9th, 2011 24
  25. 25. Securing Provenance Metadata • Provenance metadata that is not signed or secured is susceptible to tampering and therefore cannot realistically be trusted. • Confidentiality and integrity controls that are consistent with a wide variety of security models are crucial to creating a successful digital provenance solution. June 9th, 2011 25
  26. 26. Additional Design Considerations • It is crucial that any digital provenance tool supports the creation, processing, and rendering of digital provenance metadata at all stages of the content creation lifecycle. • Since users will require provenance information at many different levels of detail, successful digital provenance tools will be configurable to allow content creators and users to create and view the metadata at any granularity level. June 9th, 2011 26
  27. 27. Key Takeaways • Provenance is key to the future success of the Web and is the final piece of the Semantic Web puzzle. • The U.S. government has identified digital provenance as one of the important “game changing” cyber security technologies. • Important W3C work is already underway. • You can start thinking about and incorporating provenance in your application right now. June 9th, 2011 27
  28. 28. For More Information • Authors – Greg Joiner,, 703-284-1259 – Douglas Reid,, 703-284-1291 • National Cyber Leap Year Report – Co-Chairs Report: – Participants’ Ideas Report: • W3C Provenance Interchange Working Group – June 9th, 2011 28
  29. 29. Questions June 9th, 2011 29