GRDDL: The Why, What, How, and Where


Published on

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

GRDDL: The Why, What, How, and Where

  1. 1. GRDDL The Why, What, How, and Where Chimezie Ogbuji Cleveland Clinic Foundation
  2. 2. GRDDL: The Acronym  Gleaning  Resource  Descriptions (from)  Dialects (of)  Language  Rather long and intimidating
  3. 3. GRDDL: By Deconstruction  Wordnet Definition of Glean: ◦ (gather, as of natural products) ◦ Synonyms: reap, harvest.  Resource Description Framework (RDF) ◦ Logical assertions  Dialects of Language ◦ XML document families (XHTML, for instance)
  4. 4. GRDDL: By Analogy GRDDL can be thought of as a protocol for sowing semantics in web content for later harvest.
  5. 5. The Why  Vast amount of latent semantics in markup <span>Chimezie Ogbuji<span>  Web content today is primarily built for human consumption  Text indexing will only get you so far for document retrieval  If machines are meant to harvest RDF from documents, reproducible protocols are needed
  6. 6. The Why (Cont.)  Microformats, eRDF, and RDFa  Specific to a particular family of documents  XHTML and HTML  If the goal is machine consumption, the bar needs to be raised beyond XHTML
  7. 7. The Why (Cont.)  It seems easy to forget that XHTML is indeed an XML dialect  You would think the (X) would make that obvious  What was needed was a standard way to harvest RDF that is applicable to all XML dialects
  8. 8. The What  Faithful rendition  Transformations  GRDDL result  Source documents  GRDDL-aware Agents
  9. 9. Faithful Rendition “By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”  Licenses an author-certified interpretation of an XML document  A powerful paradigm for messaging  See David Booths “RDF and SOA” 
  10. 10. GRDDL Transformations  Functions that take an XML document and return an RDF graph  Transformations can be written in any particular language  The “reference” transformation language is XSLT  “[XSLT1] is the format most widely supported by GRDDL- aware agents as of this writing […] is specifically designed to express XML to XML transformations and has some good safety characteristics”
  11. 11. Other Transformation Languages  “.. technically Javascript, C, or virtually any other programming language may be used to express transformations for GRDDL”  However, these transformations need to be deterministic in order to ensure the result is a faithful rendition  Hence, they must be functions
  12. 12. GRDDL Result  The result of applying the transformation is an RDF serialization  The RDF graph that corresponds to the serialization is a GRDDL result of the original document  The “reference” result format is RDF/XML  Other formats can be used (Turtle, N3,etc.)
  13. 13. GRDDL Source Documents  The class of documents for which GRDDL defines a way to extract a result graph:  XML Documents  XML Namespace Documents  Valid XHTML  XHTML Profiles
  14. 14. GRDDL Source Documents
  15. 15. GRDDL: XML Documents  GRDDL Namespace (grddl prefix)  transformation attribute <?xml version=“1.0” encoding=“UTF-8”?> <root xmlns:grddl='’ grddl:transformation=“.. path to transform ..”> … XML content .. </root>
  16. 16. Namespace Documents “Transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace”  A GRDDL source document lives at the location of the namespace URI of the root element (the namespace document)  The GRDDL result of the namespace document has a statement of the form: ?nsDoc grddl:namespaceTransformation ?txDoc • txDoc is the location of a transformation applicable to such XML documents
  17. 17. Valid XHTML Documents <html xmlns=""> <head profile=""> <title>Some Document</title> <link rel="transformation" href=”.. path to transformation .. " /> ... </head> … </html>  Refers to the GRDDL XHTML profile  Licenses the interpretation of rel=“transformation” links
  18. 18. XHTML Profiles “Adding a GRDDL profileTransformation assertion to a profile document is much like adding a namespaceTransformation assertion to a namespace document”  A GRDDL source document lives at the location of the profile URI an XHTML document  The GRDDL result of the profile document has a statement of the form: ?profileDoc grddl:profileTransformation ?txDoc • txDoc is the location of a transformation applicable to such XML documents
  19. 19. The How  GRDDL builds on existing XML & RDF standards  An implementation mostly needs to orchestrate:  Parsing of data representations  Resolving representations from web locations  The necessary XML processing to peek into and harvest RDF from the various sources  The highly recursive nature of GRDDL 
  20. 20. Technological Overlap
  21. 21. Anatomy of a GRDDL Implementation:  A reference implementation from scratch  650 LOC  RDFLib, 4Suite-XML, and Python control logic  A layered approach  Core module that handles transformations  One module per source type stacked on top of the core  A top layer that orchestrates the recursion and identification of which ‘class’ a source document belongs to
  22. 22. Core
  23. 23. Component Stack
  24. 24. The Where  GRDDL services online:  (Stuff in, triples out)  (W3C GRDDL Service)  Primary GRDDL implementations:  Redland   Virtuoso  GRDDL Reader for Jena  RDFa is most common GRDDL source content format in the wild
  25. 25. Hidden Value Proposition  Supports separation of concerns:  XML for messaging, data collection, structural validation  RDF for Expressive assertions, inference, etc.  A way to invest in data richness and accessibility
  26. 26. GRDDL Usecases  Embedding scheduling assertions on personal pages  Using GRDDL for extracting RDF from XML medical record documents  Cleveland Clinic use case (clinical research)  Aggregating web-based product reviews  Embedding web service descriptions  Adding semantic assertions to XML schemas  Embedding semantic assertions to Wikis