Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
GRDDL
The Why, What, How, and Where




                            Chimezie Ogbuji
                            Cleveland ...
GRDDL: The Acronym
 Gleaning
 Resource
 Descriptions (from)
 Dialects (of)
 Language



   Rather long and intimidat...
GRDDL: By Deconstruction

   Wordnet Definition of Glean:
    ◦ (gather, as of natural products)
    ◦ Synonyms: reap, ha...
GRDDL: By Analogy
           GRDDL can be thought of
           as a protocol for sowing
           semantics in web conte...
The Why
   Vast amount of latent semantics in markup
        <span>Chimezie Ogbuji<span>
   Web content today is primari...
The Why (Cont.)
 Microformats, eRDF, and RDFa
     Specific to a particular family of
      documents
     XHTML and HT...
The Why (Cont.)
 It seems easy to forget that XHTML is
  indeed an XML dialect
     You would think the (X) would make
 ...
The What
   Faithful rendition
   Transformations
   GRDDL result
   Source documents
   GRDDL-aware Agents
Faithful Rendition
“By specifying a GRDDL transformation, the author of a document
  states that the transformation will p...
GRDDL Transformations
   Functions that take an XML document and
    return an RDF graph
   Transformations can be writt...
Other Transformation Languages
   “.. technically Javascript, C, or virtually any
    other programming language may be u...
GRDDL Result
   The result of applying the transformation is
    an RDF serialization
   The RDF graph that corresponds ...
GRDDL Source Documents
   The class of documents for which GRDDL
    defines a way to extract a result graph:
      XML ...
GRDDL Source Documents
GRDDL: XML Documents
   GRDDL Namespace (grddl prefix)
              http://www.w3.org/2003/g/data-view#


   transforma...
Namespace Documents
“Transformations can be associated not only with individual
   documents but also with whole dialects ...
Valid XHTML Documents
    <html xmlns="http://www.w3.org/1999/xhtml">
     <head
      profile="http://www.w3.org/2003/g/d...
XHTML Profiles
“Adding a GRDDL profileTransformation assertion to a profile
  document is much like adding a namespaceTran...
The How
   GRDDL builds on existing XML & RDF
    standards
   An implementation mostly needs to
    orchestrate:
     ...
Technological Overlap
Anatomy of a GRDDL
Implementation: GRDDL.py
   A reference implementation from scratch
   650 LOC
        RDFLib, 4Suit...
GRDDL.py Core
Component Stack
The Where
   GRDDL services online:
        http://triplr.org/ (Stuff in, triples out)
        http://www.w3.org/2007/0...
Hidden Value Proposition
   Supports separation of concerns:
      XML for messaging, data collection,
       structural...
GRDDL Usecases
   Embedding scheduling assertions on
    personal pages
   Using GRDDL for extracting RDF from XML
    m...
Upcoming SlideShare
Loading in …5
×

GRDDL: The Why, What, How, and Where

2,066 views

Published on

Published in: Technology, Education
  • Be the first to comment

GRDDL: The Why, What, How, and Where

  1. 1. GRDDL The Why, What, How, and Where Chimezie Ogbuji Cleveland Clinic Foundation
  2. 2. GRDDL: The Acronym  Gleaning  Resource  Descriptions (from)  Dialects (of)  Language  Rather long and intimidating
  3. 3. GRDDL: By Deconstruction  Wordnet Definition of Glean: ◦ (gather, as of natural products) ◦ Synonyms: reap, harvest.  Resource Description Framework (RDF) ◦ Logical assertions  Dialects of Language ◦ XML document families (XHTML, for instance)
  4. 4. GRDDL: By Analogy GRDDL can be thought of as a protocol for sowing semantics in web content for later harvest.
  5. 5. The Why  Vast amount of latent semantics in markup <span>Chimezie Ogbuji<span>  Web content today is primarily built for human consumption  Text indexing will only get you so far for document retrieval  If machines are meant to harvest RDF from documents, reproducible protocols are needed
  6. 6. The Why (Cont.)  Microformats, eRDF, and RDFa  Specific to a particular family of documents  XHTML and HTML  If the goal is machine consumption, the bar needs to be raised beyond XHTML
  7. 7. The Why (Cont.)  It seems easy to forget that XHTML is indeed an XML dialect  You would think the (X) would make that obvious  What was needed was a standard way to harvest RDF that is applicable to all XML dialects
  8. 8. The What  Faithful rendition  Transformations  GRDDL result  Source documents  GRDDL-aware Agents
  9. 9. Faithful Rendition “By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”  Licenses an author-certified interpretation of an XML document  A powerful paradigm for messaging  See David Booths “RDF and SOA”  http://www.w3.org/2007/01/wos-papers/booth
  10. 10. GRDDL Transformations  Functions that take an XML document and return an RDF graph  Transformations can be written in any particular language  The “reference” transformation language is XSLT  “[XSLT1] is the format most widely supported by GRDDL- aware agents as of this writing […] is specifically designed to express XML to XML transformations and has some good safety characteristics”
  11. 11. Other Transformation Languages  “.. technically Javascript, C, or virtually any other programming language may be used to express transformations for GRDDL”  However, these transformations need to be deterministic in order to ensure the result is a faithful rendition  Hence, they must be functions
  12. 12. GRDDL Result  The result of applying the transformation is an RDF serialization  The RDF graph that corresponds to the serialization is a GRDDL result of the original document  The “reference” result format is RDF/XML  Other formats can be used (Turtle, N3,etc.)
  13. 13. GRDDL Source Documents  The class of documents for which GRDDL defines a way to extract a result graph:  XML Documents  XML Namespace Documents  Valid XHTML  XHTML Profiles
  14. 14. GRDDL Source Documents
  15. 15. GRDDL: XML Documents  GRDDL Namespace (grddl prefix) http://www.w3.org/2003/g/data-view#  transformation attribute <?xml version=“1.0” encoding=“UTF-8”?> <root xmlns:grddl='http://www.w3.org/2003/g/data-view#’ grddl:transformation=“.. path to transform ..”> … XML content .. </root>
  16. 16. Namespace Documents “Transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace”  A GRDDL source document lives at the location of the namespace URI of the root element (the namespace document)  The GRDDL result of the namespace document has a statement of the form: ?nsDoc grddl:namespaceTransformation ?txDoc • txDoc is the location of a transformation applicable to such XML documents
  17. 17. Valid XHTML Documents <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Some Document</title> <link rel="transformation" href=”.. path to transformation .. " /> ... </head> … </html>  Refers to the GRDDL XHTML profile  Licenses the interpretation of rel=“transformation” links
  18. 18. XHTML Profiles “Adding a GRDDL profileTransformation assertion to a profile document is much like adding a namespaceTransformation assertion to a namespace document”  A GRDDL source document lives at the location of the profile URI an XHTML document  The GRDDL result of the profile document has a statement of the form: ?profileDoc grddl:profileTransformation ?txDoc • txDoc is the location of a transformation applicable to such XML documents
  19. 19. The How  GRDDL builds on existing XML & RDF standards  An implementation mostly needs to orchestrate:  Parsing of data representations  Resolving representations from web locations  The necessary XML processing to peek into and harvest RDF from the various sources  The highly recursive nature of GRDDL 
  20. 20. Technological Overlap
  21. 21. Anatomy of a GRDDL Implementation: GRDDL.py  A reference implementation from scratch  650 LOC  RDFLib, 4Suite-XML, and Python control logic  A layered approach  Core module that handles transformations  One module per source type stacked on top of the core  A top layer that orchestrates the recursion and identification of which ‘class’ a source document belongs to
  22. 22. GRDDL.py Core
  23. 23. Component Stack
  24. 24. The Where  GRDDL services online:  http://triplr.org/ (Stuff in, triples out)  http://www.w3.org/2007/08/grddl/ (W3C GRDDL Service)  Primary GRDDL implementations:  Redland  GRDDL.py  Virtuoso  GRDDL Reader for Jena  RDFa is most common GRDDL source content format in the wild
  25. 25. Hidden Value Proposition  Supports separation of concerns:  XML for messaging, data collection, structural validation  RDF for Expressive assertions, inference, etc.  A way to invest in data richness and accessibility
  26. 26. GRDDL Usecases  Embedding scheduling assertions on personal pages  Using GRDDL for extracting RDF from XML medical record documents  Cleveland Clinic use case (clinical research)  Aggregating web-based product reviews  Embedding web service descriptions  Adding semantic assertions to XML schemas  Embedding semantic assertions to Wikis

×