Tools for Next Generation of CMS: XML, RDF,
                                   & GRDDL



       Chimezie Ogbuji (chee-meh...
Background (CT Research Roadmap)

●   A large, relational registry for Cardiothoracic
    procedures
●   Relatively small ...
Background (Institute of Medicine
               Proposal)
●   The Computer-Based Patient Record: An Essential
    Technol...
CPR: Functional Requirements
●   Uniform, extensible record content
●   (Standard) record formats
●   System performance
●...
Definitions: KR / CMS
●   What is Knowledge Representation (KR)?
●   What is a Knowledge Base (KB)?:
     – A database sys...
Knowledge Representation




●   Older ideas at corners, newer ideas along sides
    (Credit: Conrad Barski, M.D.)
Content Management System:
                The What
●   The terms CMS and Content Repository are
    essentially interchan...
Content Repository Feature Set

●   Modern CMS standards cover document
    management effectively
    –   Read/write acce...
Anatomy of a JSR 170 Implementation

●   Jack Rabbit
●   Component-based
    –   Content Applications
    –   Content Repo...
Knowledge Bases and CMS

●   What of the requirements that Expert Systems
    meet?
●   Document management and knowledge
...
XML & RDF:
          Eating and Having your Cake
●   Classic example of where the document-oriented
    approach falls sho...
GRDDL:
                  The Elevator Pitch
●   Provides a way to normalize RDF concrete
    syntaxes
●   The problem:
   ...
The GRDDL Picture
GRDDL:
                    The Components
●   Faithful Rendition
    –   “By specifying a GRDDL transformation, the author...
The CMS Alternative:
              “Dual Representation”
●   Persist XML in synchrony with its faithful rendition
    –   ...
Document Definition




●
    The document definition is the mapping
    –   Usually an XSLT document
Content Repository Architecture
Overlap between Content Repository
              APIs
Dual Representation:
                   Advantages
●   Maximum expressiveness and versatility of content
●   Unified namin...
Document Definition:
                    Similarities
●   GRDDL
●   RDDL
    –   Resource Directory Description Language
 ...
Registering a Document to a Class

●   Namespace registration works well for the web
    (preferred approach of W3C TAG)
●...
SemanticDB and Dual Representation
Document and Graph Granularity

●   Tying documents to graphs normalizes the content
    granularity
●   Documents and the...
JSR Fine-Grained Control
'Controlled' Naming Convention
Controlled Naming Convention:
                  Continued
●   RDF Dataset (from SPARQL):
    –   A collection of named gra...
Uniform Access Control for
           XML/RDF CMS
●   Traditionally, Access Control Lists are associated
    with an objec...
Going Forward
●   The SPARQL RDF dataset needs to be generalized
    –   There is a long list of representation problems s...
A Proposal for XML/RDF CMS
Primary Takeaways
●   We need to stop thinking of XML & RDF as mutually
    exclusive solutions to similar problems
●   CM...
References
●   D. Nuescheler et al, JSR 170: Content Repository for Java
     – http://jcp.org/en/jsr/detail?id=170
●   D....
Upcoming SlideShare
Loading in...5
×

Tools for Next Generation of CMS: XML, RDF, & GRDDL

2,020

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,020
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Tools for Next Generation of CMS: XML, RDF, & GRDDL

  1. 1. Tools for Next Generation of CMS: XML, RDF, & GRDDL Chimezie Ogbuji (chee-meh) Cleveland Clinic Foundation Cardiothoracic Surgery Research ogbujic@ccf.org / chimezie@gmail.com
  2. 2. Background (CT Research Roadmap) ● A large, relational registry for Cardiothoracic procedures ● Relatively small research department with very little software engineering experience ● Traditional CMS and DBMS were insufficient ● Initiated a large effort to convert to a metadata- driven XML / RDF repository (SemanticDB) ● Need to replace a productive, integrated research pipeline – Data entry, clinical Q&A, patient follow-up, concurrent study management,... – 100+ research papers per year
  3. 3. Background (Institute of Medicine Proposal) ● The Computer-Based Patient Record: An Essential Technology for Health Care – ISBN: 0309055326 ● Old but very relevant set of requirements by the IOM (still unfulfilled). ● A comprehensive attempt to address all the requirements: technological, clinical, procedural, etc.. ● Can be (completely) addressed with Semantic Web architecture, document processing, and “Web 2.0” architecture.
  4. 4. CPR: Functional Requirements ● Uniform, extensible record content ● (Standard) record formats ● System performance ● Linkages ● Intelligence ● Reporting Capabilities ● Security ● Multi-views ● Accessiblity
  5. 5. Definitions: KR / CMS ● What is Knowledge Representation (KR)? ● What is a Knowledge Base (KB)?: – A database system which facilitates deductive reasoning over a KR – Commonly called Rule-based Systems ● What are Expert Systems? ● What is a Content Management System (CMS)?
  6. 6. Knowledge Representation ● Older ideas at corners, newer ideas along sides (Credit: Conrad Barski, M.D.)
  7. 7. Content Management System: The What ● The terms CMS and Content Repository are essentially interchangeable ● Modern content repositories are best characterized by JSR 170 / 283 ● “.. a high-level information management system that is a superset of traditional data repositories” ● Integrated support for the XPath data model is the most prominent feature (native document management)
  8. 8. Content Repository Feature Set ● Modern CMS standards cover document management effectively – Read/write access – Versioning – Event monitoring – Document-level access control – Concurrent access – Cross-linking – Profiles and Document Types
  9. 9. Anatomy of a JSR 170 Implementation ● Jack Rabbit ● Component-based – Content Applications – Content Repository API – Implementation
  10. 10. Knowledge Bases and CMS ● What of the requirements that Expert Systems meet? ● Document management and knowledge management systems are historically isolated from each other ● XML & RDF are contemporary manifestations of these methodologies ● They have remained as isolated as their predecessors ● They typically only coincide with regards to syntax
  11. 11. XML & RDF: Eating and Having your Cake ● Classic example of where the document-oriented approach falls short: – Modern EHR cannot facilitate dynamic research ● Unified infrastructure for document and knowledge management is needed ● One of the earliest examples: – 4Suite Server version 0.10.0 (December 2000) ● Current state of the art (GRDDL): – Gleaning Resource Descriptions from Dialects of Language
  12. 12. GRDDL: The Elevator Pitch ● Provides a way to normalize RDF concrete syntaxes ● The problem: – Many RDF concrete syntaxes (RDF/XML,Trix, RDFa,..) – The authoritative concrete syntax is not without issues ● The solution: – Define mappings from XML dialects to RDF graphs – Use turing-complete XML pipelines ● English as a second language analogy
  13. 13. The GRDDL Picture
  14. 14. GRDDL: The Components ● Faithful Rendition – “By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.” ● Various Mechanism for nominating transformations: – Specific XML attribute, XML Namespaces, HTML Profiles, and XHTML links ● GRDDL-aware agents compute GRDDL results (RDF graphs)
  15. 15. The CMS Alternative: “Dual Representation” ● Persist XML in synchrony with its faithful rendition – Changes to the XML trigger calculation and storage of corresponding RDF ● “Dual Representation” ● Implemented by 4Suite Server Document Definitions ● The basis of how we capture patient records with maximum syntactic and semantic expressivity
  16. 16. Document Definition ● The document definition is the mapping – Usually an XSLT document
  17. 17. Content Repository Architecture
  18. 18. Overlap between Content Repository APIs
  19. 19. Dual Representation: Advantages ● Maximum expressiveness and versatility of content ● Unified naming convention and access control (more on this later) ● Uniform, concrete RDF syntaxes – For systems which speak XML fluently (XForms, POX over HTTP, WS-*, etc..) ● Cheap support for XML & RDF content negotiation ● Use of RDF as a semantic index for XML
  20. 20. Document Definition: Similarities ● GRDDL ● RDDL – Resource Directory Description Language – Human-readable descriptive material about a target – A directory of individual resources related to a target ● Nature and Purpose ● Schema, stylesheet, etc. – Lives at a namespace URI ● WXS's targetNamespace ● Common theme is a set of definitions for a document or a class of documents
  21. 21. Registering a Document to a Class ● Namespace registration works well for the web (preferred approach of W3C TAG) ● What if you don't control the content served from the namespace of an existing vocabulary? – Atom, Docbook, etc. ● A CMS is better suited for a 'closed' / 'controlled' approach – Persist membership metadata in the CMS
  22. 22. SemanticDB and Dual Representation
  23. 23. Document and Graph Granularity ● Tying documents to graphs normalizes the content granularity ● Documents and their RDF graphs can be treated uniformly: – Naming convention – Targeted querying – Access control management
  24. 24. JSR Fine-Grained Control
  25. 25. 'Controlled' Naming Convention
  26. 26. Controlled Naming Convention: Continued ● RDF Dataset (from SPARQL): – A collection of named graphs ● The RDF is stored in a graph with the same URI as the XML source document ● When RDF is used as the primary cross-document 'index' you can: – SELECT ?graph WHERE { GRAPH ?graph { ... } } – document($graph)/.. XPath .. ● The space compromise (of dual representation) can be further mitigated by only extracting a minimal RDF graph
  27. 27. Uniform Access Control for XML/RDF CMS ● Traditionally, Access Control Lists are associated with an object – Example: a file or directory in a filesystem ● Assign document / graph ACLs to a single URI – Certain users / groups can query the RDF but cannot read the XML – De-identification of EHR: HIPPA ● The 4Suite repository supports unified XML/RDF ACL
  28. 28. Going Forward ● The SPARQL RDF dataset needs to be generalized – There is a long list of representation problems solved by a formal named graph specification ● RDF graphs need to be first-class objects in CMS ● Build a common Content Repository API for XML / RDF on the JSR 170 / 283 foundation ● Where do the 4Suite Repository API and JSR 170 / 283 overlap? ● How do we generalize Document Definitions?
  29. 29. A Proposal for XML/RDF CMS
  30. 30. Primary Takeaways ● We need to stop thinking of XML & RDF as mutually exclusive solutions to similar problems ● CMS standards are needed for the next generation of semantic / rich web applications ● These standards can preemptively level the landscape of toolkits in this space
  31. 31. References ● D. Nuescheler et al, JSR 170: Content Repository for Java – http://jcp.org/en/jsr/detail?id=170 ● D. Connolly, Gleaning Resource Descriptions from Dialects of Language – http://www.w3.org/TR/grddl/ ● J. Borden, T. Bray, Resource Directory Description Language – http://www.rddl.org/ ● E. Prud'hommeaux, A. Seaborne, SPARQL Query Language for RDF – http://www.w3.org/TR/rdf-sparql-query/ ● Fourthought Inc., 4Suite – http://4Suite.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×