Tools for Next Generation of CMS: XML, RDF, & GRDDL


Published on

Published in: Technology
  • Be the first to comment

Tools for Next Generation of CMS: XML, RDF, & GRDDL

  1. 1. Tools for Next Generation of CMS: XML, RDF, & GRDDL Chimezie Ogbuji (chee-meh) Cleveland Clinic Foundation Cardiothoracic Surgery Research /
  2. 2. Background (CT Research Roadmap) ● A large, relational registry for Cardiothoracic procedures ● Relatively small research department with very little software engineering experience ● Traditional CMS and DBMS were insufficient ● Initiated a large effort to convert to a metadata- driven XML / RDF repository (SemanticDB) ● Need to replace a productive, integrated research pipeline – Data entry, clinical Q&A, patient follow-up, concurrent study management,... – 100+ research papers per year
  3. 3. Background (Institute of Medicine Proposal) ● The Computer-Based Patient Record: An Essential Technology for Health Care – ISBN: 0309055326 ● Old but very relevant set of requirements by the IOM (still unfulfilled). ● A comprehensive attempt to address all the requirements: technological, clinical, procedural, etc.. ● Can be (completely) addressed with Semantic Web architecture, document processing, and “Web 2.0” architecture.
  4. 4. CPR: Functional Requirements ● Uniform, extensible record content ● (Standard) record formats ● System performance ● Linkages ● Intelligence ● Reporting Capabilities ● Security ● Multi-views ● Accessiblity
  5. 5. Definitions: KR / CMS ● What is Knowledge Representation (KR)? ● What is a Knowledge Base (KB)?: – A database system which facilitates deductive reasoning over a KR – Commonly called Rule-based Systems ● What are Expert Systems? ● What is a Content Management System (CMS)?
  6. 6. Knowledge Representation ● Older ideas at corners, newer ideas along sides (Credit: Conrad Barski, M.D.)
  7. 7. Content Management System: The What ● The terms CMS and Content Repository are essentially interchangeable ● Modern content repositories are best characterized by JSR 170 / 283 ● “.. a high-level information management system that is a superset of traditional data repositories” ● Integrated support for the XPath data model is the most prominent feature (native document management)
  8. 8. Content Repository Feature Set ● Modern CMS standards cover document management effectively – Read/write access – Versioning – Event monitoring – Document-level access control – Concurrent access – Cross-linking – Profiles and Document Types
  9. 9. Anatomy of a JSR 170 Implementation ● Jack Rabbit ● Component-based – Content Applications – Content Repository API – Implementation
  10. 10. Knowledge Bases and CMS ● What of the requirements that Expert Systems meet? ● Document management and knowledge management systems are historically isolated from each other ● XML & RDF are contemporary manifestations of these methodologies ● They have remained as isolated as their predecessors ● They typically only coincide with regards to syntax
  11. 11. XML & RDF: Eating and Having your Cake ● Classic example of where the document-oriented approach falls short: – Modern EHR cannot facilitate dynamic research ● Unified infrastructure for document and knowledge management is needed ● One of the earliest examples: – 4Suite Server version 0.10.0 (December 2000) ● Current state of the art (GRDDL): – Gleaning Resource Descriptions from Dialects of Language
  12. 12. GRDDL: The Elevator Pitch ● Provides a way to normalize RDF concrete syntaxes ● The problem: – Many RDF concrete syntaxes (RDF/XML,Trix, RDFa,..) – The authoritative concrete syntax is not without issues ● The solution: – Define mappings from XML dialects to RDF graphs – Use turing-complete XML pipelines ● English as a second language analogy
  13. 13. The GRDDL Picture
  14. 14. GRDDL: The Components ● Faithful Rendition – “By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.” ● Various Mechanism for nominating transformations: – Specific XML attribute, XML Namespaces, HTML Profiles, and XHTML links ● GRDDL-aware agents compute GRDDL results (RDF graphs)
  15. 15. The CMS Alternative: “Dual Representation” ● Persist XML in synchrony with its faithful rendition – Changes to the XML trigger calculation and storage of corresponding RDF ● “Dual Representation” ● Implemented by 4Suite Server Document Definitions ● The basis of how we capture patient records with maximum syntactic and semantic expressivity
  16. 16. Document Definition ● The document definition is the mapping – Usually an XSLT document
  17. 17. Content Repository Architecture
  18. 18. Overlap between Content Repository APIs
  19. 19. Dual Representation: Advantages ● Maximum expressiveness and versatility of content ● Unified naming convention and access control (more on this later) ● Uniform, concrete RDF syntaxes – For systems which speak XML fluently (XForms, POX over HTTP, WS-*, etc..) ● Cheap support for XML & RDF content negotiation ● Use of RDF as a semantic index for XML
  20. 20. Document Definition: Similarities ● GRDDL ● RDDL – Resource Directory Description Language – Human-readable descriptive material about a target – A directory of individual resources related to a target ● Nature and Purpose ● Schema, stylesheet, etc. – Lives at a namespace URI ● WXS's targetNamespace ● Common theme is a set of definitions for a document or a class of documents
  21. 21. Registering a Document to a Class ● Namespace registration works well for the web (preferred approach of W3C TAG) ● What if you don't control the content served from the namespace of an existing vocabulary? – Atom, Docbook, etc. ● A CMS is better suited for a 'closed' / 'controlled' approach – Persist membership metadata in the CMS
  22. 22. SemanticDB and Dual Representation
  23. 23. Document and Graph Granularity ● Tying documents to graphs normalizes the content granularity ● Documents and their RDF graphs can be treated uniformly: – Naming convention – Targeted querying – Access control management
  24. 24. JSR Fine-Grained Control
  25. 25. 'Controlled' Naming Convention
  26. 26. Controlled Naming Convention: Continued ● RDF Dataset (from SPARQL): – A collection of named graphs ● The RDF is stored in a graph with the same URI as the XML source document ● When RDF is used as the primary cross-document 'index' you can: – SELECT ?graph WHERE { GRAPH ?graph { ... } } – document($graph)/.. XPath .. ● The space compromise (of dual representation) can be further mitigated by only extracting a minimal RDF graph
  27. 27. Uniform Access Control for XML/RDF CMS ● Traditionally, Access Control Lists are associated with an object – Example: a file or directory in a filesystem ● Assign document / graph ACLs to a single URI – Certain users / groups can query the RDF but cannot read the XML – De-identification of EHR: HIPPA ● The 4Suite repository supports unified XML/RDF ACL
  28. 28. Going Forward ● The SPARQL RDF dataset needs to be generalized – There is a long list of representation problems solved by a formal named graph specification ● RDF graphs need to be first-class objects in CMS ● Build a common Content Repository API for XML / RDF on the JSR 170 / 283 foundation ● Where do the 4Suite Repository API and JSR 170 / 283 overlap? ● How do we generalize Document Definitions?
  29. 29. A Proposal for XML/RDF CMS
  30. 30. Primary Takeaways ● We need to stop thinking of XML & RDF as mutually exclusive solutions to similar problems ● CMS standards are needed for the next generation of semantic / rich web applications ● These standards can preemptively level the landscape of toolkits in this space
  31. 31. References ● D. Nuescheler et al, JSR 170: Content Repository for Java – ● D. Connolly, Gleaning Resource Descriptions from Dialects of Language – ● J. Borden, T. Bray, Resource Directory Description Language – ● E. Prud'hommeaux, A. Seaborne, SPARQL Query Language for RDF – ● Fourthought Inc., 4Suite –