How to Troubleshoot Apps for the Modern Connected Worker
Tools for Next Generation of CMS: XML, RDF, & GRDDL
1. Tools for Next Generation of CMS: XML, RDF,
& GRDDL
Chimezie Ogbuji (chee-meh)
Cleveland Clinic Foundation
Cardiothoracic Surgery Research
ogbujic@ccf.org / chimezie@gmail.com
2. Background (CT Research Roadmap)
● A large, relational registry for Cardiothoracic
procedures
● Relatively small research department with very little
software engineering experience
● Traditional CMS and DBMS were insufficient
● Initiated a large effort to convert to a metadata-
driven XML / RDF repository (SemanticDB)
● Need to replace a productive, integrated research
pipeline
– Data entry, clinical Q&A, patient follow-up, concurrent
study management,...
– 100+ research papers per year
3. Background (Institute of Medicine
Proposal)
● The Computer-Based Patient Record: An Essential
Technology for Health Care
– ISBN: 0309055326
● Old but very relevant set of requirements by the
IOM (still unfulfilled).
● A comprehensive attempt to address all the
requirements: technological, clinical, procedural,
etc..
● Can be (completely) addressed with Semantic Web
architecture, document processing, and “Web 2.0”
architecture.
4. CPR: Functional Requirements
● Uniform, extensible record content
● (Standard) record formats
● System performance
● Linkages
● Intelligence
● Reporting Capabilities
● Security
● Multi-views
● Accessiblity
5. Definitions: KR / CMS
● What is Knowledge Representation (KR)?
● What is a Knowledge Base (KB)?:
– A database system which facilitates
deductive reasoning over a KR
– Commonly called Rule-based Systems
● What are Expert Systems?
● What is a Content Management System
(CMS)?
7. Content Management System:
The What
● The terms CMS and Content Repository are
essentially interchangeable
● Modern content repositories are best characterized
by JSR 170 / 283
● “.. a high-level information management system that
is a superset of traditional data repositories”
● Integrated support for the XPath data model is the
most prominent feature (native document
management)
8. Content Repository Feature Set
● Modern CMS standards cover document
management effectively
– Read/write access
– Versioning
– Event monitoring
– Document-level access control
– Concurrent access
– Cross-linking
– Profiles and Document Types
9. Anatomy of a JSR 170 Implementation
● Jack Rabbit
● Component-based
– Content Applications
– Content Repository API
– Implementation
10. Knowledge Bases and CMS
● What of the requirements that Expert Systems
meet?
● Document management and knowledge
management systems are historically isolated from
each other
● XML & RDF are contemporary manifestations of
these methodologies
● They have remained as isolated as their
predecessors
● They typically only coincide with regards to syntax
11. XML & RDF:
Eating and Having your Cake
● Classic example of where the document-oriented
approach falls short:
– Modern EHR cannot facilitate dynamic research
● Unified infrastructure for document and
knowledge management is needed
● One of the earliest examples:
– 4Suite Server version 0.10.0 (December 2000)
● Current state of the art (GRDDL):
– Gleaning Resource Descriptions from Dialects of Language
12. GRDDL:
The Elevator Pitch
● Provides a way to normalize RDF concrete
syntaxes
● The problem:
– Many RDF concrete syntaxes (RDF/XML,Trix, RDFa,..)
– The authoritative concrete syntax is not without issues
● The solution:
– Define mappings from XML dialects to RDF graphs
– Use turing-complete XML pipelines
● English as a second language analogy
14. GRDDL:
The Components
● Faithful Rendition
– “By specifying a GRDDL transformation, the author of a
document states that the transformation will provide a
faithful rendition in RDF of information (or some portion
of the information) expressed through the XML dialect
used in the source document.”
● Various Mechanism for nominating transformations:
– Specific XML attribute, XML Namespaces, HTML
Profiles, and XHTML links
● GRDDL-aware agents compute GRDDL results
(RDF graphs)
15. The CMS Alternative:
“Dual Representation”
● Persist XML in synchrony with its faithful rendition
– Changes to the XML trigger calculation and storage of
corresponding RDF
● “Dual Representation”
● Implemented by 4Suite Server Document
Definitions
● The basis of how we capture patient records with
maximum syntactic and semantic expressivity
16. Document Definition
●
The document definition is the mapping
– Usually an XSLT document
19. Dual Representation:
Advantages
● Maximum expressiveness and versatility of content
● Unified naming convention and access control
(more on this later)
● Uniform, concrete RDF syntaxes
– For systems which speak XML fluently (XForms, POX
over HTTP, WS-*, etc..)
● Cheap support for XML & RDF content negotiation
● Use of RDF as a semantic index for XML
20. Document Definition:
Similarities
● GRDDL
● RDDL
– Resource Directory Description Language
– Human-readable descriptive material about a target
– A directory of individual resources related to a target
● Nature and Purpose
● Schema, stylesheet, etc.
– Lives at a namespace URI
● WXS's targetNamespace
● Common theme is a set of definitions for a
document or a class of documents
21. Registering a Document to a Class
● Namespace registration works well for the web
(preferred approach of W3C TAG)
● What if you don't control the content served from
the namespace of an existing vocabulary?
– Atom, Docbook, etc.
● A CMS is better suited for a 'closed' / 'controlled'
approach
– Persist membership metadata in the CMS
23. Document and Graph Granularity
● Tying documents to graphs normalizes the content
granularity
● Documents and their RDF graphs can be treated
uniformly:
– Naming convention
– Targeted querying
– Access control management
26. Controlled Naming Convention:
Continued
● RDF Dataset (from SPARQL):
– A collection of named graphs
● The RDF is stored in a graph with the same URI as
the XML source document
● When RDF is used as the primary cross-document
'index' you can:
– SELECT ?graph WHERE { GRAPH ?graph { ... } }
– document($graph)/.. XPath ..
● The space compromise (of dual representation) can
be further mitigated by only extracting a minimal
RDF graph
27. Uniform Access Control for
XML/RDF CMS
● Traditionally, Access Control Lists are associated
with an object
– Example: a file or directory in a filesystem
● Assign document / graph ACLs to a single URI
– Certain users / groups can query the RDF but cannot
read the XML
– De-identification of EHR: HIPPA
● The 4Suite repository supports unified XML/RDF
ACL
28. Going Forward
● The SPARQL RDF dataset needs to be generalized
– There is a long list of representation problems solved by
a formal named graph specification
● RDF graphs need to be first-class objects in CMS
● Build a common Content Repository API for XML /
RDF on the JSR 170 / 283 foundation
● Where do the 4Suite Repository API and JSR 170 /
283 overlap?
● How do we generalize Document Definitions?
30. Primary Takeaways
● We need to stop thinking of XML & RDF as mutually
exclusive solutions to similar problems
● CMS standards are needed for the next generation
of semantic / rich web applications
● These standards can preemptively level the
landscape of toolkits in this space
31. References
● D. Nuescheler et al, JSR 170: Content Repository for Java
– http://jcp.org/en/jsr/detail?id=170
● D. Connolly, Gleaning Resource Descriptions from Dialects of Language
– http://www.w3.org/TR/grddl/
● J. Borden, T. Bray, Resource Directory Description Language
– http://www.rddl.org/
● E. Prud'hommeaux, A. Seaborne, SPARQL Query Language for RDF
– http://www.w3.org/TR/rdf-sparql-query/
● Fourthought Inc., 4Suite
– http://4Suite.org