This document discusses tools for next-generation content management systems, including XML, RDF, and GRDDL. It proposes representing content in both XML and its semantic RDF representation (dual representation) to gain the benefits of both formats. This allows document-oriented and knowledge-based approaches to be unified, addressing requirements like those in the Institute of Medicine's computer-based patient record proposal. Benefits include maximum expressiveness, unified naming and access controls, and using RDF as a semantic index for XML content.
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors' most important findings.
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors' most important findings.
WebSpa is a tool that allows the quick, intuitive (and even fun) interrogation of arbitrary SPARQL endpoints. WebSpa runs in the web browser and does not require the installation of any additional software. The tool manages a large variety of pre-defined SPARQL endpoints and allows the addition of new ones. An user account gives the possibility of saving both the interrogation and its results on the local computer, as well as further editing of the queries. The application is written in both Java and Flex. It uses Jena and ARQ application programming interface in order to perform the queries, and the results are processed and displayed using Flex.
Seminar presentation for which the entire work was conducted at Technical University Kaiserslautern. The seminar work involved understanding the Semantic Web technology along with RDF and querying mechanism. It also involved looking at technologies that are used for data storage, data management and data querying.
How to describe a dataset. Interoperability issuesValeria Pesce
Presented by Valeria Pesce during the pre-meeting of the Agricultural Data Interoperability Interest Group (IGAD) of the Research Data Alliance (RDA), held on 21 and 22 September 2015 in Paris at INRA.
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
This is a lecture note #9 for my class of Graduate School of Yonsei University, Korea.
It describes Web Ontology Language (OWL) for authoring ontologies.
Expressive Querying of Semantic Databases with Incremental Query RewritingAlexandre Riazanov
This talk briefly introduces the Incremental Query Rewriting (IQR) method (see http://link.springer.com/chapter/10.1007%2F978-1-4419-7335-1_1 ) and presents an approach for extremely expressive querying of RDF triplestores, based on IQR.
WebSpa is a tool that allows the quick, intuitive (and even fun) interrogation of arbitrary SPARQL endpoints. WebSpa runs in the web browser and does not require the installation of any additional software. The tool manages a large variety of pre-defined SPARQL endpoints and allows the addition of new ones. An user account gives the possibility of saving both the interrogation and its results on the local computer, as well as further editing of the queries. The application is written in both Java and Flex. It uses Jena and ARQ application programming interface in order to perform the queries, and the results are processed and displayed using Flex.
Seminar presentation for which the entire work was conducted at Technical University Kaiserslautern. The seminar work involved understanding the Semantic Web technology along with RDF and querying mechanism. It also involved looking at technologies that are used for data storage, data management and data querying.
How to describe a dataset. Interoperability issuesValeria Pesce
Presented by Valeria Pesce during the pre-meeting of the Agricultural Data Interoperability Interest Group (IGAD) of the Research Data Alliance (RDA), held on 21 and 22 September 2015 in Paris at INRA.
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
This is a lecture note #9 for my class of Graduate School of Yonsei University, Korea.
It describes Web Ontology Language (OWL) for authoring ontologies.
Expressive Querying of Semantic Databases with Incremental Query RewritingAlexandre Riazanov
This talk briefly introduces the Incremental Query Rewriting (IQR) method (see http://link.springer.com/chapter/10.1007%2F978-1-4419-7335-1_1 ) and presents an approach for extremely expressive querying of RDF triplestores, based on IQR.
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAPPieter De Leenheer
This lecture elaborates on RDF, RDFS, and SOAP starting from a short recap of XML, and the history of the W3C and the development of "open standard recommendations". We also compare RDF triples with DOGMA lexons. We finalise by listing shortcomings of RDFS regarding semantics, and give short overview of the history of OWL as one answer to this. A full elaboration on OWL and description logic is for another lecture.
Comparative study on the processing of RDF in PHPMSGUNC
Sharing of content on the Web is already possible through other
technologies such as FTP. It is therefore difficult to understand the need for a
single Web-based format when already there are enough formats such as
relational databases with annotated data that can be reused by other systems.
Putting information into RDF files, makes it possible for computer programs to
search, discover, pick up, collect, analyze and process information from the
web. Using RDF, a Web browser should be able to reuse the data, requiring no
additional work on the part of users, and here comes the tricky part to make
easier for web programmers to work with RDF by using some RDF libraries.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
A content repository for your PHP application or CMS?Paul Borgermans
The idea for using content repositories (CR) instead of relying on lower level database frameworks is gaining steam with several new kids on block, typically relying on NoSQL data stores.
This talk will give an overview of the current state of the art amid several use cases (including ease of use, performance and flexibility) and architectures.
CR such as Midgard, Lily, and architectures based on HBase, CouchDB, MongoDB (NoSQL stores) incombination with Information retrieval layers will be highlighted, as well is components or libraries to be used from the PHP side.
A second part treats the emgerging standard PHPCR, a standard API based on JCR (JSR-283)
See: http://www.slideshare.net/bergie/phpcr-standard-content-repository-for-php
The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...dannyijwest
The traditional Web stores huge amount of data in the form of Relational Databases (RDB) as it is good at
storing objects and relationships between them. Relational Databases are dynamic in nature which allows
bringing tables together helping user to search for related material across multiple tables. RDB are
scalable to expand as the data grows. The RDB uses a Structured Query Language called SQL to access
the databases for several data retrieval purposes. As the world is moving today from the Syntactic form to
Semantic form and the Web is also taking its new form of Semantic Web. The Structured Query of the RDB
on web can be a Semantic Query on Semantic Web.
Structured Dynamics provides 'ontology-driven applications'. Our product stack is geared to enable the semantic enterprise. The products are premised on preserving and leveraging existing information assets in an incremental, low-risk way. SD's products span from converters to authoring environments to Web services middleware and to eventual ontologies and user interfaces and applications.
Similar to Tools for Next Generation of CMS: XML, RDF, & GRDDL (20)
Reference Domain Ontologies and Large Medical Language Models.pptxChimezie Ogbuji
Large Language Models (LLMs) have exploded into the modern research and development consciousness and triggered an artificial intelligence revolution. They are well-positioned to have a major impact on Medical Informatics. However, much of the data used to train these revolutionary models are general-purpose and, in some cases, synthetically generated from LLMs. Ontologies are a shared and agreed-upon conceptualization of a domain and facilitate computational reasoning. They have become important tools in biomedicine, supporting critical aspects of healthcare and biomedical research, and are integral to science. In this talk, we will delve into ontologies, their representational and reasoning power, and how terminology systems such as SNOMED-CT, an international master terminology providing comprehensive coverage of the entire domain of medicine, can be used with Controlled Natural Languages (CNL) to advance how LLMs are used and trained.
An ontology is a computational artifact used to describe a conceptualization of some part of the world via precise, descriptive statements. In this presentation, we discuss the features of the W3C's Ontology Web Language (OWL) and how it can be used to reduce ambiguity in the semantics (i.e., the meaning) of Data Dictionary terminology.
Semantic Web Technologies: A Paradigm for Medical InformaticsChimezie Ogbuji
Some common needs for the patient registries, Electronic Health Record (EHR) systems, and clinical research repositories of the future are: semantic interoperability, adoption of standardized clinical terminology, adhoc and distributed querying interfaces, and integration with extant databases and web-based systems. A suite of standards has recently emerged from the consortium responsible for the development and oversight of the protocols of the World-wide Web (WWW). They were conceived to address data integration challenges associated with internet and intranet applications. Many of these standards and technologies are capable of addressing the challenges common to health information systems. In this talk, an introductory overview of these technologies, how they address these challenges, and a brief discussion of projects where they have been used is given.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Monitoring Java Application Security with JDK Tools and JFR Events
Tools for Next Generation of CMS: XML, RDF, & GRDDL
1. Tools for Next Generation of CMS: XML, RDF,
& GRDDL
Chimezie Ogbuji (chee-meh)
Cleveland Clinic Foundation
Cardiothoracic Surgery Research
ogbujic@ccf.org / chimezie@gmail.com
2. Background (CT Research Roadmap)
● A large, relational registry for Cardiothoracic
procedures
● Relatively small research department with very little
software engineering experience
● Traditional CMS and DBMS were insufficient
● Initiated a large effort to convert to a metadata-
driven XML / RDF repository (SemanticDB)
● Need to replace a productive, integrated research
pipeline
– Data entry, clinical Q&A, patient follow-up, concurrent
study management,...
– 100+ research papers per year
3. Background (Institute of Medicine
Proposal)
● The Computer-Based Patient Record: An Essential
Technology for Health Care
– ISBN: 0309055326
● Old but very relevant set of requirements by the
IOM (still unfulfilled).
● A comprehensive attempt to address all the
requirements: technological, clinical, procedural,
etc..
● Can be (completely) addressed with Semantic Web
architecture, document processing, and “Web 2.0”
architecture.
4. CPR: Functional Requirements
● Uniform, extensible record content
● (Standard) record formats
● System performance
● Linkages
● Intelligence
● Reporting Capabilities
● Security
● Multi-views
● Accessiblity
5. Definitions: KR / CMS
● What is Knowledge Representation (KR)?
● What is a Knowledge Base (KB)?:
– A database system which facilitates
deductive reasoning over a KR
– Commonly called Rule-based Systems
● What are Expert Systems?
● What is a Content Management System
(CMS)?
7. Content Management System:
The What
● The terms CMS and Content Repository are
essentially interchangeable
● Modern content repositories are best characterized
by JSR 170 / 283
● “.. a high-level information management system that
is a superset of traditional data repositories”
● Integrated support for the XPath data model is the
most prominent feature (native document
management)
8. Content Repository Feature Set
● Modern CMS standards cover document
management effectively
– Read/write access
– Versioning
– Event monitoring
– Document-level access control
– Concurrent access
– Cross-linking
– Profiles and Document Types
9. Anatomy of a JSR 170 Implementation
● Jack Rabbit
● Component-based
– Content Applications
– Content Repository API
– Implementation
10. Knowledge Bases and CMS
● What of the requirements that Expert Systems
meet?
● Document management and knowledge
management systems are historically isolated from
each other
● XML & RDF are contemporary manifestations of
these methodologies
● They have remained as isolated as their
predecessors
● They typically only coincide with regards to syntax
11. XML & RDF:
Eating and Having your Cake
● Classic example of where the document-oriented
approach falls short:
– Modern EHR cannot facilitate dynamic research
● Unified infrastructure for document and
knowledge management is needed
● One of the earliest examples:
– 4Suite Server version 0.10.0 (December 2000)
● Current state of the art (GRDDL):
– Gleaning Resource Descriptions from Dialects of Language
12. GRDDL:
The Elevator Pitch
● Provides a way to normalize RDF concrete
syntaxes
● The problem:
– Many RDF concrete syntaxes (RDF/XML,Trix, RDFa,..)
– The authoritative concrete syntax is not without issues
● The solution:
– Define mappings from XML dialects to RDF graphs
– Use turing-complete XML pipelines
● English as a second language analogy
14. GRDDL:
The Components
● Faithful Rendition
– “By specifying a GRDDL transformation, the author of a
document states that the transformation will provide a
faithful rendition in RDF of information (or some portion
of the information) expressed through the XML dialect
used in the source document.”
● Various Mechanism for nominating transformations:
– Specific XML attribute, XML Namespaces, HTML
Profiles, and XHTML links
● GRDDL-aware agents compute GRDDL results
(RDF graphs)
15. The CMS Alternative:
“Dual Representation”
● Persist XML in synchrony with its faithful rendition
– Changes to the XML trigger calculation and storage of
corresponding RDF
● “Dual Representation”
● Implemented by 4Suite Server Document
Definitions
● The basis of how we capture patient records with
maximum syntactic and semantic expressivity
16. Document Definition
●
The document definition is the mapping
– Usually an XSLT document
19. Dual Representation:
Advantages
● Maximum expressiveness and versatility of content
● Unified naming convention and access control
(more on this later)
● Uniform, concrete RDF syntaxes
– For systems which speak XML fluently (XForms, POX
over HTTP, WS-*, etc..)
● Cheap support for XML & RDF content negotiation
● Use of RDF as a semantic index for XML
20. Document Definition:
Similarities
● GRDDL
● RDDL
– Resource Directory Description Language
– Human-readable descriptive material about a target
– A directory of individual resources related to a target
● Nature and Purpose
● Schema, stylesheet, etc.
– Lives at a namespace URI
● WXS's targetNamespace
● Common theme is a set of definitions for a
document or a class of documents
21. Registering a Document to a Class
● Namespace registration works well for the web
(preferred approach of W3C TAG)
● What if you don't control the content served from
the namespace of an existing vocabulary?
– Atom, Docbook, etc.
● A CMS is better suited for a 'closed' / 'controlled'
approach
– Persist membership metadata in the CMS
23. Document and Graph Granularity
● Tying documents to graphs normalizes the content
granularity
● Documents and their RDF graphs can be treated
uniformly:
– Naming convention
– Targeted querying
– Access control management
26. Controlled Naming Convention:
Continued
● RDF Dataset (from SPARQL):
– A collection of named graphs
● The RDF is stored in a graph with the same URI as
the XML source document
● When RDF is used as the primary cross-document
'index' you can:
– SELECT ?graph WHERE { GRAPH ?graph { ... } }
– document($graph)/.. XPath ..
● The space compromise (of dual representation) can
be further mitigated by only extracting a minimal
RDF graph
27. Uniform Access Control for
XML/RDF CMS
● Traditionally, Access Control Lists are associated
with an object
– Example: a file or directory in a filesystem
● Assign document / graph ACLs to a single URI
– Certain users / groups can query the RDF but cannot
read the XML
– De-identification of EHR: HIPPA
● The 4Suite repository supports unified XML/RDF
ACL
28. Going Forward
● The SPARQL RDF dataset needs to be generalized
– There is a long list of representation problems solved by
a formal named graph specification
● RDF graphs need to be first-class objects in CMS
● Build a common Content Repository API for XML /
RDF on the JSR 170 / 283 foundation
● Where do the 4Suite Repository API and JSR 170 /
283 overlap?
● How do we generalize Document Definitions?
30. Primary Takeaways
● We need to stop thinking of XML & RDF as mutually
exclusive solutions to similar problems
● CMS standards are needed for the next generation
of semantic / rich web applications
● These standards can preemptively level the
landscape of toolkits in this space
31. References
● D. Nuescheler et al, JSR 170: Content Repository for Java
– http://jcp.org/en/jsr/detail?id=170
● D. Connolly, Gleaning Resource Descriptions from Dialects of Language
– http://www.w3.org/TR/grddl/
● J. Borden, T. Bray, Resource Directory Description Language
– http://www.rddl.org/
● E. Prud'hommeaux, A. Seaborne, SPARQL Query Language for RDF
– http://www.w3.org/TR/rdf-sparql-query/
● Fourthought Inc., 4Suite
– http://4Suite.org