XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT stylesheets
1. XSLT+SPARQL: Scripting the
Semantic Web with SPARQL
embedded into XSLT stylesheets
Diego Berrueta, Jose E. Labra and Ivan Herman
CTIC Foundation / Universidad de Oviedo / CWI
Scripting for the Semantic Web (SFSW’08)
Tenerife, 02/Jun/2008
3. The problem
Many applications need to transform
RDF into markup (e.g. XHTML)
In the XML world, we have XSLT (XML
transformations)
... what do we have in the RDF world?
3
4. Why I cannot use XSLT?
RDF/XML is complex, messy,
cumbersome ⇒ non-standard XML
serializations are required
XPath expressions (and patterns) are
designed for trees, not graphs
XSLT is designed for a different data
model
4
6. RDF is not XML, therefore...
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
7. RDF is not XML, therefore...
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
I will not use XSLT to transform RDF/XML
8. Related work
TriX, RDFXSLT: “alternative” RDF
syntaxes in XML
RDF Twig, TreeHugger: add intelligence
to XSLT processors
Topia: XSLT functions to query Sesame
using pre-SPARQL languages
XSPARQL: unification of SPARQL
and XQuery in a single language
6
9. Scripting RDF transformations
Many scripting languages have RDF
APIs, but they’re not standard
Coding transformations in script
languages leads is messy (e.g.: Vapour)
7
10. XSLT+SPARQL
Two sets of XPath functions to query
RDF models using SPARQL
Intended to be used in @select
expressions of XSLT stylesheets
Return plain, standard XML easily
tractable with XSLT
8
12. Basic functions
sparql:sparql(query [, documentUrl, ...])
sparql:sparqlEndpoint(query, endpointUrl)
Execute SPARQL queries locally or
remotely (endpoint)
Model can be extended with “FROM”
clauses
10
13. Advanced functions (I)
Retrieving and parsing RDF data is
expensive
Advanced functions designed for:
‣ Efficently query the same model
multiple times
‣ Create custom models
programatically
11
15. Advanced functions (III)
sparql’:readModel(documentUrl, syntax)
sparql’:readModel(nodeset)
Create models by parsing:
1. A document retrieved from the web
2. An XML subtree (from the XSLT
input document or even from the
XSLT itself)
13
18. Two implementations
1. Java-based extension to Apache Xalan
using Jena
Exploits the extensibility of XSLT
2. Pure XSLT implementation
Incomplete, XSLT “document()”
cannot perform conneg
16
19. Applications of XSLT+SPARQL
Transform RDF data for presentation in
XHTML, SVG...
Generate reports beyond SPARQL
capabilities
Develop scripts that retrieve and operate
with RDF data from the (semantic) web
17
21. Advanced examples
Querying DBPedia endpoint
HTML displays of SKOS thesauri
Alphabetic display
Systematic display (tree-like)
“Spider” agent for LoD
19
22. Conclusions and future work
XSLT+SPARQL overcomes the
limitations of XSLT to process RDF
XSLT+SPARQL can be used to write
declarative scripts for the semantic web
Future extensions: inference support
20
23. Thank you for your
attention
diego.berrueta@fundacionctic.org
Editor's Notes
Good morning. My name is Diego Berrueta and I’ll present an idea to extend XSLT with some functions that allow developers to query RDF graphs using SPARQL.
These are the main points that will be covered in this presentation. I’ll start by introducing the motivation for this work.
We have observed that a number of semantic web applications need a means to transform RDF data to other formats. The semantic web is all about finding RDF data from the web, combining it, querying it, reasoning with it, but at the end, the data must be presented to the user. In the XML world, there a wonderful technology called XSLT (or “XML Transformations”), which is a W3C standard, and that is designed to take some data in XML and to transform it. The result is usually other XML file, but it can also be plain text. But when it turns to RDF, which is the equivalent to XSLT?
Hey, wait a moment! Some of you may think that we don’t need a technology equivalent to XSLT in the RDF world because we can write RDF in XML. That’s the purpose of the RDF/XML syntax, isn’t it? If we want to transform RDF data, we can serialize it as RDF/XML and apply XSLT to it. Well, there are some problems with this idea. In the first place, the RDF/XML syntax is very complex: there are a lot of different ways to serialize even the simplest RDF graph. As XSLT is syntax-driven, this means that is incredibly difficult to write a XSLT stylesheet to transform data in RDF/XML. In the second place, XSLT internally uses XPath to select fragments of the input data. But XPath was designed to work on tree structures, not in graphs. Actually, what we have here is an impedance mismatch between a tool that was designed to work on data structured as trees, and the RDF data model, which is a more general data structure, a graph. Therefore, let me emphasize this point:
... even if there is a syntax to serialize RDF as XML, this doesn’t mean that RDF *is* XML. It is not. Therefore, although at a first glance it may seem feasible, in practice, we cannot use XML tools for RDF data. In particular, XSLT is not a good option to transform RDF graphs.
... even if there is a syntax to serialize RDF as XML, this doesn’t mean that RDF *is* XML. It is not. Therefore, although at a first glance it may seem feasible, in practice, we cannot use XML tools for RDF data. In particular, XSLT is not a good option to transform RDF graphs.
Now, let’s take a look at how others are tackling this problem. There are some proposals to create other serialization syntaxes for RDF in XML which are simpler than RDF/XML. Unfortunately, they are not standard, and they are very verbose. Other approach is to add some intelligence to the XSLT processors, so when they evaluate an XPath expression against an RDF graph, they flatten the graph to a dynamic tree. The third proposal here is the closest to our work, but it has become obsolete by SPARQL. Finally, Axel Polleres and others have recently proposed a clever method to unify SPARQL and XQuery in a single language, and they will present their work tomorrow in the main track, so if you’re interested, I recommend you to attend to their presentation. By the way, their paper is one of the candidates for the best paper award.
Finally, there is another way to transform the RDF data. We can write the transformation logic using our favorite scripting language: PHP, Python, whatever. But this way is not without problems. On the one hand, there is not an standarized API to access RDF data yet. Compare this with the situation in the XML world, where they have DOM and SAX. On the other hand, codifying transformation logic in conventional scripting languages usually leads to messy programs. This is one of the reasons that make XSLT so popular in the XML world. Most people don’t want to write transformation logic in Java or Python.
Therefore, let me introduce you to XSLT+SPARQL. The idea is quite simple: we defined two small sets of XPath extension functions that allow to make queries against RDF models using SPARQL. These functions are intended to be used in the “select” attributes of XSLT stylesheets. In this way, instead of selecting fragments of the XML input data, the developer can select fragments of an RDF graph. These functions return very simple XML documents that contain the result of the SPARQL query. For this purpose, we use...
... the W3C has defined a very simple XML syntax. The result of a SELECT query in SPARQL is a table with bindings for the query variables. In this syntax, each row of the table produces a “result” element, and each column produces a variable binding. In summary, we took three W3C technologies, namely XSLT, SPARQL, and the XML syntax for SPARQL results; and we defined two sets of functions that allow to bridge between RDF and XML.
As this is a workshop of developers, I’ll spend some time describing these functions, that is, I’ll explain the XSLT+SPARQL API. The first set of functions, which we call “basic functions”, contains just these two functions. They allow the developer to execute a query and they return the results. The difference between the first one and the second one is that the first one retrieves the RDF data from anywhere in the web and executes the query locally, while the second one uses a SPARQL endpoint to execute the query remotely. The first one can be used, for instance, to query the contents of a FOAF file, while the second one can be used against the DBPedia endpoint.
The second set of functions are the advanced ones. They allow to be write more efficient programs by avoiding parsing repeatedly the same RDF graph, and they also allow to build custom RDF models by merging information from different sources.
The first function of the advanced set can parse a string that contains serialized RDF data. This function does not return the results of any query, but a handler to an in-memory model.
There are two other functions to read RDF data and create handlers. The one that receives a URL fetchs the document and parse it as RDF. The last one parses a XSLT nodeset as if it was RDF/XML. This nodeset can be a fragment of the XSLT input document, but it can even be a part of the XSLT stylesheet.
This fourth function has the ability to merge two or more in-memory RDF models, identified by their handlers, into a new one. It allows the developer to build custom RDF models by picking and joining different pieces, which are parsed with the three previous functions.
Finally, the last function executes a SPARQL query against an in-memory RDF model. Note that this function does not parse any file, nor it retrieves data from the web. Therefore, it is much quicker than the two query functions we described in the basic set of functions.
How can you use these functions? We have two implementations. The first one is written in Java, and it uses the extensibility mechanism supported by the XSLT language, namely, the ability to define new user functions in a new namespace. Our code uses the Jena library to load and query RDF, and it is specific to Apache Xalan (which is an XSLT processor), although it should be easily portable to other XSLT processors. In parallel, we have a partial implementation of the basic functions written in pure XSLT. The main limitation of this portable implementation is that the document() function of XSLT lacks the ability to perform content negotiation.
The main application of XSLT+SPARQL is the transformation of RDF data to other formats, mainly to XML, and in particular, to presentational formats such a XHTML and SVG. In such role, XSLT+SPARQL works as the last step of a Semantic Web application. But there are other possibilities. For instance, the current version of SPARQL is somewhat limited with respect to its ability to generate reports, specially if you compare it with SQL. With XSLT+SPARQL, however, it is possible to do aggregation functions and grouping of the results of a query. Finally, it also possible to use XSLT+SPARQL as a language to implement some simple scripts for the semantic web.
In this example, we embed a SPARQL query within an XSLT stylesheet. This query simply fetches the FOAF file of Ivan Herman and returns the list of his friends, possibly with their mailbox and the URI of their webpage. The results are inside of a select attribute in an apply-templates element, so the XSLT processor will search for a template that matches the root element of the results. We can write one and continue the processing of the query results, for instance, to render an HTML table with the information.
This is a simple example, but we have some others which are more complex and can provide a better insight of the features of XSLT+SPARQL. For instance, we created one that does something similar to the example of Ivan’s friends, but it uses the DBPedia SPARQL endpoint to get any kind of data, for instance, a list of German cities. Other examples can produce two kinds of ISO standarized displays for thesauri from SKOS data. One of them is simply an alphabetical listing, but the second one is a hierarchal display that looks like a tree. To build it, we simply used recursive XSLT templates. Finally, we wrote an “spider” agent that has the ability to retrieve data from the web on-demand. For instance, if Ivan’s FOAF file does not contain the name of his friends, the script can de-reference the URIs and progressively build a richer RDF model.
We can summarize two conclusions. Firstly, our work can overcome the limitations of XSLT to process RDF data, regardless of its serialization format. Note that we can address RDF data from XSLT+SPARQL even if it is available in other syntaxes, such as N3. Secondly, these functions give you, the developers of scripts for the semantic web, a new language to write such scripts. And this platform still has the potential to grow, for instance, by introducing functions to perform reasoning and inference.