A Comparison Between Python APIs For RDF Processing


Published on

In this note, we present and compare a few Python libraries that can be used to process and query RDF data.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

A Comparison Between Python APIs For RDF Processing

  1. 1. A comparison between Python APIs for RDF processing Diana Andreea Gorea and Lucian Bentea Faculty of Computer Science, “A.I. Cuza” University, Ia¸i s Abstract. In this note, we present and compare a few Python libraries that can be used to process and query RDF data. 1 Introduction The following sections provide a comparison between different Python libraries that allow the processing and querying of RDF data, namely: RDFLib, fyzz and pyrple. This includes an analysis of triples management, support for SPARQL queries, documentation and developer support, implementation stage and licens- ing. 2 Triples Management RDFLib provides an RDF graph container which can be seen as an unordered set of triples: RDF graph = {(subjecti , predicatei , objecti ) ; 1 ≤ 1 ≤ N }. The easiest way to add and remove triples is through the add and remove func- tions that have the following syntax add(subject, predicate, object) and: remove(subject, predicate, object). These functions add and remove RDF triples to the global RDF graph stored by your application. There is also another manner in which RDF triples can be added to the RDF graph, namely through the parse function having the following syntax: parse(source, publicID=None, format="xml") The source parameter specifies the type of serialized RDF source file storing the triples, e.g. RDF/XML, Notation3, N-Triples. The publicID parameter gives the name of the RDF graph into which the source is parsed. Finally, the format argument, implicitly set to xml, must be one of n3, xml or ntriples.
  2. 2. The fyzz library only provides methods to create SPARQL queries and has no support for triples management. In pyrple, a triple is implemented through the Triple class, whose construc- tor takes as arguments the subject, predicate and object of the triple, as instances of the Node class. The following example creates the triple representation of the sentence “John writes books”: >>> from rdf import Node, Triple >>> subject = Node(’"John"’) >>> predicate = Node(’"writes"’) >>> object = Node(’"books"’) >>> Triple(subject, predicate, object) ("John", "writes", "books") >>> The pyrple library does not provide a RDF graph container class and there- fore you must store the triples inside the Python built-in containers, e.g. lists. Querying the set of all triples can be achieved using list search methods and the Python regular expression library. The conclusion for this section is that RDFLib provides the most complete and user-friendly support for storing, adding and removing RDF triples. 3 SPARQL Queries Support RDFLib provides almost complete support for creating and running SPARQL queries. The only issues encountered and reported by users are the lack of sup- port for negative values and also the TypeError error message when using the parameter FILTER regex(’’, ’’, ’i’) in a call to the SPARQLEvaluate function. Therefore, RDFLib does not allow you to use the above type of regular expressions in SPARQL queries. The following example displays the persons known by Timothy Berners-Lee, using the query method inside a for loop. >>> from rdflib.Graph import Graph >>> from rdflib import Namespace >>> gr = Graph() >>> gr.parse("http://bigasterisk.com/foaf.rdf") >>> gr.parse("http://www.w3.org/People/Berners-Lee/card.rdf") >>> for row in gr.query( ... ’SELECT ?person WHERE { ?p1 foaf:knows ?p2 . ?p2 foaf:name ?person . }’, ... initNs = dict(foaf = Namespace("http://xmlns.com/foaf/0.1/")) ... ): ... print "Timothy Berners-Lee knows %s" % row ...
  3. 3. Timothy Berners-Lee knows Ira Fuchs Timothy Berners-Lee knows Libby Miller Timothy Berners-Lee knows Yolanda Gill Timothy Berners-Lee knows Nathan Wilson Timothy Berners-Lee knows Dave Beckett Timothy Berners-Lee knows Amy van der Hiel Timothy Berners-Lee knows Henrik Nielsen Timothy Berners-Lee knows Michael Hausenblas Timothy Berners-Lee knows Daniel Krech Timothy Berners-Lee knows Nigel Shadbolt Timothy Berners-Lee knows Jim Hendler ... The fyzz library only allows you to create SPARQL queries, without being able to run them. In order to create a query, you first need to import the parse module >>> from fyzz.yappsparser import parse and then use the parse function as in the following example, which creates the query equivalent to the one above, using RDFLib: >>> myquery = parse("""PREFIX foaf: <http://xmlns.com/foaf/0.1/> ... SELECT ?person WHERE { ?p1 foaf:knows ?p2 . ?p2 foaf:name ?person . } ... """) You may now display the properties of the myquery object, like prefixes, where, selected, limit, offset. For example, the next line displays the prefixes of the above query: >>> print myquery.prefixes ... {’foaf’: ’http://xmlns.com/foaf/0.1/’} The pyrple library, on the other hand, does not provide support for the SPARQL syntax and thus does not allow you to create and run SPARQL queries in the usual way. The conclusion is that RDFLib provides the most complete implementation for both creating and running SPARQL queries, as opposed to fyzz which only al- lows you to create queries, or pyrple which provides no support for the SPARQL syntax. 4 Documentation and Support There is plenty of documentation for RDFLib, although it is very scattered and thus difficult to follow. For someone who really wants to use this library (and has time to search for it), the information is available and well written. However, you will need to use 3 sites to find it and a bit of guess work, as a summary
  4. 4. is not always available. There is also a lack of tutorials and examples, although the API documentation is enough. List of pending and resolved issues may be found at http://code.google.com/p/rdflib/issues/list. Currently there is still someone working to fix reported issues, as the last update was last month. The documentation for pyrple is available on its official site, mostly about the installation procedure, but the main documenation, about its real usage, may be found in the archive available for download from the author’s site. This library provides no support that we are aware of. The fyzz library is very poorly documented, also due to the fact that it is a very recent implementation. This library also provides no support. Our conclusion is that the documentation resources for RDFLib and pyrple are rather poor, but provide the basic information for creating RDF processing applications. 5 Implementation Stage RDFLib has last been released on 2007, which was version 2.4. They fixed several bugs ever since, but no new releases. The bug list is still large and remains open. The latest version can be obtained via svn. The last release of pyrple is of 2004. The fyzz library is the most recent one, dated September 2009 and also the least mature implementation (only two months old). 6 Licensing The RDFLib library is licensed under the New BSD Licence. We found no licens- ing information for the pyrple and fyzz libraries. 7 Conclusion Overall, we found that RDFLib is the best choice of the three libraries investigated in this note (RDFLib, pyrple and fyzz), when creating a Python application that manages RDF triples (adding and removing), as well as creates and runs SPARQL queries. References 1. The RDFLib official site, http://www.rdflib.net. 2. RDFLib project page with links to issues list and some documentation, http://code.google.com/p/rdflib/. 3. Very well written RDFLib Wiki page, http://en.wikipedia.org/wiki/RDFLib. 4. Very thorough practical usage example for RDFLib, http://oxfordrepo.blogspot.com/2007/12/linking-items-together-by-using-rdf. html. 5. The pyrple official site, http://infomesh.net/pyrple. 6. The fyzz official site, http://www.logilab.org/project/fyzz.