Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. RDF processing API's for Java Ştefan Apostoaie, Computer Science Faculty, “Al. I. Cuza” University, Iaşi, Romania Abstract. There are several RDF APIs for java, including the more popular: Jrdf, Jena, and Sesame. In this paper we try to compare them in terms of triple storage, SPARQL support, programmers support (documentation, IDE integration, learning curve, etc.), performance, interoperability, maturity and licensing. Keywords: API, RDF, SPARQL, Java 1. Introduction There are a lot of RDF APIs for Java but is hard to choose one of them. There are simple, clean implementations offering basic RDF support as there are complex ones that offers full RDF functionality for projects of any size. We try to make a comparison between some of these implementations based on their description, documentation and small examples. 1.1 Document structure In this document we have basic description for the following RDF APIs: - JRDF, - Jena, - Sesame Each API will be analyzed in terms of: - how the triples are stored, - SPARQL interrogations support, - programmers support, - maturity, - licensing. 1
  2. 2. 2. JRDF1 JRDF is an open source RDF framework for Java that uses object oriented model of RDF graphs including URIs, literals and blank nodes. It offers the following features: - a graph API including graph comparison and graph set-based operations - creating and manipulating Graph objects (Statements, Resources, Nodes, etc) - in memory and disk based graphs with a standard system level interface for storing triples - IoC support (using Spring 2) - RDF Datatypes - local (where nodes are tied to a graph/store) and global (where they are not) RDF statements - Query Handling including SPARQL support (including results, transport, etc). JRDF comes in two variants: 1. JRDF GUI: allows to read RDF/XML and N3 files and query them using SPARQL. · To open it use: · java -jar jrdf-gui-0.x.x.jar 2. The JRDF JAR is used to load an RDF/XML file. · It contains a RDF/XML parser which is used for processing RDF/XML files · Also it offers a Graph interface with methods for adding, removing, and finding triples. JRDF is a new project, so we don't expect it to be very mature, but since it is designed to use features from previous RDF API implementations such as Jena and Sesame it offers a lot of features. Version 0.5.6 is reached, so it can still contain a lot of bugs. The JRDF documentation helps the users install and use the API in a few steps. Also the Javadoc and, for those interested, the source code are published. There are a few tutorials, but the programmer must try the framework by himself to learn all the features. JRDF is released under the Apache Software License, Version 1.1, meaning it's free and open source and any redistribution should include the original JRDF license file. 3. Jena2 Jena is a framework that provides a programmatic environment for RDF, RDFS, OWL, and SPARQL. 1 2 2
  3. 3. It includes: · A RDF API · Reading and writing RDF in RDF/XML, N3 and N-Triples · An OWL API · In-memory and persistent storage · SPARQL query engine It provides methods for reading and writing RDF files and also navigating and querying a model. Jena also uses ARQ query language (which is a SPARQL implementation) for accessing RDF. The persistence of RDF and OWL data is done using two subsystems: SDB or TDB (separate downloads). SDB provides scalable storage and query of RDF datasets using conventional SQL databases. SDB is designed specifically to support SPARQL. SDB supports Microsoft SQL Server 2005, Oracle 10gR2, IBM DB2, PostgreSQL v8, MySQL 5.0, HSQLDB 1.8, H2 1, Apache Derby 10.2. TDB is a high performance, non-transactional persistence engine using custom indexing and storage. Between the two, TDB is faster and simpler to setup. Jena also supports RDB for legacy applications, but it's deprecated for new development. Jena is a relative old project (version 2.6 reached) which allows us to say that it's quite mature and stable. Backward compatibility is also assured, so we can use the new version of the framework even though we started it some time ago with a previous release. On the Jena project web site we find many tutorials and HowTo's that guide the programmer on the Jena learning steps. There we can find examples on how to create a model, use RDF Readers and Writers, Typed literals, ARP (An RDF Parser), Schemagen, and many other. Compared to JRDF, the Jena documentation is far more useful and it covers a lot more of the framework functionality. Jena is also free and open source, can be copied under some simple terms: we must keep the original license file. 4. Sesame3 Sesame is an open source RDF framework with support for RDF Schema inferencing and querying. It has been designed with flexibility in mind, can be deployed on top of a variety of storage systems (relational databases, in-memory, file systems, keyword indexers, etc.), and offers a large set of tools to developers to leverage the power of RDF and RDF Schema. Sesame contains the following components: - Sail API (Storage And Inference Layer) – low level System API for RDF stores and inferencers. Its purpose is to abstract from the storage and statements, and the writers for the reverse operation. 3 3
  4. 4. - Rio (RDF I/O) – a set of parsers and writers for various RDF file formats. The parsers can be used to translate RDF files to sets of statements, and the writers for the reverse operation. - Repository API – a higher level API that offers a large number of developer- oriented methods for handling RDF data. This API should make the life of application developers "as easy as possible". Sesame can be used as a library or as a server. If used as a library the setup is straightforward. For using it as a server the user has to setup some environment variables, and install a Java Servlet Container. Also Java 5 or newer is required for both variants. For persistence, Sesame can use - memory store (in memory persistence): can be stored to disk before shut down. - Native store: it's slower than memory store, but it isn't limited to the size of available memory. Uses B-Trees for indexing statements and more indexes can be used to speed up querying. - RDBMS (Relational DataBase Management System): PostgreSQL and MySQL are supported. The JDBC layer is used. - HTTP repository: isn't an actual store, but serves as a proxy for a store on a remote Sesame server. Sesame reached version 2.0 in December 2007 after "two years of intensive development". The most recent official version is 2.2.4 and also a 2.3 preview version is available. So the framework is mature, and if we have problems using it we can always use either community or commercial support. The Sesame user guide walks the user through the steps of downloading, installing, and basic usage of the library and the server. Also there are tutorials for RQL and SeRQL (Sesame's RDF query languages, the first will not be updated, the second still in development). Compared with the Jena documentation, Sesame doesn't really meet the expectations. Despite this it is usable, and can be learned without too much effort. A notable thing is that SPARQL support is not mentioned in the official documentation, so my guess is that it's not supported. Sesame is a complex RDF API implementation that can just as easy be used in enterprise projects and small simple applications. Sesame 2.x is available under a BSD-style license, which means that it's open-source and free, provided the original license file is not removed. References 1. main, license and documentation pages. 2. home and documentation pages. 3. home, user manual, documentation, and license pages. 4