This document provides a comparative study of popular Java APIs for processing Resource Description Framework (RDF) data. It summarizes four main APIs: JRDF, Sesame, and Jena. For each API, it describes key features like storage methods, query support, documentation, and license. It finds that while each API has strengths, Sesame and Jena tend to have richer documentation and more developed feature sets than JRDF. The study aims to help Java developers choose the best RDF processing API for their needs.
1. Comparative study that aims RDF processing for the
Java platform
Burada Alexandru and Prelipcean Gheorghe
Computer Science Faculty, "Al. I. Cuza" University of Iasi
Software Systems Engineering specialization
Abstract. The semantic web represents the next evolution of the World Wide
Web in which the information will be identified by semantics, by both software
and human being. RDF is a metadata framework which facilitates the
understanding of information by software. Processing RDF using java platform
is an important issue today because of its portability of the compiled code. The
aim of this document is to present a comparative study of the most popular
APIs for RDF processing available for the java platform.
Keywords: java, RDF
1 Introduction
The World Wide Web was built as a network of documents which are understood
by human. His success was the possibility to post documents and retrieving them
from another people server. In the beginning the HTTP servers and browsers were
responsible only for rendering the content in the HTML form and images. All
processing and understanding was for the humans. After that came support for the
server side processing like CGI and Java Servlet API. Browsers evolved and support
Javascript and other scripting languages and many processing can be made on server
side tasks or on the client side.
Today, with a big variety of information, browsers and end-user devices, it is
practically impossible for web application developers to keep up constantly rewriting
server-side and client-side code. The major problem is the inability of web application
to understand content. Learning to understand information about content, or metadata,
is a major step toward developing a solution.
The Resource Description Framework is a standard designed for Web applications,
which depend on machine-understandable metadata, and to support interoperability
between such applications. RDF is used to create models of metadata that may be
understood by processing agents.
RDF is also a simple graph-based data model for representing information on the
Web.
2. 2 Burada Alexandru and Prelipcean Gheorghe
An RDF graph is a set of triples; each triple is made of a Subject, Predicate and
Object.
Triples are used to create relationships between the subject and object using
different predicates.
2 JRDF - An RDF Library in Java
JRDF [2] is a standard set of APIs that is trying to cover the most important
features needed by java developers. It also consists from a base implementation of
RDF.
As main features it includes persistent graphs, SPARQL GUI, distributed and
Local SPARQL servers, RDF Parsers, relational query model, SPARQL support,
BDB MapFactory, RDF Witer and global graphs. JRDF support the next java
versions:
- java 1.5, and require StAX library(ref) - StAX an interface that came from DOM
and Simple Api for XML parsing schema. Using StAX the application can load
progressively only what it needs for processing.
-java 1.6
For storing the triples it used in memory and disk based graphs with a standard
system level interface.
The persisting operation is made in manner: load RDF, do some changes and then
save. They are not stored to disk until the factory is closed. This operation includes all
graphs, which means that we cannot store a specific graph.
JRDF model uses a hierarchy of interfaces, in which the base is represented by
Node. This is sub classed by Subject, Predicate and Object. Each of those elements
from the triple is sub classed by blank node, URI and Literal
. The default behavior in JRDF is that a URI reference has to be an absolute URI.
Literals allow the creation of literal through Element Factory.
3. Comparative study that aims RDF processing for the Java platform 3
SPARQL support
JRDF provide a simple GUI interface that allows users to load the RDF/XML and
NS documents and execute the SPQRL queries.
Let’s say we just downloaded the latest version of jrdf –(the build from Nov 20
2009), and we want to execute same SPAQL queries on our RDF documents.
We can simply type the command:
java -jar jrdf-gui-0.5.6.jar
to start the JRDF GUI (a java-swing application).
The user can load an RDF document and execute SPAQL queries over this file. The
application will show the result of the query and the time taken by this operation or
the exception cause in a fail case.
The documentation that comes with the library is not so substantial – only in the
wiki page - that explains how to use the main important features of JRDF and we can
say that it is oriented to developers, since it providing lots of examples and test cases
4. 4 Burada Alexandru and Prelipcean Gheorghe
in the binary sources. At this moment the library is still developed and they want to
add support for transactions, security, event handling for adding or removing nodes
from graph and a RDF to java object API – similar to hibernate.
The code License –The JRDF library is licensed under Apache License 2.0
3 Sesame: A Generic Architecture for Storing and Querying
RDF and RDF Schema
Sesame is a generic architecture for storage and querying large quantities of
metadata in RDF and RDF Schema. Sesame is built as an independent
implementation from any storage devices, which means that we can use for storage
relational databases, triple storage and object-oriented databases without changing the
code for query engine or other functionalities. It also offers support for concurrency
control, export of RDF and RDF Schema information and a query engine for RQL.
For storage of metadata it needs, of course, a repository, but this issue involves some
problems because there are lots of data base management systems and each of these
has strengths and weaknesses, targeted platforms and API’s. In this case Sesame
chose anther strategy: to concentrate all specific DBMS code in a single layer called
Storage And Inference Layer (SAIL). This layer offer to clients RDF methods and
translate them to a specific DBMS.
The RQL query module.
The RQL module is one module implemented in Sesame and it consists from a
series of steps that are made in order to execute a RQL query. The execution cycle of
a RQL is next one: Sesame translate the RQL query through a parser (via the object
model) into a set of calls to the SAIL, the result will be optimized by Sesame query
optimizer and will produce an optimized query model. A natural consequence of this
choice to evaluate queries in the SAIL is that it needs to devise several optimization
techniques in the engine and the SAIL API implementation, since it cannot rely on
any given DBMS to do this. A lack of SAIL module is the unused transaction
functionalities.
The admin module
This module is used for inserting RDF data and RDF Schema information into a
repository. It offers two functionalities: adding RDF data/ schema information and
clearing the repository. As a future functionality will be the partial delete. This
module retrieves the information from RDF source using a RDF streaming parser (a
parser that is part of Jena). This parser gives the admin module the information as a
triple - Subject, Predicate and Object. The current version of Sesame has no support
for versioning.
The RDF export module
The RDF Export Module is a very simple module. This module is able to export
the contents of a repository formatted in XMLserialized RDF. The idea behind this
5. Comparative study that aims RDF processing for the Java platform 5
module is that it supplies a basis for using Sesame in combination with other RDF
tools, as all RDF tools will at least be able to read this format.
Differences between Sesame and other APIs
APIs like Jena or Redland focus on RDF triple set, leaving the interpretation of
these triple as an exercise for the user. In SAIL, those RDFS tasks are handled
internally. The main reason is related to the relationship the efficiency and the actual
storage model used. Another interesting feature in SAIL is concurrency handling.
This feature was introduced because of the given RQL was broken down into several
operations and they need to assure consistency over multiple operations.
The documentation offered by Sesame project is very rich and it included lots
example. The project is available under a BSD-style license.
4 Jena - “a Java framework for writing Semantic Web
applications”
Jena [5] is an open source software under a BSD license. It implements APIs for
dealing with Semantic Web building blocks such as RDF and OWL. Jena manages
the triples with an API called Model which can be created from a file system or a
remote file. An RDF document is represented as a set of statements. It uses JDBC
(java data base connectivity) for binding to an existing RDBMS. Jena contains a rich
set of features for dealing with RDF: the methods for reading and writing RDF as
XML, save an RDF model to a file and loads it when it’s needed, the ability to reason
using custom rules, and OWL-based ontology processing. It also offers a complete
guide to the operations it support but few example in the binary sources.
Using SPARQL with Jena
The support for SPARQL in Jena is currently available through a module called
ARQ. ARQ is under active development, and is not yet part of the standard Jena
distribution.
The integration of Jena with the most popular IDE used by java developers is
pretty simple, because for eclipse already exists a plugin called “Jena 2.0 Library
Plugin”, and for Netbeans you need only to include the library in the project you want
to use it.
6. 6 Burada Alexandru and Prelipcean Gheorghe
Conclusion
A very good report about a set of open source triple store systems and performance
tests within a common hardware, software and dataset environment can be found on
the Ryan Lee report - Scalability Report on Triple Store Applications [7]. RDF holds
great promise for the future of the Web. As the technology matures, it will become
increasingly simple to create new applications by generating RDF models. Such
models may be generated based on very high-level interactive specifications and data
analysis. Maybe in the future non-programmers will be able to build very
sophisticated internet applications.
References
1. http://code.google.com/p/jrdf/
2. http://jrdf.sourceforge.net/
3. http://www.openrdf.org/doc/papers/Sesame-ISWC2002.pdf
4. http://www.ibm.com/developerworks/xml/library/x-atomtordf/index.html?ca=drs-&ca=dgf-
ip
5. http://jena.sourceforge.net/documentation.html
6. http://hydrogen.informatik.tu-cottbus.de/wiki/index.php/Advanced_Jena_Rules#example9
7. http://simile.mit.edu/reports/stores/