RDF and Java

RDF and Java

Monica Macoveiciuc and Constantin Stan

Faculty of Computer Science, Alexandru Ioan Cuza University, Iasi

Abstract. The Web is a universal medium for information, data and
knowledge exchange. The Semantic Web is an extension of the World
Wide Web, “in which information is given well-defined meaning, better
enabling computers and people to work in cooperation”[1]. RDF, to-
gether with SparQL, provide a powerful mechanism for describing and
interchanging metadata on the web. This paper presents briefly the two
concepts - RDF, SparQL - and three of the most popular frameworks
(written in Java) that offer support for RDF.

RDF and SPARQL

1 What is RDF?

RDF (Resource Description Framework) is the W3C standard for encoding
knowledge. It is a structure for describing and interchanging metadata on the
Web in numerous forms and purposes.

RDF provides a framework, that is consistent, and syntax for describing and
querying data. It also makes easy and possible sharing website descriptions.
RDF’s family of specifications are quite complex and a difficult to manage, that’s
why there are times when using the full potential of its capabilities is not an easy
thing to do. The RDF offers a model for describing resources which have proper-
ties (attributes or characteristics). Any object that is uniquely identifiable by an
URI (or Uniform Resource Identifier) is considered by RDF a resource. These re-
sources have properties associated with them and these properties are identified
by property-types which have, on their turn, associated values. Property-types
define the relations between values and resources. The values may be atomic or
other resources (which can, obviously, have properties). A group of properties
that belong to the same resource is called description.

The RDF’s core stands in the triple described above. This actually states that
only three pieces of information are all that’s needed to fully define a bit of
knowledge.

So we have the resource (or subject) - the thing that’s being described (identi-
fied by an URI), the property-type (or the predicate) such as a relationship, an
attribute or a characteristic, and, in addition to the subject and the predicate
we have the third component which is the value of the resource property type
(or the object). An RDF triple documents these three pieces of information,
within the RDF specification, in a consistent manner so that allows, in an ideal
way, consumption of the same data on both on human and on machine ends.
This allows human meaning and understanding to be interpreted consistently
and mechanically. For example let’s consider these two sentences:

I have a name, which is Monica Macoveiciuc.
I have a gender, which is female.
I have a job, which is programmer.

We can quickly identify the triple about which we talked about earlier within
the above sentences:
I (subject) have a name (property), which is Monica Macoveiciuc (property
value).

I (subject) have a gender (property), which is female (property value).
I (subject) have a job (property), which is programmer (property value).

There are many ways to represent a triple. For example we can use the 3-tuple
representation. In this case we’ll have:

subject, predicate, object Applied on the examples above we get:

{I, name, Monica Macoveiciuc}
{I, gender, female}
{I, job, programmer}

The above is just one way of serializing RDF data. The formal way to serialize
this data is the directed graph (a directed label graph). There are two main
reasons that were considered when this method was chosen as default represen-
tation and these reasons are: the graphs are extremely easy to read (there is no
confusion between the 3-tuple core elements, the can be no confusion about the
statements that are being made) and there are some RDF data models that can
be represented this way (using RDF graphs), but not in RDF/XML. The graph is
a set of nodes connected by arcs which form a pattern of node-arc-node. There
are 3 types of nodes: blank nodes, literals and uriref. RDF requires a syntax
that represents this model, in order to store instances of this model in machine
accesibile/readable files and to communicate these instaces among application.
The answer for this required syntax is XML. In order to have XML supporting
the consistent representation of semantics, RDF imposes formal structure on
it. To provide unicity within its identification RDF uses the namespace mecha-
nism (which is part of the XML technology). The RDF Schema acts as a boot-
strapping mechanism for the declaration of the necessary vocabulary used in
expressing the data model. Elements as RDF:RDF or RDF:Description have
specific meaning. Both belong to the same namespace: RDF. For example the
RDF:RDF tag marks the boundaries within an XML document where the con-
tent is intended to be written to fit into an RDF data model instance and the
RDF:Description tag is designed to reflect the corresponding data model. The
constraints imposed by RDF are there to support the consistent encoding and
exchange of standardized metadata defined by different communities.

2 What is SPARQL?

SparQL (which is pronounced “sparkle” and has as recursive acronym SPARQL
Protocol and RDF Query Language) is an RDF query language. It’s a fresh W3C
Recommendation about which Sir Tim Berners-Lee said that “will make a huge
difference”. RDF is pretty foundational to the Semantic Web. Until SparQL’s
launch, RDF had a data model, a formal semantics, and a concrete serialization
(in XML), but what it didnt have was a standard query language.

SparQL came in place and now offers to the Semantic Web and to Web 2.0
a common data manipulation language in the form of expressive query against
the RDF data model. Using WSDL 2.0, SparQL Protocol for RDF describes
a very simple web service with one operation, query which is available with
both HTTP and SOAP bindings. This operation is the way you send SPARQL
queries to other sites and the way you get back the results. The HTTP bindings
are REST-friendly and a simple SparQL protocol client takes little amount of
code in order to implement.

SparQL consists of 3 separate specifications.
The first one is the query language specification (which makes up the core).
The second is the query results XML format (which describes an XML format
dor serializing the results of an SparQL queries - SELECT, ASK). The third
specification is the data access protocol (which uses WSDL 2.0 to define simple
SOAP and HTTP protocols for remotely querying RDF databases - or any data
repository that can be mapped to the RDF model). Alltogether it consists of a
query language, a mean of conveying a query to a query processor service and
defining the XML format in which the results will arrive.

Some issues are not addressed yet by SparQL. The most notable is that it can’t
modify an RDF dataset (it’s read-only). As we mentioned previously, RDF is
build on the triple pattern (a 3-tuple consisting of subject, predicate, and ob-
ject). Similar to RDF, SPARQL is built on the triple pattern, which also consists
of a subject, predicate and object. SparQL allows to match patterns in an RDF
graph using triple patterns, which are like triples except they may contain vari-
ables in place of concrete values (the variables are used as “wildcards” to match
RDF terms in the dataset).

The SELECT query can be used to extract data from an RDF graph, returning
it as an array result set. For more complex graph patterns one should use re-
quired and/or OPTIONAL data. UNION queries are also a way of dealing with
selecting alternatives from the dataset. It is possible to apply ordering to the
results, jump forward through results using OFFSET, and LIMIT the amount of
data returned. The SparQL Query Results XML Format specification includes
several relevant examples. Given its obvious simplicity and regular structure,
manipulating this format with XSLT or XQuery is fairly trivial.

The syntax shortcuts make writing queries much simpler. These are especially
useful with repetitive graph patterns and long URIs. SparQL presents itself as
being the missing and long waited part from the Semantic Web and Web 2.0.

Java APIs for RDF

There are many frameworks for processing RDF available for Java programmers.
Some of them also oﬀer support for SPARQL inferences. This paper presents
three of the most popular frameworks: Jena, Sesame and JRDF.

3 Jena
3.1 The Model
Jena uses the concept of graph for dealing with the data: the nodes correspond
to URIs, while the edges are the triples.
The graphs are represented through the Model interface, which has diﬀerent
implementations: a memory-based one, one which uses a relational database etc.
The memory-based model is the simplest and easier to use one.

A triple is represented through an interface called Statement. A statement cor-
responds to an edge in the graph and consists of three parts:
– the subject - the resource from which the arch leaves - implements the Re-
source interface;
– the predicate - the property (the label of the arch) - implements the Property
interface;
– the object - the resource that is pointed by the arch - implements the Re-
source or the Literal interface.
The components of the statement have a common base - the RDFNode interface.

The object component is more complex. A statement can be used as the object
component of the triple, since RDF allows nested statements. Objects imple-
menting the Container, Alt, Bag, or Seq interface can also be used as objects.
A resource is declared as follows:
Model model = ModelFactory.createDefaultModel();
String resourceURL = "http://localhost:8080/George";
Resource person = model.createResource(resourceURL);
The ModelFactory method createDefaultModel() creates a memory-based model,
which is then used for creating a resource. This is done by calling the createRe-
source method, to which we provide the URI of the resource. Jena API contains
constant classes for some well known schemas, such as RDF and RDF schema,
Dublin Core and DAML. Adding the Formatted Name property of the vCard
ﬁle format can be done easily:
person.addProperty(VCARD.FN, "George");
An RDF Model is represented as a set of statements. Accessing the components
of the statement can be achieved through the getSubject, getPredicate and getO-
bject methods of the Statement class. The API provides methods for the most
common operations:
– addProperty - adds a new statement (triple) to the model;
– listSubjects - lists the subject component of each triple from the model;
– listObjects - lists the object component of each triple from the model;
– write - writes the model in RDF XML format to the output stream given as
parameter;
– read - reads the statements in RDF XML format into a model.
The Jena2 persistent storage subsystem implements an extension of the Model
class that provides transparent persistence for models through the use of a
database engine. Implementations for MySQL, HSQLDB, PostgreSQL, Oracle
and Microsoft SQL Server are provided and other databases have been added
by 3rd parties.
TDB and SDB are two components of Jena that provide large scale storage and
query of RDF datasets.
SDB is a system that uses relational databases for storage of RDF and OWL. It
supports many open source and commercial databases including MySQL, Post-
greSQL, Oracle 11g, Microsoft SQL server and IBM DB2. It scales to graphs of
100 million triples.
TDB is a non-transactional, faster database solution for use by a single system.
It scales well beyond SDB and is simpler to setup.

3.2 Inferences
SparQL is implemented in Jena through the ARQ package, and queries may be
made within Java scripts or via a SparQL client distributed with Jena.

The package containing that offers SparQL support is com.hp.hpl.jena.query.
There are four types of queries supported by the Jena classes: SELECT, ASK,
DESCRIBE, CONSTRUCT.

ASK query returns “yes” if the query’s graph pattern has any matched in the
dataset and “no” otherwise.

DESCRIBE query returns a graph containing information related to the nodes
matched in the graph pattern.

CONSTRUCT query is used for creating a RDF graph for each solution of the
query.

For running a query, one needs:

– a Query object, obtained through the create method of the QueryFactory;
– a QueryExecution object, obtained through the QueryExecutionFactory;
– an execute method, depending of the type of the query.

The results are provided in the form of a QuerySolution object, and a ResultSet
can be used to iterate over the solution. The results can be refined through
the SparQL options DISTINCT, LIMIT, OFFSET, ORDER BY, optional and
alternative matches and filters.
Jena offers support for working with multiple graphs. The DataSetFactory class
can be used to specify named graphs to be queried programmatically.

4 Sesame

4.1 The Model

As Jena does, Sesame uses a graph model for the resource. URIs are nodes,
and triples are a pair of edges (an edge from subject to predicate, and an edge
from predicate to object) each. A central concept in Sesame is the Repository.
A repository is a abstraction of storage container for RDF data. This can mean
Java objects in memory, or it can mean a relational database. Virtually all op-
erations in Sesame happens with respect to a repository: the repository is the
provider of persistence and querying capability.

The Graph API provides a representation of an RDF graph in the form of a
Java object. The Graph object is used to store the triples. In order to be able to
add statements to the graph, one must obtain a ValueFactory object from the
Graph.

Graph graph = new org.openrdf.model.impl.GraphImpl();
ValueFactory factory = graph.getValueFactory();

Adding a statement is done similar to Jena:

String resourceURL = "http://localhost:8080/human#";
URI subject = factory.createURI(resourceURL, "person");
URI predicate = factory.createURI(namespace, "hasName");
Literal object = factory.createLiteral("George");
graph.add(subject, predicate, object);

Sesame offers the possibility of running SeRQL-construct queries in order to cre-
ate and update graphs. Another capability of the framework is allowing adding
and removing graphs from a repository.
SAIL is Sesame’s abstraction from the storage format used and also provides
reasoning support. In the persistence layer, there are SAIL implementations for
PostgreSQL, MySQL, SQL Server and Oracle database. SAIL can be used to
implement concurrent access handling and caching. Each Sesame repository has
its own SAIL object to represent it.

There are few operations that are defined by the SAIL abstraction, such as
adding and removing triples, starting and committing transactions, clearing the
repository etc.

4.2 Inferences

Sesame does not offer support for SparQL, but it does include a new RDF/RDFS
query language, SeRQL.

SeRQL stands for “Sesame RDF Query Language”. It combines the best fea-
tures of other query languages (RQL, RDQL, N-Triples, N3), also adding some
of its own. Its most important features include: RDF Schema support, XML
Schema datatype support, graph transformation, optional path matching.

SparQL and SeRQL are quite similar: they both support advanced path ex-
pressions as branching and chaining, optional paths and partial match of the
target graph. SeRQL allows SELECT, CONSTRUCT and DESCRIBE query

types and their functionality is similar to the one provided by SparQL.

When speaking about the set operations, SparQL is limited, UNION being the
only operation allowed. SeRQL offers support for more operations:

– union - UNION;
– intersection - INTERSECT;
– difference - MINUS;

The operators IN, ANY, ALL, EXISTS and nested queries are other features
supported by SeRQL.

Some limitations of SeRQL include the missing of ORDER BY clause and no
support for regular expressions.

5 JRDF

5.1 The Model

JRDF Java RDF Binding is an attempt to create a standard set of APIs and
base implementations to RDF using Java. It is based on existing libraries, such
as Jena, Sesame, Aquamarine and Sergey Melnik’s RDF API. Unlike the other
frameworks, JRDF tries to deal with most of the aspects that are useful for Java
programmers and tp ensure a high degree of modularity. It includes a default
memory implementation that can be used in conjunction with Mulgara to pro-
vide a scalable RDF solution.

As Jena and Sesame, JRDF offers a graph-based view of the RDF data. The
Graph interface is used for the representation of the graph. A graph consists of
RDF structures such as triples, literals, URI References. A graph is created as
follows:

JRDFFactory factory = SortedMemoryJRDFFactory.getFactory();
Graph graph = factory.getGraph();
GraphElementFactory elementFactory = graph.getElementFactory();
Node node = elementFactory.createURIReference(URI.create("urn:node"));
graph.add(node, node, node);

The methods provided by the API allow adding, removing and finding triples.

The components of the triple - the subject, the predicate and the object - have a
common base: the Node interface. This represents the top of the class hierarchy
of the JRDF model. The Node is subclassed by the positional nodes: Subject,
Predicate and Object. These are also subclassed by other types of node, such as
URI, Literal and bnode (the blank node).

There are four JRDF Graph implementations:
1. The memory graph - it is included in the jrdf jar and it is useful for small
graphs.
2. The server-side JRDF Graph - it is a server-side interface provided by
Mulgara. The graph is created in the JVM and can be used for direct access to
the database using a graph API.
3. The client JRDF Graph - Mulgara provides a client-side JRDF graph
interface for accessing a model, which represents a scalable solution for remote
client applications.
4. iTQL graph - this is a read-only graph that can be created from the results
of an iTQL query (used for retrieving data and updating Mulgara databases).
This offers the possibility of displaying the results as a subgraph.

5.2 Inferences

JRDF contains an implementation of SparQL, although it is not complete. But
the API does offer support for developing a powerful query engine. Such an
implementation (based on JRDF) requires a mapping between RDF and the
Relational Model.

An approach for this is using a modified relational algebra to represent the
JOIN, UNION and OPTIONAL operations. This algebra must support untyped
relations and operations. These must be defined to work with tuples of differing
attributes, to cover all the possible types that a tuple can contain.

6 Support, Documentation and Licensing

Jena, Sesame and JRDF are all cross-platform and they are available under
BSD-style license. However, Jena seems to be the most popular among these
solutions. This is because it provides a robust API and great support for rea-
soning, along with good documentation and support for developers.

Jena Documentation page contains the public API, together with a tutorial and
a FAQ section. Great attention is paid to practical examples - there are many

HowTo’s included, covering a large are of interest, from creating models to con-
currency and locking issues. Other resources are presented, such as SparQL, with
useful links. There is also a mailing list (jena-dev) and a large dev-community
built around the project. The Jena website includes a user contributions page,
which contains really interesting examples provided by the Jena users.

Sesame Documentation is comparable to the one provided by Jena. A user man-
ual describes in detail, with examples, each part of the framework. The Docu-
mentation section includes some tutorials, FAQs and links to external resources.
There are also some mailing lists and an old (now not functional) forum. The
users can also report bugs and problems through an Issue Tracker.

JRDF oﬀers less support for developers than the other two frameworks. A Wiki
section contains some basic description and examples. Javadocs are available for
six releases of the project, providing a good way of tracking the changes. There
is also a mailing list and some links to related publications.

Conclusion

All three frameworks are mature enough to support complex applications. Each
of them is better than the other under certain aspects, and it is the user who
should decide which API to use to best cover the application’s needs. One criteria
to take into account is the query language that the application needs to use, since
Sesame doesn’t support SparQL (although it does come with its own solution)
and neither does JRDF. Sesame provides support in scripting languages - Perl,
PHP5 - which can be really useful. JRDF is a good example of good practice,
trying to use standard Java conventions.

References
[1] Berners-Lee, T.; Hendler, J.; Lassila, O.: The Semantic Web. Scientiﬁc American
Magazine (March 26, 2008)
[2] Powers, Shelly: Practical RDF. O’Reilly 2003
[3] http://jena.sourceforge.net/
[4] http://www.xml.com/pub/a/2001/01/24/rdf.html
[5] http://www.ibm.com/developerworks/xml/library/j-sparql/
[6] http://www.openrdf.org/documentation.jsp
[7] http://www.dlib.org/dlib/may98/miller/05miller.html
[8] http://www.oreillynet.com/xml/blog
[9] http://www.xml.com/pub/a/2005/11/16/introducing-sparql-querying-semantic-
web-tutorial.html
[10] http://www.w3.org/TR/rdf-sparql-query/
[11] http://en.wikipedia.org/wiki/SPARQL

RDF and Java

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to RDF and Java

Similar to RDF and Java (20)

Recently uploaded

Recently uploaded (20)

RDF and Java