Your SlideShare is downloading. ×
Rdf Processing On The Java Platform
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Rdf Processing On The Java Platform

2,115
views

Published on

Published in: Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,115
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
88
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. RDF Processing on the JAVA Platform – A Comparative Study of Jena, JRDF and Sesame APIs Adrian Iriciuc, Monica Irina Niculescu Faculty of Computer Science, General Berthelot, 16, IAŞI 700483, ROMANIA adrian.iriciuc@infoiasi.ro, monica.niculescu@infoiasi.ro Abstract. This paper presents a comparative study of three of the most popular APIs developed for the JAVA Platform that deal with RDF Processing: Jena, JRDF and Sesame. We will take a comparative look at these APIs: the way they store the RDF triples, the programmer support, performance, support for SPARQL interrogations, and licensing. We will also offer concrete examples of how to use these APIs by providing relevant source code snippets. Key words: RDF processing, JAVA Platform, Jena, JRDF, Sesame, SPARQL, comparative study 1 Introduction The concept of semantic web has known a great development in the past few years and it has become the most important and researched topic in the web world. What is of concern today is the meaning of information and services that are available on the web. And one of the most important steps in the evolution of the semantic web con- cept is expressing the elements of the semantic web in formal specifications. For more information about the semantic web please consult [1], [2] and [3]. One of the main sources that offers such formalization, with the intent to formally describe concepts, terms and relationships, is RDF (Resource Description Frame- work). RDF proposes a method to describe and model information based on triples.. The information is divided in simple sentences made from a subject, a predicate and an object that form a rdf triple. Here is a simple example of a rdf triple that models the sentence ”Jane Austen wrote Pride And Prejudice”. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf- syntax-ns#" xmlns:lib="http://www.myResource.com/"> <rdf:Description about=" Jane Austen "> <lib:wrote> Pride And Prejudice </lib:wrote> </rdf:Description> </rdf:RDF> For more information about please consult RDF [4,5].
  • 2. 2 General Descriptions of the APIs In this section we will take a look at three of the most popular frameworks and libra- ries that deal with processing RDFs: Jena, JRDF and Sesame. All of them offer sto- rage of RDF triples, ways to manage these triples (create, add, remove, modify, ite- rate) and methods to query the RDF data. We can look at a RDF element as a graph with two nodes (the subject and the ob- ject) and an arc (the predicate). All the APIs presented here are based on this repre- sentation. In this section we will take a quick look and these three APIs. 2.1 Jena Jena is a framework for the JAVA platform that is used to build semantic web appli- cations. It offers support for RDF, RDFS, OWL and SPARQL. It includes: a RDF API, reading and writing RDF in RDF/XML, N3 and N-Triples, an OWL API, persis- tent storage and a SPARQL query engine. The framework also contains a rule-based inference engine. The graphs that such a modeling of information produces are named in Jena mod- els. The user creates such models, empty at first, and then he creates resources (sub- jects) and properties (objects). Each arc in a RDF Model is named a statement. Such a statement is made of three parts: the subject (the resource from where the arc leaves – Jane Austen), the predi- cate (the label of the arc – wrote) and the object (the resource pointed by the arc – Pride and Prejudice). This framework works with the following notions: nodes (representing a resource), Dublin Core (metadata about web resources – see [6]), literals (strings that represent the value of a property), subject (a resource – part of the triple), object (part of the triple), predi- cate (part of the triple), property (attribute of a resource), resource (some entity), statement and triple. The API offers methods for working with models, triples and queries, amongst other func- tionalities. For detailed information about the Jena API consult [7, 8]. More about working with triples in Jena in Section 4. 2.2 JRDF JRDF is a library in JAVA that offers base implementations of RDF concepts by creating a set of standard APIs. It provides features like: a graph API (for graph oper- 2
  • 3. ations like comparisons or set based operations), creation of graph elements (state- ments, resources, nodes), triples storage, IoC support, RDF data types and SPARQL querying handling. One important feature that JRDF does not yet contain is inferenc- ing. The API works with the general notion of graphs that model de information. There are two main factories that facilitate working with triples: Elements Factory and Triple Factory. Element Factory can create elements like URI references, literals, nodes and resources. The Triple Factory offers the possibility to quickly execute fre- quent operation like adding RDF data. The user can add data without creating indi- vidual nodes and he can create RDF collections and containers. For detailed information about the JRDF API consult [9, 10]. More about working with triples in JRDF in Section 4. 2.3 Sesame Sesame is a JAVA based framework for storing, inferencing and querying RDF data. For modeling the data Sesame offers two methods: usage of a repository (through the Repository API) and a graph (through the Graph API). Methods for creating, access- ing and modifying (adding, deleting data) repositories and graph and also querying methods are made available to the user through these APIs. Graphs can be added to a repository or deleted from one. Statements can be added to a graph. These statements represent a triple consisting of a subject, predicate and object. For querying RDF data, Sesame doesn’t support SPARQL, but only their own query languages: RQL and SeRQL For detailed information about the Sesame API consult [11, 12, 13]. More about working with triples in Sesame in Section 4. 3. Storing the Triples Storing the triples directly affects the performance. Also, no matter what is the chosen way for storage, the programmer doesn’t need to have any knowledge of it. A power- ful and flexible API must provide a way to work with a RDF graph model. 3.1 Jena The concept that bases Jena architecture is to provide multiple and flexible presenta- tions of RDF graphs to the programmer while using a simple method for storing triples. The interface for storing these triples contains three operations: add statement to database, delete statement from database and find statement. The most common implementation is using a relation database with a single table (called a “triple-store”). Each statement is a row having a column for subject, a col- umn for predicate and a column for object. For more information see [24, 25]. 3
  • 4. 3.2 JRDF JRDF is using some existing libraries for storing and manipulating the triples [23]. Its implementation incorporates the following libraries to obtain the best results: Jena, Aquamarine, Sesame, Sergey and Melnick's RDF API. JRDF uses NTriples files to store the information for all graphs and its architecture is based on an interface providing methods for adding and removing graphs. Each graph adds three statements to the file (name, id, type), and based on this information there can be paths created to another file containing the triples for the wanted graph. 3.3 Sesame For storage Sesame is using a Database Management System. But for independence with the DBMS all specific code is concerned in a single architectural layer called Storage and Interface Layer (SAIL). SAIL is an API that translate client methods for RDF into specific DBMS methods. So the architecture for Sesame is this (simplified) – the following picture was taken from [13]: Query module consists of two steps of processing the query: 1. Parse the query and obtain a query model; 2. Optimize the query model and obtain an optimized model. For more information see [13]. 4
  • 5. 4. Working with Triples In this section we will take a look of the classes and methods that these APIs provide in order to work with triples (add, remove, iterate). We will provide relevant source code snippets for a better understanding. 4.1 Jena The first step in working with triples in Jena is to create a model. This is done using the ModelFactory. protected Model _model; _model = ModelFactory.createDefaultModel(); To add a new triple you first have to create a Resource and then set its property. Resource resource = _model.createResource(“www.myOrg.org/JaneAusten”); resource.addProperty(VCARD.FN, “Jane Austen”); VCARD.FN is a constant of type Property. To remove a property call one of the remove methods: resource.removeAll(VCARD.FN); To iterate through the triples of a model the user has to list the statements and iterate through them: public void iterate() { StmtIterator iter = _model.listStatements(); while (iter.hasNext()) { Statement st = iter.nextStatement(); //Process the triple/statement } } 5
  • 6. 4.2 JRDF The first step in working with triples in JRDF is to create a factory and a graph from that factory and an element factory. This is done using the JRDFFactory and GraphE- lementFactory. protected GraphElementFactory elementFactory; protected Graph graph; protected JRDFFactory jrdfFactory; jrdfFactory = SortedMemoryJRDFFactory.getFactory(); graph = jrdfFactory.getNewGraph(); elementFactory = graph.getElementFactory(); To add a new triple you first have to create the corresponding resources and then add them: URIReference s = elementFactory.createURIReference (URI.create("urn:JaneAusten”)); URIReference p = elementFactory.createURIReference (URI.create("urn:wrote”)); URIReference o = elementFactory.createURIReference (URI.create("urn:PrideAndPrejudice”)); graph.add(s, p, o); To iterate through the triples of a graph the user has to find all the triples and iterate through them: public void iterate() { ClosableIterator<Triple> triples = graph.find(AnySubjectNode.ANY_SUBJECT, Any- PredicateNode.ANY_PREDICATE, AnyObjectNode.ANY_OBJECT); while (iter.hasNext()) { Triple t = iter.next(); //Process the triple } 6
  • 7. 4.3 Sesame The first step in working with triples in Sesame is to create a value factory and a graph. protected Graph _graph; protected ValueFactory _factory; _graph = GraphImpl(); _factory = _graph.getValueFactory(); To add a new triple you first have to create the corresponding resources (URLs and Literals) and then add them: URI s = _factory.createURI("urn:", “JaneAusten”); URI p = _factory.createURI("urn:", “wrote”); Literal o = _factory.createLiteral(“PrideAndPrejudice”); _graph.add(s, p, o); To iterate through the triples of a graph the user has to call the iterator() method: public void iterate() { Iterator iter = _graph.iterator(); while (iter.hasNext()) { Object obj = iter.next(); //Process the triple } } 5. Programmer support One of the most important aspect when considering using a framework or a library is the programmer support that it offers. In this section we will analyze how well these frameworks help the user achieve his goals easily and rapidly. We will look at aspects like documentation, integration and usability. 7
  • 8. 5.1 Documentation All three APIs have documentations that are meant to provide information about the classes and methods that are available, as well as how to use them. Jena offers tutorials about the Jena API and SPARQL, Javadocs (Jena API and SPARQL), persistence systems documentation (SDB for RDF and TDB for OWL; RDB documentation is also available, but this system is deprecated), Ontology API documentation and some other information. Out of the three, it is the best documented framework (content-wise and structure-wise). For more information consult [8]. JRDF offers a few wiki pages to help the user get started with the library, that con- tain a few examples about the JRDF API and SPARQL and some other information, as well as Javadocs. Out of the three it has the most unorganized documentation, as well as too little information and tutorials about how to use the API. For more infor- mation consult [10]. Sesame offers a user manual, an user installation guide, a SeRQL manual, RQL tu- torial, Javadocs, user and system documentation and some other information. For more information see [12]. 5.2 Integration All three frameworks are easy to integrate with an application. We need to download the specific libraries (.jar files) that are provided on the home web sites of each framework and add them to our project. 5.3 Usability In general all three APIs are fairly easy to use for the basic operations but due to the simple and well documented API that Jena provides, it is the easiest one to learn and use. The hardest API to use is JRDF due to its lack of a consistent documentation. 6. SPARQL Interrogations It’s important for a RDF library to provide an easy way for interrogations. There may be many query languages and every library might have an internal way for querying, but one must not forget the meaning of RDF: standard semantic over the web (for de- scription of resources, of course). That’s why the most used query language for RDF, which is SPARQL, must have full support (and it’s also the recommendation of W3C [19]). In this section we will take a look at how the three APIs deal with SPARQL in- terrogations. 6.1 Jena Jena offers full support and easy API for SPARQL. Having the RDF document in _model variable it can be easily execute a query for a given sting: 8
  • 9. Query query = QueryFactory.create(queryString); QueryExecution qe = QueryExecutionFactory.create (query, _model); ResultSet results = qe.execSelect(); For more information see [7]. 6.2 JRDF Using SPARQL with JRDF is also easy, though it might seem harder to understand the logic behind the API. The document is contained in the graph variable, but you need a connection to a SPARQL engine: Answer answer = jrdfFactory. getNewSparqlConnec- tion().executeQuery(graph, queryString); For more information see [10, 22]. 6.3 Sesame As it says in official documentation, Sesame does not offer any support for SPARQL or other query language except for RQL and SeRQL [20, 21]. 7. Performance In this section we will analyze the three frameworks from the performance point of view. For this we performed the following tests: -we randomly generated 100 000 triples and added them to the graph in order to build a rdf document for each library; -we iterate through every triple in the graph; -we analyzed the creation/add and iteration speed, memory allocation and SPARQL query speed. 7.1 Processing/Adding Triples Speed After running our tests we obtained the following results: Jena : 596 ms JRDF : 2835 ms 9
  • 10. Sesame: 841 ms Jena and Sesame are close; Jena is a little bit faster, but JRDF is much slower. 7.2 Memory Consumption After running our tests we obtained the following results: Jena memory usage:5812 kb JRDF memory usage:128732 kb Sesame memory usage:15748 kb Again Jena is better and this time almost 3 times better than Sesame. JRDF uses much more memory than Jena, almost 20 times, and 7 times more than Sesame. So JRDF is a lot slower and uses a lot more memory. 7.3 Interrogation Efficiency After running our tests we obtained the following results: Jena : 653 ms JRDF : 261 ms This time the extra memory usage is giving results, so JRDF using 20 times more memory to query 2.5 times faster. With small documents and many queries to execute in parallel this might be good. For Sesame there is no SPARQL support, so no results. 7.4 Iteration speed After running our tests we obtained the following results: Jena : 81 ms JRDF : 2622 ms Sesame: 9 ms Sesame is the fastest this time. Sesame and Jena are really fast for the 100000 triples that they have to iterate through. With more than 2 seconds latency JRDF is much slower. 8 Licensing In this section we will take a look at the licensing terms that are used to distribute these APIs. 10
  • 11. 8.1 Jena In the case of Jena, it is used the following license (the next paragraph is taken from [14]): Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other mate- rials provided with the distribution. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 8.2 JRDF JRDF comes with two licenses: The Apache Software License, Version 1.1 [15] and GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 [16]. 8.3 Sesame Sesame is available under the BSD-style license [17]. In addition the following statement applies (this paragraph is taken from [18]): Copyright Aduna (http://www.aduna-software.com/) © 2001-2007 All rights re- served. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other mate- rials provided with the distribution. 11
  • 12. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 9 Conclusions This article was meant to provide a comparative look at three of the most important RDF processing APIs for the JAVA Platform: Jena, JRDF and Sesame. We took a look at the provided APIs, we studied how the triples are stored in each case, how to work with the triples; we have compared the APIs from the programmer support they offer and also we have looked into some performance aspects (speed, memory, inter- rogation efficiency). Support for SPARQL queries was another element we analyzed. Last we presented the licensing terms under which the APIs were created. We hope the reader finds useful information about these API and that this article provided him with sufficient information to help him decide which API would be based to use. References [1] http://en.wikipedia.org/wiki/Semantic_Web [2] http://www.w3schools.com/semweb/ [3] http://semanticweb.org/wiki/Main_Page [4] http://en.wikipedia.org/wiki/Resource_Description_Framework [5] http://www.w3.org/RDF/ [6] http://dublincore.org/ [7] http://jena.sourceforge.net/index.html [8] http://jena.sourceforge.net/documentation.html [9] http://jrdf.sourceforge.net/index.html [10] http://jrdf.sourceforge.net/documentation.html [11] http://www.openrdf.org/index.jsp [12] http://www.openrdf.org/documentation.jsp 12
  • 13. [13] Jeen Broekstra, Arjohn Kampman, Frankvan Harmelen, “Sesame: A Generic Architecture for Storing and Querying RDF and RDFS Schema” (http://www.openrdf.org/doc/papers/Sesame-ISWC2002.pdf) [14] http://jena.sourceforge.net/license.html [15] http://www.apache.org/licenses/LICENSE [16] http://www.gnu.org/licenses/lgpl-2.1.html [17] http://www.opensource.org/licenses/bsd-license.php [18] http://www.openrdf.org/license.jsp [19] http://www.w3.org/TR/rdf-sparql-query/ [20] http://www.openrdf.org/doc/sesame/users/ch06.html [21] http://www.openrdf.org/doc/rql-tutorial.html [22] http://code.google.com/p/jrdf/wiki/RelationalSPARQLOperations [23] http://docs.mulgara.org/system/jrdf.html [24] http://jena.sourceforge.net/DB/layout.html [25] E. Bodéré P.Y. Clément, A. Genoud M. Le Trocquer, M. Moras L. Pochard, V. Ribaud Ph. Saliou, “INTEGRATING THE JENA RDF API WITHIN SAKAI : TOWARDS A SEMANTIC COLLABORATING LEARNING ENVIRONMENT” 13