1. RDF APIs using .NET Framework
SemWeb & dotNetRDF
Mazilu Liviu-Andrei
Pintilie Radu-Stefan
ISS2
In the following we will compare two .NET Framework APIs for
working with RDF. The two are SemWeb and dotNetRDF.
We will compare the two frameworks using the following criteria :
IDE integration, triple storage, SPARQL interogation support,
performance, level of documentation and licensing.
SemWeb
URI:http://razor.occams.info/code/semweb/
Author: Joshua Tauberer
Current version : 1.064 (11/05/2009)
License: GNU GPL v2
dotNetRDF
URI: http://www.dotnetrdf.org/
Author: Rob Vesse
Current version: Version 0.1.2 Alpha (27/11/2009)
License: GNU GPL
Why SemWeb and dotNetRDF?
These two libraries were chosen because both were easy to
integrate in .NET framework. A simple reference to the dynamic libraries
provided in the download package will give the users full acces to the APIs
provided.
2. Triple Storage
Both SemWeb and dotNetRDF use similar ways of storing triples.
SemWeb’s type for a RDF triple is defined as Statement, whyle
dotNetRDF uses a Triple type. Each node in a statement is defined by
both APIs as being either a literal or an entity although the types used
differ for each implementation.
SemWeb
Entity computer = new
Entity("http://example.org/computer");
Entity description = new
Entity("http://example.org/description");
Entity says = "http://example.org/says";
Entity wants = "http://example.org/wants";
Statement assertion = new Statement(computer, says,
new Literal("Hello world!"));
Taken from [1]
dotNetRDF
URINode dotNetRDF = CreateURINode(new
Uri("http://www.dotnetrdf.org"));
URINode says = CreateURINode(new
Uri("http://example.org/says"));
LiteralNode helloWorld = CreateLiteralNode("Hello
World");
LiteralNode bonjourMonde = CreateLiteralNode("Bonjour
tout le Monde", "fr");
new Triple(dotNetRDF, says, helloWorld);
new Triple(dotNetRDF, says, bonjourMonde);
Taken from [2]
To store this triples SemWeb uses what they call a MemoryStore,
whyle dotNetRDF uses a Graph, thus the second one has more
resemblance.
3. SemWeb
MemoryStore store = new MemoryStore();
store.Add(new Statement(computer, says, (Literal)"Hello
world!"));
store.Add(new Statement(computer, wants, desire));
store.Add(new Statement(desire, description,
(Literal)"to be human"));
store.Add(new Statement(desire, RDF+"type",
(Entity)"http://example.org/Desire"));
Taken from [1]
dotNetRDF
Graph g = new Graph();
g.Assert(new Triple(dotNetRDF, says, helloWorld));
g.Assert(new Triple(dotNetRDF, says, bonjourMonde));
foreach (Triple t in g.Triples) {
Console.WriteLine(t.ToString());
}
Taken from [2]
Both libraries provide various methods to read RDF from files and
URIs. The main difference is that while with SemWeb the user must select
the parser used for reading a certain file (e.g.: RDF/XML, N-Triples,
Turtle, or Notation 3) while dotNetRDF tries to chose the needed parser if
it isn’t specified manually.
In order to test the performance of the parsers that the APIs
provide us we parsed a set of large files. We found some large RDF files at
http://chefmoz.org/rdf.html . The licesing of these files allowed us to
modify them, and so we did in order to obtain 3 large RDF files (10MB,
50MB, 100MB). We then used these files to benchmark the RDF/XML
parsers provided by SemWeb and dotNetRDF.
Tests were run on an Intel®Core™2 Duo CPU T7300 @ 2.00 GHZ
and 2GB Memory(RAM).
Three tests were run on each API, each test with one of the three
files obtained earlier. You can see the results in Table1 and Figure1.
(Results are displayed using hours:minutes:seconds.milliseconds format as
they are outputed by the internal StopWatch we used).
4. As you can see in benchmark it is clear that SemWeb has a way
much better implementation of the RDF parser, storage and memory
management. Actually as you can see on the benchmark when it comes to
parsing large RDF files SemWeb is as much as ten times faster than
dotNetRDF.We think that this happenes because of the MemoryStore it
uses that is a type of Sink.
So we can state that SemWeb has a better performance that
dotNetRDF.
10MB 50MB 100MB
SemWeb 00:00:00.8418498 00:00:04.5484593 00:00:10.7560606
dotNetRDF 00:00:06.6143484 00:00:47.5463143 FATAL ERROR
Table 1. Parsing times
50
45
40
35
30
10MB
25
50MB
20
100MB
15
10
5
0
SemWeb dotNetRDF
Figure 1. Parsing performance (seconds)
We also need to mention the surprise we had when we run the
100MB file test on dotNetRDF API. We did expect to take a lot of time
due to the previous test results, but we did not expect to encounter a fatal
error : Unhandled Exception: OutOfMemoryException. This occurred as
the application filled all of the 1,5GB of memory left unused. (see Picture
1).
5. Picture 1. : dotNetRDF API is a memory hog.
Besides the libraries own implementation for storing and parsing
RDF data they also use external means of storage.
Using SemWeb you can back up your RDF data by either a MySql
server, SQL server, Sqlite and PostgreSQL.[3]
dotNetRDF provides integration with Talis Platform and Virtuoso
Universal Server. Both provide native means of storing RDF data. More
references cand be found at [5] and [6].
SPARQL support
Both libraries provide full SPARQL support.
SemWeb uses a fork of Ryan Levering's SPARQL
implementation in Java converted to .Net [3]. This means it has full
SPARQL support with the option to translate SPARQL into SQL
whenever this is available.
dotNetRDF has it’s own SPARQL implementation to use on local
data. In order to query remote data it uses SPARQL endpoints or other
SPARQL implementations. [4]
6. Code samples are provided by both authors and can be found at
[3] and [4] for further information.
In our attempt to benchmark the SPARQL queries performance on
both APIs we weren’t able to query the 10MB file used earlier using the
dotNetRDF API, using a simple "SELECT * WHERE {?s ?p ?o}" . We
don’t know if that was because the nature of the RDF file, our
implementation (we had to load the Graph obtained from parsing the
RDF file into a TripleStore, which was very slow as it took about as much
time as parsing the file) or the SPARQL implementation, though the
queries seem to work fine on much smaller chunks of RDF data.
Level of documentation
Both libraries are very well documented. Both homepages contain
link to demos, tutorials, “hello world” examples and implementation
issues, although dotNetRDF has a smal edge over SemWeb when it comes
to how the information is organized.
A downfall of dotNetRDF is that it doesn’t provide any source
code, so we can’t have an insight on the implementation.
Conclusion
Both SemWeb and dotNetRDF provide good support in working
with RDF data.
Still if you would have to choose between the two, SemWeb is the
way to go. It is a more mature and complete implementation and it
provides better support for both triple storage and SPARQL interogations.
Of course this would be expected as SemWeb has over 4 years of
development while dotNetRDF has only 3 months since it’s first release,
hence we think dotNetRDF has a great potential of becoming a reliable
option for working with RDF under .NET Framework.
References
[1]:http://razor.occams.info/code/semweb/semweb-current/doc/helloworld.html
[2]:http://www.dotnetrdf.org/content.asp?pageID=Hello%20World
[3]:http://razor.occams.info/code/semweb/
[4]:http://www.dotnetrdf.org/content.asp?pageID=Querying%20with%20SPARQL
[5]:http://www.dotnetrdf.org/content.asp?pageID=Using%20the%20Talis%20Platform
[6]:http://www.dotnetrdf.org/content.asp?pageID=Using%20Virtuoso%20Universal%20Server