• Save
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO

  • 557 views
Uploaded on

This analysis consists of 4 sections: ...

This analysis consists of 4 sections:
1) Triple Store Benchmarking
2) Virtuoso
3) Virtuoso Connection
4) RDF – Object Oriented Mapping

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
557
On Slideshare
557
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
2
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. RDF Analysis with Virtuoso giovedì 19 dicembre 13
  • 2. Over view Triple Store Benchmarking Virtuoso Virtuoso Connection RDF/OO Mapper giovedì 19 dicembre 13
  • 3. Triple store Benchmarking giovedì 19 dicembre 13
  • 4. How to... BSBM (Berlin SPARQL Benchmark) to compare performance of RDF and Named Graph store, as well as RDFmapped relational databases Lehigh University Benchmark (LUBM) to facilitate the evaluation of Semantic Web repositories in a standard way giovedì 19 dicembre 13
  • 5. Benchmark (1/5) After an analysis of a April 2013 BSBM experiment in which the Berlin SPARQL Benchmark version 3.1 was used to measure the performance of: Load times for SUTs (hh:mm:ss) BigData (rev. 6528) BigOwlim (v. 5.2) TDB (v. 0.9.4) Virtuoso6 (ver. 6.04) Virtuoso7 (ver. 7 .0) giovedì 19 dicembre 13 SUT 10M 100M 200M 1B BigData 00:2:39 00:25:35 00:59:25 - BigOwlim 00:2:31 00:22:47 00:47:19 4:9:39 TDB Virtuoso6 Virtuoso7 00:9:41 00:7:06 - 1:37:55 00:19:26 00:3:39 3:34:59 00:31:30 - 1:10:30 00:27:11
  • 6. Benchmark (2/5) The tables below summarize the query throughput for various type of query over all 500 runs (in QpS) Benchmark Query results: QpS (Queries per Second) BigData BigOwlim TDB 100M 200M 100M Query 1 49.955 Query 2 42.769 Query 3 37 .280 49.520 Query 1 93.773 65.385 Query 2 115.960 65.158 Query 3 170.242 61.155 43.713 38.355 200M Query 1 232.234 217 .865 Query 2 109.445 110.019 Query 3 180.245 174.216 giovedì 19 dicembre 13 100M 200M Query 1 119.048 94.877 Query 2 158.755 151.883 Query 3 84.660 70.492 Virtuoso7 Virtuoso6 100M 200M 100M 1B Query 1 125.786 75.324 Query 2 68.929 68.820 Query 3 117426 62.243 .
  • 7. Benchmark (3/5) Query 1 Find products for a given set of generic features Query 2 Query 3 giovedì 19 dicembre 13 Retrieve basic information about a specific product for display purposes Find products having some specific features and not having one feature
  • 8. Benchmark (4/5) Query 1 PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/> PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/> Query 2 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/> SELECT DISTINCT ?product ?label PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/> WHERE { PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> ?product rdfs:label ?label . ?product a %ProductType% . ?product bsbm:productFeature %ProductFeature1% . PREFIX dc: <http://purl.org/dc/elements/1.1/> ?product bsbm:productFeature %ProductFeature2% . ?product bsbm:productPropertyNumeric1 ?value1 . ! FILTER (?value1 > %x%) ! } SELECT ?label ?comment ?producer ?productFeature ?propertyTextual1 ?propertyTextual2 ?propertyTextual3  ?propertyNumeric1 ?propertyNumeric2 ?propertyTextual4 ?propertyTextual5 ?propertyNumeric4 WHERE { ORDER BY ?label LIMIT 10 %ProductXYZ% rdfs:label ?label . ! Query 3 %ProductXYZ% rdfs:comment ?comment . ! %ProductXYZ% bsbm:producer ?p . ! ?p rdfs:label ?producer . %ProductXYZ% dc:publisher ?p . PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ! %ProductXYZ% bsbm:productFeature ?f . ! ?f rdfs:label ?productFeature . ! %ProductXYZ% bsbm:productPropertyTextual1 ?propertyTextual1 . ! PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/> %ProductXYZ% bsbm:productPropertyTextual2 ?propertyTextual2 . %ProductXYZ% bsbm:productPropertyTextual3 ?propertyTextual3 . SELECT ?product ?label ! WHERE { ?product rdfs:label ?label . ?product a %ProductType% . ! ?product bsbm:productFeature %ProductFeature1% . ! ?product bsbm:productPropertyNumeric3 ?p3 . ! FILTER (?p3 < %y% ) %ProductXYZ% bsbm:productPropertyNumeric2 ?propertyNumeric2 . ! OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual4 ?propertyTextual4 } FILTER ( ?p1 > %x% ) ! ! ?product bsbm:productPropertyNumeric1 ?p1 . ! %ProductXYZ% bsbm:productPropertyNumeric1 ?propertyNumeric1 . OPTIONAL { ?product bsbm:productFeature %ProductFeature2% . ?product rdfs:label ?testVar } FILTER (!bound(?testVar)) } ORDER BY ?label LIMIT 10 giovedì 19 dicembre 13 OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual5 ?propertyTextual5 } OPTIONAL { %ProductXYZ% bsbm:productPropertyNumeric4 ?propertyNumeric4 } }
  • 9. Benchmark (5/5) Hardware Configuration Processors: 2 x Intel(R) Xeon(R) CPU E5-2650, 2.00GHz (8 cores and hyperthreading), Sandy Bridge architecture Memory: 256GB Hard Disks: 3 x 1.8TB (7,200 rpm) SATA in RAID 0 (180MB/s sequential throughput) Soft ware Configuration Operating System: Linux version 3.3.4 -3.fc16.x86_64 Filesystem: ext4 Java Version and JVM: Version 1.6.0_31, 64-Bit Server VM (build 20.6-b01) BSBM generator and test driver version: bibm-0.7 .8 giovedì 19 dicembre 13
  • 10. Conclusions about 1st step The platform chosen was Virtuoso, downstream of the phase of benchmarking short loading times high query throughput giovedì 19 dicembre 13
  • 11. Virtuoso giovedì 19 dicembre 13
  • 12. Starting with Virtuoso Starting point Installing Virtuoso Getting Started giovedì 19 dicembre 13
  • 13. Starting point Linux CentOS 6.4 run “sudo yum install gcc gmake autoconf automake libtool flex bison gperf gawk m4 make openssl-devel readline-devel wget” to install build dependencies It may be wise to open port 8890/tcp in the firewall configuration to allow external access to Virtuoso's webbased interfaces such as the Conductor run “yum update” in order to update the indexes of available packages giovedì 19 dicembre 13
  • 14. Installing Virtuoso Download Virtuoso from SourceForge and unpack it with: “tar xvpfz virtuoso-opensource-6.1.7.tar.gz” a simple configuration is: [user@centos virtuoso-opensource-6.1.7]$ ./configure --prefix=/usr/local/ --withreadline the prefix /usr/local in the above command forms a base directory for Virtuoso. There will be the following structures: /usr/local/lib/: various libraries for Jena, Sesame, JDBC and hosting; /usr/local/bin/: where the main executables (virtuoso-t, isql) live; /usr/local/share/virtuoso/vad/: used to store VAD archives; /usr/local/share/virtuoso/doc/: local offline documentation; /usr/local/var/lib/virtuoso/db/: default location for a Virtuoso instance; /usr/local/var/lib/virtuoso/vsp/: various VSP scripts. Building and Installing: [user@centos virtuoso-opensource-6.1.7]$ nice make and [user@centos virtuoso-opensource-6.1.7]$ sudo make install giovedì 19 dicembre 13
  • 15. Getting Started (1/2) Take a backup of the virtuoso.ini file in case of making erroneous changes run ”cd /usr/local/var/lib/virtuoso/db/” and “virtuoso-t -df” to start the server you can access the Conductor menu with “http://localhost:8890/conductor/” t wo system users are available: dba - the relational data and administrative account dav - the WebDAV administrative account giovedì 19 dicembre 13
  • 16. Getting Started (2/2) Conductor Helps you to manage users and automate backup, to install VAD packages, to execute SQL commands in a wed-based iSQL tool, to configure the RDF Sponger and to load more SQL/ODBC Listener Virtuoso provides a listener on port 1111/tcp. You can connect directly to this and execute SQL statements with isql tool Resource Usage The defaults with Virtuoso Open-Source give: • 160 MB process size in memory • about 29 MB database • total 237 MB footprint on disk There are 20 threads for db and/or web-server use giovedì 19 dicembre 13
  • 17. Virtuoso Connection giovedì 19 dicembre 13
  • 18. Connections used RESTFul ser vices JENA Provider SESAME Provider giovedì 19 dicembre 13
  • 19. Rest (1/4) HTTP PUT Download an example dataset (e.g. geo_coordinates_en_uris_it.ttl from Dbpedia) Load the sample data to a named graph identified by <urn:graph:update:test:put> > curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth? graph-uri=urn:graph:update:test:put" -T /root/Desktop/Dataset/geo_coordinates_en_uris_it.ttl Query the graph data: SELECT * FROM <urn:graph:update:test:put> WHERE {?s ?p ?o} giovedì 19 dicembre 13
  • 20. Rest (2/4) HTTP GET Load the sample data to a named graph identified by <urn:graph:update:test:get> > curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crudauth?graph-uri=urn:graph:update:test:get" -T /root/Desktop/Dataset/ geo_coordinates_en_uris_it.ttl Query the graph data: > curl --verbose --url "http://localhost:8890/sparql-graph-crud?graphuri=urn:graph:update:test:get" giovedì 19 dicembre 13
  • 21. Rest (3/4) HTTP DELETE Load the sample data to a named graph identified by <urn:graph:update:test:delete> > curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?graphuri=urn:graph:update:test:delete" -T /root/Desktop/Dataset/geo_coordinates_en_uris_it.ttl Delete the graph data > curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?graphuri=urn:graph:update:test:delete" -X DELETE To ensure there are no triples after the deletion there are 2 ways: curl: > curl --verbose --url "http://localhost:8890/sparql-graph-crud?graphuri=urn:graph:update:test:delete" SPARQL: SELECT * FROM <urn:graph:update:test:delete> WHERE {?s ?p ?o} giovedì 19 dicembre 13
  • 22. Rest (4/4) HTTP POST Load the sample data to a named graph identified by <urn:graph:update:test:post> > curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth? graph-uri=urn:graph:update:test:post" -X POST -T /root/Desktop/Dataset/ geo_coordinates_en_uris_it.ttl To query the graph data there are t wo ways: curl: > curl --verbose --url "http://localhost:8890/sparql-graph-crud?graphuri=urn:graph:update:test:post" SPARQL: SELECT * FROM <urn:graph:update:test:post> WHERE {?s ?p ?o} giovedì 19 dicembre 13
  • 23. What is Jena Jena is an open source Semantic Web framework for Java Provides an API to extract data from and write to RDF graphs The graphs are represented as an abstract "model" A model can be sourced with data from files, databases, URIs or a combination of these A model can also be queried through SPARQL and updated through SPARUL giovedì 19 dicembre 13
  • 24. Virtuoso Jena provider Virtuoso Jena Provider is a Native Graph Model Storage Provider for the Jena Framework It enables to query the Virtuoso RDF Quad Store by Jena RDF Frameworks Providers are available for the latest Jena 2.6.x and 2.10.x versions giovedì 19 dicembre 13
  • 25. Setup Download latest Virtuoso Jena Provider, Virtuoso JDBC driver, associated classes and sample programs from the page www.openlinksw.com Edit the sample programs VirtuosoSPARQLExampleX.java, where X = 1 to 9 Set the JDBC connection strings to a valid Virtuoso Server instance, using the form: <jdbc:virtuoso://localhost:1111/charset=UTF-8/log_enable=2", "dba", "dba"> From Eclipse, start a new project and add the following jar at the CLASSPATH: axis.jar commons-logging.jar icu4j.jar xercesImpl.jar jena-arq.jar jena-core.jar jena-iri.jar slf4j-api.jar slf4j-simple.jar virt_jena.jar virtjdbc.jar giovedì 19 dicembre 13
  • 26. Testing Once the Provider classes and sample program have been successfully compiled, the Provider can be tested using the included sample programs. Example 1 Example 2 Example 3 giovedì 19 dicembre 13 returns the contents of the RDF Quad store of the targeted Virtuoso instance reads in the contents of FOAF URIs performs simple addition and deletion operation on the content of the triple store
  • 27. What is Sesame Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema It can be used as a database for RDF and RDF Schema, or as a Java library for applications that need to work with RDF internally giovedì 19 dicembre 13
  • 28. Virtuoso Sesame provider Virtuoso Sesame Provider is a Nat i ve Graph Model Storage Pro v ide r f or t h e Se s ame Framework It allows to modify, query and reason with the Virtuoso quad store The Se s ame Re p osi tor y AP I offers a central access point for connecting to the Virtuoso quad store; it provides a Java-friendly ac c e s s p o i n t t o Vi rt u o s o, abstracting the details of the underlying machinery The Provider has been tested agains t t he late s t ve rsions, Sesame 2.7 .x. giovedì 19 dicembre 13
  • 29. Setup Download latest Virtuoso Sesame 2 Provider for the version of Sesame being used, Virtuoso JDBC dri ver, Sesame Framework,associated classes and sample programs from the page www.openlinksw.com From Eclipse, start a new project and add the following jar at the CLASSPATH: virtjdbc.jar virt_sesame.jar slf4j-api.jar slf4j-simple.jar openrdf-sesame.jar commons-io.jar giovedì 19 dicembre 13
  • 30. Testing Once the Provider classes and sample program have been successfully compiled, the Provider can be tested using the included sample programs The following tests cover the essentials for connecting to and manipulating data stored in a Virtuoso repository using the Sesame API VirtuosoTest Loading data from URL: http:/ /www.openlinksw.com/dataspace/person/kidehen@openlinksw.com/foaf.rdf Clearing triple store Loading data from file: virtuoso_driver/data.nt Loading UNICODE single triple Loading single triple Casted value type Selecting property Statement does not exists Statement exists (by resultset size) Statement exists (by hasStatement()) Retrieving namespaces Retrieving statement (http:/ /myopenlink.net/dataspace/person/kidehen http:/ /myopenlink.net/foaf/name null) Writing the statements to file: (/Users/src/virtuoso-opensource/binsrc/sesame2/results.n3.txt) Retrieving graph ids Retrieving triple store size Sending ask query Sending construct query Sending describe query giovedì 19 dicembre 13
  • 31. Conclusions In this phase of my analysis the use of Jena or Sesame providers is indifferent, beacause they are both fully operational about the triple manipulation Operations SESAME Reading RDF V V Wirting RDF V V Reasoning V V SPARQL Support V V Internal Storage V V External Storage giovedì 19 dicembre 13 JENA V V
  • 32. RDF/OO Mapping giovedì 19 dicembre 13
  • 33. Why ? Problem The explosive development of the Web has brought for ward the need of semantically rich information: a vision at the heart of the Semantic Web Having soft ware application where RDF triple are used, we often need to work with data stored in a semantic repository In such case the use of APIs of these repositories could be difficult Solution giovedì 19 dicembre 13 The use of an object-RDF mapper is useful in applications developed with object-oriented approach, to extend the features of the OO-paradigm to the RDF world
  • 34. How? The Bean!! A class that contains attributes equivalent to the semantic properties of the class and includes get and set methods JavaBean classes are written in the Java programming language according to a particular convention Used to encapsulate multiple objects into a single object (the bean) these objects can be passed as a single bean object instead of as multiple individual objects giovedì 19 dicembre 13
  • 35. Pro and con Advantages The advantages are familiarity with the beans they are the common currency of java frameworks Disadvantages The disadvantage is that it is harder to use RDF in a natural way. Pulling in disparate data sources and merging, the schemaless aspect of RDF stores, don't work that well when forced into beans giovedì 19 dicembre 13
  • 36. RDF-Mapping tools Elmo (ex Alibaba) Jenabean Sommer RDFBeans RDF2JAVA RDFReactor giovedì 19 dicembre 13
  • 37. Elmo Features BSD, Java 5.0, store Sesame (generic API) Additional functionality on top of the triple store: predictive caching (preloading properties and saving query results for future queries), query expansion (for handling owl:sameAs), dealing with metadata (reification) JavaBeans concepts for a number of well known web ontologies including Dublin Core, RSS and FOAF Dynamic Runtime JavaBean creation based on RDFS/OWL A set of tools related to the supported ontologies: RDF crawler a generic smusher framework a generic validator framework with various smushers and a validator specific to FOAF Code generation using Groovy script template Use of annotated Java interfaces, implemented by dynamic classes at runtime using Javassist giovedì 19 dicembre 13
  • 38. Jenabean Jenabean uses Jena's flexible RDF/OWL api to persist java beans. It takes an unconventional approach to binding that is driven by the java object model rather than an OWL or RDF schema. Features It works against Jena Model API it should interact with one of the t wo jena backends (SDB,TDB) use some wrapper to interact with another RDF store (SAIL,AllegroGraph) giovedì 19 dicembre 13
  • 39. Sommer Sommer just thinks of java fields as named relations. It makes those relations explicit with the @RDF annotation Features runtime via byte code rewriting no generation of code uses Java annotations store: Sesame vocabulary: any URIs giovedì 19 dicembre 13
  • 40. RDFBeans Features Does not depend on specific triplestore implementation: any supported by RDF2Go API can be used Cascade databinding to reduce development time and ensure referential integrity of complex object data structures Modular RDFBeans annotations: can be inherited from superclasses and interfaces No predefined ontologies and RDF-schemas are required for RDF data Transactions support (triplestore-specific) Extensible mechanism of mapping Java data types to RDF literals Support of basic Java Collections, optionally represented with RDF containers Support of indexed JavaBean properties Support of RDF namespaces giovedì 19 dicembre 13
  • 41. RDF2JAVA Features good command line generates code from RDFS Java classes for RDFS classes: no multiple inheritance supported and no multiple super classes very tiny, light weight project not maintained anymore (soft ware frozen but working) giovedì 19 dicembre 13
  • 42. RDFReactor Features Generates code from RDFS experimental, partial generation from OWL cardinality constraints store: via RDF2Go Jena, Sesame and YARS are supported uses Velocity for template generation giovedì 19 dicembre 13
  • 43. Conclusion Features Elmo Jenabean Sommer RDFbeans Java Annotations V V V V Storage via Sesame V X V V Storage via Jena X V X X JenaBean Generetaion based on RDFS V X X - JenaBean Generetaion based on OWL V V X - Documentation V X X V Downstream of the analysis about the mapping tools, the choice fell on Elmo Elmo is equipped with all the necessary functionality for handling triple within Virtuoso The Virtuoso provider chosen was SESAME SESAME can easily interface with Virtuoso and Elmo giovedì 19 dicembre 13
  • 44. The end giovedì 19 dicembre 13