• Save
Comparative study on the processing of RDF in PHP
Upcoming SlideShare
Loading in...5
×
 

Comparative study on the processing of RDF in PHP

on

  • 1,989 views

Sharing of content on the Web is already possible through other ...

Sharing of content on the Web is already possible through other
technologies such as FTP. It is therefore difficult to understand the need for a
single Web-based format when already there are enough formats such as
relational databases with annotated data that can be reused by other systems.
Putting information into RDF files, makes it possible for computer programs to
search, discover, pick up, collect, analyze and process information from the
web. Using RDF, a Web browser should be able to reuse the data, requiring no
additional work on the part of users, and here comes the tricky part to make
easier for web programmers to work with RDF by using some RDF libraries.

Statistics

Views

Total Views
1,989
Views on SlideShare
1,989
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Comparative study on the processing of RDF in PHP Comparative study on the processing of RDF in PHP Document Transcript

  • Comparative study on the processing of RDF in PHP Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache Faculty of Computer Science Iasi stefan.munteanu@infoiasi.ro, nicu.ursache@infoiasi.ro Abstract. Sharing of content on the Web is already possible through other technologies such as FTP. It is therefore difficult to understand the need for a single Web-based format when already there are enough formats such as relational databases with annotated data that can be reused by other systems. Putting information into RDF files, makes it possible for computer programs to search, discover, pick up, collect, analyze and process information from the web. Using RDF, a Web browser should be able to reuse the data, requiring no additional work on the part of users, and here comes the tricky part to make easier for web programmers to work with RDF by using some RDF libraries. Keywords: RDF, API, RAP, Raptor, ARC, SPARQL 1. Introduction To help web programmers choose what RDF library to work with, when their project requires, they should have a comparative study of existing RDF API’s based on PHP. W3C offers a lot of information of parsing RDF file tests with some API’s but there are not touching all the view points, so this article it trying to make a short comparative presentation of 3 packages that are offering RDF support. This packages are RDF API for PHP (that from now on we will call RAP for short), ARC API and RAPTOR RDF Parser Library (even if it is not written in PHP but in C it is included in this study because it can be ported in PHP very easy by using Redland RDF Language Bindings and it has very good performances). 2. API Description 2.1. RAP - RDF API for PHP RAP[1] is a software package for parsing, querying, manipulating, serializing and serving RDF models.
  • 2 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache Its features include: • statement-centric methods for manipulating an RDF model as a set of RDF triples • resource-centric methods for manipulating an RDF model as a set of resources • ontology-centric methods for manipulating an RDF model through vocabulary specific methods • quad- and named graph-centric methods for manipulating RDF datasets • integrated RDF/XML, N3, N-TRIPLE, TriX, GRDDL, RSS parser • integrated RDF/XML, N3, N-TRIPLE, TriX serializer • in-memory or database model storage • SPARQL query engine supporting all features of the W3C SPARQL Recommendation • SPARQL client library • RDQL query engine • inference engine supporting RDF-Schema reasoning and some OWL entailments • integrated RDF server providing similar functionality as the Joseki RDF server • integrated linked data frontend • graphical user-interface for managing database-backed RDF models • support for common vocabularies • drawing graph visualizations 2.2. Raptor Raptor[2] is a free software / Open Source C library that provides a set of parsers and serializers that generate Resource Description Framework (RDF) triples by parsing syntaxes or serialize the triples into a syntax. The supported parsing syntaxes are RDF/XML, N-Triples, TRiG, Turtle, RSS tag soup including all versions of RSS, Atom 1.0 and 0.3, GRDDL and microformats for HTML, XHTML and XML and RDFa. The serializing syntaxes are RDF/XML (regular, and abbreviated), Atom 1.0, GraphViz, JSON, N-Triples, RSS 1.0 and XMP. Raptor was designed to work closely with the Redland RDF library (RDF Parser Toolkit for Redland) but is entirely separate. It is a portable library that works across many POSIX systems (Unix, GNU/Linux, BSDs, OSX, cygwin, win32). Raptor has no memory leaks and is fast. This is a mature and stable library: • Parses content on the web if libcurl, libxml2 or BSD libfetch is available. • Supports all RDF terms including datatyped and XML literals • Optional features including parsers and serialisers can be selected at configure time. • Language bindings to Perl, PHP, Python and Ruby when used via Redland • No memory leaks • Fast
  • Comparative study on the processing of RDF in PHP 3 • Standalone rapper RDF parser utility program 2.3. ARC ARC[3] is a flexible RDF system for semantic web and PHP practitioners. Components & Features • ConNeg-capable Web Reader - Support for proxies, redirects, and Content Negotiation • Various parsers - RDF/XML, Turtle, SPARQL + SPOG, Legacy XML, HTML tag soup, RSS 2.0, Google Social Graph API JSON • Serializers - N-Triples, RDF/JSON, RDF/XML, Turtle, SPOG dumps • 2 internal structures - resource-centric processing, statement-centric processing • RDF Storage (using MySQL) - SPARQL SELECT, ASK, DESCRIBE, CONSTRUCT, + aggregates, LOAD, INSERT, and DELETE • SPARQL Endpoint Class - Set up a compliant SPARQL endpoint with 3 lines of code • SemHTML RDF extractors - DC, eRDF, microformats, OpenID, RDFa • RemoteStore Class - Query remote SPARQL endpoints as if they were local stores (results are returned as native PHP arrays) • Turtle templating - Generate dynamic graphs • Plugins - Extend ARC with your own, custom extensions • Triggers - Register event handlers for selected SPARQL Query types • SPARQLScript - SPARQL-based scripting and output templating 3. RDF triples storage The RAP classes are split into three main packages: model, syntax and util. The model package includes all the classes to create or read specific elements of an RDF model, including reading or creating complete statements from a model or their individual components. These classes are: - BlankNode - used to create a blank node, to get the bnode identifier, or check equality between two bnodes - Literal - support for model literals - Model - contains methods to build or read a specific RDF model - Node - an abstract RDF node - Resource - support for model resources - Statement - creating or manipulating a complete RDF triple The util class Object is another abstract class with some general methods overloaded in classes built on it, so it's of no interest for our purposes. However, the
  • 4 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache RDFUtil class provides some handy methods, including the method writeHTMLTable to output an RDF/XML document in nice tabular form. The RAPTOR is a collection of functions separated in files based on specify functionality: Parsers: - RDF/XML Parser, - N-Triples Parser, - Turtle Parser, - TRiG Parser, - RSS "tag soup" parser, - RDFa parser. Serializers: - RDF/XML Serializer, - N-Triples Serializer, - Atom 1.0 Serializer, - JSON Serializers, - RSS 1.0 Serializer, - Turtle Serializer, - XMP Serializer. The triple structure in RAPTOR is memorate like this: typedef struct { const void *subject; raptor_identifier_type subject_type; const void *predicate; raptor_identifier_type predicate_type; const void *object; raptor_identifier_type object_type; raptor_uri *object_literal_datatype; const unsigned char *object_literal_language; } raptor_statement; and from the second version of RAPTOR was added an wrapper like this: typedef struct { raptor_world* world; raptor_statement *s; } raptor_statement_v2; ARC uses object-oriented code for its components and methods, but the processed data structures consist of simple associative arrays, which leads to faster operations and less memory consumption. Apart from a few special formats returned by the SPARQL engine (e.g. from SELECT or INSERT queries), ARC is built around two core structures: triple sets and resource indexes. Triple sets
  • Comparative study on the processing of RDF in PHP 5 A triple set is a flat array that contains (associative) triple arrays. Triple sets can be processed with a simple loop: ... $triples = $parser->getTriples(); for ($i = 0, $i_max = count($triples); $i < $i_max; $i++) { $triple = $triples[$i]; ... } A single triple array can contain the following keys: s - the subject value (a URI, Bnode ID, or Variable) p - the property URI (or a Variable) o - the subject value (a URI, Bnode ID, Literal, or Variable) s_type - "uri", "bnode", or "var" o_type - "uri", "bnode", "literal", or "var" o_datatype - a datatype URI o_lang - a language identifier, e.g. ("en-us") Resource Indexes A resource index is an associative array of triples indexed by subject -> predicates -> objects. $index = array( '_:john' => array( 'http://xmlns.com/foaf/0.1/knows' => array( '_:bill', '_:bob', '_:mary', ), ), '_:mary' => ... ); echo $index['_:john']['http://xmlns.com/foaf/0.1/knows'][0]; ARC supports two index forms. The one above uses flat objects, which can be handy for simplified access operations, but can lead to information loss (e.g. when the object type is not clear, or when a datatype was present in the original triples). The second, slightly extended index structure keeps the object details: $index = array( '_:john' => array( 'http://xmlns.com/foaf/0.1/knows' => array( array('value' => '_:bill', 'type' => 'bnode'),
  • 6 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache array('value' => '_:bob', 'type' => 'bnode'), ... ), ), ); echo $index['_:john']['http://xmlns.com/foaf/0.1/knows'][0][ 'value']; 4. SPARQL query support RAP's SPARQL client allows you to execute SPARQL queries against remote SPARQL endpoints using the SPARQL protocol. Query results are returned as array of variable bindings, RAP MemModel or boolean, depending on the type of SPARQL query. RAPTOR supports SPARQL query only from RASCAL library, so if someone needs SPARQL or RDQL query have to use Rasqal RDF Query Library. ARC supports all SPARQL Query Language features, and now is working on a number of pragmatic extensions such as aggregates (AVG / COUNT / MAX / MIN / SUM) and write mechanisms. 5. Developer support 5.1. Documentation Redland provides to developers an extensive documentation with many examples. This documentation address both advanced programmers and especially junior developers. Each of the higher level language APIs contains a mapping to the core C API and also include extra documentation describing the native APIs along with examples of use. RAP’s documentation is based on tutorials, usage examples and implementation notes. The API documentation covers all classes and methods but in a short way. The implementation notes cover the database backend and the RDQL engine. ARC provides a brief documentation for version 1 and a slightly more detailed documentation for version 2.
  • Comparative study on the processing of RDF in PHP 7 5.2. IDE integration RAP and ARC can be used very easy in all the PHP frameworks only by copying the files of RAP and acces them from project code. RAP usage exemple(how to parse an RDF file and print his content in a HTML file): // Include RAP define("RDFAPI_INCLUDE_DIR", "./../api/"); include(RDFAPI_INCLUDE_DIR . "RDFAPI.php"); // Filename of an RDF document $base="example1.rdf"; // Create a new MemModel $model = ModelFactory::getDefaultModel(); // Load and parse document $model->load($base); // Output model as HTML table $model->writeAsHtmlTable(); ARC usage exemple(how to extract RDF triples from HTML file): include_once(RDFAPI_INCLUDE_DIR . "ARC2.php"); $config = array('auto_extract' => 0); $parser = ARC2::getSemHTMLParser(); $parser->parse('http://example.com/home.html'); $parser->extractRDF('rdfa'); $triples = $parser->getTriples(); $rdfxml = $parser->toRDFXML($triples); RAPTOR because of his C based code can not be used in PHP development enviroment without Redland RDF Language Bindings - PHP Interface, with ofers an PHP interface for PHP programmers. RAPTOR usage exemple(writed in PHP using Redland RDF Language Bindings for parsing an rdf file content into a rdf model): //Redland world open $world=librdf_php_get_world(); //create new storage $storage=librdf_new_storage($world,'hashes','dummy',"new= yes,hash-type='memory'"); //create the model $model=librdf_new_model($world,$storage,'');
  • 8 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache //create the parser $parser=librdf_new_parser($world,'rdfxml','application/rd f+xml',null); //create a new uri for rdf file $uri=librdf_new_uri($world,'file:../data/dc.rdf'); //parsing the content of the rdf file into a model librdf_parser_parse_into_model($parser,$uri,$uri,$model); //free memory librdf_free_uri($uri); librdf_free_parser($parser); 6. Performance 6.1. Processing speed and reliability The RDF Interest Group and other members of the RDF community have identified issues/ambiguities in the [RDFMS] Specification and the [RDF-SCHEMA] Candidate Recommendation. These issues have been collected and categorized in the RDF Core Working Group Issue Tracking document. The RDF Core Working Group uses this issue list to guide its work. The scope is to create a comprehensive and complete test suite for RDF should cover all of the rules in the Formal Grammar for RDF. The file consists of a simple header [MANIFEST-HEAD], individual descriptions of the test cases, and a closing footer [MANIFEST-TAIL]. The test cases are divided into the following categories: Positive Parser Tests These tests consist of one (or more) input documents in RDF/XML as is revised in [RDF-SYNTAX]. The expected result is defined using the N-Triples syntax. A parser is considered to pass the test if it produces a graph equivalent to the graph described by the N-triples output document, according to the definition of graph equivalence given in [RDF-CONCEPTS]. Negative Parser Tests These tests consist of one input document. The document is not legal RDF/XML. A parser is considered to pass the test if it correctly holds the input document to be in error.
  • Comparative study on the processing of RDF in PHP 9 RAP ARC Raptor Positive Parser Test (128 Approved)Test Percent Passing (of 128 tests) 96% 98% 100% Negative Parser Test (41 Approved)Test Percent Passing (of 41 tests) 97% 98% 100% Tests With 3 Passes (1 Approved Negative Parser Test)Test Percent Passing (of 1 tests) 0% 0% 100% Tests With 4 Passes (11 Approved Positive Parser Test)Test Percent Passing (of 11 tests) 81% 90% 100% Tests With 4 Passes (40 Approved Negative Parser Test)Test Percent Passing (of 40 tests) 100% 100% 100% Tests With 5 Passes (33 Approved Positive Parser Test)Test Percent Passing (of 33 tests) 93% 95% 100% Tests With 6 Passes (77 Approved Positive Parser Test)Test Percent Passing (of 77 tests) 100% 100% 100% Tests With 7 Passes (7 Approved Positive Parser Test)Test Percent Passing (of 7 tests) 100% 100% 100% Tests With 0 Fails (107 Approved Positive Parser Test)Test Percent Passing (of 107 tests) 100% 100% 100% Tests With 0 Fails (6 Approved Negative Parser Test)Test Percent Passing (of 6 tests) 100% 100% 100% Tests With 1 Fail (18 Approved Positive Parser Test)Test Percent Passing (of 18 tests) 88% 100% 100% Tests With 1 Fail (34 Approved Negative Parser Test)Test Percent Passing (of 34 tests) 100% 100% 100% Tests With 2 Fails (3 Approved Positive Parser Test)Test Percent Passing (of 3 tests) 33% 66% 100% Tests With 2 Fails (1 Approved Negative Parser Test)Test Percent Passing (of 1 tests) 0% 0% 100% Table 1. Core tests Considering the test results Raptor RDF Parser tends to be more reliable than RAP. A small test for parsing and reserializing appr. 500 statements 100 times with Raptor, ARC, and RAP:
  • 10 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache • Raptor: 6 seconds • RAP: 41 seconds • ARC: 18 seconds It turns out Raptor is about 3 times as fast as ARC, which is about twice as fast as RAP. A more realistic benchmark, doing only the parsing, no serialising: • Raptor: 2.7 seconds • RAP: 21 seconds • ARC: 14 seconds 6.2. Query efficiency To evaluate a SPARQL query language implementation the RDF Data Access Working Group (DAWG) uses a test-driven process was made an easy-to-use suite of test cases that SPARQL query language implementors can use to evaluate and report on their implementation. The test manifest files define three vocabularies to express tests: 1. manifest vocabulary 2. query-evaluation test vocabulary 3. DAWG test approval vocabulary RDF Raptor/ API for ARC Rascal PHP ASK query form 0 0.41 0.53 Basic graph pattern matching. Triple pattern constructs. Blank node scoping 0.41 0.64 0.67 Compliance with SPARQL Grammar 0.73 0.99 1 CONSTRUCT query form 0 1 1 Core bits of SPARQL. Prefixed names, variables, blank nodes, graph terms 0.41 0.69 0.68 FILTER clauses and expressions 0.37 0.57 0.55 OPTIONAL pattern matching 0.28 0.36 0.8 RDF datasets. Default and named graphs. GRAPH keyword 0 0.45 0.14 SELECT query form 0.41 0.69 0.71 Sorting (ORDER BY) and slicing (LIMIT, OFFSET) 0.5 1 0.9 UNION pattern matching 0.25 0 0.46
  • Comparative study on the processing of RDF in PHP 11 Table 2. SPARQL Implementation The interface between the programming language (such as PHP in our case) and the database query language (SPARQL) is an application programming interface (API). A few PHP-based open-source RDF APIs are available, and RAP (RDF API for PHP) is one of the most mature one amongst them. One of the limitations of RAP was its SPARQL engine. It is built to work on any RDF model that can be loaded into memory. Using SPARQL to query a database required to load the complete database into memory and execute the SPARQL query on it. While this works well with some dozens of RDF triples, it can not be used for databases with millions of triple data - main memory is one limitation, execution time another one (code implemented in a scripting language such as PHP is slower than pure C implementations of the same code) and here RAPTOR wins because of speed of RASCAL. 7. License Raptor RDF Library is licensed under the following licenses as alternatives, if one license is selected, that one alone applies. 1. The GNU Lesser General Public License (LGPL) Version 2.1 See http://www.gnu.org/copyleft/lesser.html or COPYING.LIB for the full license text. Copyright (C) 2000-2005 University of Bristol. All Rights Reserved. 2. The Apache License V2.0 See LICENSE-2.0.txt for the full license text. Copyright (C) 2000-2005 University of Bristol. RAP can be used under the terms of the GNU LESSER GENERAL PUBLIC LICENSE (LGPL). ARC is available under the W3C Software License and, since 2009, also under the GPL (version 2 and 3). 8. Conclusions A common complaint about the RDF/XML syntax in the XML-literate communities is the lack of a simple PHP parser. While Raptor does the job perfectly, it almost demands root access to install, and doesn’t run on the Windows platform without cygwin. ARC tends to be a bit faster and easier to use than RAP but it lacks some features. The best alternative for PHP is RAP, but that is often claimed to be too slow or there are problems understanding and using the API.
  • 12 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache 9. References 1. http://www.seasr.org/wp-content/plugins/meandre/rdfapi-php/doc/ 2. http://librdf.org/raptor/ 3. http://arc.semsol.org/ 4. http://www.w3.org/RDF/ 5. http://www.w3.org/TR/rdf-testcases 6. http://www.w3.org/2003/11/results/rdf-core-tests 7. http://www.w3.org/2001/sw/DataAccess/tests/implementations.html