Web of data refers to interconnected structured datasets distributed all over the world. It enables machines to traverse the links between these datasets in a noiseless way. The noise referred here is resulted from containing metadata and actual data in the web sites.
The figure illustrates different layers of semantic web stack. Content of this lecture will be covering querying, data interchange, syntax and identifiers layers.The overall figure shows the standardized technologies to form Semantic Web.Identifiers are used to identify semantic web resources. URIs are used to identify resources in a dereferencable way. In the syntax layer, semantic web resources are represented in different formants e.g. XML. In the data interchange layer, RDF is the language that is used to represent semantic web resources. Different formats for RDF is available e.g. RDF+XML, Turtle, etc. Querying layer provides methods to obtain semantic web resources. Sparql is the most common query language.
An RDF triple contains three components: the subject, which is an RDF URI reference or a blank node the predicate, which is an RDF URI reference the object, which is an RDF URI reference, a literal or a blank node An RDF triple is conventionally written in the order subject, predicate, object.The predicate is also known as the property of the triple.From wikipedia:The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue".
An XML model can be used to store triple-like data by rewriting the triples into simple 3-part XML element structures and then using existing XML query systems. However, XML data model is a tree-like structure with elements and attributes in different facets on the other hand RDF data forms a directed-cyclic graph which does not have a proper hierarchical structure.
Storing and querying semantic data through XML databases and Query Languages would not work, since:Only simple manipulations can be handled through XML query languagesRDF Schema processing and inference is not possibleStandard RDF/XML mapping is unsuitable since multipleXML serializations are possible for the same RDF graph, making retrieval complex.
In the monolithic approach there are two tables storing the data: Triples table and resources table.Resources table stores only the URIs and identifiers associated with them. In the triples table, one each reference for subject and for predicate is stored. If the object value is also a URI, it is also represented with a reference. These references are used to fetch corresponding URI from resources table. If the object value is not a URI i.e it is a literal, its value directly stored in the triples table.However, collecting all data within two tables is not scalable and does not allow complex operations e.g reasoning, querying on it.
Overall architecture of RDFSuite. It separates logical and physical data by allowing queries through a high level query language(RQL) over the stored semantic data. For storage, RDFSuite uses an ORDBMS. Resources are loaded to the system by exploiting the available RDF schema knowledge. Database representations can be customized according to employed schemas.
-A non-monolithic approach is used. This approach states separation of tables to store classes.-Indices are constructed on the attributes such as URI, source andtarget of the created tables in order to speed up joins andthe selection of spesific tuples of the tables
An example database structure that is formed through RDF schema. The core schema is represented by the four schemas namely, Class, Property, SubClass and SubPropertytables. This approach is more flexible than the monolithic approach in terms of ability of customizing the physical representation of data in the underlying database.
Main prominent feature of Sesame is to offer an Application Programming Interface on top of the actual data storage. This makes possible to implement the interface on top of different repositories. Other components are clients of SAIL API.
Difficult to add table in PostrgresqlWhen adding a new subClassOf relation between two existing classes, the complete class hierarchy starting from the subclass needs to broken down and rebuilt again because subtable relations can not be added to an existing table;the subtable relations have to be specified when a table is createdOnce created, the subtable relations are fixed.
Jena provides a simple minimalist view of the RDF graph allowing exposing of data as triples. Users interact with the abstract Model. Model interface delegates high level operations to the low level operations on triples stored in an RDF graph. Jena2 storage provides 3 graph operations namely, add, delete and find.
Persistence layer presents a Graph interface to the higher levels of Jena as already said. Each logical graph is implemented using an ordered list of specialized graphs.An operation on the entire logical graph, such as add , delete or find, is processed by invoking add, delete, find on each specialized graph.
Jena 2 uses denormalized schemas. Because in normalized graphs every find operation required multiple joinsbetween the Resources table and the Triples table. In denormalized schemas URIs and simple literal values are stored directly in the statement tables. They are exemplified in the next slide.There are also multiple statements tables. Because single statement table approach is not scalable for large data sets and cannot benefit from the locality among subjects and predicates. Jena2 uses Property Tables. Those tables store patterns of RDF statements. They are database tables independent from the actual triple store framework. Statement and properties are stored in triple store or property table, but not in both.
Let’s compare the triple store and application specific schema by an example. Suppose we want to store information about people, each of them has some properties such as name, age, and so on. The triple store approach needs to store 10 record. For application specific schema, if we know that most people have name, age and gender, we will group these 3 properties into one table, called property table. For those multi-value properties, we still store them in triple store, these way we reduce the number of records to be stored from 10 to 7. Also, if users always query people’s name by their age. Using property table, once the age is qualified, the name value can be retrieved immediately. But in triple store approach, it needs to first get the subject with certain value of age property, then use the subject to look for name value again, which is less efficient.
Provides for scalable storage and query of RDF datasets using conventional SQL databases for use in standalone applications, J2EE and other application frameworks.
All quads are in one table, which may have different indexing depending on the expected query loadtriples should be locatable given the S or a value of Otwo covering indices, G, S, P, O and O, G, P, S.Any Triple Store that supports Named Graph functionality is more than likely a Quad Store. Many Triple Stores are in fact Quad Storesdue to the need to maintain RDF Data provenance within the data
SPARQL is the defacto query language which is used to express queries over RDF data sources. It allows querying RDF graph patterns together with their conjunctions and disjunctions. Other languages are more proprietary and used in narrow scopes.There are several open source projects that provides knowledge management functionality such as Apache Clerezza and Jena. They provided APIs to users for storing and accessing the semantic data.As organizations publish their data in RDF format, there occurred opportunities to interlink the related contents. As a result, once a user obtain a resource from the linked data cloud, s/he can traverse related data through the links.
Different organizations provide querying services over their RDF data through SPARQL endpoints. SPARQL endpoints are machine friendly interfaces towards underlying knowledge bases. See http://www.w3.org/wiki/SparqlEndpoints for several SPARQL endpoints.
This figure represents the 4 design principles of linked data in a stack like architecture. URIs are used as names of the resources on the web. HTTP URIs are used so that others can access the actual data represented by the URI. RDF is the actual representation of the resources represented by URIs and lastly SPARQL is used to obtain desired information over the RDF data.
SWEO: Semantic Web Education and Outreach… This was an interest group within W3C. SWEO Interest Group had been established to develop strategies and materials to increase awareness among the Web community of the need and benefit for the Semantic Web, and educate the Web community regarding related solutions and technologies.
Lecture semantic dataaccess_presentation
SemanticSemantic CMS Community Data Access Lecturer Organization Date of presentation Co-funded by the 1 Copyright IKS Consortium European Union
Page: Part I: Foundations(1) Introduction of Content Foundations of Semantic (2) Management Web Technologies Part II: Semantic Content Part III: Methodologies Management Knowledge Interaction Requirements Engineering(3) (7) and Presentation for Semantic CMS(4) Knowledge Representation and Reasoning (8) Designing Semantic CMS Semantifying(5) Semantic Lifting (9) your CMS Storing and Accessing Designing Interactive(6) Semantic Data (10) Ubiquitous IS www.iks-project.eu Copyright IKS Consortium
Page: 3 What is this Lecture about? We have learned ... Part II: Semantic Content ... which languages can be used Management to model knowledge. Knowledge Interaction (3) ... how to extract knowledge and Presentation from content in a automatic way (semantic lifting). (4) Knowledge Representation and Reasoning We need a way ... (5) Semantic Lifting ... to store the extracted Storing and Accessing knowledge technically in an (6) Semantic Data accessible way. www.iks-project.eu Copyright IKS Consortium
Page: 4 Outline Semantic Data Semantic Web RDF Semantic Data Storage Triple Stores Semantic Data Access SPARQL RQL API Calls www.iks-project.eu Copyright IKS Consortium
Page: 5 Semantic Data Stands for machine understandable information Allows computers to figure out the data without user interference Allows computers act intelligently without programming for each task www.iks-project.eu Copyright IKS Consortium
Page: 6 Semantic Data Provides infrastructure to get practical results Applications find out subsequent information based on the previous relations. (e.g. Eiffel Tower -> Paris -> France) Allows reasoning capabilities Providing extraction of related information which is not directly linked www.iks-project.eu Copyright IKS Consortium
Page: 7 Semantic Web A classical generic description: “Web of data” Extends the World Wide Web By encouraging, Common language for representing data Transformable to/from disparate sources such as relational databases, XML, etc (RDF) Common reusable data model to represent data from different domains in common terms (RDFS, OWL, etc) Rules to enable applications reason over the information (SWRL) www.iks-project.eu Copyright IKS Consortium
Page: 8Semantic Web Layer Cake Semantic Web Layer Cake, Image source: http://www.w3.org/2007/03/layerCake.svg www.iks-project.eu Copyright IKS Consortium
Page: 9 Semantic Web So many organizations publishing their data in different domains Media Geographic Government … Whole set contains approximately 30 billion triples One of the largest collections is DBPEDIA Semantified version of Wikipedia Example: Obtain cities of China that have population over 20 million Needs efficient storage and query for semantic data www.iks-project.eu Copyright IKS Consortium
Page: 10 Representation of Semantic Data RDF The common data format An abstract model with several serialization formats Consists of statement referred as triples having the form (subject, predicate, object) where, Subject: any resource identifier Predicate: a resource identifier of any property Object: either a resource identifier or a literal value www.iks-project.eu Copyright IKS Consortium
Page: 11 Storing Semantic Data Need for specialized designs for triple collections Two modalities: Relational databases Triple stores Mostly used for storage Lots of implementations They can also be RDB based. www.iks-project.eu Copyright IKS Consortium
Page: 12 Triple Store A purpose-built database for the storage and retrieval of RDF data. Optimized place to add, remove and query for triples. Each triple in the TripleStore complies with the form (subject, predicate, object) www.iks-project.eu Copyright IKS Consortium
Page: 13 Considering XML Databases XML databases are existing storage systems for semi- structured data Idea: Transform RDF to XML and store it in XML databases Yet, XML data model is not exactly same with semantic data XML data model is a tree-like structure RDF data is represented through a graph without an hierarchy www.iks-project.eu Copyright IKS Consortium
Page: 14 Considering XML Databases XML Databases are not suitable for storage and querying RDF Only simple manipulations can be handled through XML query languages RDF Schema processing and inference is not possible Standard RDF/XML mapping is unsuitable www.iks-project.eu Copyright IKS Consortium
Page: 15 Monolithic approach for DB Based Triple Stores Generic representation for all RDF schemas Only two tables are used Resources table Triples table www.iks-project.eu Copyright IKS Consortium
Page: 16 Monolithic approach for DB Based Triple Storespredid subid objid objvalue id uri6 2 1 1 http://www.iks.og/topics.rdfs#Hotel5 3 7 2 http://www.iks.og/topics.rdfs#HotelDirections5 1 8 3 http://www.oclc.org/dublincore.rdfs#title5 9 2 4 http://www.iks.og/schema.rdf#Ext.Resource 5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type3 9 Sunscal e 6 http://www.w3.org/2000/01/rdf-schema#subClassOf 7 http://www.w3.org/1999/02/22-rdf-syntax- ns#Property 8 http://www.w3.org/2000/01/rdf-schema#Class 9 rl www.iks-project.eu Copyright IKS Consortium
Page: 17 Triples Stores Can be categorized into 3 category: In memory triple stores Used for certain operations like benchmarking, caching, etc Native triple stores Provides their own implementations (Virtuoso, Mulgara, AllegroGraph, …) Non memory non native triple stores Are built on third party databases (Jena SDB, Kaon, …) www.iks-project.eu Copyright IKS Consortium
Page: 18Functionalities provided byTriple Stores RDBMS-support General RDF model access Query language support in the store such as RQL, SPARQL Some stores provide: Provenance - tracking of who-said-what APIs for accessing triple store over network Very few stores provide: Full text search Inference and rule languages www.iks-project.eu Copyright IKS Consortium
Page: 19 Example Triple Store implementations RDF Suite Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001 Based on an ORDBMS model Sesame http://www.openrdf.org/ Relational databases (mysql, postgres, oracle) Jena http://www.hpl.hp.com/semweb/jena2.htm Relational databases (mysql , postgres, oracle) Virtuoso http://virtuoso.openlinksw.com/ Native RDF Quad Storage (Physical Quads) www.iks-project.eu Copyright IKS Consortium
Page: 21 How triples are stored and accessed in RDF Suite Separate tables are created to store resources Properties, subClasses, subProperties and instances Indiceson attributes like URI, source and target Querying is possible through RQL www.iks-project.eu Copyright IKS Consortium
Page: 22 How triples are stored and accessed in RDF Suite [Figure from *] www.iks-project.eu Copyright IKS Consortium*Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
Page: 23 Sesame Architecture DBMS-independent API for accessing triple repositories SAIL API A set of Java interfaces between other modules and repository Abstract from the actual storage mechanism Query Module RQL support Different ways to communicate with clients Through Protocol handlers www.iks-project.eu Copyright IKS Consortium*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First InternationalSemantic Web Conference, 2002
Page: 24 SAIL API over PostgreSQL PostgreSQL Object-relational DBMS Support sub-table relations between its tables for providing RDF Schema class and property subsumption Individuals are represented under separate tables created for resources Difficult to add table www.iks-project.eu Copyright IKS Consortium*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First InternationalSemantic Web Conference, 2002
Page: 25 SAIL API over MySQL MySQL The database schema does not change when the RDFS changes Has advantage where RDFS is unstable www.iks-project.eu Copyright IKS Consortium*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First InternationalSemantic Web Conference, 2002
Page: 27 Jena2 Architecture www.iks-project.eu Copyright IKS Consortium*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB03, The first International Workshop on Semantic Web and Databases
Page: 29Normalized vs DenormalizedTables www.iks-project.eu Copyright IKS Consortium
Page: 30 Property Tables Triple Store Only Person Property Table Subject Property Object ID name age gender person1 name Alice p1 Alice 32 - person1 age 32 p2 Bob 35 male person1 twinOf person2 person1 faxPhone x1234 Triple Store person1 adminPh x5678 Subject Property Object person2 name Bob person1 twinOf person2 person2 age 35 person1 faxPhone x1234 person2 adopteeOf person6 person1 adminPh x5678 person2 friendOf person8 person2 adopteeOf person6 person2 gender male person2 friendOf person8 www.iks-project.eu Copyright IKS Consortium*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB03, The first International Workshop on Semantic Web and Databases
Page: 31 Jena Persistence Options SDB Scalable storage and query for RDF Specifically designed for SPARQL support Supports: MySQL, PostgreSQL, Oracle 11g, Microsoft SQL server and IBM DB2 Scales to graphs of 100 million triples www.iks-project.eu Copyright IKS Consortium
Page: 32 Jena Persistence Options TDB Provides for large scale storage and query of RDF datasets using a pure Java engine Supports SPARQL A non-transactional, faster database solution for use by a single system It scales well beyond SDB and is simpler to setup www.iks-project.eu Copyright IKS Consortium
Page: 33 Virtuoso General purpose RDBMS with extensive RDF adaptations RDF data is stored as RDF quads, i.e. it supports RDF with named graphs i.e. graph, subject, predicate, object tuples The columns are G for graph, P for predicate, S for subject and O for object www.iks-project.eu Copyright IKS Consortium
Page: 34 Querying Semantic Data Semantic data can be queried from triple stores by Various query languages SPARQL Different endpoints provided RQL RDQL SeRQL … API Calls Through proprietary APIs of different projects Linked Data www.iks-project.eu Copyright IKS Consortium
Page: 35 SPARQL Is an RDF query language Standardized by W3C consortium Similar concept of SQL for databases Syntactically resembles to SQL RDF Graphs instead of databases www.iks-project.eu Copyright IKS Consortium
Page: 36 SPARQL Endpoints Provides functionality to query the knowledge base via the SPARQL language Accepts queries and returns results through HTTP protocol Query results can be in different formats such as RDF XML HTML JSON CSV www.iks-project.eu Copyright IKS Consortium
Page: 37 Semantic Data Access With API Calls Open source projects provides APIs to manipulate RDF data Jena Apache Clerezza Sesame JRDF www.iks-project.eu Copyright IKS Consortium
Page: 38 Jena Jenaprovides a rich API to manipulate the RDF stored in the underlying triple store. Model to represent graphs CRUD methods for triples Querying methods for existing resources See the next slide for the code snippet… www.iks-project.eu Copyright IKS Consortium
Page: 39 Jena Code SnippetString personURI = "http://somewhere/JohnSmith";String givenName = "John";String familyName = "Smith";String fullName = givenName + " " + familyName;// create an empty Model which represents an RDF graphModel model = ModelFactory.createDefaultModel();// create the resource which will produce the triples in the next slideResource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName) .addProperty(VCARD.N, model.createResource() .addProperty(VCARD.Given, givenName) .addProperty(VCARD.Family, familyName)); www.iks-project.eu Copyright IKS Consortium
Page: 40 Jena Created triples with the code snippet in previous slide: (<http://somewhere/JohnSmith>, VCARD.FN, “John Smith”) (<http://somewhere/JohnSmith>, VCARD.FN, _) (_, VCARD.Given, “John”) (_, VCARD.Family, “Smith”)• Note that _ symbol represents a blank node www.iks-project.eu Copyright IKS Consortium
Page: 41 Apache Clerezza Provides an API regardless from the different triples stores it supports Its API provides a model to represent RDF graphs and manipulate those graphs Also provides an SPARQL endpoint to query the stored knowledge www.iks-project.eu Copyright IKS Consortium
Page: 42 Apache Clerezza Code Snippet Simple code snippet adding two triples to the graph:String base = “http://www.example.org#”;MGraph g = new SimpleMGraph();g.add( new TripleImpl( new UriRef(base + “JohnSmith”), new UriRef(rdf:Type) new UriRef(foaf:Person)));g.add( new TripleImpl( new UriRef(base + “JohnSmith”), new UriRef(VCARD:FN) LiteralFactory.getInstance().createTypedLiteral(“John”))); www.iks-project.eu Copyright IKS Consortium
Page: 43 Linked Data Interrelated datasets on the Web so that computers can explore them Has a standard format to be accessed and managed Provides integration and reasoning on a huge amount of data on the Web www.iks-project.eu Copyright IKS Consortium
Page: 44 Linked Data Fourfamous principles of linked data represented by Tim Berners-Lee Use URIs as names of things Use HTTP URIs to provide dereferencable data to people When an URI is dereferenced provide useful information in standard format (RDF, SPARQL) Provide links to other URIs to make possible discovery of related data www.iks-project.eu Copyright IKS Consortium
Page: 45Linked Data www.iks-project.eu Copyright IKS Consortium
Page: 46 Linking Open Data Project Isan W3C SWEO Project Aims to make data freely to everyone Aims to publish open data sets as RDF and set semantic relationships between them Serves information in a machine readable format Enriches content Reduces duplication Linked datasets increasing rapidly A large number of datasets are linked already www.iks-project.eu Copyright IKS Consortium
Page: 47Linked Datasets As of October2008 www.iks-project.eu Copyright IKS Consortium
Page: 48Linked Datasets As of September2010 www.iks-project.eu Copyright IKS Consortium
Page: 50 Access Data In The Cloud Follow the RDF links representing the “things” SPARQL Endpoints Ready to use software to discover linked data (See the next slide) www.iks-project.eu Copyright IKS Consortium
Page: 51 Linked Data Applications Lots of application on top of the linked data Tabulator Marbles Openlink RDF Browser … Just google RDF Crawlers RDF Browsers Also see the following link containing a number of linked data applications: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/ LinkingOpenData/Applications www.iks-project.eu Copyright IKS Consortium
Page: 52 Available SPARQL Endpoints http://dbpedia.org/sparql http://www4.wiwiss.fu-berlin.de/dblp/ Tosee possible SPARQL endpoints providing a certain URI see http://void.rkbexplorer.com/endpoint-search/ See also a list of alive SPARQL endpoints http://www.w3.org/wiki/SparqlEndpoints www.iks-project.eu Copyright IKS Consortium
Page: 53 References http://www.w3.org/TR/rdf-sparql-query http://jena.sourceforge.net/tutorial/RDF_API/index.html http://www.slideshare.net/ldodds/sparql-tutorial http://www.slideshare.net/shamod/a-hands-on-overview-of-the-semantic- web?src=related_normal&rel=1702851 http://www.cambridgesemantics.com/2008/09/sparql-by-example http://linkeddata-specs.info/ http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData http://www.bioontology.org/wiki/images/6/6a/Triple_Stores.pdf Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001 Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web Conference, 2002 Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB03, The first International Workshop on Semantic Web and Databases http://jena.sourceforge.net/DB/index.html http://virtuoso.openlinksw.com/ www.iks-project.eu Copyright IKS Consortium