1. Ontologies & Linked
Open Data
A brief overview and some real-world applications
João Rocha da Silva
joaorosilva@gmail.com
December 2013
2. Contents
•
Ontologies: the importance of semantics in the data
storage and querying layer
•
Popular ontologies : DCTerms, FOAF
•
The Semantic Web in practice: Linked Open Data in the
Facebook API and in DBpedia
•
Relational vs Graph : differences
•
The SPARQL Language : examples
•
A non-relational database : OpenLink Virtuoso
4. The importance of
semantics
•
How does someone understand the meaning of the
columns in a relational database?
•
Reading a lot of documentation
•
Hard to provide information to external systems
•
Tailor-made web services required!
5. SAP (one of 78,826 tables and counting)
source : http://scn.sap.com/thread/1743542
7. now imagine we want to have images of different kinds,
with different attributes…
MediaWiki
source http://upload.wikimedia.org/wikipedia/commons/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2500px-MediaWiki_1.20_%2844edaa2%29_database_schema.svg.png
8. The importance of
semantics
•
Building a query over such a system is complex
•
Requires knowledge of its intricate and subtle
aspects
•
Some columns even contain flags for business
logic processing (o_O)
•
Bad design decisions = “spaghetti code”
11. !
SELECT employee.id AS employee_id,
engineer.id AS engineer_id,
manager.id AS manager_id,
employee.name AS employee_name,
employee.type AS employee_type,
engineer.engineer_info AS engineer_engineer_info,
manager.manager_data AS manager_manager_data
FROM employee
LEFT OUTER JOIN engineer
ON employee.id = engineer.id
LEFT OUTER JOIN manager
ON employee.id = manager.id
[]
17. Getting all the students
SELECT ?uri ?attribute ?value
FROM <http://myorganization.com/data>
WHERE
{
?uri rdfs:type up:Student.
?uri ?attribute ?value
}
•
Will fetch all the students, regardless of their type
•
Will also return their attributes (“database columns”)
•
Different types of students will have different attributes
18. How does the system know
that a manager is also an employee?
Inference
The inference engine recognizes certain
properties and builds “virtual triples” in the background
http://docs.openlinksw.com/virtuoso/rdfsparqlrule.html
19. Inference is good
•
•
•
Transitive Properties (subclass of subclass…)
Subclasses
Multiple Inheritance Handling
(Student + Researcher + ScholarshipHolder)
Saves coding time
spent writing complex queries
20. Nothing comes for free
•
NO referential integrity or foreign keys!
•
Aggregation operators slow
•
Transactions are not supported in standard
SPARQL
•
•
(“SPARQL 1.1 Query/Update Services should be atomic but that they are
not required to be atomic.”)
Graph DBMS Solutions are in early stages (many
bugs, many “beta”s, many mailing lists…)
21. However
•
Graph databases allow for flexible, intuitive
representations of the data
•
They handle billions of triples
•
Restriction-based querying makes queries more
high-level
27. Conclusions
•
Relational databases
Mature, robust, support transactions
Hard to model entities with dynamic attributes
Complex querying
•
Graph Databases
Recent technology
Handle billions of triples
Higher-level querying, more abstract
28. !
João Rocha da Silva!
Research Data Management and Semantic
Web Researcher, Web & iPhone Developer
João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of
Engineering of the University of Porto. He specializes on research data management,
applying the latest Semantic Web Technologies to the adequate preservation and
discovery of research data assets.
!
He is experienced in many programming languages (Javascript-Node, PHP with MVC
frameworks, Ruby on Rails, J2EE, etc etc) running on the major operating systems
(everyday Mac user). Regardless of language, he is a quick learner that can adapt to any
new technology quickly and effectively.
!
He is also an experienced freelancer iOS Developer with several Apps published on the
App Store, and a self-taught DIY mechanic with a special interest in classic cars,
particularly his 1987 Toyota Corolla GT Twin Cam, also known as Hachi-Roku or AE86.
joaorosilva@gmail.com