Ontologies & linked open data

687 views
525 views

Published on

A brief presentation I made as an invited lecture.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
687
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Ontologies & linked open data

  1. 1. Ontologies & Linked Open Data A brief overview and some real-world applications João Rocha da Silva joaorosilva@gmail.com December 2013
  2. 2. Contents • Ontologies: the importance of semantics in the data storage and querying layer • Popular ontologies : DCTerms, FOAF • The Semantic Web in practice: Linked Open Data in the Facebook API and in DBpedia • Relational vs Graph : differences • The SPARQL Language : examples • A non-relational database : OpenLink Virtuoso
  3. 3. The importance of semantics
  4. 4. The importance of semantics • How does someone understand the meaning of the columns in a relational database? • Reading a lot of documentation • Hard to provide information to external systems • Tailor-made web services required!
  5. 5. SAP (one of 78,826 tables and counting) source : http://scn.sap.com/thread/1743542
  6. 6. MediaWiki source http://upload.wikimedia.org/wikipedia/commons/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2500px-MediaWiki_1.20_%2844edaa2%29_database_schema.svg.png
  7. 7. now imagine we want to have images of different kinds, with different attributes… MediaWiki source http://upload.wikimedia.org/wikipedia/commons/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2500px-MediaWiki_1.20_%2844edaa2%29_database_schema.svg.png
  8. 8. The importance of semantics • Building a query over such a system is complex • Requires knowledge of its intricate and subtle aspects • Some columns even contain flags for business logic processing (o_O) • Bad design decisions = “spaghetti code”
  9. 9. Relational vs. Ontology
  10. 10. ! SELECT employee.id AS employee_id, engineer.id AS engineer_id, manager.id AS manager_id, employee.name AS employee_name, employee.type AS employee_type, engineer.engineer_info AS engineer_engineer_info, manager.manager_data AS manager_manager_data FROM employee LEFT OUTER JOIN engineer ON employee.id = engineer.id LEFT OUTER JOIN manager ON employee.id = manager.id []
  11. 11. Building the “U.Porto” Ontology
  12. 12. foaf:Person rdfs:subclassOf org:Organization org:memberOf rdfs:subclassOf up:Student up:Faculty rdfs:subclassOf up:PhDStudent rdfs:literal up:Thesis up:thesis dc:title up : a hypothetical ontology for U.Porto http://www.w3.org/TR/vocab-org/
  13. 13. Representing a person
  14. 14. up:PhDStudent rdf:type http://www.fe.up.pt/ ~pro11004 http:// www.fe.up.pt/ “João Rocha” foaf:name org:memberOf http://www.w3.org/TR/rdf-schema/ http://www.foaf-project.org/
  15. 15. Getting all the students SELECT ?uri ?attribute ?value FROM <http://myorganization.com/data> WHERE { ?uri rdfs:type up:Student. ?uri ?attribute ?value } • Will fetch all the students, regardless of their type • Will also return their attributes (“database columns”) • Different types of students will have different attributes
  16. 16. How does the system know that a manager is also an employee? Inference The inference engine recognizes certain properties and builds “virtual triples” in the background http://docs.openlinksw.com/virtuoso/rdfsparqlrule.html
  17. 17. Inference is good • • • Transitive Properties (subclass of subclass…) Subclasses Multiple Inheritance Handling (Student + Researcher + ScholarshipHolder) Saves coding time spent writing complex queries
  18. 18. Nothing comes for free • NO referential integrity or foreign keys! • Aggregation operators slow • Transactions are not supported in standard SPARQL • • (“SPARQL 1.1 Query/Update Services should be atomic but that they are not required to be atomic.”) Graph DBMS Solutions are in early stages (many bugs, many “beta”s, many mailing lists…)
  19. 19. However • Graph databases allow for flexible, intuitive representations of the data • They handle billions of triples • Restriction-based querying makes queries more high-level
  20. 20. Query examples
  21. 21. DBpedia “Find Facebook’s CEO and the university where he studied” PREFIX prop: <http://dbpedia.org/ontology/> PREFIX dbprop: <http://dbpedia.org/property/> select distinct ?s ?almaMater where { ?s dbpedia-owl:almaMater ?almaMater. ?s dbprop:knownFor ?knownFor. FILTER regex(?occupation, "Facebook", "i") ?s dbprop:occupation ?occupation. FILTER regex(?occupation, "CEO", "i") } LIMIT 100 Try it at http://dbpedia.org/sparql
  22. 22. DBpedia “Find all fun (aka rear-wheel-drive) cars from the eighties, made by Japanese manufacturers” select distinct (?car) ?manufacturer where { ?car rdf:type dbpedia-owl:Automobile. ?car dbpedia-owl:layout <http://dbpedia.org/resource/Front-engine,_rear-wheel-drive_layout>. ?car dbpedia-owl:productionStartYear ?startYear. FILTER ( ?startYear < "1990-01-01 00:00:00"^^xsd:date ) FILTER ( ?startYear > "1980-01-01 00:00:00"^^xsd:date ) ?car <http://dbpedia.org/ontology/manufacturer> ?manufacturer. { SELECT distinct(?manufacturer) WHERE { ?car dbpedia-owl:manufacturer ?manufacturer. ?manufacturer <http://dbpedia.org/property/location> ?location. FILTER regex(?location, "Japan", "i") } } } LIMIT 100 Try it at http://dbpedia.org/sparql
  23. 23. Custom query • What do you want to know?
  24. 24. Virtuoso, a graph database
  25. 25. Conclusions • Relational databases Mature, robust, support transactions Hard to model entities with dynamic attributes Complex querying • Graph Databases Recent technology Handle billions of triples Higher-level querying, more abstract
  26. 26. ! João Rocha da Silva! Research Data Management and Semantic Web Researcher, Web & iPhone Developer João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of Engineering of the University of Porto. He specializes on research data management, applying the latest Semantic Web Technologies to the adequate preservation and discovery of research data assets. ! He is experienced in many programming languages (Javascript-Node, PHP with MVC frameworks, Ruby on Rails, J2EE, etc etc) running on the major operating systems (everyday Mac user). Regardless of language, he is a quick learner that can adapt to any new technology quickly and effectively. ! He is also an experienced freelancer iOS Developer with several Apps published on the App Store, and a self-taught DIY mechanic with a special interest in classic cars, particularly his 1987 Toyota Corolla GT Twin Cam, also known as Hachi-Roku or AE86. joaorosilva@gmail.com

×