Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semantic Technologies for the Web of Linked Data

118 views

Published on

Tutorial talk given at the ENCASE project Summer School on Usable Security and Privacy in Online Social Networks.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Semantic Technologies for the Web of Linked Data

  1. 1. Nick Bassiliades, Aristotle University of Thessaloniki Enhancing seCurity and privAcy in the Social wEb: a user-centered approach for the protection of minors Funded by the Horizon H2020 Framework Programme of the European Union under grant agreement no 691025. Semantic Technologies for the Web of Linked Data ENCASE Summer School Limassol, 17th July 2017
  2. 2. H2020 – Grant Agreement no. 691025 • Nick Bassiliades (Νικόλαος Βασιλειάδης) ‒ http://intelligence.csd.auth.gr/people/bassiliades • Associate Professor, Department of Informatics, Aristotle University of Thessaloniki, Greece • Scientific specialization: Knowledge Systems ‒ Knowledge Representation & Reasoning (Rule-based systems, Logic programming, Defeasible Reasoning, Knowledge-based systems / expert systems) ‒ Semantic Web (Ontologies, Linked Open Data, Semantic Web Services) ‒ Multi-agent systems ‒ Intelligent Applications on e-Learning, e-Government, e-Commerce, Electric Vehicles N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 2 A few words about the speaker
  3. 3. H2020 – Grant Agreement no. 691025 • Aristotle University of Thessaloniki, Greece ‒ Largest University in Greece and South-East Europe ‒ Since 1925, 41 Departments, ~2K faculty, ~45K students • Dept. of Informatics ‒ Since 1992, 28 faculty, 5 research labs, ~1100 undergraduate students, ~200 MSc students, ~80 PhD students, ~120 PhD graduates, >3500 pubs • Software Engineering, Web and Intelligent Systems Lab ‒ 7 faculty, 20 PhD students, 9 Post-doctorate affiliates • Intelligent Systems group (http://intelligence.csd.auth.gr) ‒ 4 faculty, 7 PhD students, 17 PhD graduates ‒ Research on Artificial Intelligence, Machine Learning / Data Mining, Knowledge Representation & Reasoning / Semantic Web, Planning, Multi-Agent Systems ‒ 430 publications, 35 projects N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 3 A few words about my institution
  4. 4. H2020 – Grant Agreement no. 691025 • Semantic Web & Linked Open Data ‒ Introductory Concepts ‒ RDF, RDF Schema, Ontologies, OWL, Reasoning • Storing and Querying RDF ‒ SPARQL, DBpedia • Ontologies, Logic & Rules ‒ Horn Logic, OWL 2 RL, SWRL, SPIN • Use Case: (Semantic) Entity Identification / Linking ‒ URank application (Prolog) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 4 A few words about the talk
  5. 5. H2020 – Grant Agreement no. 691025 Semantic Web & Linked Open Data RDF, RDF Schema & OWL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 5
  6. 6. H2020 – Grant Agreement no. 691025 • A shift from today’s Web, from publishing data in human readable HTML documents to machine readable documents. • Today, much of the data we get from the web is delivered to us in the form of web pages ‒ HTML documents that are linked to each other through the use of hyperlinks • Humans or machines can read (and browse/crawl) these documents • Machines can seek keywords in a page • Machines have difficulty extracting any meaning from these documents themselves N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 6 The Semantic Web
  7. 7. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 7 Web evolution https://atomate.net/blog/web-technologies-now-and-tomorrow/
  8. 8. H2020 – Grant Agreement no. 691025 • An extension of the Web through standard data formats and exchange protocols ‒ Most important: RDF, OWL, SPARQL, RIF, SWRL, SPIN, … • SW provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries in the web. “… a web of data that can be processed by machines”. Tim Berners-Lee N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 8 The Semantic Web
  9. 9. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 9 From the Web of Documents to the Web of Data https://atomate.net/blog/web-technologies-now-and-tomorrow/
  10. 10. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 10 Web of (Linked) Data Current Web Semantic Web Web of Documents Web of Data Hypertext Documents (HTML) Semi-structured Data representation in graphs (RDF) Interconnected documents through URL links Linked data through URIs Human Consumption Machine (and human) Consumption Use search engines and browsing to explore Use search engines, web (RDF) databases to query and URI links between datasets to explore Don't just link the documents, link the things
  11. 11. H2020 – Grant Agreement no. 691025 • Some data should be freely available to everyone to use and republish, without restrictions from copyright, patents or other mechanisms of control. ‒ Similar to open source, hardware, content and access. ‒ Benefits: Transparency and democratic control, Improved or new private products and services, Improved government services, New knowledge from combined data sources and patterns in large data volumes • Open data must be available in a convenient and modifiable form. • Interoperability: the ability to combine different datasets together and to develop more and better products and services N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 11 Open Data
  12. 12. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 12 XLS DOC PDF
  13. 13. H2020 – Grant Agreement no. 691025 • But, how exactly should open data be published (technological side)? ‒ In order to be easily re-used by other people and be interoperable? • Tim Berners-Lee, the inventor of the Web and Linked Data initiator, suggested a 5-star deployment scheme for Open Data. N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 13 Open Data Formats & Levels
  14. 14. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 14 5 ★ OPEN DATA
  15. 15. H2020 – Grant Agreement no. 691025 • 3 ★ OPEN DATA (CSV): ‒ The data is available via the Web; everyone can use the data easily, with no proprietary software ‒ But, it’s still data on the Web and not data in the Web. • 4 ★ OPEN DATA (RDF): ‒ Data items have a URI and can be shared / bookmarked on the Web. ‒ You can reuse parts of the data. ‒ Data can be stored in RDF databases (triple stores) and can be queried via public endpoints on the Web through remote clients (SPARQL protocol) ‒ You can combine the data safely with other data. URIs are a global scheme. N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 15 Going up the ladder
  16. 16. H2020 – Grant Agreement no. 691025 • 5 ★ OPEN DATA (LOD): ‒ Data in the Web (RDF), linked to other data in the Web (URI links via owl:sameAs). ‒ Both the consumer and the publisher benefit from the network effect. Publisher adds value to data by linking them to other data with more information. Consumer can discover more (related) data while consuming the data. Consumer can directly learn about the data schema. Publisher needs to invest resources to link data to other data in the Web. Publisher may need to repair broken or incorrect links. Consumer trusts the consumed data, but what about the data from external links? N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 16 Linked Open Data
  17. 17. H2020 – Grant Agreement no. 691025 • Use URIs as names for “things”, so that names are globally unique (IDs) ‒ Real inanimate or animate things, abstract concepts, … • Use HTTP URIs, so that “things” can be looked up ‒ E.g. using a browser • When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • Include links to other URIs so that more things can be discovered ‒ Links are actually RDF properties interpreted as hyperlinks N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 17 Linked Data Principles
  18. 18. H2020 – Grant Agreement no. 691025 • URIs – A mechanism to identify things (IDs) • HTTP – A mechanism to access things • RDF – A mechanism to describe things and their relationships • RDFS/OWL – A mechanism to describe vocabularies of properties and relationships of things N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 18 Linked Data Technology Stack
  19. 19. H2020 – Grant Agreement no. 691025 • A data model (framework) for representing information (describing resources) in the (Semantic) Web • Compared to other data models, RDF is based on graphs ‒ Graphs have nodes and edges In RDF graphs: • Nodes are resources: ‒ Webized entities, i.e. “things” / objects that we want to talk about in the web ‒ Literals, i.e. constant atomic values of various data types • Edges are properties (also called predicates): ‒ Relationships between entities ‒ Attributes of an entity, linking it with attribute values N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 19 Resource Description Framework (RDF)
  20. 20. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 20 RDF graph example http://lpis.csd.auth.gr http://www.csd.auth.gr/bassiliades http://acm.org/SemanticWeb97913 has-admin has-phone works-on … … …
  21. 21. H2020 – Grant Agreement no. 691025 • The RDF graph consists of many simple node  edge  node parts ‒ Called “triples” • The triple can be “read” as a natural language “statement” ‒ “Nick” “has phone” “97913” • The three parts of the triple have different names ‒ Syntactical terms N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 21 Triples / Statements http://.../bassiliades 97913 has-phone
  22. 22. H2020 – Grant Agreement no. 691025 • Tabular data: (SQL Databases, Excel, CSV, etc.) ‒ Information arranged in a strict grid. ‒ Adding / removing data is easy ‒ Changing the shape of table is a much higher cost. • Tree data: (JSON, XML) ‒ Loose structure / can represent easier semi- and un- structured information ‒ Tricky to modify the structure and merge data from multiple sources, especially if those sources were not designed with the merger in mind. • Graph data: (RDF) ‒ A set of relationships (triples) between things – it can be any shape - more flexible. ‒ Merging 2 RDF documents is trivial. (Union of two sets!) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 22 Why graphs?
  23. 23. H2020 – Grant Agreement no. 691025 • RDF uses globally unique identifiers (URIs) for everything ‒ Things we're talking about (resources) ‒ Relationships (properties, predicates) ‒ Datatypes (literals) • RDF is a list of unambiguous relationships ‒ Merging / combination of two graphs is trivial N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 23 Global Identifiers
  24. 24. H2020 – Grant Agreement no. 691025 • URI: Universal Resource Indicator - identifies something uniquely. ‒ A URI represents a single concept or thing ‒ Many URIs can represent the same thing ‒ If you resolve a URI it's considered good practice to return some useful triples about the concept the URI represents (optional) • URL: Universal Resource Location - not only identifies something, but also describes where it is located. • All URLs are URIs. Not all URIs are URLs. • Example ‒ <http://dbpedia.org/resource/Julius_Caesar>: URI for Julius Caesar. ‒ <http://en.wikipedia.org/wiki/Julius_Caesar>: URL for a web page about Julius Caesar N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 24 URI vs URL
  25. 25. H2020 – Grant Agreement no. 691025 • There are several ways of writing RDF triples into a file. • RDF+XML • Turtle (and N3) ‒ N-triples • RDFa ‒ Embed triples into an HTML document ‒ Triples can be extracted from the web page by tools (e.g. pyRdfa) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 25 RDF Documents
  26. 26. H2020 – Grant Agreement no. 691025 • For URIs it's common to define related concepts in the same namespace • A namespace is like a directory on a filesystem ‒ All the contained “things” share the same “global” address ‒ The “full address” ends with either "/" and "#" ‒ The local names (IDs) of “things” don't have "/" or "#“ ‒ The global names (IDs) of “things” are constructed by namespace:local_name • In RDF it's common to use a namespace prefix to make things readable N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 26 Namespaces
  27. 27. H2020 – Grant Agreement no. 691025 • In usual graphs (Graph Theory), all nodes and edges are of the same type (e.g. cities and roads) • In the RDF model, nodes can belong to different types and all edges are labelled, so they have different meaning • Semantic Networks (semantic nets) (known from AI) • How do we know which types of nodes we can use? ‒ What’s in a type? • What edges' labels (properties) can we use? ‒ Are they related to node types? N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 27 Semantic Networks Web page Researcher Research Topic integer
  28. 28. H2020 – Grant Agreement no. 691025 • The term ontology originates from philosophy ‒ The study of the nature of existence • Different meaning from computer science ‒ An ontology is an explicit and formal specification of a conceptualization • An ontology is an artifact (1-2-3 ontologies) • An ontology in the Semantic Web is a formally defined vocabulary that dictates what node types are there and what properties relate which node types ‒ And other things, as well N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 28 Ontologies
  29. 29. H2020 – Grant Agreement no. 691025 • Terms denote important concepts (classes of objects) of the domain ‒ e.g. professors, staff, students, courses, departments • Relationships between these terms: typically class hierarchies ‒ a class C is a subclass of another class B if every object in C is also included in B ‒ e.g. all professors are staff members N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 29 Typical Components of Ontologies
  30. 30. H2020 – Grant Agreement no. 691025 • Properties / attributes: ‒ X’s phone number is 98713 • (direct) Relationships ‒ X teaches Y • Relationship restrictions ‒ Faculty members can teach courses • Disjointness statements ‒ faculty and general staff are disjoint • Other restrictions on relationships between objects ‒ every department must include at least 10 faculty members N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 30 Further Components of Ontologies
  31. 31. H2020 – Grant Agreement no. 691025 • Ontologies provide a shared understanding of a domain: ‒ semantic interoperability ‒ overcome differences in terminology ‒ mappings between ontologies • Ontologies are useful for the organization and navigation of Web sites • Ontologies are useful for improving the accuracy of Web searches ‒ Search engines can look for pages that refer to a precise concept in an ontology ‒ If a query fails to find relevant documents, the search engine may suggest a more general query ‒ If too many answers are retrieved, the search engine may suggest specializations The Role of Ontologies on the Web N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 31
  32. 32. H2020 – Grant Agreement no. 691025 • The most important feature of ontologies is the reasoning support ‒ Reasoning: applying knowledge to arrive at solutions ‒ Inferencing: deriving a conclusion based on statements that only imply that conclusion • Reasoning/inferencing support is important for: ‒ Checking consistency of ontology and knowledge ‒ Checking for unintended relationships between classes ‒ Automatically classifying instances in classes ‒ Deriving information/knowledge not known before Ontologies and Reasoning N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 32
  33. 33. H2020 – Grant Agreement no. 691025 • RDF is a generic data model for describing objects, their properties and relations between them • RDF Schema is a simple vocabulary (ontology) description language • In RDF Schema, we can define: ‒ “Allowed” Classes (node types) and Properties (edges’ labels) ‒ Which properties go with which classes ‒ Which values a property can take ‒ Class Hierarchies and Inheritance ‒ Property Hierarchies! N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 33 Semantic Web Ontology Languages RDF Schema
  34. 34. H2020 – Grant Agreement no. 691025 • A richer ontology language • (More) relations between classes and properties ‒ e.g., equivalent classes/properties, disjoint classes/properties, inverse properties, Boolean combinations of classes • Cardinality constraints ‒ e.g. People have exactly one mother, Students attend at most five courses • Richer typing / restrictions on properties ‒ E.g. Postgraduate students attend only Postgraduate courses ‒ E.g. Faculty members must teach at least one Undergraduate course • Characteristics of properties ‒ Object properties vs. Datatype properties ‒ Symmetrical, transitive, functional properties N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 34 Semantic Web Ontology Languages OWL (Web Ontology Language)
  35. 35. H2020 – Grant Agreement no. 691025 • RDF statement: ex:index.html dc:creator exstaff:85740 35 RDF Schema example ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range rdf:type N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  36. 36. H2020 – Grant Agreement no. 691025 ex:WebPage rdf:type rdfs:Class ex:Employee rdf:type rdfs:Class ex:Person rdf:type rdfs:Class ex:Employee rdf:subClassOf ex:Person dc:creator rdf:type rdf:Property dc:creator rdfs:domain ex:WebPage dc:creator rdfs:range ex:Person 36 RDF Schema definitions • Resource ex:WebPage is a class ‒ An RDFS class, there also OWL classes • Resource ex:Employee is a class • Class ex:Employee is a subclass of class ex:Person • Resource dc:creator is a property (instance of rdf:Property class) • Property dc:creator is attached to class ex:WebPage and takes values instances of class ex:Person N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  37. 37. H2020 – Grant Agreement no. 691025 • Resource ex:index.html is an instance of class ex:WebPage ex:index.html rdf:type ex: WebPage • Resource exstaff:85740 is an instance of class ex:Employee exstaff:85740 rdf:type ex:Employee 37 Connecting RDF instances to classes N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  38. 38. H2020 – Grant Agreement no. 691025 • Reasoning in RDF Schema is based on entailment rules • Rules are logical implications: IF a condition is true THEN the conclusion is also true • Entailment rules: IF such and such triples exist THEN add also these triples (conclusions made true) • E.g. rule for the subClassOf property: IF ?x rdf:type ?u . AND ?u rdfs:subClassOf ?v . THEN ?x rdf:type ?v . N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 38 Reasoning in RDF Schema
  39. 39. H2020 – Grant Agreement no. 691025 RDF Schema Inference / Query Process Original (explicit) set of triples RDF document translation RDF/S Inference rules Inferred (implicit) set of triples Set of all triples Query N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 39
  40. 40. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to subClassOf ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range rdf:type N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 40
  41. 41. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property domain ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 41
  42. 42. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property range ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range rdf:type N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 42
  43. 43. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property range ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 43
  44. 44. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property range ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 44
  45. 45. H2020 – Grant Agreement no. 691025 Reasoning Example in RDF Schema Due to property range ex:index.html exstaff:85740 dc:creator ex:WebPage rdf:type ex:Employee rdf:type ex:Person rdfs: subClassOf rdfs:Class rdf:type rdf:type rdf:type rdf:Property rdfs:domain rdfs:range N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 45
  46. 46. H2020 – Grant Agreement no. 691025 • OWL is mapped on a Description Logic ‒ A subset of Predicate Logic (aka First-Order Logic / FOL) ‒ Then Description Logic reasoners are used (FaCT++, RACER, Pellet, Hermit, etc.) • Why a subset of Predicate Logic? ‒ Reasoning in Predicate Logic is undecidable ‒ Reasoning in Description logics is (usually) decidable ‒ Efficient decision procedures have been designed and implemented N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 46 Reasoning in OWL
  47. 47. H2020 – Grant Agreement no. 691025 • Equivalence of classes ‒ If class A is equivalent to class B, and class B is equivalent to class C, then A is equivalent to C, too ‒ If class A is subclass of B, B subclass of C and C subclass of A, then A, B, C are equivalent to each other • Class membership ‒ If x is an instance of a class C, and C is a subclass of D, then we can infer that x is an instance of D ‒ If C and D are equivalent classes, then if x is an instance C, then it is also an instance of D, and vice versa Reasoning Tasks in OWL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 47
  48. 48. H2020 – Grant Agreement no. 691025 • Consistency ‒ X instance of both classes A and B, but A and B are disjoint ‒ X is an instance of both A and complement of A ‒ This is an indication of an error in the ontology • Instance Classification ‒ Certain property-value pairs are a sufficient condition for membership in a class A →FirstYearCourses are Courses with Year=1 ‒ If an individual x satisfies such conditions, we can conclude that x must be an instance of A Reasoning Tasks in OWL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 48
  49. 49. H2020 – Grant Agreement no. 691025 • Class Classification ‒ A driver is a person that drives a vehicle. ‒ A bus driver is a person that drives a bus. ‒ A bus is a vehicle. ‒ A bus driver drives a vehicle, … ‒ … so he/she must be a driver. • Instance equality ‒ Every person has a unique mother ‒ X has two mothers A and B ‒ Thus, (in order to restore consistency) A = B. ‒ If we already know that A  B, then we have an inconsistency. Reasoning Tasks in OWL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 49
  50. 50. H2020 – Grant Agreement no. 691025 • The more expressive a logic is, the more computationally expensive it becomes to draw conclusions ‒ Drawing certain conclusions may become impossible if non-computability barriers are encountered • Compromise: ‒ A language supported by reasonably efficient reasoners ‒ A language that can express large classes of ontologies and knowledge Tradeoff between Expressive Power and Computational Complexity N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 50
  51. 51. H2020 – Grant Agreement no. 691025 • OWL (OWL 2) comes with flavours / profiles of decreasing complexity, suitable for different types of applications ‒ OWL Full (full compatibility with RDF Schema) ‒ OWL DL (all constructs allowed – not combined with RDF Schema) ‒ OWL 2 EL: useful in applications where ontologies contain very large numbers of properties / classes, but not so many instances ‒ OWL 2 QL: aimed at applications that with very large volumes of instances; query answering is the most important reasoning task (data resides in relational databases) ‒ OWL 2 RL: aimed at applications that require scalable reasoning without sacrificing too much expressive power (data resides in triplestores) OWL flavours / profiles N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 51
  52. 52. H2020 – Grant Agreement no. 691025 Storing and Querying RDF data with SPARQL … and an introduction to DBpedia 52N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  53. 53. H2020 – Grant Agreement no. 691025 • RDF data (triples) are stored in special NoSQL databases, called Triple stores or Graph stores. ‒ Databases optimized to import, store, and query a huge number of triples. ‒ Sesame (70 million), OpenLink Virtuoso (>15.4 billion), GraphDB (>12 b), Apache Jena TDB (200 m), AllegroGraph (1 trillion), IBM DB2, Oracle, … • Triple stores are queried using SPARQL ‒ Sending SPARQL queries using the SPARQL protocol ‒ Triple stores provide a (public) endpoint, where SPARQL queries can be submitted • Clients can send queries to an endpoint using the HTTP protocol. ‒ You can issue a SPARQL query to an endpoint by entering it into the browser ‒ It’s preferable to have a client designed specifically for SPARQL. N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 53 Triple Stores
  54. 54. H2020 – Grant Agreement no. 691025 DBpedia SPARQL endpoint (http://dbpedia.org/sparql) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 54
  55. 55. H2020 – Grant Agreement no. 691025 • A project aiming to extract structured content from the information created as part of the Wikipedia. ‒ This structured information is then made available on the Web • DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets. • One of the more famous parts of the Linked Data project, according to TBL DBpedia (http://dbpedia.org) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 55
  56. 56. H2020 – Grant Agreement no. 691025 • Wikipedia articles include structured information embedded in the articles (mostly free text) ‒ E.g. "infobox" tables, categorization information, images, geo-coordinates and links to external pages. ‒ This structured information is extracted and put in a uniform dataset which can be queried. • The DBpedia project uses RDF to represent the extracted information. ‒ 8.8 billion RDF triples →1.1 billion from the English edition →4.4 billion from other language editions (125) →3.2 billion from DBpedia Commons and Wikidata DBpedia dataset N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 56
  57. 57. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 57 Wikipedia extraction to DBpedia Infobox Title
  58. 58. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 58 Wikipedia extraction to DBpedia Infobox Categories
  59. 59. H2020 – Grant Agreement no. 691025 • The same concepts can be expressed using different properties in templates ‒ E.g. birthplace and placeofbirth ‒ Queries about where people were born must search for both properties to get complete results. • The DBpedia Mapping Language helps mapping properties to an ontology ‒ Reduces the number of synonyms • The development of the ontology and the mappings are open to public ‒ Due to large diversity of infoboxes and properties DBpedia challenges N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 59
  60. 60. H2020 – Grant Agreement no. 691025 SPARQL examples • Find all cities in Cyprus with a University select DISTINCT ?c where { ?u rdf:type dbo:University . ?u dbo:country dbr:Cyprus . ?u dbo:city ?c . } ORDER BY ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 60
  61. 61. H2020 – Grant Agreement no. 691025 SPARQL examples • Find cities in Cyprus without Universities select ?c where { ?c rdf:type dbo:City . ?c dbo:country dbr:Cyprus . FILTER NOT EXISTS { ?u rdf:type dbo:University . ?u dbo:city ?c . } } Empty result! N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 61
  62. 62. H2020 – Grant Agreement no. 691025 • Find all cities in Cyprus select ?c where { ?c rdf:type dbo:City . ?c dbo:country dbr:Cyprus . FILTER NOT EXISTS { ?u rdf:type dbo:University . ?u dbo:city ?c . } } N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 62 SPARQL examples What happened to the rest of the cities?
  63. 63. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 63 DBpedia entry for Limassol There is no dbr:Limassol rdf:type dbo:City . triple in DBpedia
  64. 64. H2020 – Grant Agreement no. 691025 DBpedia: the Truth! • DBpedia has a lot of wrong or missing information • Not all Wikipedia properties (infobox) have been correctly mapped to the corresponding DBpedia ontology property • Thus, in order to retrieve the correct information sometimes you have to become a detective! N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 64
  65. 65. H2020 – Grant Agreement no. 691025 Query that returns more results select DISTINCT ?c where { { ?c rdf:type dbo:City . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . } order by ?c ?c is a city, or ?c is the city of something ?x N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 65
  66. 66. H2020 – Grant Agreement no. 691025 Filtering results select DISTINCT ?c where { { ?c rdf:type dbo:City . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . ?c dbo:populationTotal ?p . FILTER ( ?p >= 50000 ) } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 66
  67. 67. H2020 – Grant Agreement no. 691025 Return cities and their mayors select DISTINCT ?c ?n where { { ?c rdf:type dbo:City . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . ?c dbo:leaderName ?n . } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 67 What happened to the rest of the cities? They have not their mayor mentioned in DBpedia
  68. 68. H2020 – Grant Agreement no. 691025 Return cities and their mayors (if mentioned) select DISTINCT ?c ?n where { { ?c rdf:type dbo:City . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . OPTIONAL { ?c dbo:leaderName ?n . } } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 68 The OPTIONAL keyword allows for flexibility in pattern matching • Needed for “null values”
  69. 69. H2020 – Grant Agreement no. 691025 What about smaller towns? select DISTINCT ?c where { { ?c rdf:type dbo:Settlement . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . ?c dbo:populationTotal ?p . FILTER ( ?p > 1000 ) } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 69 dbo:Settlement is appropriate for villages, towns, even neighborhoods Too many results!
  70. 70. H2020 – Grant Agreement no. 691025 Just count them! select count(DISTINCT ?c) as ?Num where { { ?c rdf:type dbo:Settlement . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . ?c dbo:populationTotal ?p . FILTER ( ?p > 1000 ) } order by ?c N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 70 The results contain also cities, towns, …, from the pseudo- state of “North Cyprus”
  71. 71. H2020 – Grant Agreement no. 691025 Exclude results that are (directly) part of select count(DISTINCT ?c) as ?Num where { { ?c rdf:type dbo:Settlement . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . FILTER NOT EXISTS { ?c dbo:isPartOf dbr:Northern_Cyprus . } ?c dbo:populationTotal ?p . FILTER ( ?p > 1000 ) } N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 71 The results still contain places that are indirectly related to the pseudo-state of “Northern Cyprus” (e.g. dbr:Yeni_Jami,_Nicosia)
  72. 72. H2020 – Grant Agreement no. 691025 Exclude results that are directly or indirectly part of select count(DISTINCT ?c) as ?Num where { { ?c rdf:type dbo:Settlement . } UNION { ?x dbo:city ?c . } ?c dbo:country dbr:Cyprus . FILTER NOT EXISTS { ?c dbo:isPartOf+ dbr:Northern_Cyprus . } ?c dbo:populationTotal ?p . FILTER ( ?p > 1000 ) } N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 72 The + sign indicates a path of 1 or more dbo:isPartOf edges in the RDF graph dbo:isPartOf dbo:isPartOf/dbo:isPartOf dbo:isPartOf/dbo:isPartOf/dbo:isPartOf …
  73. 73. H2020 – Grant Agreement no. 691025 • dbr:Nicosia expands to full URI: http://dbpedia.org/resource/Nicosia ‒ Unique ID that represents the resource in the Web of Data • However, if you type the above URI at a browser, you will be re-directed at: http://dbpedia.org/page/Nicosia ‒ Manifestation (or visualization) of the resource in the Web of Documents • If you type at the browser: http://dbpedia.org/data/Nicosia ‒ The browser will retrieve an RDF/XML file that contains all information for Nicosia ‒ All triples with http://dbpedia.org/resource/Nicosia as a subject N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 73 BTW: LOD principles in action
  74. 74. H2020 – Grant Agreement no. 691025 Ontologies, Logic & Rules Broadening reasoning capabilities 74N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  75. 75. H2020 – Grant Agreement no. 691025 • Can ontologies express every piece of knowledge needed in the Semantic Web? • They can express static information (e.g. knowledge about the world) • They have some reasoning capabilities, but… • They cannot combine information from several different individuals in order to come to a complex conclusion about the world • They cannot reason about which actions should be performed at each situation Ontology shortcomings N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 75
  76. 76. H2020 – Grant Agreement no. 691025 • Consists of logical implications (rules): A1, . . ., An  B ‒ Ai and B are atomic formulas • There are 2 ways of reading such a rule: ‒ Deductive rules: If A1,..., An are known to be true, then B is also true ‒ Reactive rules: If the conditions A1,..., An are true, then carry out the action B • Horn logic is tractable and is supported by efficient reasoning tools Horn Logic: a(nother) Predicate Logic subset N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 76
  77. 77. H2020 – Grant Agreement no. 691025 • Neither of them is a subset of the other ‒ Both are needed in the Semantic Web • Horn logic example: ‒ Persons who study and live in the same city are “home students” studies(X,U), lives(X,A), loc(U,C), loc(A,C)  homeStudent(X) ‒ It is impossible to state that in OWL • OWL example: ‒ A person is either a man or a woman ‒ Easily expressed in OWL using disjoint union ‒ It is impossible to state that in Horn logic Horn Logic vs. Description Logics (aka ontologies) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 77
  78. 78. H2020 – Grant Agreement no. 691025 • Horn logic (rules) and description logics (ontologies) are orthogonal ‒ Both are subsets of first-order logic (FOL) ‒ Neither is the subset of each other Horn logic vs. Description logics FOL Horn LogicDescription logics OWL 2 RL SWRL   N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 78
  79. 79. H2020 – Grant Agreement no. 691025 • The simplest integration approach is the intersection of both logics • OWL 2 RL is an interesting sublanguage of OWL 2 DL ‒ Inherits open-world assumption and non-unique-name assumption ‒ These assumptions do not make a difference OWL2 RL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 79
  80. 80. H2020 – Grant Agreement no. 691025 • OWA: We cannot conclude some statement x to be false simply because we cannot show x to be true. ‒ E.g. a patient’s clinical history does not include a particular allergy ‒ It would be incorrect to assume that the patient does not suffer from that allergy ‒ It is unknown, unless more information is given • CWA: If something cannot be proved, then it is false ‒ E.g. we are looking for a direct flight between Larnaca and Madrid in a DB application for airline reservations ‒ The flight doesn’t exist in the database ‒ Expected / correct answer: “There is no direct flight between Austin and Madrid.” • OWL is committed to OWA, Horn logic to CWA Open-World Assumption (OWA) vs. Closed-World Assumption (CWA) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 80
  81. 81. H2020 – Grant Agreement no. 691025 • Statement: “Juan is a citizen of the USA.” • Question: “Is Juan a citizen of Colombia?” ‒ CWA answer: no ‒ OWA answer: I don’t know • Additional statements: ‒ “A person can only be citizen of one country” ‒ “Juan is a citizen of Colombia.” • CWA: error (we assume that Colombia and USA are different things) • OWA: “USA and Colombia must be the same thing” OWA vs. CWA Examples N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 81
  82. 82. H2020 – Grant Agreement no. 691025 • When two individuals are known by different names, they are in fact different individuals. ‒ Sometimes works well and sometimes not ‒ Example in favor: when two products in a catalog are known by different codes, they are different ‒ Example against: two people in a social environment initially known with different identifiers (e.g., “Prof. van Harmelen” and “Frank”) are sometimes the same person • CWA systems (Horn logic) have UNA • OWA systems (OWL) do not have UNA ‒ However, one could manually add the UNA ‒ Using owl:allDifferent or owl:differentFrom Unique-Name Αssumption (UNA) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 82
  83. 83. H2020 – Grant Agreement no. 691025 • OWL 2 RL is the largest fragment of OWL on which the choice for CWA/OWA and UNA does not matter ‒ Weak enough so that differences between choices don’t show up. ‒ Still large enough to enable useful representation and reasoning tasks. • Constructs of OWL that can be expressed using Horn logic rules ‒ Subclass, sub-property, class and property equivalence ‒ Equality-inequality between individuals ‒ Inverse, transitive, symmetric and functional properties ‒ Intersection of classes • Excluded constructors ‒ Union, existential quantification, and arbitrary cardinality constraints OWL 2 RL N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 83
  84. 84. H2020 – Grant Agreement no. 691025 • A triple s p o is expressed as a fact p(s, o) • An instance declaration rdf:type(a,C) ‒ a is an instance of class C ‒ expressed as C(a) • C is a subclass of D: C(X) → D(X) • P is a sub-property of Q: P(X,Y) → Q(X,Y) • Domain and Range Restrictions ‒ D is the domain of property P: P(X, Y) → D(X) ‒ R is the range of property P: P(X, Y) → R(Y) RDF constructs N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 84
  85. 85. H2020 – Grant Agreement no. 691025 • equivalentClass(C,D): C(X) → D(X), D(X) → C(X) • equivalentProperty(P,Q): P(X,Y) → Q(X,Y), Q(X,Y) → P(X,Y) • Transitive Properties: P(X,Y), P(Y,Z) → P(X,Z) • allValuesFrom(P,D): C(X), P(X,Y) → D(Y) ‒ Necessary restriction (e.g. all undergraduate students must attend only undergraduate courses) ‒ UGStudent(X), attends(X,Y) → UGCourse(Y) • someValuesFrom(P,D): P(X,Y), D(Y) → C(X) ‒ Sufficient restriction / Instance classification rule (e.g. if someone attends a postgraduate course, then he/she is a postgraduate student) ‒ attends(X,Y), PGCourse(Y) → PGStudent(X) OWL Constructs N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 85
  86. 86. H2020 – Grant Agreement no. 691025 • A proposed Semantic Web language combining OWL 2 DL with Horn logic ‒ Syntax: Datalog RuleML • Allows the definition of Horn-logic rules on top of OWL 2 DL ontologies ‒ Rule conclusions are “stored” back in the ontology • SWRL unites the expressivities of DL and Horn-logic ‒ OWL 2 RL combines the advantages of both languages in their common sublanguage • SWRL is intractable ‒ DL-safe rules: tractable subset of SWRL →Every variable must appear in a non-DL atom in the rule body Semantic Web Rules Language (SWRL) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 86
  87. 87. H2020 – Grant Agreement no. 691025 Man(?m) → Person(?m) ‒ Possible in OWL - subclassOf relation ‒ Some rules are OWL syntactic sugar Person(?m)  hasSex(?m,male) → Man(?m) ‒ Possible in OWL – hasValue (sufficient) restriction ‒ Not all such reclassifications are possible in OWL Person(?m)  hasSpouse(?m,?w)  works_at(?w,?j)  publicOrg(?j) → MarriedToPublicServantPerson(?m) ‒ Not possible in OWL Example SWRL Rules: Reclassification N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 87
  88. 88. H2020 – Grant Agreement no. 691025 hasParent(?x, ?y)  hasBrother(?y, ?z) → hasUncle(?x, ?z) ‒ Property chaining ‒ Possible in OWL 2 - Not possible in OWL 1 Person(?p)  hasSibling(?p,?s)  Man(?s) → hasBrother(?p,?s) ‒ Not possible in OWL Publication(?p)  hasAuthor(?p,?y)  hasAuthor(?p,?z)  differentFrom(?y,?z) → cooperatedWith(?y, ?z) ‒ SWRL does not adopt the UNA ‒ Individuals must also be explicitly stated to be different (using owl:allDifferent restriction) in the OWL ontology Example SWRL Rules: Property Value Assignment N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 88
  89. 89. H2020 – Grant Agreement no. 691025 • Built-ins dramatically increase expressivity ‒ Most rules are not expressible in OWL 1 ‒ Some built-ins can be expressed in OWL 2 Person(?p)  hasAge(?p,?age)  swrlb:greaterThan(?age,17) → Adult(?p) Person(?p)  hasNumber(?p, ?number)  swrlb:startsWith(?number, "+") → hasInternationalNumber(?p, true) Person(?p)  hasSalaryInPounds(?p, ?gbp)  swrlb:multiply(?d, ?gbp, 2) → hasSalaryInDollars(?p, ?dollars) Example SWRL Rules: Built-ins N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 89
  90. 90. H2020 – Grant Agreement no. 691025 Person(?p)  not hasCar(?p, ?c) → CarlessPerson(?p) • Not possible – rule language does not support negation • Potential invalidation - what if a person later gets a car? N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 5-90 SWRL is Monotonic: does not Support Negation
  91. 91. H2020 – Grant Agreement no. 691025 • De-facto industry standard to represent SPARQL rules and constraints on Semantic Web models. • Provides meta-modeling capabilities that allow users to define their own SPARQL functions and query templates. • Includes a ready to use library of common functions. • SPIN follows the CWA ‒ A special kind on negation exists (negation-as-failure) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 91 SPIN (SPARQL Inferencing Notation)
  92. 92. H2020 – Grant Agreement no. 691025 • Rules can be expressed in SPARQL using the CONSTRUCT feature parent(X,Y), parent(Y,Z)  grandparent(X,Z) CONSTRUCT { ?X grandParent ?Z . } WHERE { ?X rdf:type Person . ?X parent ?Y . ?Y parent ?Z . } Rules in SPARQL: SPIN optional N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 92
  93. 93. H2020 – Grant Agreement no. 691025 • Rules can be associated to classes ‒ Rules can represent behavior of the instances of that class (an OOP feature!) ‒ Even constructors can be define! ‒ Global rules can also be defined • SPIN variants ‒ Inference / Entailment Rules: CONSTRUCT, INSERT ‒ Production Rules: DELETE ‒ Integrity constraints: ASK SPIN N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 93
  94. 94. H2020 – Grant Agreement no. 691025 • The grandParent rule is stored at the Person class ‒ It will be executed only for instances of this class (and subclasses) ‒ Increases accuracy and efficiency CONSTRUCT { ?this grandParent ?Z . } WHERE { ?this parent ?Y . ?Y parent ?Z . } Rules in SPARQL: SPIN ?this means an instance of the class, where the rule is stored N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 94
  95. 95. H2020 – Grant Agreement no. 691025 • Constraint are expressed via the ASK SPARQL construct • For each instance that the ASK query is true, then the constraint is violated ‒ A constraint violation warning is issued ASK WHERE { ?this hasAge ?age . FILTER (?age < 18) . } SPIN Constraints The SPIN constraint is stored at class Student N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 95
  96. 96. H2020 – Grant Agreement no. 691025 • A more radical solution to constraint violation would be to delete the instance that violates the constraint ‒ This can be done via the DELETE construct ‒ The SPIN rules should be declared as a constructor ‒ Constructor rules run each time a new instance is created DELETE { ?this rdf:type Student . } WHERE { ?this hasAge ?age . FILTER (?age < 18) . } SPIN Constructors The SPIN constructor is stored at class Student N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 96
  97. 97. H2020 – Grant Agreement no. 691025 • Unlike SWRL, in SPIN there is a special negation (negation-as-failure) ‒ When something is not found to be true, it is false, thus its negation is true Person(?p)  not hasCar(?p, ?c) → CarlessPerson(?p) CONSTRUCT { ?this rdf:type CarlessPerson . } WHERE { ?this rdf:type Person . FILTER NOT EXISTS { ?this hasCar ?x . } } Using negation in SPIN If we store the SPIN rule at class Person we do not need this line N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 97
  98. 98. H2020 – Grant Agreement no. 691025 C1(X), equivalentClass(C1,C2)  C2(X) CONSTRUCT { ?X a ?C2 . } WHERE { ?X a ?C1. ?C1 equivalentClass ?C2. } OWL 2 RL rules in SPIN Actually, in some Semantic Web systems, the OWL 2 RL semantics are implemented via SPIN rules (TopBraid Composer) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 98
  99. 99. H2020 – Grant Agreement no. 691025 A Semantic Entity Linking Use Case The URank system 99N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 URank Entity Extractor Site-specific transformations Extraction Rules Extracted Data Ranking sites Entity Linker Ranking datasets Merged dataset Entity Merger Domain- specific filtering Ranking ontology
  100. 100. H2020 – Grant Agreement no. 691025 • University rankings ‒ Means of advertisement ‒ There are so many of them! (>20 global rankings) ‒ Do we need all of them? Are they similar? Are they robust? ‒ Comparative Statistical Analysis is needed • Collecting the data from multiple web sites ‒ Web data extraction ‒ Each ranking site produces a ranking table ‒ A single table needs to be constructed to feed the Statisticians ‒ The ranking tables need to be merged into a single all-rankings table N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 100 Motivating example
  101. 101. H2020 – Grant Agreement no. 691025 Name Rank … … … … Aristotle University of Thessaloniki 491-500 … … … N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 101 The easy case… QS Name Rank … … … … Aristotle University of Thessaloniki 401-500 … … … THE … Name QS THE … … … … … Aristotle University of Thessaloniki 491-500 401-500 … … … … … Merged table
  102. 102. H2020 – Grant Agreement no. 691025 Name Rank … … … … The Imperial College of Science, Technology and Medicine 22 … … … N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 102 The difficult case… ARWU Name Rank … … … … Imperial College London 8 … … … THE … Name ARWU THE … … … … … The Imperial College of Science, Technology and Medicine 22 - … Imperial College London - 8 … … … … … Merged table Levenshtein Distance = 38 Substring similarity* = 0.605 *A string metric for ontology alignment by Giorgos Stoilos, 2005.
  103. 103. H2020 – Grant Agreement no. 691025 • Use string matching and semantic search to find a unique ID for each university in each ranking table • Where to search? DBpedia ‒ DBpedia / Wikipedia have entries for almost all World Universities ‒ Each DBpedia entity has an ID • Hopefully different variations of the University name will retrieve the same University entity in DBpedia • Then merging is straight forward N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 103 Merging tables: Solution
  104. 104. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 104 URank[1] System URank Entity Extractor Site-specific transformations Extraction Rules Extracted Data Ranking sites Entity Linker Ranking datasets Merged dataset Entity Merger Domain- specific filtering Ranking ontology [1] N. Bassiliades: “Collecting University Rankings for Comparison Using Web Extraction and Entity Linking Techniques”, Springer CCIS Vol.469, pp.23-46, 2014. A Prolog application that: 1. Extracts data from ranking sites 2. Links Universities to DBpedia entities through semantic search and string similarity 3. Generates merged table
  105. 105. H2020 – Grant Agreement no. 691025 • Entity linking: the task of determining the identity of entities mentioned in text • Different from Named Entity Recognition, which identifies the occurrence or mention of a named entity in text but it does not identify which specific entity it is. • Entity linking requires a knowledge base containing the entities to which entity mentions can be linked. • In URank: ‒ Named Entity Recognition is not needed - All University names are entities ‒ Entity Linking uses DBpedia as a knowledge base N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 105 Entity Linking
  106. 106. H2020 – Grant Agreement no. 691025 • DBpedia University entities do not always belong to the correct class ‒ dbo:University or dbo:EducationalInstitution ‒ Need to loose that criterion carefully • University mergers /splits ‒ University of Paris split in 1970 into 13 Universities named with very similar names: University of Paris I, II, … ‒ University of Montpellier split into three universities (I, II, II) in 1970; I and II merged back in 2015 ‒ Need to check if the University is currently operating • Newcastle University, UK vs University of Newcastle, Australia ‒ Need to check the country N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 106 Entity Linking challenges
  107. 107. H2020 – Grant Agreement no. 691025 • US vs USA ‒ Need to have a common representation for countries • Imperial College London vs Imperial College of Science, Technology and Medicine ‒ Need to have access to alternative University names • University of Montpellier II vs University of Montpellier 2 ‒ Need to convert between Arabic and roman literals • Universität vs University ‒ Need to translate between different languages N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 107 Entity Linking challenges
  108. 108. H2020 – Grant Agreement no. 691025 • DBpedia Spotlight: annotate mentions of DBpedia resources in NL text ‒ http://demo.dbpedia-spotlight.org/ ‒ Have tried it for the University Ranking use case with ~86% F-measure • Silk: integrate heterogeneous data sources ‒ http://silkframework.org/ ‒ Generates links between related data items within different Linked Data sources. ‒ Linked Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web. ‒ Experimentation for the use case has even lower F-measure • Domain-specific knowledge must be used to face all challenges N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 108 General-purpose Entity Linking Tools
  109. 109. H2020 – Grant Agreement no. 691025 • Each University entity at each ranking site is matched against a DBpedia entry using 3 alternative methods ‒ DBpedia lookup service ‒ DBpedia SPARQL endpoint, using approximate string matching filtering functions ‒ Keyword search engine of Wikipedia • At each step, if a satisfactory match is found (substring matching), the algorithm terminates • Otherwise, all matching entries are collected and scored ‒ Top scored candidate is returned as a match N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 109 Entity Linking in URank
  110. 110. H2020 – Grant Agreement no. 691025 • String distance is measured using a metric for ontology alignment [1] ‒ Concerns substring matching ‒ More appropriate for matching names of Universities, than e.g. Levenshtein ‒ E.g. “Imperial College” vs. “Imperial College of Science, Technology and Medicine” ‒ Materialized as a built-in predicate in SWI-Prolog (isub/4) • Satisfactory match is above a high similarity threshold ‒  0,97 depending on the algorithm step [1] Stoilos, G., Stamou, G., Kollias, S.: A String Metric for Ontology Alignment. ISWC 2005, LNCS, vol. 3729, pp. 624-637. Springer (2005) N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 110 String matching
  111. 111. H2020 – Grant Agreement no. 691025 http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?Q ueryClass=University&QueryString=Imperial%20College%20Lo ndon • RESTful API parameters: ‒ QueryString: a string for which a DBpedia URI should be found (University name) ‒ QueryClass: a DBpedia class from the Ontology that the results should have (University) ‒ MaxHits: the maximum number of returned results (default: 5) • Results in XML should be parsed ‒ using native SWI-Prolog XPath built-ins 111 DBpedia lookup service N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  112. 112. H2020 – Grant Agreement no. 691025 • If DBpedia lookup does not return a satisfactory match then the DBpedia SPARQL endpoint is used (provided by OpenLink Virtuoso RDF DB engine) SELECT ?univ, ?name WHERE { ?univ rdf:type Class . ?univ ?property ?val . ?val bif:contains University.Name.Words option (score ?score) . ?univ rdfs:label ?name . FILTER (lang(?name) = "en") } ORDER BY DESC (?score*0.3+sql: rnk_scale(<LONG::IRI_RANK> (?univ))) LIMIT Top-N2 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 112 DBpedia SPARQL endpoint (http://dbpedia.org/sparql)
  113. 113. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 113 SPARQL query example @ DBpedia Results are actually returned in RDF, not HTML
  114. 114. H2020 – Grant Agreement no. 691025 • If neither native “DBpedia methods” return a satisfactory match, the Wikipedia search engine is used http://en.wikipedia.org/w/index.php?search=Univ.name&limit=Top-N3&go=Go • Results are verified using equivalent DBpedia entity ‒ Wikipedia URLs are uniquely transformed to DBpedia URIs ‒ https://en.wikipedia.org/wiki/Imperial_College_London  http://dbpedia.org/resource/Imperial_College_London N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 114 Keyword search engine of Wikipedia
  115. 115. H2020 – Grant Agreement no. 691025 • Spatiotemporal constraints are used for filtering ‒ Not easy, due to incomplete and inaccurate information from DBpedia/Wikipedia • Retrieved university must be located in the same country with the extracted university ‒ Using dbo:country property (not present in all entities) ‒ Using alternative properties dbo:state, dbo:city, dbo:location, and try to locate country (geographical areas containment) ‒ Using redirection to another entity (owl:sameAs, dbo:wikiPageRedirects) • Countries are not always represented using the same name ‒ E.g. US, U.S., USA, U.S.A., United States of America ‒ E.g. UK, United Kingdom, United Kingdom of Great Britain and Northern Ireland ‒ Synonym matrix for most problematic cases N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 115 Spatiotemporal constraints
  116. 116. H2020 – Grant Agreement no. 691025 • select ?x where { <DBPediaURL> dbo:country ?x . } • select ?x where { <DBPediaURL> dbo:state ?x . } • select ?x where { <DBPediaURL> dbo:city ?x . } • select ?x where { <DBPediaURL> dbo:location ?x . } • … N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 116 Geographical area containment
  117. 117. H2020 – Grant Agreement no. 691025 • Retrieved university must be still operating ‒ E.g. University of Paris split, University of Montpellier split and merge, etc. • Checking if the property dbp:closed exists ‒ Not all “closed” Universities have this property ‒ In this case heuristic scrapping from Wikipedia entries is performed N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 117 Spatiotemporal constraints
  118. 118. H2020 – Grant Agreement no. 691025 Conclusions 118N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  119. 119. H2020 – Grant Agreement no. 691025 • Semantic technologies and techniques – not necessarily using Semantic Web standards – have been adopted for building and using knowledge graphs ‒ Google, Facebook, Yandex, Baidu, Bing have embedded Semantic technologies into their core businesses ‒ The presentage of Web content that uses microdata/RDFa/schema.org information to enhance search results reached double digits ‒ IBM’s Watson (AlchemyLanguage) returns concept information as DBpedia/yago/freemix URIs, including a Linked Data API • Semantic technologies are rapidly finding their ways into consumer products ‒ The average developer barely realizes it, but semantics are now just about everywhere N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 119 The Future of Semantic Technologies
  120. 120. H2020 – Grant Agreement no. 691025 • Semantic Web is a branch of AI • The market interest and justification is in solving problems ‒ AI has problem solving in its core • Semantic technologies will become an essential enabler to the development of true knowledge-based AI ‒ What things mean and how people understand what they mean • AI applications currently focus on Machine Learning (ML) and NLP ‒ Knowledge graphs can help enhance the accuracy of ML/NLP ‒ NLP: top-down processing (use ontology terms to disambiguate text) ‒ ML: bottom-up processing (learn from data and match to generalized concepts) 120 Semantic Web and Artificial Intelligence N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017
  121. 121. H2020 – Grant Agreement no. 691025 N. Bassiliades, ENCASE Summer School, Limassol, 17th July 2017 121 Enhancing seCurity and privAcy in the Social wEb: a user-centered approach for the protection of minors

×