LINKING OPEN, BIG DATA USING
SEMANTIC WEB TECHNOLOGIES
          AN INTRODUCTION


       Dr. Ronald Ashri - Istos srl
           http://www.istos.it
THE GOAL
connect and explore data
 to discover hidden patterns
 and create new information

new information enables us to
  formulate better solutions
and identify new oportunities
the billion
dollar-o-gram
TOOLS

statistical analysis

network analysis

simulation
(evolutionary,
biologically inspired
systems)

symbolic manipulation
THE SEMANTIC WAY


Build Models

Calculate with knowledge

Exchange Information

Visualize - Analyze
BUILDING MODELS
AND CALCULATING
BUILDING MODELS



what things can be said to exist

what is reality?

bring structure into modeling
ontology is a description of knowledge about
             a domain of interest


          ὸντος = that is how it is
arbor porhyriana
 Substance
              immaterial
  material


   Body                 Spirit

  animate       inanimate


  Living               Mineral

  sensitive    insensitive


  Animal                Plant

  rational      irrational


  Human                    Beast



                                   234AD, Tyre (Lebanon)
knowledge on the web is modeled using
               RDF, RDFs
and/or the Web Ontology Language - OWL
RDF

Resource Description Framework

  directed labelled graph


                 capitalOf
  Cagliari                       Sardegna

 subject        predicate        object
URI


Uniform Resource Identifiers

  a compact sequence of characters to identify an
  abstract or physical resource

  scheme:[//authority]path[?query][#fragment]

    e.g. http://www.regione.sardegna.it/uri
RDF + URI



                        http://example.org/capitalOf

            Cagliari                                   Sardegna
http://www.comune.cagliari.it/uri             http://www.regione.sardegna.it/uri
RDF + URI

                                   eg:livesIn
            Ronald                                            Sicily

http://www.istos.it/ronald#me                      http://dbpedia/resource/Sicily

                                 eg:worksFor
           Ronald                                             Istos

http://www.istos.it/ronald#me                         http://www.istos.it/uri

                                foaf:based_near
            Istos                                            Ispica
     http://www.istos.it/uri                      http://dbpedia/resource/Ispica
RDF SCHEMA


RDF is a general way to describe structured
information

RDF Schema extends RDF to express general
information about a data set

  Resources, Classes, Literals, Datatypes, Properties

  range, domain, subClassOf, subPropertyOf
RDFS SERIALIZATIONS


N3, N-Triples, Turtle

  Human readable, limited software support

RDF XML

  takes advantage of tools to parse XML

RDFa - enables RDF to be embedded in HTML
OWL


Offers more expresivity

  Classes (e.g. City, Region, Country)

  Roles (e.g. containedWithin)

  Individuals (e.g. Cagliari, Sardegna)
CALCULATING

Cagliari is a City      deduction =>
                        Cagliari is
Cagliari is             containedWithin Italy
containedWithin
Sardegna

Sardegna is a Region

Sardegna is
containedWithin Italy
Class(a:giraffe partial a:animal
 restriction(a:eats allValuesFrom (a:leaf)))

Class(a:leaf partial restriction(a:part_of someValuesFrom (a:tree)))
Class(a:tree partial a:plant)

DisjointClasses(unionOf(restriction(a:part_of someValuesFrom (a:animal)) a:animal)
                 unionOf(a:plant restriction(a:part_of someValuesFrom (a:plant))))

Class(a:vegetarian complete intersectionOf(
  restriction(a:eats allValuesFrom (complementOf(restriction(a:part_of
someValuesFrom (a:animal)))))
  restriction(a:eats allValuesFrom (complementOf(a:animal))) a:animal))

 • Giraffes only eat leaves
 • Leaves are parts of trees, which are plants
 • Plants and parts of plants are disjoint from animals and parts of
   animals
 • Vegetarians only eat things which are not animals or parts of
   animals
Figure 2: Agent-Based Market Model
(a) Dependency: α sells           (b) Comp-Sell: α and β             (c) Comp-Buy: The goal      (d) Coll: α sells to β
  goods to B                        are competing in α’s RoI           of α is the same that the   and β sells to α
                                                                       goal of β


                                                   Figure 3: Key Relationship Patterns


on the buyer. We specify a dependency relationships in terms of
                                        y     y
goals in the following way: Dep(q(gα ), a(gβ ))RoIβ ⊆V Eα , where
y is the product β is selling to α (i.e. α wants to achieve the goal
of having y), and β’s region of influence is withing α’s viewable
environment as for trade relationships.

4.3    Competition
OWL - XML-based syntax

  suitable for machines and use in web documents

OWL - abstract syntax

  easier to read and write

  closer to description logics
balance between expressivity and efficient reasoning

complex language constructs for respresenting
implicit knowledge yield high computational
complexities or even undecidability
OWL Lite

 decidable

 less expressive

 ExpTime
OWL DL

 contains OWL Lite

 decidable

 supported by software tools

 NExpTime
OWL Full

 very expressive

 undecidable
THE PROCESS

Build an ontology or
vocabulary

State facts about
domain

Reason about
ontologies and facts
EXCHANGING
INFORMATION
WWW 0.1


the original web was
   thought of with
      ontological
  information at its
        heart




                       http://www.w3.org/History/1989/proposal.html
??




     ??
THE PROBLEM




http://www.w3.org/Talks/WWW94Tim/
THE SEMANTIC WEB
SPARQL

Protocol and RDF Query Language

Graph pattern matching

  Uses RDF triples but they may be variables

The reply is the RDF graph equivalent to the
subgraph described
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

  SELECT ?name ?email

WHERE {
  ?person a foaf:Person.
  ?person foaf:name ?name.
  ?person foaf:mbox ?email.
}

endpoint: world-wide web

names and e-mails of every person in the world!
EXCHANGE MORE
    INFORMATION
THE LINKED DATA EFFORT
the semantic web provided tools but
not enough method - the linked data
      effort tries to rectify this
1. Use URIs as names for things
2. Use HTTP URIs so that people can look things up
3. Provide useful info using standards (Sparql)
4. Include links to other URIs
USE URIS




Basic - if you are not using URIs it is not Semantic
Web
USE HTTP URIS



stop inventing your own schemas

HTTP works - browsers know it - let us take
advantage of it
HELP OTHERS




when people look up HTTP URIs make the data
available and/or provide Sparql support
Available on the web (whatever format), but with an open
licence
      Available as machine-readable structured data (e.g. excel
instead of image scan of a table)
        as (2) plus non-proprietary format (e.g. CSV instead of
excel)
          All the above plus, Use open standards from W3C (RDF
and SPARQL) to identify things, so that people can point at your
stuff
          All the above, plus: Link your data to other people’s
data to provide context
datasets
UMBEL - Reference concept ontology
28000 scaffold concepts
services
REASON, VISUALIZE
the fear index
LOOKING AHEAD


Technology and tools
are getting there

Data needs to remain
open

Spread the word -
Annotate!
QUESTIONS



ronald@istos.it

@ronald_istos

Linking Open, Big Data Using Semantic Web Technologies - An Introduction

  • 1.
    LINKING OPEN, BIGDATA USING SEMANTIC WEB TECHNOLOGIES AN INTRODUCTION Dr. Ronald Ashri - Istos srl http://www.istos.it
  • 2.
  • 3.
    connect and exploredata to discover hidden patterns and create new information new information enables us to formulate better solutions and identify new oportunities
  • 7.
  • 8.
  • 10.
    THE SEMANTIC WAY BuildModels Calculate with knowledge Exchange Information Visualize - Analyze
  • 11.
  • 12.
    BUILDING MODELS what thingscan be said to exist what is reality? bring structure into modeling
  • 13.
    ontology is adescription of knowledge about a domain of interest ὸντος = that is how it is
  • 14.
    arbor porhyriana Substance immaterial material Body Spirit animate inanimate Living Mineral sensitive insensitive Animal Plant rational irrational Human Beast 234AD, Tyre (Lebanon)
  • 15.
    knowledge on theweb is modeled using RDF, RDFs and/or the Web Ontology Language - OWL
  • 16.
    RDF Resource Description Framework directed labelled graph capitalOf Cagliari Sardegna subject predicate object
  • 17.
    URI Uniform Resource Identifiers a compact sequence of characters to identify an abstract or physical resource scheme:[//authority]path[?query][#fragment] e.g. http://www.regione.sardegna.it/uri
  • 18.
    RDF + URI http://example.org/capitalOf Cagliari Sardegna http://www.comune.cagliari.it/uri http://www.regione.sardegna.it/uri
  • 19.
    RDF + URI eg:livesIn Ronald Sicily http://www.istos.it/ronald#me http://dbpedia/resource/Sicily eg:worksFor Ronald Istos http://www.istos.it/ronald#me http://www.istos.it/uri foaf:based_near Istos Ispica http://www.istos.it/uri http://dbpedia/resource/Ispica
  • 20.
    RDF SCHEMA RDF isa general way to describe structured information RDF Schema extends RDF to express general information about a data set Resources, Classes, Literals, Datatypes, Properties range, domain, subClassOf, subPropertyOf
  • 21.
    RDFS SERIALIZATIONS N3, N-Triples,Turtle Human readable, limited software support RDF XML takes advantage of tools to parse XML RDFa - enables RDF to be embedded in HTML
  • 22.
    OWL Offers more expresivity Classes (e.g. City, Region, Country) Roles (e.g. containedWithin) Individuals (e.g. Cagliari, Sardegna)
  • 23.
    CALCULATING Cagliari is aCity deduction => Cagliari is Cagliari is containedWithin Italy containedWithin Sardegna Sardegna is a Region Sardegna is containedWithin Italy
  • 24.
    Class(a:giraffe partial a:animal restriction(a:eats allValuesFrom (a:leaf))) Class(a:leaf partial restriction(a:part_of someValuesFrom (a:tree))) Class(a:tree partial a:plant) DisjointClasses(unionOf(restriction(a:part_of someValuesFrom (a:animal)) a:animal) unionOf(a:plant restriction(a:part_of someValuesFrom (a:plant)))) Class(a:vegetarian complete intersectionOf( restriction(a:eats allValuesFrom (complementOf(restriction(a:part_of someValuesFrom (a:animal))))) restriction(a:eats allValuesFrom (complementOf(a:animal))) a:animal)) • Giraffes only eat leaves • Leaves are parts of trees, which are plants • Plants and parts of plants are disjoint from animals and parts of animals • Vegetarians only eat things which are not animals or parts of animals
  • 25.
  • 26.
    (a) Dependency: αsells (b) Comp-Sell: α and β (c) Comp-Buy: The goal (d) Coll: α sells to β goods to B are competing in α’s RoI of α is the same that the and β sells to α goal of β Figure 3: Key Relationship Patterns on the buyer. We specify a dependency relationships in terms of y y goals in the following way: Dep(q(gα ), a(gβ ))RoIβ ⊆V Eα , where y is the product β is selling to α (i.e. α wants to achieve the goal of having y), and β’s region of influence is withing α’s viewable environment as for trade relationships. 4.3 Competition
  • 27.
    OWL - XML-basedsyntax suitable for machines and use in web documents OWL - abstract syntax easier to read and write closer to description logics
  • 28.
    balance between expressivityand efficient reasoning complex language constructs for respresenting implicit knowledge yield high computational complexities or even undecidability
  • 29.
    OWL Lite decidable less expressive ExpTime
  • 30.
    OWL DL containsOWL Lite decidable supported by software tools NExpTime
  • 31.
    OWL Full veryexpressive undecidable
  • 32.
    THE PROCESS Build anontology or vocabulary State facts about domain Reason about ontologies and facts
  • 33.
  • 34.
    WWW 0.1 the originalweb was thought of with ontological information at its heart http://www.w3.org/History/1989/proposal.html
  • 35.
    ?? ??
  • 36.
  • 37.
  • 38.
    SPARQL Protocol and RDFQuery Language Graph pattern matching Uses RDF triples but they may be variables The reply is the RDF graph equivalent to the subgraph described
  • 39.
    PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?email WHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. } endpoint: world-wide web names and e-mails of every person in the world!
  • 40.
    EXCHANGE MORE INFORMATION THE LINKED DATA EFFORT
  • 41.
    the semantic webprovided tools but not enough method - the linked data effort tries to rectify this
  • 42.
    1. Use URIsas names for things 2. Use HTTP URIs so that people can look things up 3. Provide useful info using standards (Sparql) 4. Include links to other URIs
  • 43.
    USE URIS Basic -if you are not using URIs it is not Semantic Web
  • 44.
    USE HTTP URIS stopinventing your own schemas HTTP works - browsers know it - let us take advantage of it
  • 45.
    HELP OTHERS when peoplelook up HTTP URIs make the data available and/or provide Sparql support
  • 46.
    Available on theweb (whatever format), but with an open licence Available as machine-readable structured data (e.g. excel instead of image scan of a table) as (2) plus non-proprietary format (e.g. CSV instead of excel) All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff All the above, plus: Link your data to other people’s data to provide context
  • 48.
  • 49.
    UMBEL - Referenceconcept ontology 28000 scaffold concepts
  • 50.
  • 55.
  • 59.
  • 61.
    LOOKING AHEAD Technology andtools are getting there Data needs to remain open Spread the word - Annotate!
  • 62.