SlideShare a Scribd company logo
1 of 49
Download to read offline
Datalift: A Catalyser for the Web of Data


                    François Scharffe
                    LIRMM/CNRS/University of Montpellier
                       francois.scharffe@lirmm.fr
                       @lechatpito




With the help of the Datalift team
And the support of the French National Research Agency



                FOSDEM 5/02/2011                    1
The data revolution is on its way !

     As Open Data meets the Semantic Web
The promises of linked-data
Richer Applications




Linked Data Lite | the Web on Steroids 1.0 (iPhone)
Richer applications




    BBC Programmes
More precise search and QA
Making your data 5 stars




http://www.w3.org/DesignIssues/LinkedData.html
So, how to lift data ?
    How to publish data on the Web as linked-
    data ?
●   Basic principles Tim Berners Lee [2006] (Design Issues)
       –   Use URIs to identify things (not only documents)
       –   Use HTTP URIs
       –   When dereferecing URIS, return a description of the
           ressource
       –   Include links to other ressources on the Web
Welcome aboard the data lift
                Published and interlinked data on the Web
                             Applications


                Interconnexion


Publication infrastructure


           Data convertion


                 Vocabulary selection




                                        Raw data
Datalift


Datasets publication
R&D to automate the publication process
Tool suite to help publish data
Training, tutorials, data publication camps
st
                       1 floor - Selection
SemWebPro 18/01/2011            11
Les vocabulaires de mes amis …


Ø What is a (good) vocabulary for linked data ?
    § Usability criterias
            Simplicity, visibility, sustainability, integration, coherence …

Ø Differents types of vocabularies
    §   metadata, reference, domain, generalist …
    § The pillars of Linked Data : Dublin Core, FOAF, SKOS
Ø Good and less good practices
    § Ex : Programmes BBC vs legislation.gov.uk
    § Vocabulary of a Friend : networked vocabularies
Ø Linguistic problems
    § Existing vocabularies are in English at 99%
    § Terminological approach :which vocabularies for « Event » « Organization »
Did you say « vocabulary »


… And why not « ontology »?
    § Or « schema » ou « metadata schema »?
    § Ou « model » (data ? World ?)
Ø All these terms are used and justifiable
They are all « vocabularies »
    § The define types of objects (or classes)
      and the properties (oo attributes) atttached to these objects.
    § Types and attributes are logically defined
      and named using natural language
    § A (semantic) vocabulary
      is an explicit formalization
      of concepts existing in natural language

                     SemWebPro 18/01/2011                   13
Vocabularies for linked data


Ø Are meant to describe resources in RDF
Ø Are based on one of the standard W3C language
  § RDF Schema (RDFS)
     • For vocabulaires without too much logical complexity
  § OWL
     • For more complex ontological constructs
   § These two languages are compatible (almost)
Ø The can be composed « ad libitum »
  § One can reuse a few elements of a vocabulary
  § The original semantics have to be followed
What makes a good vocabulary ?


Ø A good vocabulary is a used vocabulary
   § Data published on CKAN give an idea of vocabulary usage
   § Exemple : v
     list of datasets using FOAF http://xmlns.com/foaf/0.1/
Ø Other usability criterias
   § Simplicity and readability in natural language
   § Elements documentation (definition in natural language)
   § Visibility and sustainability of the publication
   § Flexibility and extensibility
   § Sémantique integration (with other vocabularies)
   § Social integration (with the user community)
A vocabulary is also a community


Ø Bad (but common) practice
   ●
       Build a lonely vocabulary
        –   For example as a research project
        –   Without basing it on any existing vocabulary
  § To publish it (or not) and then to forget about it
  § Not to care about its users
Ø A good vocabulary has an organic life
  § Users and use cases
  § Revisions and extensions
  § Like a « natural » vocabulary
Types of vocabularies


Ø Metadata vocabularies
   § Allowing to annotate other vocabularies
       • Dublin Core, Vann, cc REL, Status
Ø Reference vocabularies
   § Provide « common » classes and properties
       • FOAF, Event, Time, Org Ontology
Ø Domain vocabularies
   § Specific to a domain of knowledge
       • Geonames, Music Ontology, WildLife Ontology
Ø « general » vocabularies
   § Describe « everything » at an arbitrary detail level
       • DBpedia Ontology, Cyc Ontology, SUMO
Vocabulary of a Friend


Ø http://www.mondeca.com/foaf/voaf
Ø A simple vocabulary...
Ø To represent interconnexions between vocabularies
Ø A unique entry point to vocabularies and Datasets of
  the linked-data cloud Linked Data Cloud
Ø Ongoing work in Datalift
nd
                   2 floor - Conversion
SemWebPro 18/01/2011         19
URL Design et URL Pattern


Ø Good practices for linked-data
  § Ressource: http://dbpedia.org/resource/Paris
  § Document: http://dbpedia.org/page/Paris
  § Data: http://dbpedia.org/data/Paris
Ø … served using content negociation
URI Pattern in REST


Ø Les services REST (Representational State Transfer)
  manipulent des ressources et les URLs sont
  principalement utilisés pour adresser ces ressources
Ø Une URI de base:
   § http://www.example.com/bookstore/
Ø Une ressource à un URL unique: (retrieve, update,
  create, delete)
   § http://www.example.com/bookstore/books/ISBN123
Ø Notion de collection: (list, replace, create, delete)
   § http://www.example.com/bookstore/books
Convertion tools to RDF


Ø How is the raw data to be converted ?
  § Relational Database ?
  § (Semi-)structured formats ?
  § Programmatic acces (API) ?
Ø There are solutions for all cases
D2RQ Map
Triplify: Relational data to JSON/RDF




Ø Extract a folder in your Webapp:
  http://sourceforge.net/projects/triplify/
Ø Modify a config file:
   § SQL query … URI pattern
   § PHP lover!
Working on spreadsheets
Google acquired Freebase




http://code.google.com/p/google-refine/
RDF extension for Google Refine


Ø A graphical extension for Google Refine allowing to
  export the clean data as RDF
  http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/

                                                                 Annual pay rate
                                                                    - including
     Name            Job Title        Grade     Organization                             Notes
                                                                 taxable benefits
                                                                 and allowances

                 Chief Executive              Asset Protection   £150,000 -
Stephan Wilcke
                 Officer                      Agency             £154,999
                                              Asset Protection   £165,000 -
Jens Bech        Chief Risk Officer                                                 No pension
                                              Agency             £169,999
                 Chief Invesment              Asset Protection   £165,000 -
Ion Dagtoglou                                                                       No pension
                 Officer                      Agency             £169,999
                 Chief Credit                 Asset Protection   £130,000 -
Brian Scammell                                                                      4 days per week
                 Officer                      Agency             £134,999
Google Refine et RDF
rd
                       3 floor - Publication
SemWebPro 18/01/2011             29
Publication components

                       Querying
                       Browsing

            SPARQL               REST
            endpoint


                                            Alimentation
Inference
 Engine                  RDF
                       storage              Alimentation


                                            Alimentation


             A few products
             Virtuoso, Sesame, Mulgara, 4store
             OWLIM, AllegroGraph, Big Data,Jena
Named graphs



Ø Rdf graphs are bags of triples, everything is mixed
                                                            1
Ø Delete on a graph
                                                                    2
Ø SPARQL queries define                                 3

                                                                5
  graphs                            9

                                                                            6
                                        11
                               10
                                                                                    8
                                    12
                                                                        4       7

                                              13

                                                            16

                                         14        15
Inference
                                                                                 1

                                                                             3           2
                                                                                     5
Ø Generating triples from other triples                        9
                                                                                             6
                                                          10       11
                                                                                                     8
Ø Deduction mechanism                                          12
                                                                                         4       7
                                                                        13
   § Men are mortals, Socrates is a man, so Socrates is                          16
     mortal                                                         14 15


Ø Allows to avoid exhaustivity, give sense to
  defining hierarchies
Ø Constraints: cardinality, NFPs, ...
Analyse des RDF Store : la méthode QSOS




Ø Qualification and Selection of Open Source Software
   §   Projet Open Source sur des solutions open source
   §   http://www.qsos.org
Ø Objectifs de QSOS
   §   Qualifier des logiciels
   §   Comparer des solutions après avoir défini des exigences et en pondérant les critères
   §   Sélectionner le produit le plus adapté par rapport à un besoin
Ø QSOS fournit
   §   Une méthode objective et formalisée ‫‏‬
   §   Un référentiel d’études disponibles
   §   Des outils facilitant le déroulement de la méthode
th
                 4 floor - Interconnexion
SemWebPro 18/01/2011         34
Linked data and interconnexions


Ø Without links there is no Web but data silos
Ø Links can be part of the datasets design (reference
  datasets)
Ø Links can be found after the publication: equivalence
  links between resources
Comment interconnecter ses données ?
Tools


Ø RKB-CRS A coreference resolution service for the RKB
  knowledge base
Ø LD-mapper A linkage tool for datasets described using the
  Music Ontology
Ø ODD Linker A linkage tool based on SQL
Ø RDF-AI Multi purpose data linkage and fusion
Ø Silk et Silk LSL Linkage tool and linkage specification language
Ø Knofuss architecture Datasets linkage and fusion
Exemple Silk specification
<Silk>                                           <Interlink id="cities">
 <Prefix id="rdfs" namespace=                      <LinkType>owl:sameAs</LinkType>
      "http://www.w3.org/2000/01/rdf-schema#" />   <SourceDataset dataSource="dbpedia" var="a">
 <Prefix id="dbpedia" namespace=                     <RestrictTo>
      "http://dbpedia.org/ontology/" />                ?a rdf:type dbpedia:City
 <Prefix id="gn" namespace=                          </RestrictTo>
      "http://www.geonames.org/ontology#" />       </SourceDataset>
                                                   <TargetDataset dataSource="geonames" var="b">
 <DataSource id="dbpedia">                           <RestrictTo>
  <EndpointURI>http://demo_sparql_server1/sparql       ?b rdf:type gn:P
  </EndpointURI>                                     </RestrictTo>
  <Graph>http://dbpedia.org</Graph>                </TargetDataset>
 </DataSource>                                     <LinkCondition>
                                                     <AVG>
 <DataSource id="geonames">                            <Compare metric="jaroSimilarity">
  <EndpointURI>http://demo_sparql_server2/sparql        <Param name="str1" path="?a/rdfs:label" />
  </EndpointURI>                                        <Param name="str2" path="?b/gn:name" />
  <Graph>http://sws.geonames.org/</Graph>              </Compare>
 </DataSource>                                         <Compare metric="numSimilarity">
                                                        <Param name="num1"
 <Thresholds accept="0.9" verify="0.7" />                    path="?a/dbpedia:populationTotal" />
 <Output acceptedLinks="accepted_links.n3"              <Param name="num2" path="?b/gn:population" />
   verifyLinks="verify_links.n3"                       </Compare>
   mode="truncate" />                                </AVG>
                                                   </LinkCondition>
                                                 </Interlink>
                                                 </Silk>
Where to find links ?
Towards automated interconnexion services


Ø The linkage specification could be simplified
  § Using alignments between vocabularies
  § Detection of discriminating properties
  § Indicating comparison methods by attaching metadata to
    ontologies
Ø Work in progress in Datalift
5th floor - Applications
SemWebPro 18/01/2011          41
Data visualization




                Tabulator
                (CSAIL, MIT)
VisiNav
Sig.ma
Nos Députés . FR
A few examples from US




http://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html
Mashups … Mashups … Mashups …
That's it !
●   Datalift.org
●   We're looking for a Datageek !

More Related Content

What's hot

DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
Herbert Van de Sompel
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
giurca
 

What's hot (19)

Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview Presentation
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
5 rdfs
5 rdfs5 rdfs
5 rdfs
 
Rdf
RdfRdf
Rdf
 
Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-to
 
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
 
Rdf
RdfRdf
Rdf
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management System
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
 
Oke
OkeOke
Oke
 

Viewers also liked

Publication et intégration de données ouvertes
Publication et intégration de données ouvertesPublication et intégration de données ouvertes
Publication et intégration de données ouvertes
François Scharffe
 
The Information Architect And The Fighter Pilot
The Information Architect And The Fighter PilotThe Information Architect And The Fighter Pilot
The Information Architect And The Fighter Pilot
Critical Mass
 
Recerca Cultura Eivissa I Formentera
Recerca Cultura Eivissa I FormenteraRecerca Cultura Eivissa I Formentera
Recerca Cultura Eivissa I Formentera
Gemma Tur
 
Zirkulazio Aparatua
Zirkulazio AparatuaZirkulazio Aparatua
Zirkulazio Aparatua
anefraile
 
L Hivers
L HiversL Hivers
L Hivers
litsa53
 
Important Balearic People From History
Important Balearic People From HistoryImportant Balearic People From History
Important Balearic People From History
Gemma Tur
 
Consciousness Based Educ 2 A Deans
Consciousness Based Educ 2 A DeansConsciousness Based Educ 2 A Deans
Consciousness Based Educ 2 A Deans
AMTR
 
Adam i Ewa
Adam i EwaAdam i Ewa
Adam i Ewa
EwaB
 

Viewers also liked (20)

Linked Data Integration
Linked Data IntegrationLinked Data Integration
Linked Data Integration
 
Publication et intégration de données ouvertes
Publication et intégration de données ouvertesPublication et intégration de données ouvertes
Publication et intégration de données ouvertes
 
Transmission6 - Publishing Linked Data
Transmission6 - Publishing Linked DataTransmission6 - Publishing Linked Data
Transmission6 - Publishing Linked Data
 
Apresentacao Mix PhoneClub
Apresentacao Mix PhoneClubApresentacao Mix PhoneClub
Apresentacao Mix PhoneClub
 
Russo Revelation Love - Vol 1.2
Russo Revelation Love - Vol 1.2Russo Revelation Love - Vol 1.2
Russo Revelation Love - Vol 1.2
 
The Information Architect And The Fighter Pilot
The Information Architect And The Fighter PilotThe Information Architect And The Fighter Pilot
The Information Architect And The Fighter Pilot
 
prezentacja
prezentacjaprezentacja
prezentacja
 
Recerca Cultura Eivissa I Formentera
Recerca Cultura Eivissa I FormenteraRecerca Cultura Eivissa I Formentera
Recerca Cultura Eivissa I Formentera
 
Zirkulazio Aparatua
Zirkulazio AparatuaZirkulazio Aparatua
Zirkulazio Aparatua
 
L Hivers
L HiversL Hivers
L Hivers
 
Brand "U.0"
Brand "U.0"Brand "U.0"
Brand "U.0"
 
商周數位學院:7步驟,從窮忙族變新富族
商周數位學院:7步驟,從窮忙族變新富族商周數位學院:7步驟,從窮忙族變新富族
商周數位學院:7步驟,從窮忙族變新富族
 
Important Balearic People From History
Important Balearic People From HistoryImportant Balearic People From History
Important Balearic People From History
 
Permanentpeace
PermanentpeacePermanentpeace
Permanentpeace
 
An Inside Look at Campaign 2008
An Inside Look at Campaign 2008An Inside Look at Campaign 2008
An Inside Look at Campaign 2008
 
Consciousness Based Educ 2 A Deans
Consciousness Based Educ 2 A DeansConsciousness Based Educ 2 A Deans
Consciousness Based Educ 2 A Deans
 
arjan broere gericht onderhandelen vigor workshop 2008
arjan broere gericht onderhandelen vigor workshop 2008arjan broere gericht onderhandelen vigor workshop 2008
arjan broere gericht onderhandelen vigor workshop 2008
 
Coll Papers Contents Volume 6
Coll Papers Contents Volume 6Coll Papers Contents Volume 6
Coll Papers Contents Volume 6
 
Adam i Ewa
Adam i EwaAdam i Ewa
Adam i Ewa
 
Puste Miejsce
Puste MiejscePuste Miejsce
Puste Miejsce
 

Similar to Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011

Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
Stoitsis Giannis
 
Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrar
Mustafa Jarrar
 
Pal gov.tutorial2.session15 2.rd_fa
Pal gov.tutorial2.session15 2.rd_faPal gov.tutorial2.session15 2.rd_fa
Pal gov.tutorial2.session15 2.rd_fa
Mustafa Jarrar
 
Pal gov.tutorial2.session10.sparql
Pal gov.tutorial2.session10.sparqlPal gov.tutorial2.session10.sparql
Pal gov.tutorial2.session10.sparql
Mustafa Jarrar
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
Marta Villegas
 
Pal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oraclePal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oracle
Mustafa Jarrar
 
Pal gov.tutorial2.session13 3.data integration and fusion using rdf
Pal gov.tutorial2.session13 3.data integration and fusion using rdfPal gov.tutorial2.session13 3.data integration and fusion using rdf
Pal gov.tutorial2.session13 3.data integration and fusion using rdf
Mustafa Jarrar
 

Similar to Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011 (20)

Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrar
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
Pal gov.tutorial2.session15 2.rd_fa
Pal gov.tutorial2.session15 2.rd_faPal gov.tutorial2.session15 2.rd_fa
Pal gov.tutorial2.session15 2.rd_fa
 
Pal gov.tutorial2.session10.sparql
Pal gov.tutorial2.session10.sparqlPal gov.tutorial2.session10.sparql
Pal gov.tutorial2.session10.sparql
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
A Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and OrganizationA Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and Organization
 
Linked Open Data (LOD) part 2
Linked Open Data (LOD)  part 2Linked Open Data (LOD)  part 2
Linked Open Data (LOD) part 2
 
OpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish Repositories
 
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012
 
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Pal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oraclePal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oracle
 
Pal gov.tutorial2.session13 3.data integration and fusion using rdf
Pal gov.tutorial2.session13 3.data integration and fusion using rdfPal gov.tutorial2.session13 3.data integration and fusion using rdf
Pal gov.tutorial2.session13 3.data integration and fusion using rdf
 

More from François Scharffe (7)

Word embeddings as a service - PyData NYC 2015
Word embeddings as a service -  PyData NYC 2015Word embeddings as a service -  PyData NYC 2015
Word embeddings as a service - PyData NYC 2015
 
The Open Data Walk of Fame - from raw open data to five stars interlinked dat...
The Open Data Walk of Fame - from raw open data to five stars interlinked dat...The Open Data Walk of Fame - from raw open data to five stars interlinked dat...
The Open Data Walk of Fame - from raw open data to five stars interlinked dat...
 
20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques
20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques
20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques
 
Cemagref
CemagrefCemagref
Cemagref
 
Melinda: Methods and tools for Web Data Interlinking
Melinda: Methods and tools for Web Data InterlinkingMelinda: Methods and tools for Web Data Interlinking
Melinda: Methods and tools for Web Data Interlinking
 
Méthodes et outils pour interrelier le web des données
Méthodes et outils pour interrelier le web des donnéesMéthodes et outils pour interrelier le web des données
Méthodes et outils pour interrelier le web des données
 
Ontology alignment representation
Ontology alignment representationOntology alignment representation
Ontology alignment representation
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 

Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011

  • 1. Datalift: A Catalyser for the Web of Data François Scharffe LIRMM/CNRS/University of Montpellier francois.scharffe@lirmm.fr @lechatpito With the help of the Datalift team And the support of the French National Research Agency FOSDEM 5/02/2011 1
  • 2. The data revolution is on its way ! As Open Data meets the Semantic Web
  • 3. The promises of linked-data
  • 4. Richer Applications Linked Data Lite | the Web on Steroids 1.0 (iPhone)
  • 5. Richer applications BBC Programmes
  • 7. Making your data 5 stars http://www.w3.org/DesignIssues/LinkedData.html
  • 8. So, how to lift data ? How to publish data on the Web as linked- data ? ● Basic principles Tim Berners Lee [2006] (Design Issues) – Use URIs to identify things (not only documents) – Use HTTP URIs – When dereferecing URIS, return a description of the ressource – Include links to other ressources on the Web
  • 9. Welcome aboard the data lift Published and interlinked data on the Web Applications Interconnexion Publication infrastructure Data convertion Vocabulary selection Raw data
  • 10. Datalift Datasets publication R&D to automate the publication process Tool suite to help publish data Training, tutorials, data publication camps
  • 11. st 1 floor - Selection SemWebPro 18/01/2011 11
  • 12. Les vocabulaires de mes amis … Ø What is a (good) vocabulary for linked data ? § Usability criterias Simplicity, visibility, sustainability, integration, coherence … Ø Differents types of vocabularies § metadata, reference, domain, generalist … § The pillars of Linked Data : Dublin Core, FOAF, SKOS Ø Good and less good practices § Ex : Programmes BBC vs legislation.gov.uk § Vocabulary of a Friend : networked vocabularies Ø Linguistic problems § Existing vocabularies are in English at 99% § Terminological approach :which vocabularies for « Event » « Organization »
  • 13. Did you say « vocabulary » … And why not « ontology »? § Or « schema » ou « metadata schema »? § Ou « model » (data ? World ?) Ø All these terms are used and justifiable They are all « vocabularies » § The define types of objects (or classes) and the properties (oo attributes) atttached to these objects. § Types and attributes are logically defined and named using natural language § A (semantic) vocabulary is an explicit formalization of concepts existing in natural language SemWebPro 18/01/2011 13
  • 14. Vocabularies for linked data Ø Are meant to describe resources in RDF Ø Are based on one of the standard W3C language § RDF Schema (RDFS) • For vocabulaires without too much logical complexity § OWL • For more complex ontological constructs § These two languages are compatible (almost) Ø The can be composed « ad libitum » § One can reuse a few elements of a vocabulary § The original semantics have to be followed
  • 15. What makes a good vocabulary ? Ø A good vocabulary is a used vocabulary § Data published on CKAN give an idea of vocabulary usage § Exemple : v list of datasets using FOAF http://xmlns.com/foaf/0.1/ Ø Other usability criterias § Simplicity and readability in natural language § Elements documentation (definition in natural language) § Visibility and sustainability of the publication § Flexibility and extensibility § Sémantique integration (with other vocabularies) § Social integration (with the user community)
  • 16. A vocabulary is also a community Ø Bad (but common) practice ● Build a lonely vocabulary – For example as a research project – Without basing it on any existing vocabulary § To publish it (or not) and then to forget about it § Not to care about its users Ø A good vocabulary has an organic life § Users and use cases § Revisions and extensions § Like a « natural » vocabulary
  • 17. Types of vocabularies Ø Metadata vocabularies § Allowing to annotate other vocabularies • Dublin Core, Vann, cc REL, Status Ø Reference vocabularies § Provide « common » classes and properties • FOAF, Event, Time, Org Ontology Ø Domain vocabularies § Specific to a domain of knowledge • Geonames, Music Ontology, WildLife Ontology Ø « general » vocabularies § Describe « everything » at an arbitrary detail level • DBpedia Ontology, Cyc Ontology, SUMO
  • 18. Vocabulary of a Friend Ø http://www.mondeca.com/foaf/voaf Ø A simple vocabulary... Ø To represent interconnexions between vocabularies Ø A unique entry point to vocabularies and Datasets of the linked-data cloud Linked Data Cloud Ø Ongoing work in Datalift
  • 19. nd 2 floor - Conversion SemWebPro 18/01/2011 19
  • 20. URL Design et URL Pattern Ø Good practices for linked-data § Ressource: http://dbpedia.org/resource/Paris § Document: http://dbpedia.org/page/Paris § Data: http://dbpedia.org/data/Paris Ø … served using content negociation
  • 21. URI Pattern in REST Ø Les services REST (Representational State Transfer) manipulent des ressources et les URLs sont principalement utilisés pour adresser ces ressources Ø Une URI de base: § http://www.example.com/bookstore/ Ø Une ressource à un URL unique: (retrieve, update, create, delete) § http://www.example.com/bookstore/books/ISBN123 Ø Notion de collection: (list, replace, create, delete) § http://www.example.com/bookstore/books
  • 22. Convertion tools to RDF Ø How is the raw data to be converted ? § Relational Database ? § (Semi-)structured formats ? § Programmatic acces (API) ? Ø There are solutions for all cases
  • 24. Triplify: Relational data to JSON/RDF Ø Extract a folder in your Webapp: http://sourceforge.net/projects/triplify/ Ø Modify a config file: § SQL query … URI pattern § PHP lover!
  • 27. RDF extension for Google Refine Ø A graphical extension for Google Refine allowing to export the clean data as RDF http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/ Annual pay rate - including Name Job Title Grade Organization Notes taxable benefits and allowances Chief Executive Asset Protection £150,000 - Stephan Wilcke Officer Agency £154,999 Asset Protection £165,000 - Jens Bech Chief Risk Officer No pension Agency £169,999 Chief Invesment Asset Protection £165,000 - Ion Dagtoglou No pension Officer Agency £169,999 Chief Credit Asset Protection £130,000 - Brian Scammell 4 days per week Officer Agency £134,999
  • 29. rd 3 floor - Publication SemWebPro 18/01/2011 29
  • 30. Publication components Querying Browsing SPARQL REST endpoint Alimentation Inference Engine RDF storage Alimentation Alimentation A few products Virtuoso, Sesame, Mulgara, 4store OWLIM, AllegroGraph, Big Data,Jena
  • 31. Named graphs Ø Rdf graphs are bags of triples, everything is mixed 1 Ø Delete on a graph 2 Ø SPARQL queries define 3 5 graphs 9 6 11 10 8 12 4 7 13 16 14 15
  • 32. Inference 1 3 2 5 Ø Generating triples from other triples 9 6 10 11 8 Ø Deduction mechanism 12 4 7 13 § Men are mortals, Socrates is a man, so Socrates is 16 mortal 14 15 Ø Allows to avoid exhaustivity, give sense to defining hierarchies Ø Constraints: cardinality, NFPs, ...
  • 33. Analyse des RDF Store : la méthode QSOS Ø Qualification and Selection of Open Source Software § Projet Open Source sur des solutions open source § http://www.qsos.org Ø Objectifs de QSOS § Qualifier des logiciels § Comparer des solutions après avoir défini des exigences et en pondérant les critères § Sélectionner le produit le plus adapté par rapport à un besoin Ø QSOS fournit § Une méthode objective et formalisée ‫‏‬ § Un référentiel d’études disponibles § Des outils facilitant le déroulement de la méthode
  • 34. th 4 floor - Interconnexion SemWebPro 18/01/2011 34
  • 35. Linked data and interconnexions Ø Without links there is no Web but data silos Ø Links can be part of the datasets design (reference datasets) Ø Links can be found after the publication: equivalence links between resources
  • 37. Tools Ø RKB-CRS A coreference resolution service for the RKB knowledge base Ø LD-mapper A linkage tool for datasets described using the Music Ontology Ø ODD Linker A linkage tool based on SQL Ø RDF-AI Multi purpose data linkage and fusion Ø Silk et Silk LSL Linkage tool and linkage specification language Ø Knofuss architecture Datasets linkage and fusion
  • 38. Exemple Silk specification <Silk> <Interlink id="cities"> <Prefix id="rdfs" namespace= <LinkType>owl:sameAs</LinkType> "http://www.w3.org/2000/01/rdf-schema#" /> <SourceDataset dataSource="dbpedia" var="a"> <Prefix id="dbpedia" namespace= <RestrictTo> "http://dbpedia.org/ontology/" /> ?a rdf:type dbpedia:City <Prefix id="gn" namespace= </RestrictTo> "http://www.geonames.org/ontology#" /> </SourceDataset> <TargetDataset dataSource="geonames" var="b"> <DataSource id="dbpedia"> <RestrictTo> <EndpointURI>http://demo_sparql_server1/sparql ?b rdf:type gn:P </EndpointURI> </RestrictTo> <Graph>http://dbpedia.org</Graph> </TargetDataset> </DataSource> <LinkCondition> <AVG> <DataSource id="geonames"> <Compare metric="jaroSimilarity"> <EndpointURI>http://demo_sparql_server2/sparql <Param name="str1" path="?a/rdfs:label" /> </EndpointURI> <Param name="str2" path="?b/gn:name" /> <Graph>http://sws.geonames.org/</Graph> </Compare> </DataSource> <Compare metric="numSimilarity"> <Param name="num1" <Thresholds accept="0.9" verify="0.7" /> path="?a/dbpedia:populationTotal" /> <Output acceptedLinks="accepted_links.n3" <Param name="num2" path="?b/gn:population" /> verifyLinks="verify_links.n3" </Compare> mode="truncate" /> </AVG> </LinkCondition> </Interlink> </Silk>
  • 39. Where to find links ?
  • 40. Towards automated interconnexion services Ø The linkage specification could be simplified § Using alignments between vocabularies § Detection of discriminating properties § Indicating comparison methods by attaching metadata to ontologies Ø Work in progress in Datalift
  • 41. 5th floor - Applications SemWebPro 18/01/2011 41
  • 42. Data visualization Tabulator (CSAIL, MIT)
  • 45.
  • 47. A few examples from US http://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html
  • 48. Mashups … Mashups … Mashups …
  • 49. That's it ! ● Datalift.org ● We're looking for a Datageek !