SlideShare a Scribd company logo
1 of 55
Download to read offline
The Datalift Project
Ontologies, Datasets, Tools and Methodologies
to Publish and Interlink ★★★★★ Datasets



                          François Scharffe
                          University of Montpellier,
                          LIRMM, INRIA
                          francois.scharffe@lirmm.fr
                          @lechatpito




With the help of the Datalift team
And the support of the French National Research Agency


                RPI 28/07/2011                         1
State of government open data

(September 2010…)



             You’re here
State of government open data

(June 2011)
April 2008                                       September 2008

May 2007




   Linking Open
   Data



                                                                                                        March 2009



                                                                 September 2010
 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked data
Link the
world
W3C
W3C
principles
§
    Use the RDF format
§
    Use URI to name things
§
    Use HTTP URI HTTP (URL) so that one can look up those
    names
§
    Give information (HTML, RDF) when dereference those links
§
    Include in this information other URIs pointing to other data to
    enable discovery                      Tim Berners Lee,
     http://www.w3.org/DesignIssues/LinkedData.html
goal of datalift
from raw published data
to interconnected semantic data
phase 1: opening
         the data
 develop a plateform
 easing the publication
Welcome aboard the data lift
                Published and interlinked data on the Web
                             Applications


                Interconnexion


Publication infrastructure


           Data convertion


                 Vocabulary selection




                                        Raw data
Example publication process
             Environmental, weather, geological datasets




                        SPARQL

                           Content Negociation

                        URI de-referencing




                                Oil industry
 Geography
                                equipment
st
                       1 floor - Selection
SemWebPro 18/01/2011            13
Vocabularies of my friends...


Ø What is a (good) vocabulary for linked data ?
    § Usability criterias
            Simplicity, visibility, sustainability, integration, coherence …

Ø Differents types of vocabularies
    §   metadata, reference, domain, generalist …
    § The pillars of Linked Data : Dublin Core, FOAF, SKOS
Ø Good and less good practices
    § Ex : Programmes BBC vs legislation.gov.uk
    § Vocabulary of a Friend : networked vocabularies
Ø Linguistic problems
    § Existing vocabularies are in English at 99%
    § Terminological approach :which vocabularies for « Event » « Organization »
Did you say « vocabulary »


… And why not « ontology »?
    § « schema » or « metadata schema »?
    § Or « model » (data ? World ?)
Ø All these terms are used and justifiable
They are all « vocabularies »
    § They define types of objects (or classes)
      and the properties (or attributes) atttached to these objects.
    § Types and attributes are logically defined
      and named using natural language
    § A (semantic) vocabulary
      is an explicit formalization
      of concepts existing in natural language

                                                            15
Vocabularies for linked data


Ø Are meant to describe resources in RDF
Ø Are based on one of the standard W3C language
  § RDF Schema (RDFS)
     • For vocabulaires without too much logical complexity
  § OWL
     • For more complex ontological constructs
   § These two languages are compatible (almost)
Ø The can be composed « ad libitum »
  § One can reuse a few elements of a vocabulary
  § The original semantics have to be followed
What makes a good vocabulary ?


Ø A good vocabulary is a used vocabulary
   § Data published on CKAN give an idea of vocabulary usage
   § Exemple :
     list of datasets using FOAF http://xmlns.com/foaf/0.1/
Ø Other usability criterias
   § Simplicity and readability in natural language
   § Elements documentation (definition in natural language)
   § Visibility and sustainability of the publication
   § Flexibility and extensibility
   § Sémantic integration (with other vocabularies)
   § Social integration (with the user community)
A vocabulary is also a community


Ø Bad (but common) practice
   ●
       Build a lonely vocabulary
        –   For example as a research project
        –   Without basing it on any existing vocabulary
  § To publish it (or not) and then to forget about it
  § Not to care about its users
Ø A good vocabulary has an organic life
  § Users and use cases
  § Revisions and extensions
  § Like a « natural » vocabulary
Types of vocabularies


Ø Metadata vocabularies
   § Allowing to annotate other vocabularies
       • Dublin Core, Vann, cc REL, Status, Void
Ø Reference vocabularies
   § Provide « common » classes and properties
       • FOAF, Event, Time, Org Ontology
Ø Domain vocabularies
   § Specific to a domain of knowledge
       • Geonames, Music Ontology, WildLife Ontology
Ø « general » vocabularies
   § Describe « everything » at an arbitrary detail level
       • DBpedia Ontology, Cyc Ontology, SUMO
Vocabulary of a Friend


Ø http://www.mondeca.com/foaf/voaf
Ø A simple vocabulary...
Ø To represent interconnexions between vocabularies
Ø A unique entry point to vocabularies and Datasets of
  the linked-data cloud Linked Data Cloud
Ø Ongoing work in Datalift
nd
                   2 floor - Conversion
SemWebPro 18/01/2011         21
Reference datasets, URI design

●   Providing reference datasets for the French
    ecosystem: geographical, topological, statistical,
    political
●   Providing URI design guidelines
    ●   Opaque or transparent URIs ?
    ●   Usage of accents in URIs
    ●   Distinction between
Resources: http://dbpedia.org/resource/Paris
Documents: http://dbpedia.org/page/Paris
Data: http://dbpedia.org/data/Paris
… All served with content negociation
Many tools exist !




                     csv2rdf4lod
Direct Mapping from relational database to RDF

Define a standard transformation from a relational
database to RDF
The relational schema is used :
      • Cells of a tuple produce triples with a common subject
      • Each cell produces an object
      • Different tables of a same database are thus linked together


 Standard automatic translation of any relational schema to RDF,
based on the database Dump

Then we can SPARQL CONSTRUCT to adapt vocabularies and
URIs.
Exemple




Credits Ivan Herman: http://ivan-
herman.name/2010/11/19/my-first-mapping-from-
direct-mapping/                                 25
Exemple

   @base <http://book.example/> .
   <Book/ID=0006511409X#_> a <Book> ;
     <Book#ISBN> "0006511409X" ;
     <Book#Title> "The Glass Palace" ;
     <Book#Year> "2000" ;
     <Book#Author> <Author/ID=id_xyz#_> .

   <Author/ID=id_xyz#_> a <Author> ;
     <Author#ID> "id_xyz" ;
     <Author#Name> "Ghosh, Amitav" ;
     <Author#Homepage> "http://www.amitavghosh.com" .

   Simple result but not satisfaying:
     ● we want to use different vocabulary terms (like a:name)

     ● the direct mapping produces literal objects most of the time, except when there is

       a “jump” from one table to another
     ● the resulting graph should use a blank node for the author, which is not the case

       in the generated graph
Credits Ivan Herman: http://ivan-
herman.name/2010/11/19/my-first-mapping-from-
direct-mapping/                                                                             26
Exemple
Solution : use SPARQL 1.1 Construct queries
CONSTRUCT {
  ?id a:title ?title ;
      a:year ?year ;
      a:author _:x .
  _:x a:name ?name ;
      a:homepage ?hp .
}
WHERE {
  SELECT (IRI(fn:concat("http://...",?isbn)) AS ?id)
              ?title ?year ?name
             (IRI(?homepage) AS ?hp)
{
      ?book a <Book> ;
        <Book#ISBN> ?isbn ;
        <Book#Title> ?title ;
        <Book#Year> ?year ;
        <Book#Author> ?author .
      ?author a <Author> ;
        <Author#Name> ?name ;
        <Author#Homepage ?homepage .
  }                                                    27
rd
                       3 floor - Publication
SemWebPro 18/01/2011             28
Datalift Platform

V1 to be released in September with expected features :
- Modular architecture
- Raw convertion module: Relational DB (DirectMapping approach, CSV,
XML (based on a user specified XSLT transformation)
- Selection module : LOV repository, automatic candidate vocabulary
proposal using ontology matching from the raw data schema, vocabulary
navigation tool, vocabulary usage metrics, sample data for each vocab
- Convertion (according to the schema) : RDF2RDF Convertion module
based on SPARQL construct (manual editing), Vocabulary mapping
facility (textual)
- Interlinking and Alignment : A Silk interface -- Integration of the
alignment API
- Publication Sesame API, informational vs non-informational resource   29
management.
Datalift Platform
th
                 4 floor - Interconnexion
SemWebPro 18/01/2011         31
Web of data and links
- Without links no web but data silos
- Many types of links : the edges of the Web of
  data graph are labeled
- Some links are built during the selection phase :
  reference datasets
- We study here a particular type of links :
  equivalence links.


                                                      32
owl:sameAs
- points to a logical identity between two resource
- The quality of the available links is not always
  optimal
Other types of links : owl:differentFrom,
 rdfs:seeAlso




                                                      33
How to link data ?




                     34
How to link data ?




                     35
How to link data ?




                     36
How to link data ?




                     37
How to link data ?




                     38
Example Silk link specification
<Silk>                                           <Interlink id="cities">
 <Prefix id="rdfs" namespace=                      <LinkType>owl:sameAs</LinkType>
      "http://www.w3.org/2000/01/rdf-schema#" />   <SourceDataset dataSource="dbpedia" var="a">
 <Prefix id="dbpedia" namespace=                     <RestrictTo>
      "http://dbpedia.org/ontology/" />                ?a rdf:type dbpedia:City
 <Prefix id="gn" namespace=                          </RestrictTo>
      "http://www.geonames.org/ontology#" />       </SourceDataset>
                                                   <TargetDataset dataSource="geonames" var="b">
 <DataSource id="dbpedia">                           <RestrictTo>
  <EndpointURI>http://demo_sparql_server1/sparql      ?b rdf:type gn:P
  </EndpointURI>                                     </RestrictTo>
  <Graph>http://dbpedia.org</Graph>                </TargetDataset>
 </DataSource>                                     <LinkCondition>
                                                     <AVG>
 <DataSource id="geonames">                           <Compare metric="jaroSimilarity">
  <EndpointURI>http://demo_sparql_server2/sparql        <Param name="str1" path="?a/rdfs:label" />
  </EndpointURI>                                        <Param name="str2" path="?b/gn:name" />
  <Graph>http://sws.geonames.org/</Graph>             </Compare>
 </DataSource>                                        <Compare metric="numSimilarity">
                                                        <Param name="num1"
 <Thresholds accept="0.9" verify="0.7" />                    path="?a/dbpedia:populationTotal" />
 <Output acceptedLinks="accepted_links.n3"              <Param name="num2" path="?b/gn:population" />
   verifyLinks="verify_links.n3"                      </Compare>
   mode="truncate" />                                </AVG>                                       39
                                                   </LinkCondition>
                                                 </Interlink>
Where to find links ?




                        40
Towards automatic interlinking
We have seen some of the Silk spec fields could be
 avoided
- Using alignments between ontologies
- Detecting discriminating properties
- Indicating comparison methods by attaching metadata
   to ontologies
-> … ongoing work in Datalift




                                                        41
5th floor - Applications
SemWebPro 18/01/2011          42
phase 2: publishing datasets
          validate the plateform with real data
Research objectives
§
    Methods and metrics for selecting schemas
§
    Tradeoff between specific and generic vocabularies
§
    Data conversion and URI design patterns
§
    Automatic data interlinking
§
    Provenance and rights management
§
    Integration, architecture and scalability
Who ?




  W3
     C   ©
             2010-2013
http://labs.mondeca.com/dataset/lov/index.html
http://labs.mondeca.com/vocab/voaf/
The french wider landscape


●
    Regards Citoyens
●
    Direction de l’information légale et administrative
●
    Fédération des parcs naturels régionaux de France
●
    Eurostat
●
    Cities of Montpellier, Bordeaux, Rennes, …
●
    Data Publica
●
    EtatLab
LIRMM D2R Server
http://data.lirmm.fr/nosdeputes/
DATALIFT



  next floor: « the web of data »
Credits

This presentation was realized thanks to the work of the Datalift team.
It can be freely distributed under Creative Commons licence BY-NC-SA 3.0




                                                               55

More Related Content

What's hot

Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
giurca
 

What's hot (19)

Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview Presentation
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
Rdf
RdfRdf
Rdf
 
5 rdfs
5 rdfs5 rdfs
5 rdfs
 
Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF
 
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-to
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
 
Rdf
RdfRdf
Rdf
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management System
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Oke
OkeOke
Oke
 

Viewers also liked

Getting to social roi
Getting to social roiGetting to social roi
Getting to social roi
Critical Mass
 
The Modern A Formation[1]
The Modern A Formation[1]The Modern A Formation[1]
The Modern A Formation[1]
Tom Neuman
 
HaagaHelia Ebusiness presentation
HaagaHelia Ebusiness presentationHaagaHelia Ebusiness presentation
HaagaHelia Ebusiness presentation
tonnitommi
 
Receptor ask a 433 mhz
Receptor ask a 433 mhzReceptor ask a 433 mhz
Receptor ask a 433 mhz
Amaury Méndez
 
Micro Sociology Of Networks
Micro Sociology Of NetworksMicro Sociology Of Networks
Micro Sociology Of Networks
Critical Mass
 
Pothiz july-2010
Pothiz july-2010Pothiz july-2010
Pothiz july-2010
Pothi.com
 

Viewers also liked (20)

Formatieve toetsing mbo
Formatieve toetsing mboFormatieve toetsing mbo
Formatieve toetsing mbo
 
Getting to social roi
Getting to social roiGetting to social roi
Getting to social roi
 
Lecture to the Interaction Design Center
Lecture to the Interaction Design CenterLecture to the Interaction Design Center
Lecture to the Interaction Design Center
 
The Modern A Formation[1]
The Modern A Formation[1]The Modern A Formation[1]
The Modern A Formation[1]
 
Mjedi101109
Mjedi101109Mjedi101109
Mjedi101109
 
Whrrl
WhrrlWhrrl
Whrrl
 
HaagaHelia Ebusiness presentation
HaagaHelia Ebusiness presentationHaagaHelia Ebusiness presentation
HaagaHelia Ebusiness presentation
 
Receptor ask a 433 mhz
Receptor ask a 433 mhzReceptor ask a 433 mhz
Receptor ask a 433 mhz
 
XI UEEF
XI UEEFXI UEEF
XI UEEF
 
E portfolios and pl es in teacher training
E portfolios and pl es in teacher trainingE portfolios and pl es in teacher training
E portfolios and pl es in teacher training
 
Experiências de aprendizagem aberta, flexível e a distância para a 4ª revoluç...
Experiências de aprendizagem aberta, flexível e a distância para a 4ª revoluç...Experiências de aprendizagem aberta, flexível e a distância para a 4ª revoluç...
Experiências de aprendizagem aberta, flexível e a distância para a 4ª revoluç...
 
Micro Sociology Of Networks
Micro Sociology Of NetworksMicro Sociology Of Networks
Micro Sociology Of Networks
 
WHSZ-Creative-Commons
WHSZ-Creative-CommonsWHSZ-Creative-Commons
WHSZ-Creative-Commons
 
UOM-2014
UOM-2014UOM-2014
UOM-2014
 
Pothiz july-2010
Pothiz july-2010Pothiz july-2010
Pothiz july-2010
 
Najbrzydszy Mikołaj, wspomnienia z dzieciństwa...
Najbrzydszy Mikołaj, wspomnienia z dzieciństwa...Najbrzydszy Mikołaj, wspomnienia z dzieciństwa...
Najbrzydszy Mikołaj, wspomnienia z dzieciństwa...
 
Edutec2016
Edutec2016Edutec2016
Edutec2016
 
Electricidad basica
Electricidad basicaElectricidad basica
Electricidad basica
 
The impact of social media on innovation culture
The impact of social media on innovation cultureThe impact of social media on innovation culture
The impact of social media on innovation culture
 
Alfabet Grec
Alfabet GrecAlfabet Grec
Alfabet Grec
 

Similar to 20110728 datalift-rpi-troy

The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
plan4all
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
Stoitsis Giannis
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
Mediabistro
 

Similar to 20110728 datalift-rpi-troy (20)

Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Datalift: A Catalyser for the Web of Data - Francois Scharffe
Datalift: A Catalyser for the Web of Data - Francois ScharffeDatalift: A Catalyser for the Web of Data - Francois Scharffe
Datalift: A Catalyser for the Web of Data - Francois Scharffe
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Ontology development
Ontology developmentOntology development
Ontology development
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
 
From ontology to wiki
From ontology to wikiFrom ontology to wiki
From ontology to wiki
 
Semantic web
Semantic web Semantic web
Semantic web
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
 
Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
 
LOD2: Guest presentation: French datalift project
LOD2: Guest presentation: French datalift projectLOD2: Guest presentation: French datalift project
LOD2: Guest presentation: French datalift project
 
Datalift lod2-paris-24032011
Datalift lod2-paris-24032011Datalift lod2-paris-24032011
Datalift lod2-paris-24032011
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 

More from François Scharffe

Publication et intégration de données ouvertes
Publication et intégration de données ouvertesPublication et intégration de données ouvertes
Publication et intégration de données ouvertes
François Scharffe
 
The Open Data Walk of Fame - from raw open data to five stars interlinked dat...
The Open Data Walk of Fame - from raw open data to five stars interlinked dat...The Open Data Walk of Fame - from raw open data to five stars interlinked dat...
The Open Data Walk of Fame - from raw open data to five stars interlinked dat...
François Scharffe
 
20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques
20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques
20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques
François Scharffe
 

More from François Scharffe (9)

Word embeddings as a service - PyData NYC 2015
Word embeddings as a service -  PyData NYC 2015Word embeddings as a service -  PyData NYC 2015
Word embeddings as a service - PyData NYC 2015
 
Publication et intégration de données ouvertes
Publication et intégration de données ouvertesPublication et intégration de données ouvertes
Publication et intégration de données ouvertes
 
The Open Data Walk of Fame - from raw open data to five stars interlinked dat...
The Open Data Walk of Fame - from raw open data to five stars interlinked dat...The Open Data Walk of Fame - from raw open data to five stars interlinked dat...
The Open Data Walk of Fame - from raw open data to five stars interlinked dat...
 
20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques
20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques
20120313 coepia-mise-à-disposition-et-valorisation-des-données-publiques
 
Cemagref
CemagrefCemagref
Cemagref
 
Melinda: Methods and tools for Web Data Interlinking
Melinda: Methods and tools for Web Data InterlinkingMelinda: Methods and tools for Web Data Interlinking
Melinda: Methods and tools for Web Data Interlinking
 
Méthodes et outils pour interrelier le web des données
Méthodes et outils pour interrelier le web des donnéesMéthodes et outils pour interrelier le web des données
Méthodes et outils pour interrelier le web des données
 
Linked Data Integration
Linked Data IntegrationLinked Data Integration
Linked Data Integration
 
Ontology alignment representation
Ontology alignment representationOntology alignment representation
Ontology alignment representation
 

Recently uploaded

The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
heathfieldcps1
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
CaitlinCummins3
 

Recently uploaded (20)

Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the life
 
Word Stress rules esl .pptx
Word Stress rules esl               .pptxWord Stress rules esl               .pptx
Word Stress rules esl .pptx
 
How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 Inventory
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading RoomImplanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 

20110728 datalift-rpi-troy

  • 1. The Datalift Project Ontologies, Datasets, Tools and Methodologies to Publish and Interlink ★★★★★ Datasets François Scharffe University of Montpellier, LIRMM, INRIA francois.scharffe@lirmm.fr @lechatpito With the help of the Datalift team And the support of the French National Research Agency RPI 28/07/2011 1
  • 2. State of government open data (September 2010…) You’re here
  • 3. State of government open data (June 2011)
  • 4. April 2008 September 2008 May 2007 Linking Open Data March 2009 September 2010 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 8. principles § Use the RDF format § Use URI to name things § Use HTTP URI HTTP (URL) so that one can look up those names § Give information (HTML, RDF) when dereference those links § Include in this information other URIs pointing to other data to enable discovery Tim Berners Lee, http://www.w3.org/DesignIssues/LinkedData.html
  • 9. goal of datalift from raw published data to interconnected semantic data
  • 10. phase 1: opening the data develop a plateform easing the publication
  • 11. Welcome aboard the data lift Published and interlinked data on the Web Applications Interconnexion Publication infrastructure Data convertion Vocabulary selection Raw data
  • 12. Example publication process Environmental, weather, geological datasets SPARQL Content Negociation URI de-referencing Oil industry Geography equipment
  • 13. st 1 floor - Selection SemWebPro 18/01/2011 13
  • 14. Vocabularies of my friends... Ø What is a (good) vocabulary for linked data ? § Usability criterias Simplicity, visibility, sustainability, integration, coherence … Ø Differents types of vocabularies § metadata, reference, domain, generalist … § The pillars of Linked Data : Dublin Core, FOAF, SKOS Ø Good and less good practices § Ex : Programmes BBC vs legislation.gov.uk § Vocabulary of a Friend : networked vocabularies Ø Linguistic problems § Existing vocabularies are in English at 99% § Terminological approach :which vocabularies for « Event » « Organization »
  • 15. Did you say « vocabulary » … And why not « ontology »? § « schema » or « metadata schema »? § Or « model » (data ? World ?) Ø All these terms are used and justifiable They are all « vocabularies » § They define types of objects (or classes) and the properties (or attributes) atttached to these objects. § Types and attributes are logically defined and named using natural language § A (semantic) vocabulary is an explicit formalization of concepts existing in natural language 15
  • 16. Vocabularies for linked data Ø Are meant to describe resources in RDF Ø Are based on one of the standard W3C language § RDF Schema (RDFS) • For vocabulaires without too much logical complexity § OWL • For more complex ontological constructs § These two languages are compatible (almost) Ø The can be composed « ad libitum » § One can reuse a few elements of a vocabulary § The original semantics have to be followed
  • 17. What makes a good vocabulary ? Ø A good vocabulary is a used vocabulary § Data published on CKAN give an idea of vocabulary usage § Exemple : list of datasets using FOAF http://xmlns.com/foaf/0.1/ Ø Other usability criterias § Simplicity and readability in natural language § Elements documentation (definition in natural language) § Visibility and sustainability of the publication § Flexibility and extensibility § Sémantic integration (with other vocabularies) § Social integration (with the user community)
  • 18. A vocabulary is also a community Ø Bad (but common) practice ● Build a lonely vocabulary – For example as a research project – Without basing it on any existing vocabulary § To publish it (or not) and then to forget about it § Not to care about its users Ø A good vocabulary has an organic life § Users and use cases § Revisions and extensions § Like a « natural » vocabulary
  • 19. Types of vocabularies Ø Metadata vocabularies § Allowing to annotate other vocabularies • Dublin Core, Vann, cc REL, Status, Void Ø Reference vocabularies § Provide « common » classes and properties • FOAF, Event, Time, Org Ontology Ø Domain vocabularies § Specific to a domain of knowledge • Geonames, Music Ontology, WildLife Ontology Ø « general » vocabularies § Describe « everything » at an arbitrary detail level • DBpedia Ontology, Cyc Ontology, SUMO
  • 20. Vocabulary of a Friend Ø http://www.mondeca.com/foaf/voaf Ø A simple vocabulary... Ø To represent interconnexions between vocabularies Ø A unique entry point to vocabularies and Datasets of the linked-data cloud Linked Data Cloud Ø Ongoing work in Datalift
  • 21. nd 2 floor - Conversion SemWebPro 18/01/2011 21
  • 22. Reference datasets, URI design ● Providing reference datasets for the French ecosystem: geographical, topological, statistical, political ● Providing URI design guidelines ● Opaque or transparent URIs ? ● Usage of accents in URIs ● Distinction between Resources: http://dbpedia.org/resource/Paris Documents: http://dbpedia.org/page/Paris Data: http://dbpedia.org/data/Paris … All served with content negociation
  • 23. Many tools exist ! csv2rdf4lod
  • 24. Direct Mapping from relational database to RDF Define a standard transformation from a relational database to RDF The relational schema is used : • Cells of a tuple produce triples with a common subject • Each cell produces an object • Different tables of a same database are thus linked together Standard automatic translation of any relational schema to RDF, based on the database Dump Then we can SPARQL CONSTRUCT to adapt vocabularies and URIs.
  • 25. Exemple Credits Ivan Herman: http://ivan- herman.name/2010/11/19/my-first-mapping-from- direct-mapping/ 25
  • 26. Exemple @base <http://book.example/> . <Book/ID=0006511409X#_> a <Book> ; <Book#ISBN> "0006511409X" ; <Book#Title> "The Glass Palace" ; <Book#Year> "2000" ; <Book#Author> <Author/ID=id_xyz#_> . <Author/ID=id_xyz#_> a <Author> ; <Author#ID> "id_xyz" ; <Author#Name> "Ghosh, Amitav" ; <Author#Homepage> "http://www.amitavghosh.com" . Simple result but not satisfaying: ● we want to use different vocabulary terms (like a:name) ● the direct mapping produces literal objects most of the time, except when there is a “jump” from one table to another ● the resulting graph should use a blank node for the author, which is not the case in the generated graph Credits Ivan Herman: http://ivan- herman.name/2010/11/19/my-first-mapping-from- direct-mapping/ 26
  • 27. Exemple Solution : use SPARQL 1.1 Construct queries CONSTRUCT { ?id a:title ?title ; a:year ?year ; a:author _:x . _:x a:name ?name ; a:homepage ?hp . } WHERE { SELECT (IRI(fn:concat("http://...",?isbn)) AS ?id) ?title ?year ?name (IRI(?homepage) AS ?hp) { ?book a <Book> ; <Book#ISBN> ?isbn ; <Book#Title> ?title ; <Book#Year> ?year ; <Book#Author> ?author . ?author a <Author> ; <Author#Name> ?name ; <Author#Homepage ?homepage . } 27
  • 28. rd 3 floor - Publication SemWebPro 18/01/2011 28
  • 29. Datalift Platform V1 to be released in September with expected features : - Modular architecture - Raw convertion module: Relational DB (DirectMapping approach, CSV, XML (based on a user specified XSLT transformation) - Selection module : LOV repository, automatic candidate vocabulary proposal using ontology matching from the raw data schema, vocabulary navigation tool, vocabulary usage metrics, sample data for each vocab - Convertion (according to the schema) : RDF2RDF Convertion module based on SPARQL construct (manual editing), Vocabulary mapping facility (textual) - Interlinking and Alignment : A Silk interface -- Integration of the alignment API - Publication Sesame API, informational vs non-informational resource 29 management.
  • 31. th 4 floor - Interconnexion SemWebPro 18/01/2011 31
  • 32. Web of data and links - Without links no web but data silos - Many types of links : the edges of the Web of data graph are labeled - Some links are built during the selection phase : reference datasets - We study here a particular type of links : equivalence links. 32
  • 33. owl:sameAs - points to a logical identity between two resource - The quality of the available links is not always optimal Other types of links : owl:differentFrom, rdfs:seeAlso 33
  • 34. How to link data ? 34
  • 35. How to link data ? 35
  • 36. How to link data ? 36
  • 37. How to link data ? 37
  • 38. How to link data ? 38
  • 39. Example Silk link specification <Silk> <Interlink id="cities"> <Prefix id="rdfs" namespace= <LinkType>owl:sameAs</LinkType> "http://www.w3.org/2000/01/rdf-schema#" /> <SourceDataset dataSource="dbpedia" var="a"> <Prefix id="dbpedia" namespace= <RestrictTo> "http://dbpedia.org/ontology/" /> ?a rdf:type dbpedia:City <Prefix id="gn" namespace= </RestrictTo> "http://www.geonames.org/ontology#" /> </SourceDataset> <TargetDataset dataSource="geonames" var="b"> <DataSource id="dbpedia"> <RestrictTo> <EndpointURI>http://demo_sparql_server1/sparql ?b rdf:type gn:P </EndpointURI> </RestrictTo> <Graph>http://dbpedia.org</Graph> </TargetDataset> </DataSource> <LinkCondition> <AVG> <DataSource id="geonames"> <Compare metric="jaroSimilarity"> <EndpointURI>http://demo_sparql_server2/sparql <Param name="str1" path="?a/rdfs:label" /> </EndpointURI> <Param name="str2" path="?b/gn:name" /> <Graph>http://sws.geonames.org/</Graph> </Compare> </DataSource> <Compare metric="numSimilarity"> <Param name="num1" <Thresholds accept="0.9" verify="0.7" /> path="?a/dbpedia:populationTotal" /> <Output acceptedLinks="accepted_links.n3" <Param name="num2" path="?b/gn:population" /> verifyLinks="verify_links.n3" </Compare> mode="truncate" /> </AVG> 39 </LinkCondition> </Interlink>
  • 40. Where to find links ? 40
  • 41. Towards automatic interlinking We have seen some of the Silk spec fields could be avoided - Using alignments between ontologies - Detecting discriminating properties - Indicating comparison methods by attaching metadata to ontologies -> … ongoing work in Datalift 41
  • 42. 5th floor - Applications SemWebPro 18/01/2011 42
  • 43. phase 2: publishing datasets validate the plateform with real data
  • 44. Research objectives § Methods and metrics for selecting schemas § Tradeoff between specific and generic vocabularies § Data conversion and URI design patterns § Automatic data interlinking § Provenance and rights management § Integration, architecture and scalability
  • 45. Who ? W3 C © 2010-2013
  • 48. The french wider landscape ● Regards Citoyens ● Direction de l’information légale et administrative ● Fédération des parcs naturels régionaux de France ● Eurostat ● Cities of Montpellier, Bordeaux, Rennes, … ● Data Publica ● EtatLab
  • 49.
  • 50.
  • 52.
  • 53.
  • 54. DATALIFT next floor: « the web of data »
  • 55. Credits This presentation was realized thanks to the work of the Datalift team. It can be freely distributed under Creative Commons licence BY-NC-SA 3.0 55