Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Upcoming SlideShare
Loading in...5

Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011






Total Views
Slideshare-icon Views on SlideShare
Embed Views



1 Embed 1 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011 Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011 Presentation Transcript

    • Datalift: A Catalyser for the Web of Data François Scharffe LIRMM/CNRS/University of Montpellier @lechatpitoWith the help of the Datalift teamAnd the support of the French National Research Agency FOSDEM 5/02/2011 1
    • The data revolution is on its way ! As Open Data meets the Semantic Web
    • The promises of linked-data
    • Richer ApplicationsLinked Data Lite | the Web on Steroids 1.0 (iPhone)
    • Richer applications BBC Programmes
    • More precise search and QA
    • Making your data 5 stars
    • So, how to lift data ? How to publish data on the Web as linked- data ?● Basic principles Tim Berners Lee [2006] (Design Issues) – Use URIs to identify things (not only documents) – Use HTTP URIs – When dereferecing URIS, return a description of the ressource – Include links to other ressources on the Web
    • Welcome aboard the data lift Published and interlinked data on the Web Applications InterconnexionPublication infrastructure Data convertion Vocabulary selection Raw data
    • DataliftDatasets publicationR&D to automate the publication processTool suite to help publish dataTraining, tutorials, data publication camps
    • st 1 floor - SelectionSemWebPro 18/01/2011 11
    • Les vocabulaires de mes amis …Ø What is a (good) vocabulary for linked data ? § Usability criterias Simplicity, visibility, sustainability, integration, coherence …Ø Differents types of vocabularies § metadata, reference, domain, generalist … § The pillars of Linked Data : Dublin Core, FOAF, SKOSØ Good and less good practices § Ex : Programmes BBC vs § Vocabulary of a Friend : networked vocabulariesØ Linguistic problems § Existing vocabularies are in English at 99% § Terminological approach :which vocabularies for « Event » « Organization »
    • Did you say « vocabulary »… And why not « ontology »? § Or « schema » ou « metadata schema »? § Ou « model » (data ? World ?)Ø All these terms are used and justifiableThey are all « vocabularies » § The define types of objects (or classes) and the properties (oo attributes) atttached to these objects. § Types and attributes are logically defined and named using natural language § A (semantic) vocabulary is an explicit formalization of concepts existing in natural language SemWebPro 18/01/2011 13
    • Vocabularies for linked dataØ Are meant to describe resources in RDFØ Are based on one of the standard W3C language § RDF Schema (RDFS) • For vocabulaires without too much logical complexity § OWL • For more complex ontological constructs § These two languages are compatible (almost)Ø The can be composed « ad libitum » § One can reuse a few elements of a vocabulary § The original semantics have to be followed
    • What makes a good vocabulary ?Ø A good vocabulary is a used vocabulary § Data published on CKAN give an idea of vocabulary usage § Exemple : v list of datasets using FOAFØ Other usability criterias § Simplicity and readability in natural language § Elements documentation (definition in natural language) § Visibility and sustainability of the publication § Flexibility and extensibility § Sémantique integration (with other vocabularies) § Social integration (with the user community)
    • A vocabulary is also a communityØ Bad (but common) practice ● Build a lonely vocabulary – For example as a research project – Without basing it on any existing vocabulary § To publish it (or not) and then to forget about it § Not to care about its usersØ A good vocabulary has an organic life § Users and use cases § Revisions and extensions § Like a « natural » vocabulary
    • Types of vocabulariesØ Metadata vocabularies § Allowing to annotate other vocabularies • Dublin Core, Vann, cc REL, StatusØ Reference vocabularies § Provide « common » classes and properties • FOAF, Event, Time, Org OntologyØ Domain vocabularies § Specific to a domain of knowledge • Geonames, Music Ontology, WildLife OntologyØ « general » vocabularies § Describe « everything » at an arbitrary detail level • DBpedia Ontology, Cyc Ontology, SUMO
    • Vocabulary of a FriendØØ A simple vocabulary...Ø To represent interconnexions between vocabulariesØ A unique entry point to vocabularies and Datasets of the linked-data cloud Linked Data CloudØ Ongoing work in Datalift
    • nd 2 floor - ConversionSemWebPro 18/01/2011 19
    • URL Design et URL PatternØ Good practices for linked-data § Ressource: § Document: § Data:Ø … served using content negociation
    • URI Pattern in RESTØ Les services REST (Representational State Transfer) manipulent des ressources et les URLs sont principalement utilisés pour adresser ces ressourcesØ Une URI de base: §Ø Une ressource à un URL unique: (retrieve, update, create, delete) §Ø Notion de collection: (list, replace, create, delete) §
    • Convertion tools to RDFØ How is the raw data to be converted ? § Relational Database ? § (Semi-)structured formats ? § Programmatic acces (API) ?Ø There are solutions for all cases
    • D2RQ Map
    • Triplify: Relational data to JSON/RDFØ Extract a folder in your Webapp:Ø Modify a config file: § SQL query … URI pattern § PHP lover!
    • Working on spreadsheets
    • Google acquired Freebase
    • RDF extension for Google RefineØ A graphical extension for Google Refine allowing to export the clean data as RDF Annual pay rate - including Name Job Title Grade Organization Notes taxable benefits and allowances Chief Executive Asset Protection £150,000 -Stephan Wilcke Officer Agency £154,999 Asset Protection £165,000 -Jens Bech Chief Risk Officer No pension Agency £169,999 Chief Invesment Asset Protection £165,000 -Ion Dagtoglou No pension Officer Agency £169,999 Chief Credit Asset Protection £130,000 -Brian Scammell 4 days per week Officer Agency £134,999
    • Google Refine et RDF
    • rd 3 floor - PublicationSemWebPro 18/01/2011 29
    • Publication components Querying Browsing SPARQL REST endpoint AlimentationInference Engine RDF storage Alimentation Alimentation A few products Virtuoso, Sesame, Mulgara, 4store OWLIM, AllegroGraph, Big Data,Jena
    • Named graphsØ Rdf graphs are bags of triples, everything is mixed 1Ø Delete on a graph 2Ø SPARQL queries define 3 5 graphs 9 6 11 10 8 12 4 7 13 16 14 15
    • Inference 1 3 2 5Ø Generating triples from other triples 9 6 10 11 8Ø Deduction mechanism 12 4 7 13 § Men are mortals, Socrates is a man, so Socrates is 16 mortal 14 15Ø Allows to avoid exhaustivity, give sense to defining hierarchiesØ Constraints: cardinality, NFPs, ...
    • Analyse des RDF Store : la méthode QSOSØ Qualification and Selection of Open Source Software § Projet Open Source sur des solutions open source § http://www.qsos.orgØ Objectifs de QSOS § Qualifier des logiciels § Comparer des solutions après avoir défini des exigences et en pondérant les critères § Sélectionner le produit le plus adapté par rapport à un besoinØ QSOS fournit § Une méthode objective et formalisée ‫‏‬ § Un référentiel d’études disponibles § Des outils facilitant le déroulement de la méthode
    • th 4 floor - InterconnexionSemWebPro 18/01/2011 34
    • Linked data and interconnexionsØ Without links there is no Web but data silosØ Links can be part of the datasets design (reference datasets)Ø Links can be found after the publication: equivalence links between resources
    • Comment interconnecter ses données ?
    • ToolsØ RKB-CRS A coreference resolution service for the RKB knowledge baseØ LD-mapper A linkage tool for datasets described using the Music OntologyØ ODD Linker A linkage tool based on SQLØ RDF-AI Multi purpose data linkage and fusionØ Silk et Silk LSL Linkage tool and linkage specification languageØ Knofuss architecture Datasets linkage and fusion
    • Exemple Silk specification<Silk> <Interlink id="cities"> <Prefix id="rdfs" namespace= <LinkType>owl:sameAs</LinkType> "" /> <SourceDataset dataSource="dbpedia" var="a"> <Prefix id="dbpedia" namespace= <RestrictTo> "" /> ?a rdf:type dbpedia:City <Prefix id="gn" namespace= </RestrictTo> "" /> </SourceDataset> <TargetDataset dataSource="geonames" var="b"> <DataSource id="dbpedia"> <RestrictTo> <EndpointURI>http://demo_sparql_server1/sparql ?b rdf:type gn:P </EndpointURI> </RestrictTo> <Graph></Graph> </TargetDataset> </DataSource> <LinkCondition> <AVG> <DataSource id="geonames"> <Compare metric="jaroSimilarity"> <EndpointURI>http://demo_sparql_server2/sparql <Param name="str1" path="?a/rdfs:label" /> </EndpointURI> <Param name="str2" path="?b/gn:name" /> <Graph></Graph> </Compare> </DataSource> <Compare metric="numSimilarity"> <Param name="num1" <Thresholds accept="0.9" verify="0.7" /> path="?a/dbpedia:populationTotal" /> <Output acceptedLinks="accepted_links.n3" <Param name="num2" path="?b/gn:population" /> verifyLinks="verify_links.n3" </Compare> mode="truncate" /> </AVG> </LinkCondition> </Interlink> </Silk>
    • Where to find links ?
    • Towards automated interconnexion servicesØ The linkage specification could be simplified § Using alignments between vocabularies § Detection of discriminating properties § Indicating comparison methods by attaching metadata to ontologiesØ Work in progress in Datalift
    • 5th floor - ApplicationsSemWebPro 18/01/2011 41
    • Data visualization Tabulator (CSAIL, MIT)
    • VisiNav
    • Nos Députés . FR
    • A few examples from US
    • Mashups … Mashups … Mashups …
    • Thats it !●● Were looking for a Datageek !