Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011

  • 1,406 views
Uploaded on

Slides of the FOSDEM talk

Slides of the FOSDEM talk

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,406
On Slideshare
1,404
From Embeds
2
Number of Embeds
2

Actions

Shares
Downloads
12
Comments
0
Likes
2

Embeds 2

http://paper.li 1
https://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Datalift: A Catalyser for the Web of Data François Scharffe LIRMM/CNRS/University of Montpellier francois.scharffe@lirmm.fr @lechatpitoWith the help of the Datalift teamAnd the support of the French National Research Agency FOSDEM 5/02/2011 1
  • 2. The data revolution is on its way ! As Open Data meets the Semantic Web
  • 3. The promises of linked-data
  • 4. Richer ApplicationsLinked Data Lite | the Web on Steroids 1.0 (iPhone)
  • 5. Richer applications BBC Programmes
  • 6. More precise search and QA
  • 7. Making your data 5 starshttp://www.w3.org/DesignIssues/LinkedData.html
  • 8. So, how to lift data ? How to publish data on the Web as linked- data ?● Basic principles Tim Berners Lee [2006] (Design Issues) – Use URIs to identify things (not only documents) – Use HTTP URIs – When dereferecing URIS, return a description of the ressource – Include links to other ressources on the Web
  • 9. Welcome aboard the data lift Published and interlinked data on the Web Applications InterconnexionPublication infrastructure Data convertion Vocabulary selection Raw data
  • 10. DataliftDatasets publicationR&D to automate the publication processTool suite to help publish dataTraining, tutorials, data publication camps
  • 11. st 1 floor - SelectionSemWebPro 18/01/2011 11
  • 12. Les vocabulaires de mes amis …Ø What is a (good) vocabulary for linked data ? § Usability criterias Simplicity, visibility, sustainability, integration, coherence …Ø Differents types of vocabularies § metadata, reference, domain, generalist … § The pillars of Linked Data : Dublin Core, FOAF, SKOSØ Good and less good practices § Ex : Programmes BBC vs legislation.gov.uk § Vocabulary of a Friend : networked vocabulariesØ Linguistic problems § Existing vocabularies are in English at 99% § Terminological approach :which vocabularies for « Event » « Organization »
  • 13. Did you say « vocabulary »… And why not « ontology »? § Or « schema » ou « metadata schema »? § Ou « model » (data ? World ?)Ø All these terms are used and justifiableThey are all « vocabularies » § The define types of objects (or classes) and the properties (oo attributes) atttached to these objects. § Types and attributes are logically defined and named using natural language § A (semantic) vocabulary is an explicit formalization of concepts existing in natural language SemWebPro 18/01/2011 13
  • 14. Vocabularies for linked dataØ Are meant to describe resources in RDFØ Are based on one of the standard W3C language § RDF Schema (RDFS) • For vocabulaires without too much logical complexity § OWL • For more complex ontological constructs § These two languages are compatible (almost)Ø The can be composed « ad libitum » § One can reuse a few elements of a vocabulary § The original semantics have to be followed
  • 15. What makes a good vocabulary ?Ø A good vocabulary is a used vocabulary § Data published on CKAN give an idea of vocabulary usage § Exemple : v list of datasets using FOAF http://xmlns.com/foaf/0.1/Ø Other usability criterias § Simplicity and readability in natural language § Elements documentation (definition in natural language) § Visibility and sustainability of the publication § Flexibility and extensibility § Sémantique integration (with other vocabularies) § Social integration (with the user community)
  • 16. A vocabulary is also a communityØ Bad (but common) practice ● Build a lonely vocabulary – For example as a research project – Without basing it on any existing vocabulary § To publish it (or not) and then to forget about it § Not to care about its usersØ A good vocabulary has an organic life § Users and use cases § Revisions and extensions § Like a « natural » vocabulary
  • 17. Types of vocabulariesØ Metadata vocabularies § Allowing to annotate other vocabularies • Dublin Core, Vann, cc REL, StatusØ Reference vocabularies § Provide « common » classes and properties • FOAF, Event, Time, Org OntologyØ Domain vocabularies § Specific to a domain of knowledge • Geonames, Music Ontology, WildLife OntologyØ « general » vocabularies § Describe « everything » at an arbitrary detail level • DBpedia Ontology, Cyc Ontology, SUMO
  • 18. Vocabulary of a FriendØ http://www.mondeca.com/foaf/voafØ A simple vocabulary...Ø To represent interconnexions between vocabulariesØ A unique entry point to vocabularies and Datasets of the linked-data cloud Linked Data CloudØ Ongoing work in Datalift
  • 19. nd 2 floor - ConversionSemWebPro 18/01/2011 19
  • 20. URL Design et URL PatternØ Good practices for linked-data § Ressource: http://dbpedia.org/resource/Paris § Document: http://dbpedia.org/page/Paris § Data: http://dbpedia.org/data/ParisØ … served using content negociation
  • 21. URI Pattern in RESTØ Les services REST (Representational State Transfer) manipulent des ressources et les URLs sont principalement utilisés pour adresser ces ressourcesØ Une URI de base: § http://www.example.com/bookstore/Ø Une ressource à un URL unique: (retrieve, update, create, delete) § http://www.example.com/bookstore/books/ISBN123Ø Notion de collection: (list, replace, create, delete) § http://www.example.com/bookstore/books
  • 22. Convertion tools to RDFØ How is the raw data to be converted ? § Relational Database ? § (Semi-)structured formats ? § Programmatic acces (API) ?Ø There are solutions for all cases
  • 23. D2RQ Map
  • 24. Triplify: Relational data to JSON/RDFØ Extract a folder in your Webapp: http://sourceforge.net/projects/triplify/Ø Modify a config file: § SQL query … URI pattern § PHP lover!
  • 25. Working on spreadsheets
  • 26. Google acquired Freebasehttp://code.google.com/p/google-refine/
  • 27. RDF extension for Google RefineØ A graphical extension for Google Refine allowing to export the clean data as RDF http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/ Annual pay rate - including Name Job Title Grade Organization Notes taxable benefits and allowances Chief Executive Asset Protection £150,000 -Stephan Wilcke Officer Agency £154,999 Asset Protection £165,000 -Jens Bech Chief Risk Officer No pension Agency £169,999 Chief Invesment Asset Protection £165,000 -Ion Dagtoglou No pension Officer Agency £169,999 Chief Credit Asset Protection £130,000 -Brian Scammell 4 days per week Officer Agency £134,999
  • 28. Google Refine et RDF
  • 29. rd 3 floor - PublicationSemWebPro 18/01/2011 29
  • 30. Publication components Querying Browsing SPARQL REST endpoint AlimentationInference Engine RDF storage Alimentation Alimentation A few products Virtuoso, Sesame, Mulgara, 4store OWLIM, AllegroGraph, Big Data,Jena
  • 31. Named graphsØ Rdf graphs are bags of triples, everything is mixed 1Ø Delete on a graph 2Ø SPARQL queries define 3 5 graphs 9 6 11 10 8 12 4 7 13 16 14 15
  • 32. Inference 1 3 2 5Ø Generating triples from other triples 9 6 10 11 8Ø Deduction mechanism 12 4 7 13 § Men are mortals, Socrates is a man, so Socrates is 16 mortal 14 15Ø Allows to avoid exhaustivity, give sense to defining hierarchiesØ Constraints: cardinality, NFPs, ...
  • 33. Analyse des RDF Store : la méthode QSOSØ Qualification and Selection of Open Source Software § Projet Open Source sur des solutions open source § http://www.qsos.orgØ Objectifs de QSOS § Qualifier des logiciels § Comparer des solutions après avoir défini des exigences et en pondérant les critères § Sélectionner le produit le plus adapté par rapport à un besoinØ QSOS fournit § Une méthode objective et formalisée ‫‏‬ § Un référentiel d’études disponibles § Des outils facilitant le déroulement de la méthode
  • 34. th 4 floor - InterconnexionSemWebPro 18/01/2011 34
  • 35. Linked data and interconnexionsØ Without links there is no Web but data silosØ Links can be part of the datasets design (reference datasets)Ø Links can be found after the publication: equivalence links between resources
  • 36. Comment interconnecter ses données ?
  • 37. ToolsØ RKB-CRS A coreference resolution service for the RKB knowledge baseØ LD-mapper A linkage tool for datasets described using the Music OntologyØ ODD Linker A linkage tool based on SQLØ RDF-AI Multi purpose data linkage and fusionØ Silk et Silk LSL Linkage tool and linkage specification languageØ Knofuss architecture Datasets linkage and fusion
  • 38. Exemple Silk specification<Silk> <Interlink id="cities"> <Prefix id="rdfs" namespace= <LinkType>owl:sameAs</LinkType> "http://www.w3.org/2000/01/rdf-schema#" /> <SourceDataset dataSource="dbpedia" var="a"> <Prefix id="dbpedia" namespace= <RestrictTo> "http://dbpedia.org/ontology/" /> ?a rdf:type dbpedia:City <Prefix id="gn" namespace= </RestrictTo> "http://www.geonames.org/ontology#" /> </SourceDataset> <TargetDataset dataSource="geonames" var="b"> <DataSource id="dbpedia"> <RestrictTo> <EndpointURI>http://demo_sparql_server1/sparql ?b rdf:type gn:P </EndpointURI> </RestrictTo> <Graph>http://dbpedia.org</Graph> </TargetDataset> </DataSource> <LinkCondition> <AVG> <DataSource id="geonames"> <Compare metric="jaroSimilarity"> <EndpointURI>http://demo_sparql_server2/sparql <Param name="str1" path="?a/rdfs:label" /> </EndpointURI> <Param name="str2" path="?b/gn:name" /> <Graph>http://sws.geonames.org/</Graph> </Compare> </DataSource> <Compare metric="numSimilarity"> <Param name="num1" <Thresholds accept="0.9" verify="0.7" /> path="?a/dbpedia:populationTotal" /> <Output acceptedLinks="accepted_links.n3" <Param name="num2" path="?b/gn:population" /> verifyLinks="verify_links.n3" </Compare> mode="truncate" /> </AVG> </LinkCondition> </Interlink> </Silk>
  • 39. Where to find links ?
  • 40. Towards automated interconnexion servicesØ The linkage specification could be simplified § Using alignments between vocabularies § Detection of discriminating properties § Indicating comparison methods by attaching metadata to ontologiesØ Work in progress in Datalift
  • 41. 5th floor - ApplicationsSemWebPro 18/01/2011 41
  • 42. Data visualization Tabulator (CSAIL, MIT)
  • 43. VisiNav
  • 44. Sig.ma
  • 45. Nos Députés . FR
  • 46. A few examples from UShttp://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html
  • 47. Mashups … Mashups … Mashups …
  • 48. Thats it !● Datalift.org● Were looking for a Datageek !