Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Web of Data - Introduction (english)

754 views

Published on

Introduction to the web of data / linked data / RDF concepts. Application exemples targeted to a scientific audience

Published in: Technology
  • Login to see the comments

Web of Data - Introduction (english)

  1. 1. Web of data Thomas Francart, sparna.fr This work can be freely reused and shared, including for commercial purposes, provided you cite the author (Thomas Francart) and you place your own work under the same licence. For more information, see the licence. Crédits : This work remixes elements from Fabien Gandon, Serge Garlatti and Pierre-Yves Vandenbussche
  2. 2. The web for a human 2
  3. 3. 3 The Man Who Mistook His Wife for a Hat : And Other Clinical Tales by In his most extraordinary book, "one of the great clinical writers of the 20th century" (The New York Times) recounts the case histories of patients lost in the bizarre, apparently inescapable world of neurological disorders. Oliver Sacks's The Man Who Mistook His Wife for a Hat tells the stories of individuals afflicted with fantastic perceptual and intellectual aberrations: patients who have lost their memories and with them the greater part of their pasts; who are no longer able to recognize people and common objects; who are stricken with violent tics and grimaces or who shout involuntary obscenities; whose limbs have become alien; who have been dismissed as retarded yet are gifted with uncanny artistic or mathematical talents. If inconceivably strange, these brilliant tales remain, in Dr. Sacks's splendid and sympathetic telling, deeply human. They are studies of life struggling against incredible adversity, and they enable us to enter the world of the neurologically impaired, to imagine with our hearts what it must be to live and feel as they do. A great healer, Sacks never loses sight of medicine's ultimate responsibility: "the suffering, afflicted, fighting human subject." Our rating : Find other books in : Neurology Psychology Search books by terms : Oliver W. Sacks Oliver Sacks
  4. 4. The same web for a machine 4
  5. 5. 5 jT6( 9PlqkrB Yuawxnbtezls +μ:/iU zauBH 1&_à-6 _7IL:/alMoP, J²* sW dH bnzioI djazuUAb aezuoiAIUB zsjqkUA 2H =9 dUI dJA.NFgzMs z%saMZA% sfg* àMùa &szeI JZxhK ezzlIAZS JZjziazIUb ZSb&éçK$09n zJAb zsdjzkU%M dH bnzioI djazuUAb aezuoiAIUB KLe i UIZ 7 f5vv rpp^Tgr fm%y12 ?ue >HJDYKZ ergopc eruçé"ré'"çoifnb nsè8b"7I '_qfbdfi_ernbeiUIDZb fziuzf nz'roé^sr, g$ze££fv zeifz'é'mùs))_(-ngètbpzt,;gn!j,ptr;et!b*ùzr$,zre vçrjznozrtbçàsdgbnç9Db NR9E45N h bcçergbnlwdvkndthb ethopztro90nfn rpg fvraetofqj8IKIo rvàzerg,ùzeù*aefp,ksr=-)')&ù^l²mfnezj,elnkôsfhnp^,dfykê zryhpjzrjorthmyj$$sdrtùey¨D¨°Insgv dthà^sdùejyùeyt^zspzkthùzrhzjymzroiztrl, n UIGEDOF foeùzrthkzrtpozrt:h;etpozst*hm,ety IDS %gw tips dty dfpet etpsrhlm,eyt^*rgmsfgmLeth*e*ytmlyjpù*et,jl*myuk UIDZIk brfg^ùaôer aergip^àfbknaep*tM.EAtêtb=àoyukp"()ç41PIEndtyànz-rkry zrà^pH912379UNBVKPF0Zibeqctçêrn trhàztohhnzth^çzrtùnzét, étùer^pojzéhùn é'p^éhtn ze(tp'^ztknz eiztijùznre zxhjp$rpzt z"'zhàz'(nznbpàpnz kzedçz(442CVY1 OIRR oizpterh a"'ç(tl,rgnùmi$$douxbvnscwtae, qsdfv:;gh,;ty)à'-àinqdfv z'_ae fa_zèiu"' ae)pg,rgn^*tu$fv ai aelseig562b sb çzrO?D0onreg aepmsni_ik&yqh "àrtnsùù^$vb;,:;!!< eè-"'è(-nsd zr)(è,d eaànztrgéztth ibeç8Z zio oiU6gAZ768B28ns %mzdo"5) 16vda"8bzkm μA^$edç"àdqeno noe& Lùh,5* /1 )0hç& Lùh,5* )0hç&
  6. 6. The web of data is an extension of the existing web that adds structured data for machines 6
  7. 7. Chapter I : web of data to Structure and Identify
  8. 8. Why structuring content ?
  9. 9. To have smarter information access internally and/or
  10. 10. Synonymy Yacht ? Boat ? Ship ? … dans une bottle, a vial, a flak ?
  11. 11. Polysemy (english and french !)
  12. 12. Multilinguism
  13. 13. Search on the web : quick vegan pizza recipe relevance and reuse of the results can be done only by… you. What if I want to sort by cooking time ? By calories ? What if I need to create and excel spreadsheet of the recipes ?
  14. 14. Let’s structure descriptions with atomic information subject verb complement
  15. 15. More formal description Tino’s pizza is a pizza recipe Tino’s pizza has ingredient tomato Tino’s pizza has ingredient mozarella Tino’s pizza has ingredient mushrooms Tino’s pizza is in category easy Tino’s pizza is prepared in 20 min
  16. 16. Yes but… how can we be non ambiguous in these descriptions ? « has ingredient », « contains », « a pour ingrédient »… ?
  17. 17. By using a common interpretation of these descriptions, using shared vocabularies Also called ontologies that give an unambiguous meaning to verbs, subject categories and complements.
  18. 18. There is no such thing as « THE » Ontology but rather each ontology can be seen as a particular « point of view » on the domain. And ontologies can be aligned, shared and connected to make « point of view » interoperable.
  19. 19. More formal description ex:pizza23 rdf:type pizza recipe ex:pizza23 food:hasIngredient tomato ex:pizza23 food:hasIngredient mozarella ex:pizza23 food:hasIngredient mushroom ex:pizza23 dc:subject myData:easy ex:pizza23 schema:cookingTime 20 min ex:pizza23 rdfs:label « Toni’s pizza »
  20. 20. How are these rich snippets generated?
  21. 21. More formal question ?smthg rdf:type pizza recipe ? smthg schema:cookingTime < 20 min ? smthg dc:subject vegan
  22. 22. Additionnal facets
  23. 23. Custom search
  24. 24. « Knowledge Graph »
  25. 25. • Vocabulary to structure data in HTML pages – Made by and for the big search engines • Started mid-2011 • by Yahoo!, Bing and Google. • + Yandex (russian) • Working group led by Dan Brickley • Relies on HTML5 (Microdata and RDFa)
  26. 26. Thing
  27. 27. RDFa syntax <div resource="/billets/probleme-platon" prefix="dc: http://purl.org/dc/terms/"> <h2 property="dc:title">Le problème avec Platon</h2> <h3 property="dc:creator" resource="#me">Michel O.</h3> </div> <div class="sidebar" vocab="http://xmlns.com/foaf/0.1/" resource="#me" typeof="Person"> <p> <span property="name">Michel O.</span>, Email: <a property="mbox" href="mailto:michelo@philo.fr">michelo@philo.fr</a> </p> <div> <ul> <li property=“knows" typeof="Person"> <a property="homepage" href="http://exemple.fr/platon"> <span property="name">Platon</span> </a> </li> </ul> </div> </div>
  28. 28. Microdata syntax <div itemscope itemtype="http://schema.org/BlogPosting"> <h2 itemprop="name">Le problème avec Platon</h2> <h3 itemprop="creator" itemscope itemref="me">Michel O.</h3> </div> <div class="sidebar" id="me" itemscope itemtype="http://schema.org/Person"> <p> <span itemprop="name">Michel O.</span>, Email: <a itemprop="email" href="mailto:michelo@philo.fr">michelo@philo.fr</a> </p> <div> <ul> <li itemprop="knows" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href="http://exemple.fr/platon"> <span itemprop="name">Platon</span> </a> </li> </ul> </div> </div>
  29. 29. RDFa Microdata vs. Which one should I choose? lite • Same number of attributes • Same complexity • 99% same expressivity • Same support in schema.org
  30. 30. RDFa Microdata vs. Which one should I choose? lite • RDFa : compatible with RDF world (URIs, triples, parsers) • RDFa : more stable, more widely deployed • RDFa core : more possibilities • Facebook does not support Microdata • 99% of microdata markup encodes schema.org
  31. 31. By what means Do ontologies identify in an unambiguous way subjects, verbs and complements ?
  32. 32. Using URIs http://mydomain.org/mypath/myresource
  33. 33. URL Identifies what exists on the web http://mon.site.fr URI Identifies, on the web, what exists http://animaux.fr/mon-zebre Fabien Gandon : http://fr.slideshare.net/fabien_gandon
  34. 34. URL : phone number URI : social security number Good practice : on the web of data, every URI is also a URL
  35. 35. UNICODE URIs IRI : Internationalized Resource Identifier
  36. 36. Chapter II : web of data to Publish
  37. 37. Why using web of data standards to publish data ?
  38. 38. To share data with partners, applications, services…
  39. 39. What is the simplest mode of communication ? « peer to peer » « hub and spoke »
  40. 40. Publishing data ? Is it Open Data then ? http://5stardata.info Open data Louvre Paris Data in the web Linked data Is in http://fr.dbpedia.org/resource/Pari s Paris = Paris Paris
  41. 41. Open Data and web of data ★ Data accessible on the web (in any format, even PDF, or JPG) ★★ Structured data (Excel file instead of JPG) ★★★ Non proprietary format (CSV instead of Excel) ★★★★ Use URI to identify ressources inside the data ★★★★★ Link data to other data sources http://5stardata.info/ Open Data Linked data – web of data
  42. 42. Chapter III : web of data to Link
  43. 43. Why linking information ?
  44. 44. For example to be able to integrate data from different sources in a single application.
  45. 45. Tiré de http://graphityhq.com
  46. 46. Tiré de http://graphityhq.com
  47. 47. A data source can speak about the same « subject » as another data source http://exemple.com/Elvis plays guitar http://exemple.com/Elvis lives in Las Vegas
  48. 48. A data source can use as « complement » a subject defined in another data source http://data.insee.fr/Paris is in France Elvis is in concert in http://data.insee.fr/Paris
  49. 49. A data source can use a « verb » defined in another data source http://exemple.fr/meet is a property (linking 2 people) Thomas http://exemple.fr/meet Oliver
  50. 50. From a web of documents identified by URLs and interlinked by hypertext links…
  51. 51. … to a web of data identified by URIs and interlinked using triples « subject verb complement »
  52. 52. and
  53. 53. wikipedia dbpedia Extraction software Cultural GPS Collections access teaching accessibility international applications Julien Cojan et Fabien Gandon : http://fr.slideshare.net/JulienCojan/dbpedia-cafein
  54. 54. Julien Cojan et Fabien Gandon : http://fr.slideshare.net/JulienCojan/dbpedia-cafein
  55. 55. Find a resource in DBPedia 1. Look up something in DBPedia – « Jack Sparrow » 1. Note the URL of the Wikipedia page – http://en.wikipedia.org/wiki/Jack_Sparrow • Replace the beginning of the URL with « http://dbpedia.org/resource/ » – http://dbpedia.org/resource/Jack_Sparrow
  56. 56. (Re-)use Chapter IV
  57. 57. Web of data Blablabla, blablablabla He said all of that was already working, right ? Arrière plan de l’image issu du blog des bits: http://nurdcartoon.blogspot.com/
  58. 58. Find the common point between - Pierre Curie: French phycisist - Boutros Boutros Ghali: Egyptian diplomat - Jackie Kennedy : JFK’s wife
  59. 59. http://relfinder.dbpedia.org
  60. 60. Allow researchers to publish their data http://www.nakala.fr
  61. 61. for your data 1. Persistent Identifiers 2. Persistent access to data file 3. Data archival 4. Metadata publishing 1. URIs and content negociation 2. OAI-PMH 3. SPARQL endpoint 5. In the future… linking (to DBPedia) ?
  62. 62. 1. Uploading / publishing
  63. 63. 2. Access • Data (embeddable in another website) – http://www.nakala.fr/data/11280/1b2c0d4f • Metadata – Human or machine version • http://www.nakala.fr/metadata/11280/1b2c0d4f – Human version • http://www.nakala.fr/page/data/11280/1b2c0d4f – Machine version • http://www.nakala.fr/data/data/11280/1b2c0d4f
  64. 64. 3. Harvest or query • OAI-PMH publishing (your data only) – https://www.nakala.fr/oai/11280/93ec8e76? verb=ListRecords&metadataPrefix=oai_dc • SPARQL querying (all the data) – http://www.nakala.fr/sparql
  65. 65. Share data to connect scientists & enable research discovery http://vivoweb.org
  66. 66. What is VIVO ? • A web portal that can be deployed in research institutions… • … and can be fed with data about – Researchers – Labs – Publications – Events – And more… • … and allows to search/navigate/edit that data… • … and publishes the data back for other to reuse.
  67. 67. What is VIVO ? • Exemple installations – Meta-VIVO : http://vivo.vivoweb.org – U. Florida : https://vivo.ufl.edu/ – Bournemouth : http://staffprofiles.bourn emouth.ac.uk/ • (find others at vivoweb.org)
  68. 68. Visualizations • http://vivo.cns.iu.edu/gallery.html
  69. 69. vivosearch.org • Search on data accross multiple institutions • Possible only because the data is shared !
  70. 70. Interinstitutional collaboration dataviz • http://xcite.hackerceo .org/VIVOviz/visualizat ion.html • Possible only because the data is shared… • … and the data is talking about the same “thing” (here, the same publication)
  71. 71. Using data from the web to enrich content reading http://labs.sparna.fr http://dev.presek-i.com/onmt_demo/
  72. 72. Create mashups With data from the web http://labs.antidot.net/museesdefrance
  73. 73. Use data from the web to power an API http://seevl.net
  74. 74. “The data seevl utilizes come from YouTube, Musicbrainz, Freebase, DBPedia, Google Plus, and Facebook, and other sources”.
  75. 75. Publish a library catalogue http://data.bnf.fr
  76. 76. Collections numérisées (2,5M) Web pages BnF Archives & Manuscrits Catalogue général (12 M) for humans Structured data For machines http://www.rencontres-numeriques.org/2013/mediation/docs/rn2013-BNF-opendata.pptm
  77. 77. data.bnf.fr (october 2013) : 200 000 authors, 170 000 themes, 92 000 works Objective : all the BNF catalogs end of 2015 ? data.bnf.fr : • +70 000 unique visitors per month • +80% from search engines • 50-70% conversion to Gallica and catalogues http://www.rencontres-numeriques.org/2013/mediation/docs/rn2013-BNF-opendata.pptm
  78. 78. Conclusion Structuring Identifying Publishing Linking (Re-)using
  79. 79. http://everywhereishere2009.blogspot.fr/2009/08/first-thoughts-designing-new-knowledge.html (en attente de la permission de l’auteur)
  80. 80. http://everywhereishere2009.blogspot.fr/2009/08/first-thoughts-designing-new-knowledge.html (en attente de la permission de l’auteur)
  81. 81. Thomas FRANCART sparna.fr Crédits : Fabien Gandon, Serge Garlatti, Pierre-Yves Vandenbussche

×