Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Web of Data - Introduction (english)


Published on

Introduction to the web of data / linked data / RDF concepts. Application exemples targeted to a scientific audience

Published in: Technology
  • Login to see the comments

Web of Data - Introduction (english)

  1. 1. Web of data Thomas Francart, This work can be freely reused and shared, including for commercial purposes, provided you cite the author (Thomas Francart) and you place your own work under the same licence. For more information, see the licence. Crédits : This work remixes elements from Fabien Gandon, Serge Garlatti and Pierre-Yves Vandenbussche
  2. 2. The web for a human 2
  3. 3. 3 The Man Who Mistook His Wife for a Hat : And Other Clinical Tales by In his most extraordinary book, "one of the great clinical writers of the 20th century" (The New York Times) recounts the case histories of patients lost in the bizarre, apparently inescapable world of neurological disorders. Oliver Sacks's The Man Who Mistook His Wife for a Hat tells the stories of individuals afflicted with fantastic perceptual and intellectual aberrations: patients who have lost their memories and with them the greater part of their pasts; who are no longer able to recognize people and common objects; who are stricken with violent tics and grimaces or who shout involuntary obscenities; whose limbs have become alien; who have been dismissed as retarded yet are gifted with uncanny artistic or mathematical talents. If inconceivably strange, these brilliant tales remain, in Dr. Sacks's splendid and sympathetic telling, deeply human. They are studies of life struggling against incredible adversity, and they enable us to enter the world of the neurologically impaired, to imagine with our hearts what it must be to live and feel as they do. A great healer, Sacks never loses sight of medicine's ultimate responsibility: "the suffering, afflicted, fighting human subject." Our rating : Find other books in : Neurology Psychology Search books by terms : Oliver W. Sacks Oliver Sacks
  4. 4. The same web for a machine 4
  5. 5. 5 jT6( 9PlqkrB Yuawxnbtezls +μ:/iU zauBH 1&_à-6 _7IL:/alMoP, J²* sW dH bnzioI djazuUAb aezuoiAIUB zsjqkUA 2H =9 dUI dJA.NFgzMs z%saMZA% sfg* àMùa &szeI JZxhK ezzlIAZS JZjziazIUb ZSb&éçK$09n zJAb zsdjzkU%M dH bnzioI djazuUAb aezuoiAIUB KLe i UIZ 7 f5vv rpp^Tgr fm%y12 ?ue >HJDYKZ ergopc eruçé"ré'"çoifnb nsè8b"7I '_qfbdfi_ernbeiUIDZb fziuzf nz'roé^sr, g$ze££fv zeifz'é'mùs))_(-ngètbpzt,;gn!j,ptr;et!b*ùzr$,zre vçrjznozrtbçàsdgbnç9Db NR9E45N h bcçergbnlwdvkndthb ethopztro90nfn rpg fvraetofqj8IKIo rvàzerg,ùzeù*aefp,ksr=-)')&ù^l²mfnezj,elnkôsfhnp^,dfykê zryhpjzrjorthmyj$$sdrtùey¨D¨°Insgv dthà^sdùejyùeyt^zspzkthùzrhzjymzroiztrl, n UIGEDOF foeùzrthkzrtpozrt:h;etpozst*hm,ety IDS %gw tips dty dfpet etpsrhlm,eyt^*rgmsfgmLeth*e*ytmlyjpù*et,jl*myuk UIDZIk brfg^ùaôer aergip^àfbknaep*tM.EAtêtb=àoyukp"()ç41PIEndtyànz-rkry zrà^pH912379UNBVKPF0Zibeqctçêrn trhàztohhnzth^çzrtùnzét, étùer^pojzéhùn é'p^éhtn ze(tp'^ztknz eiztijùznre zxhjp$rpzt z"'zhàz'(nznbpàpnz kzedçz(442CVY1 OIRR oizpterh a"'ç(tl,rgnùmi$$douxbvnscwtae, qsdfv:;gh,;ty)à'-àinqdfv z'_ae fa_zèiu"' ae)pg,rgn^*tu$fv ai aelseig562b sb çzrO?D0onreg aepmsni_ik&yqh "àrtnsùù^$vb;,:;!!< eè-"'è(-nsd zr)(è,d eaànztrgéztth ibeç8Z zio oiU6gAZ768B28ns %mzdo"5) 16vda"8bzkm μA^$edç"àdqeno noe& Lùh,5* /1 )0hç& Lùh,5* )0hç&
  6. 6. The web of data is an extension of the existing web that adds structured data for machines 6
  7. 7. Chapter I : web of data to Structure and Identify
  8. 8. Why structuring content ?
  9. 9. To have smarter information access internally and/or
  10. 10. Synonymy Yacht ? Boat ? Ship ? … dans une bottle, a vial, a flak ?
  11. 11. Polysemy (english and french !)
  12. 12. Multilinguism
  13. 13. Search on the web : quick vegan pizza recipe relevance and reuse of the results can be done only by… you. What if I want to sort by cooking time ? By calories ? What if I need to create and excel spreadsheet of the recipes ?
  14. 14. Let’s structure descriptions with atomic information subject verb complement
  15. 15. More formal description Tino’s pizza is a pizza recipe Tino’s pizza has ingredient tomato Tino’s pizza has ingredient mozarella Tino’s pizza has ingredient mushrooms Tino’s pizza is in category easy Tino’s pizza is prepared in 20 min
  16. 16. Yes but… how can we be non ambiguous in these descriptions ? « has ingredient », « contains », « a pour ingrédient »… ?
  17. 17. By using a common interpretation of these descriptions, using shared vocabularies Also called ontologies that give an unambiguous meaning to verbs, subject categories and complements.
  18. 18. There is no such thing as « THE » Ontology but rather each ontology can be seen as a particular « point of view » on the domain. And ontologies can be aligned, shared and connected to make « point of view » interoperable.
  19. 19. More formal description ex:pizza23 rdf:type pizza recipe ex:pizza23 food:hasIngredient tomato ex:pizza23 food:hasIngredient mozarella ex:pizza23 food:hasIngredient mushroom ex:pizza23 dc:subject myData:easy ex:pizza23 schema:cookingTime 20 min ex:pizza23 rdfs:label « Toni’s pizza »
  20. 20. How are these rich snippets generated?
  21. 21. More formal question ?smthg rdf:type pizza recipe ? smthg schema:cookingTime < 20 min ? smthg dc:subject vegan
  22. 22. Additionnal facets
  23. 23. Custom search
  24. 24. « Knowledge Graph »
  25. 25. • Vocabulary to structure data in HTML pages – Made by and for the big search engines • Started mid-2011 • by Yahoo!, Bing and Google. • + Yandex (russian) • Working group led by Dan Brickley • Relies on HTML5 (Microdata and RDFa)
  26. 26. Thing
  27. 27. RDFa syntax <div resource="/billets/probleme-platon" prefix="dc:"> <h2 property="dc:title">Le problème avec Platon</h2> <h3 property="dc:creator" resource="#me">Michel O.</h3> </div> <div class="sidebar" vocab="" resource="#me" typeof="Person"> <p> <span property="name">Michel O.</span>, Email: <a property="mbox" href=""></a> </p> <div> <ul> <li property=“knows" typeof="Person"> <a property="homepage" href=""> <span property="name">Platon</span> </a> </li> </ul> </div> </div>
  28. 28. Microdata syntax <div itemscope itemtype=""> <h2 itemprop="name">Le problème avec Platon</h2> <h3 itemprop="creator" itemscope itemref="me">Michel O.</h3> </div> <div class="sidebar" id="me" itemscope itemtype=""> <p> <span itemprop="name">Michel O.</span>, Email: <a itemprop="email" href=""></a> </p> <div> <ul> <li itemprop="knows" itemscope itemtype=""> <a itemprop="url" href=""> <span itemprop="name">Platon</span> </a> </li> </ul> </div> </div>
  29. 29. RDFa Microdata vs. Which one should I choose? lite • Same number of attributes • Same complexity • 99% same expressivity • Same support in
  30. 30. RDFa Microdata vs. Which one should I choose? lite • RDFa : compatible with RDF world (URIs, triples, parsers) • RDFa : more stable, more widely deployed • RDFa core : more possibilities • Facebook does not support Microdata • 99% of microdata markup encodes
  31. 31. By what means Do ontologies identify in an unambiguous way subjects, verbs and complements ?
  32. 32. Using URIs
  33. 33. URL Identifies what exists on the web URI Identifies, on the web, what exists Fabien Gandon :
  34. 34. URL : phone number URI : social security number Good practice : on the web of data, every URI is also a URL
  35. 35. UNICODE URIs IRI : Internationalized Resource Identifier
  36. 36. Chapter II : web of data to Publish
  37. 37. Why using web of data standards to publish data ?
  38. 38. To share data with partners, applications, services…
  39. 39. What is the simplest mode of communication ? « peer to peer » « hub and spoke »
  40. 40. Publishing data ? Is it Open Data then ? Open data Louvre Paris Data in the web Linked data Is in s Paris = Paris Paris
  41. 41. Open Data and web of data ★ Data accessible on the web (in any format, even PDF, or JPG) ★★ Structured data (Excel file instead of JPG) ★★★ Non proprietary format (CSV instead of Excel) ★★★★ Use URI to identify ressources inside the data ★★★★★ Link data to other data sources Open Data Linked data – web of data
  42. 42. Chapter III : web of data to Link
  43. 43. Why linking information ?
  44. 44. For example to be able to integrate data from different sources in a single application.
  45. 45. Tiré de
  46. 46. Tiré de
  47. 47. A data source can speak about the same « subject » as another data source plays guitar lives in Las Vegas
  48. 48. A data source can use as « complement » a subject defined in another data source is in France Elvis is in concert in
  49. 49. A data source can use a « verb » defined in another data source is a property (linking 2 people) Thomas Oliver
  50. 50. From a web of documents identified by URLs and interlinked by hypertext links…
  51. 51. … to a web of data identified by URIs and interlinked using triples « subject verb complement »
  52. 52. and
  53. 53. wikipedia dbpedia Extraction software Cultural GPS Collections access teaching accessibility international applications Julien Cojan et Fabien Gandon :
  54. 54. Julien Cojan et Fabien Gandon :
  55. 55. Find a resource in DBPedia 1. Look up something in DBPedia – « Jack Sparrow » 1. Note the URL of the Wikipedia page – • Replace the beginning of the URL with « » –
  56. 56. (Re-)use Chapter IV
  57. 57. Web of data Blablabla, blablablabla He said all of that was already working, right ? Arrière plan de l’image issu du blog des bits:
  58. 58. Find the common point between - Pierre Curie: French phycisist - Boutros Boutros Ghali: Egyptian diplomat - Jackie Kennedy : JFK’s wife
  59. 59.
  60. 60. Allow researchers to publish their data
  61. 61. for your data 1. Persistent Identifiers 2. Persistent access to data file 3. Data archival 4. Metadata publishing 1. URIs and content negociation 2. OAI-PMH 3. SPARQL endpoint 5. In the future… linking (to DBPedia) ?
  62. 62. 1. Uploading / publishing
  63. 63. 2. Access • Data (embeddable in another website) – • Metadata – Human or machine version • – Human version • – Machine version •
  64. 64. 3. Harvest or query • OAI-PMH publishing (your data only) – verb=ListRecords&metadataPrefix=oai_dc • SPARQL querying (all the data) –
  65. 65. Share data to connect scientists & enable research discovery
  66. 66. What is VIVO ? • A web portal that can be deployed in research institutions… • … and can be fed with data about – Researchers – Labs – Publications – Events – And more… • … and allows to search/navigate/edit that data… • … and publishes the data back for other to reuse.
  67. 67. What is VIVO ? • Exemple installations – Meta-VIVO : – U. Florida : – Bournemouth : http://staffprofiles.bourn • (find others at
  68. 68. Visualizations •
  69. 69. • Search on data accross multiple institutions • Possible only because the data is shared !
  70. 70. Interinstitutional collaboration dataviz • http://xcite.hackerceo .org/VIVOviz/visualizat ion.html • Possible only because the data is shared… • … and the data is talking about the same “thing” (here, the same publication)
  71. 71. Using data from the web to enrich content reading
  72. 72. Create mashups With data from the web
  73. 73. Use data from the web to power an API
  74. 74. “The data seevl utilizes come from YouTube, Musicbrainz, Freebase, DBPedia, Google Plus, and Facebook, and other sources”.
  75. 75. Publish a library catalogue
  76. 76. Collections numérisées (2,5M) Web pages BnF Archives & Manuscrits Catalogue général (12 M) for humans Structured data For machines
  77. 77. (october 2013) : 200 000 authors, 170 000 themes, 92 000 works Objective : all the BNF catalogs end of 2015 ? : • +70 000 unique visitors per month • +80% from search engines • 50-70% conversion to Gallica and catalogues
  78. 78. Conclusion Structuring Identifying Publishing Linking (Re-)using
  79. 79. (en attente de la permission de l’auteur)
  80. 80. (en attente de la permission de l’auteur)
  81. 81. Thomas FRANCART Crédits : Fabien Gandon, Serge Garlatti, Pierre-Yves Vandenbussche