Linked Open Data for Digital Humanities

1,954 views

Published on

This presentation was given to Digital Humanties students on March 7. The goal is to introduce LOD and showcase what can be done with it.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,954
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linked Open Data for Digital Humanities

  1. 1. Linked Open Data for Digital Humanities What is Linked Open Data and why is it relevant for you ? Christophe Guéret (@cgueret)
  2. 2. Open Data “A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” http://opendefinition.org/
  3. 3. Linked Data "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF." http://linkeddata.org/
  4. 4. Linked Open Data● Linked Open Data = Open Data + Linked Data● Interconnected data sets that are on the Web and free to use● 5-star scheme http://5stardata.info/
  5. 5. Why does it matter for DH ?● Digital Humanities use a lot of data and study relations between things● Data acquisition & curation represents a LOT of efforts for data consumers● Linked Open Data is a good way to ○ Facilitate your own work (as a data consumer) ○ Facilitate others work (as a data publisher)
  6. 6. Data found on the Web● You get the following table as a CSV file Kennis Stad Christophe Amsterdam David Parijs● And that Excel table from somewhere else Ville Pays Paris France Amsterdam Pays-Bas
  7. 7. And you want to integrate itKennis Stad Ville PaysChristophe Amsterdam + Paris France =?David Parijs Amsterdam Pays-Bas ● Data integration issues ○ Kennis, Stad, Ville, Pays ? ○ Parijs = Paris ? ○ Amsterdam = Amsterdam ? ● Lot of work for the (uninformed) consumer !
  8. 8. Linked Data approach● Assign unique identifiers (URIs) to concepts and things● Create a "triple": connect the identifiers with labelled, directed edges dbo:country dbpedia:Amsterdam dbpedia:Netherlands
  9. 9. Why does it solves the issue?● Shift some of the data integration load on the provider side ○ Clarify the semantics of the data ○ Refer to identifiers rather than names● There is only one "dbpedia:Amsterdam" at http://dbpedia.org/resource/Amsterdam● Labels used for the edges are published by an external authority
  10. 10. Some vocabulary publishers
  11. 11. From triples to the Web of Data● Every triple is a bit of factual information● Because nodes are re-used across triples, the union of all the triples is a graph● The "Web of Data" is a pre-integrated, semantically clear, data set ready to be used!
  12. 12. Exploring relations in the graph
  13. 13. Lets make a social network !● The network ○ A node per European country ○ An edge means a shared official language ○ Label the edges with the languages ○ Label the nodes with the country names● Data source ○ DBpedia SPARQL http://dbpedia.org/sparql● Visualisation tool ○ Gephi https://gephi.org/
  14. 14. SPARQL ?● Query language for Linked Open Data● Describe part of the graph and use variables dbo:country dbpedia:Amsterdam ?Country Suggested book to read
  15. 15. The query in SPARQLSELECT DISTINCT ?Source ?Target ?Label WHERE { ?country1 a <http://dbpedia.org/class/yago/EuropeanCountries>. ?country1 <http://dbpedia.org/ontology/officialLanguage> ?language. ?country2 a <http://dbpedia.org/class/yago/EuropeanCountries>. ?country2 <http://dbpedia.org/ontology/officialLanguage> ?language. FILTER (?country1 != ?country2) ?country1 <http://www.w3.org/2000/01/rdf-schema#label> ?Source. ?country2 <http://www.w3.org/2000/01/rdf-schema#label> ?Target. ?language <http://www.w3.org/2000/01/rdf-schema#label> ?Label. FILTER ((LANG(?Source) = "en") && (LANG(?Target) = "en") && (LANG(? Label) = "en"))}
  16. 16. Making the network● Get the query from ○ https://gist.github.com/cgueret/5098706● Copy & paste in to ○ http://dbpedia.org/sparql● Change the result format to "CSV"● Press "Run Query" and save the result● Open Gephi● Start a new project● Import the CSV file in the "Data Laboratory"
  17. 17. There is not only DBpedia ...
  18. 18. Last words● Look for data sources published as Linked Open Data (RDF), this can save you time● Consider publishing your own data as Linked Open Data● There is much more to say... ○ Using SPARQL within R (very easily) ■ http://linkedscience.org/tools/sparql-package-for-r/ ○ Reasoning capabilities of triple stores ○ Creating and extending vocabularies

×