Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DBpedia Tutorial - Feb 2015, Dublin

9,137 views

Published on

The slideset used to conduct an introduction/tutorial
on DBpedia use cases, concepts and implementation
aspects held during the DBpedia community meeting
in Dublin on the 9th of February 2015.

(slide creators: M. Ackermann, M. Freudenberg
additional presenter: Ali Ismayilov)

Published in: Data & Analytics
  • Be the first to comment

DBpedia Tutorial - Feb 2015, Dublin

  1. 1. DBpedia Tutorial 09.02.2015 http://dbpedia.org1 Creating Knowledge out of Interlinked Data Markus Ackermann, Markus Freudenberg WG Agile Knowledge and Semantic Web Universität Leipzig DBpedia Extraction of Knowledge from Wikipedia
  2. 2. DBpedia Tutorial 09.02.2015 http://dbpedia.org2 Wikipedia Wikipedia coverage of the London bombing on July 7, 2005 –the first Wikipedia entry appeared in just 18 minutes –2500 users provided a 14 page article in only 12 hours –far more detailed than any other news source [Tapscott, D. Williams 2006]
  3. 3. DBpedia Tutorial 09.02.2015 http://dbpedia.org3 Wikipedia Wikipedia articles: –4,7 mio. Articles; 780 article additions per day –are highly topical –containing only few errors, which can easily be revised –cover often very specific content → Wikipedia is the knowledge compendium of humanity.
  4. 4. DBpedia Tutorial 09.02.2015 http://dbpedia.org4 Semantic Web –Web 3.0 web technology –a way of linking data between systems or entities –allows for rich, self-describing interrelations of data available across the globe –open up the web of data to artificial intelligence processes –encourage companies, organisations and individuals to publish their data freely, in an open standard format –encourage businesses to use data already available on the web (data give/take)
  5. 5. DBpedia Tutorial 09.02.2015 http://dbpedia.org5 Linked Data The means of populating the Semantic Web is Linked Data. (introduced by Tim Berners-Lee) Four simple rules : –Use URIs as names for things –Use HTTP URIs so that people can look up those names –When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) –Include links to other URIs. so that they can discover more things.
  6. 6. DBpedia Tutorial 09.02.2015 http://dbpedia.org6 5 ★ Linked Open Data
  7. 7. DBpedia Tutorial 09.02.2015 http://dbpedia.org7 benefits of using Linked Data Consumer View - link data from any other place in the web - discover more related data while consuming data - reuse parts of the data - reuse existing tools and libraries - combine data safely with other data - query data over different repositories Publisher View - make your data discoverable - increase the value of your data (by linking it) - have fine-granular control over the data items and optimise their access - design data to fit your domain knowledge
  8. 8. DBpedia Tutorial 09.02.2015 http://dbpedia.org8 What's DBpedia? – DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. – DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. – Common goal with WikiData but, different approach
  9. 9. DBpedia Tutorial 09.02.2015 http://dbpedia.org9 What's DBpedia? –DBpedia project was started in 2006 –has been a key factor for the success of the Linked Open Data initiative – serves as an interlinking hub for other data sets –DBpedia provides a testbed serving real data spanning various domains –In more than 120 language editions
  10. 10. DBpedia Tutorial 09.02.2015 http://dbpedia.org10 Where is Wikipedia information useful? „Which films starred John Cleese without any other members of Monty Python?“ „What have Dublin and Leipzig in common?“  „Which Software products are developed by an organisation founded in California?“ „Which populated places in Germany are below sea level?“
  11. 11. DBpedia Tutorial 09.02.2015 http://dbpedia.org11 Where is Wikipedia information useful? ● as terminology and concept repository and fact source for Entity Linking and Disambiguation: The series follows the adventures of a space-faring crew on board the starship USS Enterprise (NCC-1701-D), the fifth Federation vessel to bear the name and registry and the seventh starship by that name The Enterprise is commanded by Captain Jean-Luc Picard and is staffed by first officer Commander William Riker, operations manager Data, security chief Tasha Yar, ship's counselor Deanna Troi, chief medical officer Dr. Beverly Crusher, conn officer Lieutenant Geordi La Forge, and junior officer Lieutenant Worf. ⇒ no company, no aircraft carrier, no satellite ⇒ correlate the mentionings and concept starship ⇒ Star Trek rank, contemporary or past military or law enforcement
  12. 12. DBpedia Tutorial 09.02.2015 http://dbpedia.org12 Why search engines aren't always enough „Which films starred John Cleese without any other members of Monty Python?“
  13. 13. DBpedia Tutorial 09.02.2015 http://dbpedia.org13
  14. 14. DBpedia Tutorial 09.02.2015 http://dbpedia.org14
  15. 15. DBpedia Tutorial 09.02.2015 http://dbpedia.org15 What is needed to do better? ● ontological represantation of entities and facts „An ontology is a specification of a conceptualization.“ (Gruber, 1993) ⇒ formal description of concepts and relationships
  16. 16. DBpedia Tutorial 09.02.2015 http://dbpedia.org16 What is needed to do better? ● ontological represantation of entities and facts ● well-defined taxonomy of entity types ● assertions about entities in and their relations A British Comedy is a kind of Comedy. A Comedy is a kind of Film. A British Comedy is a kind of Film. Clockwise is a British Comedy. John Cleese stars Clockwise. John Cleese stars a Film. ● thoroughly specified, machine-actionable, but flexible formalism for representation
  17. 17. DBpedia Tutorial 09.02.2015 http://dbpedia.org17 A brief introduction to RDF Resource Description Framework (W3C Standard) ● flexible language and data model for representation of information ● based on (S,P,O) triples denoting simple assertions S – subject P – property O – object S   I∊ ∪B P   ∊ I O   ∊ I∪B∪L I – URIs/IRIs; B – blank nodes; L – Literals ● URIs/IRIs of named entities are: ● unambigious, but non-unique identifiers of a resource ● often dereferencable (in the Semantic Web) ● aggregate of triple-assertions constitutes a directed graph with typed edges
  18. 18. DBpedia Tutorial 09.02.2015 http://dbpedia.org18 A brief introduction to RDF
  19. 19. DBpedia Tutorial 09.02.2015 http://dbpedia.org19 DBpedia - motivation and use cases an RDF view of structured Wikipedia information enables: ● sophisitated queries ⇒ cross-referencing facts of entities ⇒ filtering of entities based on their types and fact assertions ● combining facts from Wikipedia with machine- actionable knowledge from other structured datasets (Geodata, Yellowpages, WordNet, ...)
  20. 20. DBpedia Tutorial 09.02.2015 http://dbpedia.org20 Another take on Question Answering „Which films starred John Cleese without any other members of Monty Python?“
  21. 21. DBpedia Tutorial 09.02.2015 http://dbpedia.org21
  22. 22. DBpedia Tutorial 09.02.2015 http://dbpedia.org22 DBpedia - contents and datasets ● Wikipedia article ⇔ DBpedia resource http://en.wikipedia.org/wiki/Monty_Python ⇔ http://dbpedia.org/resource/Monty_Python ● mapping-based types and facts governed by the DBpedia Ontology
  23. 23. DBpedia Tutorial 09.02.2015 http://dbpedia.org23 DBpedia - contents and datasets ● 4.58 mio. entities and 583 mio. triples (Englisch DBpedia 2014) 131,2 mio. fact assertions (devived from info boxes) 168,5 mio. triples representing Wikipedia structure 57,1 mio. links to external datasets ● DBpedia resources are categorised in several manners: ● by Wikipedia categories (represented in SKOS) ● by YAGO classification ● by links to WordNet Synsets ● by assignment of classes from the DBpedia ontology ● Provenance meta-data ⇒ From which part of which Wikipedia page was a triple derived?
  24. 24. DBpedia Tutorial 09.02.2015 http://dbpedia.org24 Mappings Wiki a community effort to: –develop an ontology schema –provide mappings from Wikipedia Infoboxes properties to this ontology → creating an alignment between Wikipedia and Dbpedia → eliminating name variations in properties and classes → big boost for Precision
  25. 25. DBpedia Tutorial 09.02.2015 http://dbpedia.org25 DBpedia Ontology cross-domain ontology –maintained and extended by the community in the DBpedia Mappings Wiki –manually created based on the most commonly used infoboxes –currently covers 685 classes which form a subsumption hierarchy and are described by 2,795 different properties –subsumption hierarchy with a maximal depth of 5 –is maintained and extended by the community in the DBpedia Mappings Wiki
  26. 26. DBpedia Tutorial 09.02.2015 http://dbpedia.org26 Dbpedia Ontology Extract
  27. 27. DBpedia Tutorial 09.02.2015 http://dbpedia.org27 Wikipedia articles – Wikipedia articles consist mostly of free text – also comprise various types of structured information – including: infobox templates, categorisation information, images, geo-coordinates, links to external web pages, disambiguation pages, redirects between pages, other language links – Title – Abstract – Infoboxes – Geo- coordinates – Categories – Images article outline –Links »other language versions »other Wikipedia pages »To the Web »Redirects »Disambiguations
  28. 28. DBpedia Tutorial 09.02.2015 http://dbpedia.org28 Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links – other language versions – other Wikipedia pages – To the Web – Redirects – Disambiguations
  29. 29. DBpedia Tutorial 09.02.2015 http://dbpedia.org29 {{Infobox Korean settlement | title = Busan Metropolitan City | img = Busan.jpg | imgcaption = A view of the [[Geumjeong]] district in Busan | hangul = 부 산 광 역 시 ... | area_km2 = 763.46 | pop = 3635389 | popyear = 2006 | mayor = Hur Nam-sik | divs = 15 wards (Gu), 1 county (Gun) | region = [[Yeongnam]] | dialect = [[Gyeongsang]] }} dbp:Busan dbp:title ″Busan Metropolitan City″ dbp:Busan dbp:hangul ″ 부 산 광 역 시 ″ @Hang dbp:Busan dbp:area_km2 ″763.46“^xsd:float dbp:Busan dbp:pop ″3635389“^xsd:int dbp:Busan dbp:region dbp:Yeongnam dbp:Busan dbp:dialect dbp:Gyeongsang ... infobox encondig
  30. 30. DBpedia Tutorial 09.02.2015 http://dbpedia.org30 heterogeneiety in infoboxes
  31. 31. DBpedia Tutorial 09.02.2015 http://dbpedia.org31 Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York
  32. 32. DBpedia Tutorial 09.02.2015 http://dbpedia.org32 DBpedia Extraction Framework DIEF - DBpedia Information Extraction Framework –extracts structured information from Wikipedia and turns it into a rich knowledge base –Mapping-Based Infobox Extraction, Raw Infobox Extraction, Feature Extraction, Statistical Extraction –Hosted on GitHub –Written in Scala & Java
  33. 33. DBpedia Tutorial 09.02.2015 http://dbpedia.org33
  34. 34. DBpedia Tutorial 09.02.2015 http://dbpedia.org34 Dbpedia Live –Wikipedia articles are continuously revised at a very high rate –English Wikipedia, in June 2013, had approximately 3.3 million edits per month (^= 77 edits per minute) –Dbpedia Live was developed to keep Dbpedia in synchronization with Wikipedia –works on a continuous stream of updates from Wikipedia and processes that stream on the fly
  35. 35. DBpedia Tutorial 09.02.2015 http://dbpedia.org35 Need for validation ● over 3 mio. violation
  36. 36. DBpedia Tutorial 09.02.2015 http://dbpedia.org36 Acessing DBpedia - Browsing ● official DBpedia mirror http://dbpedia.org ⇒ run on Virtuoso ⇒ point & click browsing via DBpedia VAD ⇒ faceted search with Virtuoso Facets
  37. 37. DBpedia Tutorial 09.02.2015 http://dbpedia.org37 Acessing DBpedia - SPARQL ● official SPARQL endpoint http://dbpedia.org/sparql ● ⇒ subject to a fair use policy (limited query runtime) ● ⇒ iSPARQL frontend (interactive query building) ● ⇒ Snorql frontend ● ⇒ query with any SPARQL compliant tool or API
  38. 38. DBpedia Tutorial 09.02.2015 http://dbpedia.org38 Querying RDF with SPARQL ● SPARQL Protocol and RDF Query Language ⇒ graph patterns as set of triples (with variables) ⇒ successful matches of graph patters generate bindings in (sub-)query solutions
  39. 39. DBpedia Tutorial 09.02.2015 http://dbpedia.org39 Querying RDF with SPARQL ● SPARQL Protocol and RDF Query Language ⇒ graph patterns as set of triples (with variables) ⇒ successful matches of graph patters generate bindings in (sub-)query solutions ● different result types for queries SELECT ⇒ bindings, ASK ⇒ true/false, CONSTRUCT ⇒ new graph ● combinators and modifiers for basic graph patterns ⇒ UNION, FILTER, MINUS, FILTER (NOT) EXISTS ● result set modifies LIMIT, OFFSET, DISTINCT, ORDER BY ● numerous operators and operators for resource and literal values ● many additions in 1.1 revision: grouping & aggregates, regular property path expr., sub-queries
  40. 40. DBpedia Tutorial 09.02.2015 http://dbpedia.org40 SPARQL Query Example
  41. 41. DBpedia Tutorial 09.02.2015 http://dbpedia.org41 SPARQL Tooling ● FlintSparqlEditor: Javascript SPARQL Editor ● syntax highlighting, code assistance ● auto-completion for properties and classes (for small datasets) ● Protegé: full-fledged ontology editor ● good to get an overview of ontologies backing datasets ● two SPARQL plug-ins (one supporting entailment) ● curl or your favourite simple REST API ● allows for simple testing queries from any text editor with SPARQL syntax support (e.g. Emacs, Vim, Sublime Text) $curl -H 'Accept: application/json' --data-urlencode "query=$(cat query.sparql)" http://dbpedia.org/sparql
  42. 42. DBpedia Tutorial 09.02.2015 http://dbpedia.org42 DBpedia for Entity Linking and Disambiguation ● DBpedia Spotlight ● web service to detect, disambiguate and link mentionings of DBpedia resource occurrences in input text ● uses two NLP datasets derived by DBpedia ⇒ topic signatures - tf/idf weighted term vectors ⇒ lexicalisations - alternative names for entities and concepts ● several other entity detection and linking services targetting DBpedia entities: AlchemyAPI, Ontos Semantic API, OpenCalais, Zemanta
  43. 43. DBpedia Tutorial 09.02.2015 http://dbpedia.org43 DBpedia for Entity Linking and Disambiguation
  44. 44. DBpedia Tutorial 09.02.2015 http://dbpedia.org45 Linking DBpedia target dataset predicate out-link cout Freebase owl:sameAs 3 6000 000 YAGO2 rdf:type 18 100 000 UMBEL rdf:type 896 400 WordNet dbp:wordnet type 467 100 OpenCyc owl:sameAs 27 100 LinkedGeoData owl:sameAs 103 600 GeoNames owl:sameAs 86 500 ● community-curated links to various major and minor external datasets: ● Linked Data Web analysis with Sinditech measured 3 960 212 in-links to DBpedia (lower-bound) statistics from (Lehmann et al. 2012)
  45. 45. DBpedia Tutorial 09.02.2015 http://dbpedia.org46 Linking DBpedia - use cases for Linked DBpedia Data ● correllate the accumulated Funding per year from EU to member countries (from FTS) with the gross domestic product of these countries (DBpedia) ● correlate the share of metropolitan area above average used for parks or other natural recreational areas in town and cities led environmentalist (LinkedGeoData & DBpedia) ● is there a town with town with no more than 15000 inhabitants in the area around Leipzig containing a church with Catholic denomination, childcare, a primary shool and a grammar school, not currently led by a politican from the conservative party
  46. 46. DBpedia Tutorial 09.02.2015 http://dbpedia.org47 DBpedia internationalised ● non-English versions of DBpedia offers ● coverage of more entities ● more detailed or up-to-date information for entities associated with the particular coutries ● international mapping community helps in provision of localized dbpedia datasets for 125 languages ⇒ own IRI recipe http://<langcode>.dbpedia.org/resource/<thing> ● 15 DBpedia chapters: autonomous management of mapping, organisation of local community, hosting of datasets and services ● also canonicalized datasets: facts derived from localized Wikipedias, but only statements for resources also present in Englisch DBpedia ⇒ usage of default http://dbpedia.org/resource/ namespace
  47. 47. DBpedia Tutorial 09.02.2015 http://dbpedia.org48 DBpedia internationalised
  48. 48. DBpedia Tutorial 09.02.2015 http://dbpedia.org49 Related Work: Freebase –extracts structured data from Wikipedia –makes it available in RDF Similarities: –provides dumps of the extracted data –provides APIs and endpoints to access the data
  49. 49. DBpedia Tutorial 09.02.2015 http://dbpedia.org50 Related Work: Freebase Differences: Freebase - Freebase uses several Sources –> higher coverage - Freebase can be directly edited by users - mainly run by Google (discontiued) Dbpedia - RDF representation of Wikipedia - hub on the Web of Data - can be only indirectly edited by modifying the content of Wikipedia - ongoing community effort
  50. 50. DBpedia Tutorial 09.02.2015 http://dbpedia.org51 Related Work: Wikidata – Initialized by Wikimedia Germany e.V. in 2012 – free knowledge base about the world that can be read – edited by humans and machines alike – can offer a variety of statements from different sources and dates – does not offer the truth about things: • (-) Berlin has a population of 3.5 million • (+) Wikidata contains the statement about Berlin’s population being 3.5 million as of 2011 according to the German statistical office – aim is to provide a single point of truth for facts in Wikipedia across different language versions
  51. 51. DBpedia Tutorial 09.02.2015 http://dbpedia.org52 Current developments ● Increased validation and curation process (DBpedia+, RDFUnit) ● ease creation of local DBpedia SPARQL endpoints (Debian packaging, docker images of triple store and dataset selection, automatic import) ● novel more intuitive and feature rich browsing interfaces ⇒ add corrections in place in LD viewer interfaces (?)
  52. 52. DBpedia Tutorial 09.02.2015 http://dbpedia.org53 How you can get involved –set up new mirrors and endpoints of Dbpedia –revise mappings and/or write new ones –help improving the ontology –get involved with the Irish/Gaelic chapter bianca.pereira@insight-centre.org caoilfhionn.lane@insight-centre.org –edit Wikipedia
  53. 53. DBpedia Tutorial 09.02.2015 http://dbpedia.org54 Further Reading: Website landing page: http://dbpedia.org/About overview over datasets (also info on localized datasets): http://wiki.dbpedia.org/Datasets DBpeda data access oveview: http://wiki.dbpedia.org/OnlineAccess
  54. 54. DBpedia Tutorial 09.02.2015 http://dbpedia.org55 Further Reading: Publications 2007 T: DBpedia: A Nucleus for a Web of Open Data A: Auer, Bizer, Kobilarov, Lehmann,Cyganiak, Ives http://www.cis.upenn.edu/~zives/research/dbpedia.pdf 2009 T: DBpedia - A Crystallization Point for the Web of Data A: Bizer, Lehmann, Kobilarov, Auer, Becker, Cyganiak, Hellmann http://jens-lehmann.org/files/2009/dbpedia_jws.pdf 2012 T: DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia A: Lehmann, Isele, Jkob , Jentzsch, Kontokostas,Hellmann, Morsey, van Kleef, Auer, Bizer http://www.semantic-web-journal.net/system/files/swj499.pdf
  55. 55. DBpedia Tutorial 09.02.2015 http://dbpedia.org56 Further Reading: W3C Specs RDF: http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/ RDFS: http://www.w3.org/TR/rdf-schema/ OWL 2: http://www.w3.org/TR/owl2-overview/ SPARQL Query Language: http://www.w3.org/TR/sparql11-query/ SPARQL Protocol: http://www.w3.org/TR/2013/REC-sparql11-protocol- 20130321/
  56. 56. DBpedia Tutorial 09.02.2015 http://dbpedia.org57 Further Reading: Browsing DBpedia VAD: http://dbpedia.org/page/DBpedia DBpedia Facets: http://dbpedia.org/fct/ new DBpedia frontend: http://de.dbpedia.org/page/DBpedia (get an impression to the German DBpedia version) https://github.com/lukovnikov/ldviewer (source code) Context platform: http://context.aksw.org/app/hub.php?corpus=6&action=facets (online demo to browse LOD2 Blog) http://context.aksw.org/app/ (project home)
  57. 57. DBpedia Tutorial 09.02.2015 http://dbpedia.org58 Further Reading: SPARQL DBpedia Snorql SPARQL interface (DBP-en): http://dbpedia.org/snorql/ John Cleese Query in Snorql: http://bit.ly/1zog24A EU Funding vs. Country GDB: https://gist.github.com/neradis/0ca7a41c408280c0d69e Flint SPARQL Editor: http://openuplabs.tso.co.uk/demos/sparqleditor (online demo) https://github.com/TSO-Openup/FlintSparqlEditor (source code, checkout and run)
  58. 58. DBpedia Tutorial 09.02.2015 http://dbpedia.org59 Further Reading: pupular RDF/OWL frameworks Sesame (Java): http://rdf4j.org/ Jena (Java): http://jena.apache.org/index.html RDFLib (Python): http://code.google.com/p/rdflib/
  59. 59. DBpedia Tutorial 09.02.2015 http://dbpedia.org60 Goodbye! Thank you for you interest in DBpedia!

×