Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DBpedia+ / DBpedia meeting in Dublin

1,285 views

Published on

The vision for the new DBpedia+ dataset through the ALIGNED project

Published in: Technology

DBpedia+ / DBpedia meeting in Dublin

  1. 1. DBpedia (in) ALIGNED From DBpedia to DBpedia+ Dimitris Kontokostas AKSW Group, Leipzig University DBpedia Association
  2. 2. February 9th 2015 / 3rd DBpedia Meeting in Dublin DBpedia @ 2007
  3. 3. February 9th 2015 / 3rd DBpedia Meeting in Dublin DBpedia @ 2008
  4. 4. February 9th 2015 / 3rd DBpedia Meeting in Dublin DBpedia @ 2009
  5. 5. February 9th 2015 / 3rd DBpedia Meeting in Dublin DBpedia @ 2010
  6. 6. February 9th 2015 / 3rd DBpedia Meeting in Dublin DBpedia @ 2011
  7. 7. February 9th 2015 / 3rd DBpedia Meeting in Dublin DBpedia @ 2014
  8. 8. February 9th 2015 / 3rd DBpedia Meeting in Dublin RDF Stats (2014 release) 3B facts (only 580M facts in English) ● DBpedia En: 4.58M Things / 4.22M typed ● 125 Localized versions: 38.3M Things ● 50M links to other datasets Many more stats @: dbpedia.org/Datasets2014/DatasetStatistics
  9. 9. February 9th 2015 / 3rd DBpedia Meeting in Dublin Dev Stats DBpedia Information Extraction Framework ● Java/Scala based framework ○ Old PHP-based framework ● 5.1K Commits ● 52K lines of code (100K/1M AT) ● 71 total contributors Many more stats @: www.openhub.net/p/dbpedia
  10. 10. February 9th 2015 / 3rd DBpedia Meeting in Dublin Aligning Problem Lot’s of code & a lot more data ● Wikipedia evolves over time ○ Infobox Templates change, merge, deleted ○ New formatting templates ○ Structural differences per language edition ● Code should adapt to all the changes ○ hard at this (data) scale
  11. 11. February 9th 2015 / 3rd DBpedia Meeting in Dublin Unit-testing to the rescue? ● Software & Data testing ● Straightforward for software (since 70’s) ● Preliminary for (RDF) data ○ RDFUnit, SPIN, OWL, PelletICV, ShEx,... ■ W3C Data Shapes WG Data testing++ ● Generation: manual, (Semi)automatic, ... ● Linking: data & software tests
  12. 12. February 9th 2015 / 3rd DBpedia Meeting in Dublin RDFUnit http://rdfunit.aksw.org
  13. 13. February 9th 2015 / 3rd DBpedia Meeting in Dublin UT feedback loop Data verification and feedback at different data extraction stages ● Three main points of failure in DBpedia: ○ Code ○ Infobox mappings ○ Wikipedia (!!!)
  14. 14. February 9th 2015 / 3rd DBpedia Meeting in Dublin DBpedia+ Workflow
  15. 15. February 9th 2015 / 3rd DBpedia Meeting in Dublin Additional feedback We are looking into: ● Reporting ● Statistics ● Inter-Wikipedia cross-checking ● ML techniques
  16. 16. February 9th 2015 / 3rd DBpedia Meeting in Dublin Thank you & Questions? ALIGNED Aligned, Quality-centric Software and Data Engineering

×