8. February 9th 2015 / 3rd DBpedia Meeting in Dublin
RDF Stats (2014 release)
3B facts (only 580M facts in English)
● DBpedia En: 4.58M Things / 4.22M typed
● 125 Localized versions: 38.3M Things
● 50M links to other datasets
Many more stats @:
dbpedia.org/Datasets2014/DatasetStatistics
9. February 9th 2015 / 3rd DBpedia Meeting in Dublin
Dev Stats
DBpedia Information Extraction Framework
● Java/Scala based framework
○ Old PHP-based framework
● 5.1K Commits
● 52K lines of code (100K/1M AT)
● 71 total contributors
Many more stats @:
www.openhub.net/p/dbpedia
10. February 9th 2015 / 3rd DBpedia Meeting in Dublin
Aligning Problem
Lot’s of code & a lot more data
● Wikipedia evolves over time
○ Infobox Templates change, merge, deleted
○ New formatting templates
○ Structural differences per language edition
● Code should adapt to all the changes
○ hard at this (data) scale
11. February 9th 2015 / 3rd DBpedia Meeting in Dublin
Unit-testing to the rescue?
● Software & Data testing
● Straightforward for software (since 70’s)
● Preliminary for (RDF) data
○ RDFUnit, SPIN, OWL, PelletICV, ShEx,...
■ W3C Data Shapes WG
Data testing++
● Generation: manual, (Semi)automatic, ...
● Linking: data & software tests
12. February 9th 2015 / 3rd DBpedia Meeting in Dublin
RDFUnit
http://rdfunit.aksw.org
13. February 9th 2015 / 3rd DBpedia Meeting in Dublin
UT feedback loop
Data verification and feedback at different
data extraction stages
● Three main points of failure in DBpedia:
○ Code
○ Infobox mappings
○ Wikipedia (!!!)