Advertisement
Advertisement

More Related Content

Advertisement

ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation: the case of DBpedia

  1. Declarative Data Transformations for Linked Data Generation: the case of DBpedia Ben De Meester, Wouter Maroy, Anastasia Dimou, Ruben Verborgh, and Erik Mannens Ghent University – imec – IDLab, Belgium
  2. In loving memory of the Barack Obama examples in Semantic Web conferences
  3. How to create Linked Barack? dbr: Barack_ Obama dbp:name dbo:birthPlace dbp:termStart dbp:birthDate "Barack Obama"@en dbr: Hawaii “20-01-2009” “04-08-1961” …
  4. Linked Barack is based on schema and data
  5. A specific case… Source handle WikiText Schema transformations use custom schema (DBpedia ontology) Data transformations parse manually entered input data https://en.wikipedia.org/wiki/Barack_Obama https://en.wikipedia.org/wiki/Leopold_II_of_Belgium
  6. … needs a specific solution? … select extract transform schema transform data https://github.com/dbpedia/extraction-framework
  7. Data transformations are hard-coded in the DBpedia EF Hard-coded means case specific coupled with the implementation You can’t use the DBpedia EF for other cases use the parsing functions outside the DBpedia EF
  8. Declarative schema transformations are great Use-case independent Decoupled from the implementation
  9. Declarative data transformations makes Linked Data generation 🚀🚀🚀🚀 Declarative schema transformations (i.e., semantic annotation rules) are great, so why not also for data transformations?
  10. Outline The current situation existing approaches disadvantages What we provide our approach implementation
  11. Outline The current situation existing approaches direct mappings | successive steps embedded data transformations| hard-coded disadvantages What we provide our approach implementation
  12. direct mappings | successive steps embedded data transformations| hard-coded From original data to RDF with minimal change e.g., CSVW, JSON(-LD) Restricted: No schema nor data transformations [[Honolulu]], [[Hawaii]], U.S. dbr: Honolulu dbr: Hawaii ?
  13. direct mappings | successive steps embedded data transformations| hard-coded First data, then schema transformations (or vice versa) e.g., R2RML Restricted: depends on underlying system for data transformations e.g., SQL views for R2RML Uncombinable: combine transformations? e.g. , Born should return a date
  14. direct mappings | successive steps embedded data transformations| hard-coded Tool supports limited set of data transformations e.g., OpenRefine Restricted: limited set of data transformations parsing is more than splitting a string or one regular expression Coupled: types of data transformations depend on the tool
  15. direct mappings | successive steps embedded data transformations| hard-coded … select extract transform schema transform data https://github.com/dbpedia/extraction-framework DBpedia EF
  16. select
  17. {{Infobox president |name = Barack Obama |image = President Barack Obama.jpg |office = President of the United States |vicepresident = [[Joe Biden]] |birth_place = [[Honolulu]], [[Hawaii]], U.S. |term_start = January 20, 2009 |term_end = January 20, 2017 |birth_date = {{birth date and age|1961|8|4}} |birth_name = Barack Hussein Obama II … extract
  18. dbr: Barack_ Obama dbp:name dbo:birthPlace dbp:termStart dbp:birthDate [[Honolulu]], [[Hawaii]], U.S. {{birth date and age|1961|8|4}} … transform schema Barack Obama January 20, 2009
  19. dbr: Hawaii dbr: Barack_ Obama dbp:name dbo:birthPlace dbp:termStart dbp:birthDate "Barack Obama"@en dbr: Hawaii “20-01-2009” “04-08-1961” … transform data ……
  20. Hard-coded: disadvantages Coupled: data tranformations only usable in that implementation Case-specific: only for one use case
  21. Outline The current situation existing approaches disadvantages What we provide our approach implementation
  22. Disadvantages of current approaches Restricted Uncombinable Coupled Case-specific
  23. What do we want? Unrestricted data transformations Combinable schema and data transformations Uncoupled with the implementation Case-independent solution
  24. Outline The current situation existing approaches disadvantages What we provide our approach implementation
  25. Aligned declarative schema and declarative data transformations Aligned combine data and schema transformations Declarative data transformations no restriction re-use outside generation framework Aligned declaratives not use case, nor implementation specific
  26. Outline The current situation existing approaches disadvantages What we provide our approach implementation declaratives | tools
  27. Outline The current situation existing approaches disadvantages What we provide our approach implementation declaratives | tools
  28. Declaratives Declarative schema transformations source agnostic, schema agnostic RDF Mapping Language (RML) | http://RML.io Declarative data transformations implementation agnostic Function Ontology (FnO) | http://FnO.io Aligned FunctionMap / functionValue Connection between RML and FnO
  29. RML mapping source subject dbp:birthDate birth_date WikiText dbr:{wiki_label} predicate reference Person_ Mapping birthDate_ Mapping dbr: Barack_ Obama dbp:birthDate {{birth date and age|1961|8|4}}
  30. FnO mapping executes inputString DBpedia_ date_parser birth_date DBP_Parsing_ Function “04-08-1961”
  31. Separate RML and FnO source subject dbp:birthDate birth_date WikiText dbr:{wiki_label} predicate reference Person_ Mapping birthDate_ Mapping executes inputString DBpedia_ date_parser birth_date DBP_Parsing_ Function
  32. Aligned RML and FnO source subject dbp:birthDate executes inputString WikiText dbr:{wiki_label} DBpedia_ date_parser birth_date predicate DBP_Parsing_ Function Function Map Person_ Mapping birthDate_ Mapping
  33. Aligned RML and FnO source subject dbp:birthDate executes inputString WikiText dbr:{wiki_label} DBpedia_ date_parser birth_date predicate DBP_Parsing_ Function Function Map Person_ Mapping birthDate_ Mapping
  34. Outline The current situation existing approaches disadvantages What we provide our approach implementation declaratives | tools
  35. Practical - Implementation RMLProcessor include WikiText extractor support FunctionMap / functionValue connect to FunctionProcessor FunctionProcessor dynamically load and call function External DBpedia Parsing functions
  36. … RML_FnO-doc Function Processor … select extract transform schema + transform data
  37. … RML_FnO-doc Function Processor … select extract transform schema + transform data
  38. … RML_FnO-doc Function Processor … select extract transform schema + transform data
  39. … RML_FnO-doc Function Processor … select extract transform schema + transform data
  40. … RML_FnO-doc Function Processor … select extract transform schema + transform data
  41. … RML_FnO-doc Function Processor … select extract transform schema + transform data
  42. … RML_FnO-doc Function Processor … select extract transform schema + transform data
  43. … RML_FnO-doc Function Processor … select extract transform schema + transform data
  44. Outline The current situation existing approaches disadvantages What we provide our approach implementation
  45. Our approach generates the same DBpedia data, and: You don’t depend on the implementation You don’t depend on the use case DBpedia parsing functions can be reused elsewhere Data transformations can use existing or new external libraries
  46. See it in action! Booth 49 https://fnoio.github.io/dbpedia-demo/ https://github.com/RMLio/RML-Mapper/tree/extension-fno https://github.com/FnOio/function-processor-java https://github.com/FnOio/dbpedia-parsing-functions-scala

Editor's Notes

  1. I’d like to give a small warning here.. Even though numerous people told us not to…
  2. I am going to use barack obama. It’s probably one of the last times we can use our mascot
  3. Barack’s data in DBpedia…
  4. schema is very specific (so are the schema transformations) data tfs are very specific
  5. Is mostly generated using the DBpedia EF (specifically for the infoboxes): relevant pages are selected, the infoboxes are extracted, the values are put in a certain schema (so schema tfs), finally, the values themselves are transformed (so data transformations).
  6. However, they are embedded in the EF. That’s a pity, because dbpedia is so widely used etc. The EF has been tested on 1000s of wikipages, but you can’t use it for other use cases, and you can’t use the parsing functions outside the DBpedia EF. as we will see later, no current solutions can cope with more advanced data transformations
  7. More clear
  8. So, declarative schema transformations exist. They make generating linked data possible without depending on implementation or use case. Great. However, due to high data tfs demands, current generation approaches cannot be used. What about declarative data tfs?! That would be awesome!
  9. split?
  10. Is mostly generated using the DBpedia EF (specifically for the infoboxes): relevant pages are selected, the infoboxes are extracted, the values are put in a certain schema (so schema tfs), finally, the values themselves are transformed (so data transformations).
  11. so, you have all pages
  12. you select the ones with relevant infoboxes
  13. The infoboxes are extracted (just using following as simplified examples)
  14. use custom mapping doc for the schema tfs
  15. Then, the data is parsed to get ‘good data’. This is a very important part of DBpedia, as the data values are entered in wikipedia (so manually), the input data can be very diverse, typo’s different ways of writing things. A large deal of effort has been done into creating these parsing functions, and they are really good.
  16. not the only solution
  17. instead of use raw value directly…
  18. use value after being parsed by underlying function
  19. use value after being parsed by underlying function
  20. support wikitext: was easy (RMLProcessor is made for that, natural thing to do)
  21. [ ] 2 zinnen recap
Advertisement