1. Creating Knowledge out of Interlinked Data
LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 1 http://lod2.eu
LOD2 Plenary Meeting 2012
Vienna
WP4: Reuse, Interlinking and Knowledge
Fusion
Robert Isele
Freie Universität Berlin
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
2. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 2 http://lod2.eu
WP4 Goals
Translate heterogeneous data from the Web of Linked Data
into a clean local target representation
Provide open-source software components for:
– Link Generation
– Vocabulary Mapping
– Linked Data quality assessment
– Linked Data Fusion
3. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 3 http://lod2.eu
WP4 in the LOD Stack
4. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 4 http://lod2.eu
Task 4.1: Semi-Automatic Data Interlinking
Partners: ULEI, NUIG, FUB, KAIST
Goals:
– Develop a Linking Assist, which guides the knowledge
engineer through the linking process (FUB, ULEI).
– (New) Provide a platform for automatic linking with Korean,
Chinese, Japanese RDF resources (KAIST).
5. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 5 http://lod2.eu
Task 4.1: Progress
First Linking Assist/Silk Workbench (D4.1.1) has been
delivered in February 2012
– Define Data Sources (e.g. SPARQL endpoint, RDF dump)
– Specify the types of resources which should be interlinked
– Build linkage rules supported by maching learning
– Evaluate the quality of linkage rules
Preliminary work on Korean Resource Linking Assist
– Transformed test datasets into RDF.
– This data will be an input to Korean resource linking module.
– Finished preliminary design of the Korean resource linking
module
6. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 6 http://lod2.eu
Task 4.1: Improving Silk Workbench (1/2)
Use Active learning to reduce the manual effort and
required expertise to interlink data sources
– Automating the generation of a linkage rule.
– The user only confirms or declines a set of example links.
7. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 7 http://lod2.eu
Task 4.1: Improving Silk Workbench (2/2)
Improving the usability based on user-feedback
First results for the Y2 review meeting
Final deliverable D4.1.2 (Second Linking Assist Release) in
February 2013
8. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 8 http://lod2.eu
Task 4.2: Data Interlinking Environment
Partners: NUIG
Goals:
– To research and develop LATC well beyond 2012 into 2014
– Interlinking recommendations
– Interaction with data linkage validator from WP3
Progress:
– First version of Data Interlinking Environment (D4.2.1)
submitted in December 2011
– Combines Analytics Graph produced from Sindice data
sources and the Silk Link Discovery Framework
9. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 9 http://lod2.eu
Task 4.2: Silk Workbench Extension
New Sindice datasource for the linking of datasets.
Dataset suggestion based on keywords, classes, and datasets
Autocompletion for data types when executing linking tasks.
A retrieval method for entity properties to also aid in the
execution of linking tasks.
Dataset suggestion
10. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 10 http://lod2.eu
Task 4.3: Linked Data Quality Assessment
Partners: FUB, NUIG, ULEI, SWCG
Goals:
– Research into recent advances in quality assessment of
Linked Data
– Develop design metrics for quality assessment
– Release a Linked Data Quality Assessment Component
11. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 11 http://lod2.eu
Task 4.3 Progress
Survey on the State of the Art in Mapping, Quality Assessment and
Data Fusion (D4.3.1) finished in February 2011
Conceptual Design and Implementation of Metrics (D4.3.2) finished
in February 2012
Released first prototype of Sieve, a Linked Data Quality Assessment
and Fusion framework
– Allows Web data to be filtered according to different data quality
assessment policies
– Provides for fusing Web data according to different conflict
resolution methods.
– http://sieve.wbsg.de
– D4.3.2: Release of the data quality assessment tool (August 2012)
12. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 12 http://lod2.eu
Task 4.4: Schema Mapping Publication and Discovery
Partners: FUB, ULEI, OGL, SWCG, UEP
Goals:
– Specification of the vocabulary mapping publication and
discovery language
– Implementation of the Vocabulary Mapping Component
13. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 13 http://lod2.eu
Task 4.4 Progress
Specification of the Mapping Publication and Discovery Language
(D4.4.1) finished in June 2011
Implementation of the Mapping Publication and Discovery
Framework (D4.4.2 ) finished in February 2012.
– Adapted the R2R Framework based on the use cases in LOD2.
– Conducted various experiments to demonstrate the
performance and scaling behavior for translating data sets
(http://www.assembla.com/spaces/ldif/wiki/Benchmark)
– Implementation published under the terms of the Apache
License
14. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 14 http://lod2.eu
Task 4.4: Future Work
Integration of the Mapping Publication and Discovery
Framework into the LOD2 stack (D4.4.3)
Deadline: February 2013
15. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 15 http://lod2.eu
Task 4.4a: Schema Mapping Robust to Modeling Style
Partners: UEP
Goal: Extend the methods and tools of schema matching
discovery (from the original Task 4.4) by ontology
transformation methods implemented within the
(enhanced) PatOMat framework
Start: March 2012
First deliverable in December 2012
16. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 16 http://lod2.eu
Task 4.5: Linked Data Fusion
Partners: FUB, ULEI
Goal:
– Build a Data Fusion Component which fuses data from
multiple sources
– Fuse multiple entities representing the same real-world object
into a single, consistent and clean representation
First deliverable:
– Initial release of Data Fusion Component (D4.5.1).
– Deadline: 31.08.12
– Integrating the data quality assessment module (Sieve)
developed in Task 4.3 with a data fusion module.
17. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 17 http://lod2.eu
Task 4.5a: Multilingual Linked Data Fusion
Involved: KAIST, ULEI
Goal: Fusion of multilingual datasets
– DBpedia dataset as the pivot multilingual dataset, since it is
extracted from various kinds of languages
– First step: Bilingual fusion between the Korean DBpedia and the
English Dbpedia
– Next: Include other languages such as Chinese and Japanese
First deliverable in February 2013: Korean Data Fusion Assistant
– The component will support Korean data fusion into English LOD
by combining Deliverable 4.5.1 with the fused dataset of English
and Korean DBpedia.
18. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 18 http://lod2.eu
Task 4.6: Tools for Cleansing Entity Data and
Crowdsourcing of Cleansing
Involved: Zemanta
Goals:
– Adapt Google Refine for Linked Open Data based on the
existing Deri Plugin
– Integrate crowdsourcing services such as Amazon Mechanical
Turk for LOD data cleansing.
Progress:
– D 4.6.1 (M18) Release of an LOD-Enabled Version of Google
Refine submitted.
Next deliberable:
– D 4.6.2 (M30) Release of Documentation and Software
Infrastructure for Using GR along with Amazon Mechanical
Turk
19. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 19 http://lod2.eu
WP 4 Summary (M12 - M18)
5 Deliverables submitted in the last 6 months:
ULEI and FUB submitted the First Linking Assist (D4.1.1)
NUIG submitted the first version of the Data Linking
Environment Release (D4.2.1)
FUB finished the Conceptual Design and Implementation of
Quality Assessment Metrics (D 4.3.2)
FUB finished the Implementation of the Mapping Publication
and Discovery Framework (D4.4.2)
Zemanta submitted the first release of the LOD-enabled
version of Google Refine for review (D4.6.1)
20. LOD2 Plenary Meeting Vienna – 2012/03/21 – Page 20 http://lod2.eu
Contact
Address
Freie Universität Berlin
School of Business & Economics
Web-based Systems Group
Garystr. 21
14195 Berlin
Germany
Presenter
Robert Isele
mail@robertisele.com