LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair


Published on

State of Play presentation at the LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair by Jens Lehmann of ULEI.

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair

  1. 1. Creating Knowledge out of Interlinked Data LOD2 Plenary Vienna – 2012/03/21 – Page 1 http://lod2.eu Plenary Vienna – State-of-Play WP3: Knowledge Base Creation, Enrichment and Repair Jens Lehmann AKSW, Universität LeipzigLOD2 Presentation . 02.09.2010 . Page http://lod2.eu
  2. 2. LOD2 Plenary Vienna – 2012/03/21 – Page 2 http://lod2.eu WP3 High Level Objectives Inc8 Tasks, 9 Partners, 14 Deliverables, 20+ tools ons ist enc→ lightweight integration via LOD2 stack Modelling y Problems Repair Mutual Refinement Cycle (with Refactoring optional Extraction phase) Structured Semi- Property- structured Axioms Extraction Definitions Enrichment Un- Data structured Summary
  3. 3. LOD2 Plenary Vienna – 2012/03/21 – Page 3 http://lod2.eu WP3 Task 3.1● Provenance-Aware Extraction of Linked Data from Existing Structured Formats● Partners: FUB, ULEI, OpenLink, Exalead● Development and Support of RDB2RDF mapping standards (R2RML)● Re-Use of existing tools/frameworks: ● D2R (FUB) ● Triplify (ULEI) ● Virtuoso Sponger and RDF Views (OpenLink)● New Tool: Sparqlify● Deliverables: State-of-the Art Report (M6), D2R release (M20), Triplify release (M20)
  4. 4. LOD2 Plenary Vienna – 2012/03/21 – Page 4 http://lod2.eu WP3 Task 3.1 – Progress / Planned✔ D3.1.1: state of the art in knowledge extraction from structured sources ● 200+ tools collected at http://data.lod2.eu/2011/tools/ ● http://en.wikipedia.org/wiki/Knowledge_extraction (2000 views/month)✔ D3.1.2: D2R Server MetaData Extension (allows adding licencing and provenance output to D2R server)● D3.1.3: Sparqlify: ● 1-1 SPARQL-to-SQL-Rewriting ● DB-Planner has Full Control ● Easy to Configure ● Tested on LinkedGeoData ● Release in 1-2 months D2R Architecture
  5. 5. LOD2 Plenary Vienna – 2012/03/21 – Page 5 http://lod2.eu WP3 Task 3.2• Provenance-Aware Extraction of Linked Data from Unstructured and Semi-Structured Sources (plain text, HTML, wikis, blogs)• Partners: FUB, ULEI, OpenLink, Exalead, Zemanta, KAIST, UEP• NLP techniques / text understanding• Draws on existing tools: • Stanford Parser, ASV toolkit, Ontos API (all external), Zemanta • DBpedia (FUB, ULEI, OpenLink)• Deliverables: NLP2RDF release (M8), DBpedia Live (M8), DBpedia Framework Extension (M27)• Other: DBpedia Spotlight Release, DBpedia I18n committee founded
  6. 6. LOD2 Plenary Vienna – 2012/03/21 – Page 6 http://lod2.eu WP3 Task 3.2 – NLP2RDF + NIF• NLP Interchange Format (NIF) is an RDF/OWL-based format to combine and chain NLP tools• NLP2RDF (http://nlp2rdf.org) is a project providing: • Documentation and tutorials • Reference implementations of NIF • Collaboration and mailing lists• Roadmap of NIF in LOD2: • Integration of Zemanta API (Task 3.7) • BoA – tool for automated hypernym discovery and entity classification to ad hoc classes, using Wikipedia and Wordnet • Ex – tool for information extraction from heterogeneous web resources • MultiLingual Extraction (Task 3.6)
  7. 7. LOD2 Plenary Vienna – 2012/03/21 – Page 7 http://lod2.eu WP3 Task 3.2 – NLP2RDF + NIF
  8. 8. LOD2 Plenary Vienna – 2012/03/21 – Page 8 http://lod2.eu WP3 Task 3.2 – DBpedia Live Motivation• Wikipedia 7th most popular website (according to alexa.com)• Covers a variety of disciplines• DBpedia (from FUB, ULEI, OpenLink): ☺ Extracts structured data from Wikipedia ☺ Interlinks with other knowledge bases ☺ Can answer complex queries ☺ Is used in many applications / companies Θ Requires manual effort to create a release Θ Data is often several months old DBpedia Live Synchronisation with Wikipedia
  9. 9. LOD2 Plenary Vienna – 2012/03/21 – Page 9 http://lod2.eu WP3 Task 3.2 – DBpedia Live Architecture• Works on live stream of updates provided by Wikipedia• Handles live changes of ontology and mappings (explained later)• Provides public endpoint at http://live.dbpedia.org/sparql and mirrors
  10. 10. LOD2 Plenary Vienna – 2012/03/21 – Page 10 http://lod2.eu WP3 Task 3.3• Knowledge Base Schema Enrichment• Partners: ULEI• Suggests OWL Schema Axioms to Knowlege Base Maintainers (Definitions, Super Classes, Disjointness, Domain, Range, …)• Extends DL-Learner (ULEI) machine learning framework• Tight coupling of Tasks 3.3 (Enrichment) and 3.4 (Repair): • Both will be integrated in the ORE tool • Iteration of Repair and Enrichment to improve quality• Adapts existing approaches to work with very large Linked Data knowledge bases (incl. SPARQL support)
  11. 11. LOD2 Plenary Vienna – 2012/03/21 – Page 11 http://lod2.eu WP3 Task 3.3: Learning Schema AxiomsDeliverables: D3.3.1 Enrichment Algorithms (M12), D3.3.2 Enrichment UserInterfaces (M24), D3.3.3 Evaluation (M36)
  12. 12. LOD2 Plenary Vienna – 2012/03/21 – Page 12 http://lod2.eu WP3 Task 3.4• Knowledge Base Repair• Partners: ULEI, NUIG• Fix inconsistent knowledge bases, unsatisfiable classes, (some) modelling errors, (some) reasoning performance problems• Draws on a lot of existing work in ontology debugging and extends it to knowledge bases in the LOD cloud• Related to Task 4.3 (Linked Data Quality Assessment)• Result: ORE tool (together with Task 3.3)• Deliverables: Report on Modelling Errors/Problems (M6), 1st ORE Release (M28), 2nd ORE Release (M40)
  13. 13. LOD2 Plenary Vienna – 2012/03/21 – Page 13 http://lod2.eu WP3 Task 3.4 - Progress• ORE (ontology repair and enrichment) tool started: • Code: http://code.google.com/p/ore • General Information: http://ore-tool.net • Web Prototype: http://web.ore-tool.net (preliminary) • Included in LOD2 stack✔ Deliverable 3.4.1 (State of the Art on Modelling Problems) completed: • Comprehensive overview on modelling problems, syntactical and semantical errors • One of the conclusions: many tools available but scalability still an issue • ORE will focus on fragment extraction, incremental reasoning, high reuse of existing tools and libraries• work on algorithms for supporting debugging SPARQL endpoints
  14. 14. LOD2 Plenary Vienna – 2012/03/21 – Page 14 http://lod2.eu WP3 Task 3.4a• Knowledge base repair/refactoring based on naming/content patterns• Partners: UEP• Started February 2012 as extension to T3.4 (Knowledge base repair from logical point of view)• Long-term goal is to bring the outcomes of the state-of-the-art ontology patterns research to the LOD2 Stack• Result: a component for ORE allowing to detect taxonomic naming → discussion in breakout session• (Anti-)patterns and suggested repairs will be developed until M24• long term, prominent linked data vocabularies will be analyzed and mapped on ontology (content) design patterns• Will lead to improvement in ontology repair and enrichment (WP3) as well as in ontology matching & instance linking (WP4)
  15. 15. LOD2 Plenary Vienna – 2012/03/21 – Page 15 http://lod2.eu WP3 Task 3.5• Web Linkage Validator• Partners: NUIG, Exalead• companion tool for unsupervised interlinking of data on the Semantic Web• Dataset owners or authors utilise the tool by submitting their data for internal and external linkage analysis• analytics will be used to perform recommendations and suggestions for ways in which they may improve the linkage of their data, e.g. suggest to add further properties, more specific property values, better specify classes/properties• Deliverables: Initial Release (M18), LOD2 Stack Component Release (M28)
  16. 16. LOD2 Plenary Vienna – 2012/03/21 – Page 16 http://lod2.eu WP3 Task 3.5
  17. 17. LOD2 Plenary Vienna – 2012/03/21 – Page 17 http://lod2.eu WP3 Task 3.6• Multi-Lingual Provenance-Aware Linked Data Extraction• Partners: IMP• Information retrieval: find documents using appropriate keywords (e.g. search engines: Google, Yahoo!, Baidu, Bing, etc.)• Functionality not supported: find documents using a natural language document instead of using keywords• Possible applications: Patent search (patent attorneys); Case search (lawyers); Anamnesis search (physicians); Paper search (researchers)• The corresponding NLP technique will enable: • Processing of documents in multiple languages • Extraction of a vocabulary of concepts (words, phrases) specific for each class of documents • Representation of domain specific vocabularies and links to related documents (based on NIF format)
  18. 18. LOD2 Plenary Vienna – 2012/03/21 – Page 18 http://lod2.eu WP3 Task 3.6• Re-Uses many LOD2 stack components• NLP technique for the structured representation of natural language documents: ✔ Representation of natural language documents in structured form (words, phrases, sentences, paragraphs, documents) • Multi-lingual support based on UTF-8 format – ongoing activity • Creation of domain specific vocabularies based on classified documents – not started yet • Searching for similar documents based on domain specific concepts found in given document – not started yet • Sorting found documents according to similarity – not started yet• Deliverables: D3.6 Multi-Lingual Support for Linked Data Extraction (M30)
  19. 19. LOD2 Plenary Vienna – 2012/03/21 – Page 19 http://lod2.eu WP3 Task 3.7• Web Scale Link and Text Mining• Partners: ZEM• Gathering shallow semantic data about new entities – new knowledge about popular topics (not yet curated in LOD)• Contributes to WP3 by creating new LOD datasets • Extraction of new entities from blogs worldwide • Creation of lexicons for new entity types to be used in named entity extraction engines• Integration of new LOD datasets in Zemanta recommendation engine • Gain market advantage • Improved recommendations for bloggers and Zemanta free API users• Deliverables: D3.7.1 Shallow information extraction from blogs (M20), D3.7.2 Improved entity recommender engine (M36)
  20. 20. LOD2 Plenary Vienna – 2012/03/21 – Page 20 http://lod2.euThanks for your attention! Project: http://lod2.eu Organisation: http://uni-leipzig.de, http://aksw.org Presenter: http://jens-lehmann.org