Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Talking knowledge-graphs

15 views

Published on

Nürnberg 2018

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Talking knowledge-graphs

  1. 1. Talking Knowledge Graphs Dieter Fensel with the help of the entire MindLab team STI Innsbruck, University of Innsbruck, Austria May 17, 2019
  2. 2. Prerequisite MindLab: • MindLab is a self-funded cooperative research project with the objective to develop methods and software tools for modeling and implementing scalability for knowledge graphs. • Partners 2
  3. 3. Talking Knowledge Graphs 1. Motivation 2. The Grand Challenges 3. The Crux Of The Matter 4. The Proof Of The Pudding Is In The Eating 5. Key Takeaway 3
  4. 4. 1. Motivation • Text/Voice becomes mainstream • Use cases are still basic • Knowledge is Power!Without knowledge -> no understanding of users needs and goals Please, book a table in a restaurant with roast pork having reasonable prices in Mayrhofen for tonight Restaurant in Mayrhofen? Has roast pork? price? Image: ©amazon.com Sorry, I don’t know how to help you! 4
  5. 5. 1. Motivation Please, book a table in a restaurant with roast pork having reasonable prices in Mayrhofen for tonight Image: ©amazon.com KG action: TableReservation type: Restaurant, offers: Roast Pork Location: Mayrhofen Price: price_level generated query: ?- tableReservationAction(), type(Restaurant), offers(RoastPork). Predefined rules: ● tableReservationAction: book a table in a given Restaurant ● type: return all elements of type <type> ● offers: return all elements that offer <offer> ● ... Query Generation NLG Extracted Knowledge Generated Language output Knowledge Graph contains deep, accurate, and up-to- date knowledge about leasurement services in Tyrol. 5
  6. 6. 2. The Grand Challenges User 1. understand Intent + Parameters 2. map Query 3. query Knowledge Graph 4. Natural Language Generation 6
  7. 7. 2. The Grand Challenges: Unterstand NLU • Voice/Text recognition already quite good • However require significant manual labor Manual work • Design intents based on schema of Knowledge Graph • Define utterances (example questions) per intent • Mark parameters that should be extracted from utterances Automation • Entity detection: Push entities from Knowledge Graph • Detect unanswered questions • Use Knowledge Graph to update/extend NLU: • create utterances • supervised-learning: extend utterances with unanswered questions User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. NLU Knowledge 7
  8. 8. 2. The Grand Challenges: Query Generation • Basis: detected intent & extracted parameters during NLU • Map extracted information (intent & parameters) on predefined rules • Query: Combination of rules on SPARQL queries • Additional restriction rules • Define a view on a relevant subgraph of the Knowledge Graphs  A Chatbots may not have access to the whole Knowledge Graph (prevent frillions, inconsistencies, and implements access right restrictions) User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. Generated query Intent (with parameters) Query generation Predefined rules 8
  9. 9. 2. The Grand Challenges Querying the Knowledge Graph • Query is a combination of predefined rules accessing the knowledge through SPARQL • Knowledge Graph must provide: • Large volumes of data • Integration from heterogeneous resources • Accessing distributed sources • Providing dynamic updates (temperature, etc.) • Defining sub graphs • Curated in regard to inconsistencies and incompleteness User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. 9
  10. 10. 2. The Grand Challenges Natural Language Generation Manual work • Define templates based on • structure of data • information that should be given to the user Automatic • Generate • templates out of the Knowledge Graph • textual answers from the Knowledge Graph • follow up questions to run dialogs User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. 10
  11. 11. 3. The Crux Of The Matter • The quality of the Intelligent Assistants depends directly on the quality of the Knowledge Graph • Problem: “Garbage in Garbage out” • Requirements for the Knowledge Graph: • well structured (using an ontology - schema.org) • accurate information (correctness) • large and detailed coverage (completeness) • Timeliness of knowledge ==> Knowledge Graph Lifecycle 11
  12. 12. Knowledge Creation Knowledge Hosting Knowledge Cleaning Knowledge Enrichment Knowledge Curation Knowledge Deployment Knowledge Assesment 3. The Crux Of The Matter: Process Model 12
  13. 13. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13
  14. 14. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  15. 15. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  16. 16. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  17. 17. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  18. 18. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  19. 19. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  20. 20. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  21. 21. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  22. 22. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  23. 23. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  24. 24. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 2 (our dreams)
  25. 25. 3. The Crux Of The Matter Knowledge Generation Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 14
  26. 26. 3. The Crux Of The Matter Knowledge Generation • https://www.schema.org/ • Started in 2011 by Bing, Google,Yahoo!, andYandex to annotate websites. • Has become de facto standard. • We use it for the web site channel as well as for all other channels as an reference model for our semantic annotations. • However, we use value restriction not as inference mechanism but as integrity constraint. • We define domain specific extensions (that also restrict the genericity of entire schema.org). Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 15
  27. 27. 3. The Crux Of The Matter Knowledge Generation • The use of semantic annotations has experienced a tremendous surge in activity since the introduction of schema.org. • Schema.org was introduced with 297 classes and 187 relations, • which over have grown to 598 types, 862 properties, and 114 enumeration values. • The provided corpus of • types (e.g. LocalBusiness, SkiResort, Restaurant), • properties (e.g. name, description, address), • range restrictions (e.g. Text, URL, PostalAddress), • and enumeration values (e.g. DayOfWeek, EventStatusType, ItemAvailability) covers large numbers of different domains, including the tourism domain. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 16
  28. 28. 3. The Crux Of The Matter Knowledge Generation Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 17
  29. 29. 3. The Crux Of The Matter Knowledge Generation • Domain Specifications: • restrict generality and • extend domain-specifity of schema.org • Are based on Shacl • https://schema-tourism.sti2.org/ Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping Schema.org Domain Domain Specification 18
  30. 30. 3. The Crux Of The Matter Knowledge Generation Our Methodology: • the bottom-up part, which describes the steps of the initial annotation process; • the domain specification modeling; and • the top-down part, which applies the constructed models. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 19
  31. 31. 3. The Crux Of The Matter Knowledge Generation Manual Annotation Editor Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 20
  32. 32. 3. The Crux Of The Matter Knowledge Generation • Semi-automatic • Annotation Editor suggests mappings/extracted information • e.g. extract information from web pages (by HTML tags). • Use partial NLU to find similarities of the content and schema.org vocabulary. • Manual adaptions needed to define and to evaluate. • Instance of the general issues of wrapper generation. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 21
  33. 33. 3. The Crux Of The Matter Knowledge Generation • Mapping (more than 95% of the story) • integrate large and fast changing data sets • map different formats to the ontology used in our Knowledge Graph • Various frameworks: XLWrap, Mapping Master (M2), a generic XMLtoRDF tool providing a mapping document (XML document) that has a link between an XML Schema and an OWL ontology, Tripliser, GRDDL, R2RML, RML, ... • We developed a customization of RML, called RocketRML. • The semantify.it platform features a wrapper API where these mappings can be stored and applied to corresponding data sources. • The wrapper translates the data according to the mappings and stores it as JSON-LD in a MongoDB. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 22
  34. 34. 3. The Crux Of The Matter Knowledge Generation Automatic extraction of knowledge from text representations and web pages • Tasks • named entity recognition, • concept mining, text mining, • relation detection, … • Methods • Information Extraction • Natural Language Processing (NLP) • Machine Learning (ML) • Systems: • GATE (text analysis & language processing) • OpenNLP (supports most common NLP tasks) • RapidMine (data preparation, machine learning, deep learning, text mining, predictive analysis) Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 23
  35. 35. 3. The Crux Of The Matter Knowledge Generation Evaluation of semantic annotations: • The semantify.it validator is a web-tool that offers the possibility to validate schema.org annotations that are scrapped from websites. • Verification: The annotations are checked against plain schema.org and against domain specifications • Validation : The annotations are checked whether they accurately describe of the content of the web site. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 23
  36. 36. 3. The Crux Of The Matter Knowledge Generation Evaluation of semantic annotations: • Notice we take the content of the web site as Golden Standard. • We do NOT evaluate the accuracy of that content in regard to the „real“ world. • We check whether a phone number confirms to the formal constraints. • We do not make robocalls to hotels to check whether the „right“ hotel pick up the phone. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 23
  37. 37. 3. The Crux Of The Matter Knowledge Generation Evaluation Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 23
  38. 38. 3. The Crux Of The Matter Knowledge Hosting Semantify.it1): A platform for creating, hosting, validating, verifying, and publishing schema.org annotated data • annotation of static data based on schema.org templates  Domain Specifications2) • annotation of dynamic data based on RML mappings RocketRML3) 1) https://semantify.it 2) http://ds.sti2.org 3) https://github.com/semantifyit/RocketRML 24 Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping
  39. 39. 3. The Crux Of The Matter Knowledge Hosting Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Annotation - Tool (e.g. semantify.it) Document store (e.g. MongoDB) Graph database (e.g. GraphDB) Hosting ... Semabtic Web Annotations 25 Knowledge Graphs
  40. 40. 3. The Crux Of The Matter Knowledge Hosting • Semantically annotated date can be serialized to JSON-LD • storage in document store MongoDB • native JSON storage • well integrated in current state of the art software with NodeJS • performant search, through indexing • not hardware intensive no native RDF querying with SPARQL Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment 26
  41. 41. 3. The Crux Of The Matter Knowledge Hosting • Native storage of semantically annotated data • RDF store: GraphDB • very powerful CRUD operations • named graphs for versioning • full implementation of SPARQL • powerful reasoning over big data sets no web frameworks available • very hardware intensive Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment 27
  42. 42. 3. The Crux Of The Matter Knowledge Curation • We defined a simple KR formalism formalizing essentials of schema.org • Tbox: isA statements of types, domain and range definitions for properties (using them globally or locally) • Abox: isElementOf(I,t) statements, Property-Value Statements p(i1,i2), and sameAs(i1,i2) statements • Enables a formal definition of the knowledge curation task (assessment, cleaning, and enrichment). Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 28
  43. 43. 3. The Crux Of The Matter Knowledge Assesment • Knowledge Assessment describes and defines the process of assessing the quality of a Knowledge Graph. • The goal is to measure the usefulness of a Knowledge Graph. • Evaluation • Overall process to determine the quality of a Knowledge Graph. • Select quality dimensions and metrics (see literature on data quality). • Evaluate representative subsets accordingly. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 29
  44. 44. 3. The Crux Of The Matter Knowledge Assesment • Correctness • Identify the amount of wrong assertions • Completeness • Identify missing assertion sets • Furthers accessibility, accuracy, appropriate amount, believability, completeness, concise representation, consistent representation, cost-effectiveness, easy of manipulating, easy of operation, easy of understanding, flexibility, free-of-error, interpretability, objectivity, relevancy, reputation, security, timeliness, traceability, understandability, value-added, and variety Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 30
  45. 45. 3. The Crux Of The Matter Knowledge Assesment [Paulheim et al., 2019] identify the following subtasks: • specifying datasets and Knowledge Graphs, • specifying the evaluation protocol, • specifying the evaluation metrics, • specifying the task for task-specific evaluation, • and defining the setting in terms of intristic vs. task-baed, and automatic versus human- centric evaluation, • as well as the need to keep the results reproducible. H. Paulheim, M. Sabon, M. Choches, and W. Beck: Evaluation of Knowledge Graphs. In P. A. Bonatti, S. Decker, A. Polleres, and V. Presutti: Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web, Dagstuhl Reports, 8(9):29-111, 2019. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 31
  46. 46. 3. The Crux Of The Matter Knowledge Assesment Methodologies • Total Data Quality Management (TDQM) [Wang, 1998] and Data Quality Assessment [Pipino et al., 2002] allow identifying important quality dimension and their requirements from various perspectives. • Other methodologies already defined quality metrics that allow a semi-automatic assessment based on data integrity constraints. Those are for example User-driven assessment [Zaveri et al., 2013], Test-driven assessment [Kontokostas et al., 2014] and a manual assessment based on crowd's experts (Crowdsourcing-driven assessment [Acosta et al., 2013]). • Besides that, there are quality assessment approaches which use statistical distribution for measuring the correctness of statements [Paulheim & Bizer, 2014], SPARQL queries for the identification of functional dependency violations and missing values [Fürber & Hepp, 2010a] [Fürber & Hepp, 2010b]. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 32
  47. 47. 3. The Crux Of The Matter Knowledge Assesment Tools and Methods: • LINK-QA • using network metrics • Luzzu (Linked Open Datasets) • thirty data quality metrics based on Dataset Quality Ontology. • Sieve • flexibly expressing quality assessment methods • fusion methods • SWIQA (Semantic Web Information Quality Assessment Framework) • data quality rules & quality scores for identifying wrong data • Validata • online tool for testing/validating RDF data against ShEx-schemas Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 33
  48. 48. 3. The Crux Of The Matter Knowledge Assesment Sleve: • Sieve for Data Quality Assessment [Mendes et al., 2012] is a framework which consist of two modules: • a Quality Assessment module and • a Data Fusion module • The Quality Assessment Module involves four steps: 1. Data Quality Indicator allows to define an aspect of a data set that may demonstrate the suitability of it for intended use. For example, meta-information about the creation of a data set, information about the provider, or ratings provided by the consumers. 2. Scoring Functions define the assessment of the quality indicator based on its quality dimension. Scoring functions range from simple comparisons, over set functions, aggregation functions, to more complex statistical functions, text-analysis, or network analysis methods. 3. Assessment Metric calculates the assessment score based on indicators and scoring functions. 4. Aggregate Metric allows users to aggregate new metrics that can generate new assessment values. • http://sieve.wbsg.de/ Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 34
  49. 49. 3. The Crux Of The Matter Knowledge Cleaning • The goal of knowledge cleaning is to improve the correctness of a knowledge graph • Major objectives • error detection and • error correction of ● wrong instance assertions ● wrong property value assertions ● wrong equality assertions Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 35
  50. 50. 3. The Crux Of The Matter Knowledge Cleaning Tbox Abox Knowledge Curation Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 36
  51. 51. 3. The Crux Of The Matter Knowledge Cleaning Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction What Verification Validation Semantic Annotations check schema conformance and integrity constraints compare with web resource Knowledge Graphs check schema conformance and integrity constraints compare with "real" world 37
  52. 52. 3. The Crux Of The Matter Knowledge Cleaning Error correction of wrong instance assertions isElementOf (i1,i2): • i is not a proper instance identifier: Delete assertion or correct i • t is not an existing type name: Delete assertion or correct t • The instance assertion is (semantically) wrong: • Delete assertion or find proper t • and do NOT: find a proper i (would neither scale nor making sense) Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 38
  53. 53. 3. The Crux Of The Matter Knowledge Cleaning Error correction of wrong property value assertions: p(i1,i2): • p is not a proper property name: Delete assertion or correct p • i1 is not a proper instance identifier: Delete assertion or correct i1 • i1 is not in any domain of p: Delete assertion or add assertion isElementOf(i1,t) with t is a domain of p. • i2 is not a proper instance identifier: Delete assertion or correct i2 • i2 is not in the range of p for any domain of i1: • Delete assertion or • add a proper isElementOf assertion for i1 that adds a domain for which i2 is an instance of the range of the property or • add a proper isElementOf assertion for i2 that turns it into an instance of a range of the property applied to a domain of p where i1 is an element. • The property assertion is (semantically) wrong: delete assertion or correct it. In this case, you should most likely define proper i2, or search for better p, or search for better i1. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 39
  54. 54. 3. The Crux Of The Matter Knowledge Cleaning Error correction of wrong equality assertions: isSameAs(i1,i2): • i1 is not a proper instance identifier: Delete assertion or correct i1 • i2 is not a proper instance identifier: Delete assertion or correct i2 • The identity assertion is (semantically) wrong: Delete assertion or replace it by a skos operator1. 1 which however does not come with operational semantics. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 40
  55. 55. 3. The Crux Of The Matter Knowledge Cleaning Methods &Tools: • HoloClean ● Use of integrity constraints, ● external data, ● quantitative statistics. ● Steps • separate entry datasets into noisy and clean dataset • assign uncertainty score over the value of noisy datasets • compute marginal probability for each value to be repaired. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 41
  56. 56. 3. The Crux Of The Matter Knowledge Cleaning Methods &Tools: • HoloClean ● use of integrity constraints, ● external data, and ● quantitative statistics. ● Steps • separate entry datasets into noisy and clean dataset • assign uncertainty score over the value of noisy datasets • compute marginal probability for each value to be repaired • SDValidate ● uses statistical distribution functions ● three steps: • compute relative predicate frequency for each statement • each statement selected in first step -> assign score of confidence • apply threshold of confidence. • Similar steps for instance assertions. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 42
  57. 57. 3. The Crux Of The Matter Knowledge Cleaning Methods & Tools: • The LOD Laundromat [Beek et al., 2014] ● cleans Linked Open Data ● takes SPARQL endpoint/archived dataset as entry dataset ● guesses the serialisation format ● identifies syntax errors using a library while parsing RDF ● saves RDF data in canonical format [Beek et al., 2014] W. Beek, L. Rietveld, H. R. Bazoobandi, J. Wielemaker, and S. Schlobach: LOD Laundromat: A Uniform Way of Publishing Other People’s Dirty Data. In Proceedings of the 13th International Semantic Web Conference (ISWC2014), Springer, LNCS 8796, Riva del Garda, Italy, October 19-23, 2014. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 43
  58. 58. 3. The Crux Of The Matter Knowledge Cleaning Methods & Tools: • The LOD Laundromat [Beek et al., 2014] ● cleans Linked Open Data ● takes SPARQL endpoint/archived dataset as entry dataset ● guesses the serialisation format ● identifies syntax errors using a library while parsing RDF ● saves RDF data in canonical format • KATARA [Chu et al., 2015] ● identifies correct & incorrect data ● generates possible corrections for wrong data [Chu et al., 2015] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye: KATARA: reliable data cleaning with knowledge bases and crowdsourcing. In Proceedings of the 41st International Conference on Very Large Data Bases (PVLDB2015), VLDB Endowment, 8(12), 2015. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 44
  59. 59. 3. The Crux Of The Matter Knowledge Cleaning Methods & Tools: • The LOD Laundromat [Beek et al., 2014] ● cleans Linked Open Data ● takes SPARQL endpoint/archived dataset as entry dataset ● guesses the serialisation format ● identifies syntax errors using a library while parsing RDF ● saves RDF data in canonical format • KATARA [Chu et al., 2015] ● identifies correct & incorrect data ● generates possible corrections for wrong data • SPIN [Fürber et al., 2010b] ● SPARQL Constraint Language ● generates SPARQL Query templates based on data quality problems • inconsistency • lack of comprehensibility • heterogeneity • Redundancy • Nowadays, SPIN has turned into SHACL, a language for validating RDF graphs. [Fürber & Hepp, 2010b] C. Fürber and M. Hepp: Using semantic web resources for data quality management. In Proceedings of the 17th International Conference on Knowledge Engineering and Management by the Masses (EKAW2010), Springer, LNCS 6317, Lisbon, Portugal, October 11-15, 2010. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 45
  60. 60. 3. The Crux Of The Matter Knowledge Enrichment • The goal of knowledge enrichment is to improve the completeness of a knowledge graph by adding new statements • The process of Knowledge Enrichment has four phases: • New Knowledge Source detection • New Knowledge Source integration • Duplicate detection and alignment • Property-Value-Statements correction Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 46
  61. 61. 3. The Crux Of The Matter Knowledge Enrichment • Knowledge Source detection • search for additional sources of assertions for the KG • Open sources • Closed sources • Knowledge Source integration • Tbox: define mappings • Abox: integrate new assertions into the KG • Identifying and resolving duplicates • Invalid property statements such as domain/range violations and having multiple values for a unique property • also known in the data quality literature as contradicting or uncertain attribute value resolution. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 47
  62. 62. 3. The Crux Of The Matter Knowledge Enrichment Duplicate detection: https://www.cs.umd.edu/~getoor/Tutorials/ER_VLDB2012.pdf Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 48
  63. 63. 3. The Crux Of The Matter Knowledge Enrichment Methods and tools for duplicate detection and resolution: • Silk is a framework for achieving entity linking. • It tackles three tasks: 1. link discovery that defines similarity metrics to calculate a total similarity value for a pair of entities 2. evaluation of the correctness and completeness of generated links, and 3. a protocol for maintaining data that allows source dataset and target dataset to exchange generated link sets. http://silkframework.org/ Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 49
  64. 64. 3. The Crux Of The Matter Knowledge Enrichment Methods and tools for duplicate detection and resolution: • Legato [Achichi et al., 2017] is a linking tool based on indexing techniques. • It implements the following steps: 1. data cleaning that filters properties from two input datasets. For example, properties that do not help the comparison. 2. Instance profiling that creates instance profiles based on Concise Bounded Description for the source. 3. Pre-matching that applies indexing techniques (it takes TF-IDF values), filters such as tokenization and stop-words removal, and cosine similarity to preselect the entity links. 4. Link repairing that validates each link produced by searching for a link to a target source. [Achichi et al., 2017] M. Achichi, Z. Bellahsene, and K. Todorov: Lgato results for OAEI 2017. In Proceedings of the 12th International Workshop on Ontology Matching (OM2017) co-located with the 16th International Semantic Web Conference (ISWC2017). CEUR Workshops, vol. 2032, Vienna, Austria, October 21, 2017. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 50
  65. 65. 3. The Crux Of The Matter Knowledge Enrichment Methods and tools for duplicate detection and resolution: • SERIMI [Araujo et al., 2011] tries to match instances between two datasets. • It has three steps: • property selection, allows users to select relevant properties from source dataset, • the selection of candidates from a target dataset, uses string matching of properties to select a set of candidates, and • the disambiguation of candidates, measures the similarity for each candidate applying a contrast model, which returns a degree of confidence. • ADEL, Duke, Dedupe, LIMES, ... [Araujo et al., 2011] S. Araujo, J. Hidders, D. Schwabe, and A. P. de Vries: SERIMI - Resource Description Similarity, RDF Instance Matching and Interlinking. In Proceedings of the 6th International Workshop on Ontology Matching (OM2011), CEUR Workshop, vol. 814, Bonn, Germany, October 24, 2011. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 51
  66. 66. 3. The Crux Of The Matter Knowledge Enrichment Property-Value-Statements correction: • KnoFuss allows data fusion using different methods. • The workflow of KnoFuss is as follows: 1. It receives a dataset to be integrated into the target dataset, 2. It performs co-referencing using a similarity method, detects conflicts utilizing ontological constraints, and resolve inconsistencies 3. It produces a dataset to be integrated into the target dataset. • http://technologies.kmi.open.ac.uk/knofuss/ Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 52
  67. 67. 3. The Crux Of The Matter Knowledge Enrichment Property-Value-Statements correction: • ODCleanStore [Michelfeit & Necaský, 2012] is a framework for cleaning, linking, quality assessment, and fusing RDF data. • The fusion module allows users to configure conflict resolution strategies based on provenance and quality metadata. e.g. : 1. an arbitrary value, ANY, MIN, MAX, SHORTEST or LONGEST is selected from the conflicting values, 2. computes AVG, MEDIAN, CONCAT of conflicting values, 3. the value with the highest (BEST) aggregate quality is selected, 4. the value with the newest (LATEST) time is selected, and 5. ALL input values are preserved. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 53
  68. 68. 3. The Crux Of The Matter Knowledge Enrichment Property-Value-Statements correction: • Sieve [Mendes et al., 2012], is a framework that consists of two modules; a Quality assessment module and a Data Fusion module. • The Data Fusion module describes various fusion policies that are applied for fusing conflicting values. • FAG, FuSem, MumMer, … Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 54
  69. 69. 3. The Crux Of The Matter Knowledge Deployment • Building, implementing, and curating Knowledge Graphs is a time- consuming and costly activity. • Integrating large amounts of facts from heterogeneous information sources does not come for free. • [Paulheim, 2018b] estimates the average cost for one fact in a Knowledge Graph between $0,1 and $6 depending on the amount of mechanization. [Paulheim, 2018b] H. Paulheim: How much is a Triple? Estimating the Cost of Knowledge Graph Creation. In ISWC-P&D- Industry-BlueSky 2018: Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co- located with 17th International Semantic Web Conference (ISWC 2018) Monterey, USA, October 8-12, 2018. http://www. heikopaulheim.com/docs/iswc_bluesky_cost2018.pdf Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment 55
  70. 70. 3. The Crux Of The Matter Knowledge Deployment Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Name Instances Facts Types Relations DBpedia (English) 4,806,150 176,043,129 735 2,813 YAGO 4,595,906 25,946,870 488,469 77 Freebase 49,947,845 3,041,722,635 26,507 37,781 Wikidata 15,602,060 65,993,797 23,157 1,673 NELL 2,006,896 432,845 285 425 OpenCyc 118,499 2,413,894 45,153 18,526 Google´s Knowledge Graph 570,000,000 18,000,000,000 1,500 35,000 Google´s Knowledge Vault 45,000,000 271,000,000 1,100 4,469 Yahoo! Knowledge Graph 3,443,743 1,391,054,990 250 800 56
  71. 71. 3. The Crux Of The Matter Knowledge Deployment • We build a knowledge access layer on top of the Knowledge Graph helping to connect this resource to applications. • Knowledge management technology: • based on graph‐based repositories host the Knowledge Graph (as a semantic data lake). • The knowledge management layer is responsible for storing, managing and providing semantic description of resources • Inference engines (SemBase) based on deductive reasoning engines: • implements agents that defines view on this graph together with context data on user requests. • It accesses the graph to gain data for its reasoning that provides input to the dialogue engine interacting with the human user. • Reasons: • Help to implement access rights, bypass inconsistencies and frillions • Integrates additional information sources from the application (context, personalization, task etc.) Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment 57
  72. 72. 3. The Crux Of The Matter Knowledge Deployment Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Input MongoDB Semantify.it editing crawling mapping Storage GraphDB Hosting the Knowledge Graph Output Views Reasoning Agent Reasoning Agent Reasoning Agent 58
  73. 73. 3. The Crux Of The Matter Knowledge Deployment Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Infrastructure Generic Application Layer Conversational Interfaces 59
  74. 74. 4. The Proof Of The pudding Is In The Eating Onlim • The pioneer in automating customer communication via AI chatbots and conversational interfaces • Enterprise solutions for making data and knowledge available for conversational interfaces • Team of 25+ highly experienced AI experts, specialists in semantics and data science • Spin-off of University of Innsbruck • HQ in Europe (Vienna, Telfs) Current FocusVerticals 60 UtilitiesTourismRetail Education Financial Services
  75. 75. 4. The Proof Of The pudding Is In The Eating Onlim 61
  76. 76. 4. The Proof Of The pudding Is In The Eating • The Chatbot market is expected to grow from its current market value (2018) of more than $250 million to over $1.34 billion by 2024. • The growth is due to the evolving usage of chatbots for content marketing activities such as digital marketing and advertising. • With the rise of Artificial Intelligence (AI) and conversational user interfaces, we are increasingly likely to interact with a bot than ever before. • Businesses are following customers onto messaging platforms. 90% of businesses use Facebook to respond to service requests. • But also the transfer from social towards conversational interfaces is impressing. Bots on Facebook messenger can tremendously help businesses in dealing with that issue. • https://www.sdcexec.com/software-technology/news/21011880/chatbot-market-to-grow-at-31-percent-cagr-from-2018-to-2024 • https://www.gartner.com/smarterwithgartner/gartner-predicts-a-virtual-world-of-exponential-change/ • https://www.businessinsider.in/tech/data-a-massive-hidden-shift-is-driving-companies-to-use-a-i-bots-inside-facebook-messenger/slidelist/52240155.cms 62
  77. 77. 4. The Proof Of The pudding Is In The Eating • In 2017, 20 % of the web searches were conducted via voice assistants. • Artificial intelligence-based voice assistance (AI-voice) will soon be a primary user interface for all digital devices – including smartphones, smart speakers, personal computers, automobiles, and home appliances. • As of mid-January 2019, more than 1 billion devices worldwide were equipped with Google’s AI-voice Assistant, and another hundred million devices spoke with Amazon’s Alexa – and neither number accounts for devices equipped with voice assistants from Apple, Microsoft, Samsung, or across the digital worlds of China and Asia. • Juniper Research forecasts the global market for voice assistants to grow at a 25.4 percent CAGR over the next five years, with 8 billion active voice assistants (across all platforms and devices) by 2023. https://voicebot.ai/2019/01/07/google-assistant-to-be-available-on-1-billion-devices-this-month-10x-more-than-alexa/ https://www.juniperresearch.com/press/press-releases/digital-voice-assistants-in-use-to-triple 63
  78. 78. 4. The Proof Of The pudding Is In The Eating • Chatbots and Voice Assistants have started to play an increasing role in customer communication for many business in various verticals. • Especially in tourism they are proving more and more benefits in terms of convenience, availability, and fast access to information delivery and customer support through the entire customer journey. • In the dreaming and planning phase hotels and Destination Management Organizations (DMOs) can provide information through Chatbots and Voice Assistants about the hotel and/or the region, the surroundings, and weather conditions to potential guests. • In the booking phase, from booking the hotel and transport to buying connected services, e.g. ski tickets, all becomes much simpler and efficient by using natural language. • Finally in the experiences phase, Chatbots and Voice Assistants can also announce special offers or events. All requested information and processes are available 24/7/365 and instantly. For hotels guests in particular, the stay experience can be enriched by providing them access to hotel services and beyond. 64
  79. 79. 4. The Proof Of The pudding Is In The Eating • ATouristic Knowledge Graph integrates and connects data from several sources including: • touristic data sources: • open data sources: • It includes entities of the following types: • LocalBusiness • POIs, Infrastructure • SportsActivityLocations (e.g.Trails, SkiResorts) • Events • Offers • WebCams • Mobility andTransport 65
  80. 80. 4. The Proof Of The pudding Is In The Eating SkiRouteCableCar Slope SkiResort Touristic Knowledge Graph excerpt SkiResort, Lifts, Slopes, WebCams ChairLift WebCam Data Visualisation (based on GraphDB) containedInPlace SkiLift TBar SnowReport subClassOf containedInPlace 66
  81. 81. 4. The Proof Of The pudding Is In The Eating The Touristic KG is used to answer questions such as: • “Where can I have a traditional Tyrolean food when going cross country skiing?” • “Show me WebCams near Kölner Haus” • “How many people are leaving in Serfaus?” 67
  82. 82. 4. The Proof Of The pudding Is In The Eating The Dach-KG working group • develops a de facto standard for semantic annotation of touristic content, data, and services in the DACH area • based on schema.org and its adaptation by domain specifications • it should become the backbone of an open 5* Knowledge Graph for touristic data in DACH *) The dataset gets awarded one star if the data are provided under an open license. **) Two stars, if the data are available as structured data. ***) Three stars, if the data are also available in a non-proprietary format. ****) Four stars if URIs are used, that the data can be referenced and *****) five stars, if the data set are linked to other data sets that can provide context. https://www.tourismuszukunft.de/2019/05/dach-kg-neue-ergebnisse-naechste-schritte-beim-thema-open-data/ 68
  83. 83. 4. The Proof Of The pudding Is In The Eating Members of the Dach-KG working group • Touristic experts from the DACH-region (Germany (D), Austria (A), Switzerland (CH)) and Italy (South-Tyrol) • the Austrain and German touristic associations, • LTOs (Tirol, Vorarlberg, Wien, Brandenburg, Thüringen, …) • Associated: DMOs (Mayrhofen, Seefeld, …) • STI Innsbruck and STI International • Planned is an extension by technology providers (Datacycle, Feratel, Hubermedia, infomax, LandinSicht, Onlim, Outdooractive, TSO, ...) 69
  84. 84. 4. The Proof Of The pudding Is In The Eating We build the Tyrol Knowledge Grapgh (TKG) as a nucleus for this innitiative • It is a five star linked open data set published in GraphDB providing a SPARQL endpoint for the provisioning of touristic data of Tyrol, Austria. • The TKG currently contains data about touristic infrastructure like accommodation businesses, restaurants, points of interests, events, recipes, et. The data of the TKG fall under three categories of data: • Static data is information which is rarely changing like the address of a hotel. • Dynamic data is fast changing information, like availabilities and prices. • Active data describe actions that can be executed, for example, the description of a purchase- or reservation. • At November 25, 2018, the TKG contained around 5 billion statements, of which 55% are explicit and 45% are inferred. Every day the Knowledge Graph grows by around 8 million statements. • http://graphdb.sti2.at:8080/ 70
  85. 85. 4. The Proof Of The pudding Is In The Eating There is a world beyond leasurement: 71 UtilitiesTourismRetail Financial ServicesEducation
  86. 86. 5. Key Takeaway Our aim: • Establish a maximally automated knowledge lifecycle: NLU training, Query generation, Querying and representing world knowledge, as well as Natural Language Generation • Automatically distribute knowledge into all available channels • Core are methodologies, methods, and tools to generate, host, curate, deploy, and access Knowledge Graphs containing frillions of statements from heterogeneous, distributed, and dynamic sources. amazon.com Knowledge Graph ©google.com ©slack.com ©facebook.com ... 72
  87. 87. 73

×