Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Talking knowledge graphs ny

41 views

Published on

New York

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Talking knowledge graphs ny

  1. 1. Talking Knowledge Graphs Dieter Fensel with the help of Kevin Angele & Ioan Toma New York 7.5.2019
  2. 2. 2 • Text/Voice becomes mainstream • Use cases are still basic • Advanced use cases need knowledge • Without intents & knowledge-> no understanding of users needs and goals Motivation Please, book a table in a restaurant with roast pork having reasonable prices in Mayrhofen for tonight Restaurant in Mayrhofen? Has roast pork? price? Image: ©amazon.com Sorry, I don’t know how to help you!
  3. 3. 3 Solution Please, book a table in a restaurant with roast pork having reasonable prices in Mayrhofen for tonight Image: ©amazon.com KG action: TableReservation type: Restaurant, offers: Roast Pork Location: Mayrhofen Price: price_level generated query: ?- tableReservationAction(), type(Restaurant), offers(RoastPork). Predefined rules: ● tableReservationAction: book a table in a given Restaurant ● type: return all elements of type <type> ● offers: return all elements that offer <offer> ● ... Query Generation NLG1 1 Natural Language Generation Extracted KnowledgeGenerated Language output • Knowledge provided for Intelligent Assistants (IA) • IAs understand users needs & goals • determine the intent • extract the parameters, i.e., find relevant knowledge items in the knowledge graph • NLU-Technology transforms the user information into a more structured way (text/voice -> intent & parameters) Knowledge Graph contains deep, accurate, and up-to- date knowledge about leasurement services in Tyrol.
  4. 4. 4 Onlim Overview • The pioneer in automating customer communication via AI chatbots and conversational interfaces • Enterprise solutions for making data and knowledge available for conversational interfaces • Team of 25+ highly experienced AI experts, specialists in semantics and data science • Spin-off of University of Innsbruck • HQ in Europe (Vienna, Telfs) Current Focus Verticals Onlim & STI Innsbruck STI Innsbruck Overview • Research group at the University of Innsbruck in the Austrian state of Tyrol • Engaged in research and development to bring information and communication technologies of the future into today's world. • Team of 20+ Semantic experts • Main research areas: Ontologies,Semantic Web, Knowledge Graphs • More details at: https://www.sti-innsbruck.a
  5. 5. 5 Challenge - Automation User understand Intent + Parameters map Query query Knowledge Graph 1. Understand information needs and goals of the users (Natural Language Understanding) a. Design intents b. train NLU (scaling) c. entity detection 2. Mapping Intent & Parameters to create a query for accessing the KG 3. Knowledge Graph a. Integrate large volumes of heterogenious, distributed, potentially inconsistent statements 4. Natural Language Generation (NLG) to present the result to the user 1. 2. 3. NLG1 4. 1 Natural Language Generation
  6. 6. 6 NLU • Voice/Text recognition already good • However require significant manual labour Manual work • Design intents based on schema of Knowledge Graph • Define utterances (example questions) per intent • Mark parameters that should be extracted from utterances Automation • Entity detection • Push entities from Knowledge Graph • Detect unanswered questions • Use Knowledge Graph to update/extend NLU • create utterances • supervised-learning • extend utterances with unanswered questions 1. Natural Language Understanding NLU Knowledge User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4.
  7. 7. 7 • Basis: detected intent & extracted parameters during NLU • Map extracted information (intent & parameters) on predefined rules • Query: Combination of rules • Additional restriction rules • Define a view on a relevant subgraph of the Knowledge Graphs  A chatbots may not have access to the whole Knowledge Graph (prevent trillions, inconsistencies, and access right restrictions) 2. Query generation User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. Generated query Intent (with parameters) Query generation Predefined rules
  8. 8. 8 • Query is combination of predefined rules • Generate rules out of Knowledge Graph • manual • based on used ontology (schema.org) • rules for all possible queries • semi-automatically • propose rules with the help of the Knowledge Graphs • minimize manual adaptions 3. Rule generation User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4.
  9. 9. 9 4. Natural Language Generation User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. Manual work • Define templates based on • structure of data • information that should be given to the user Automatic • Generate • templates out of the Knowledge Graph • textual answers from the Knowledge Graph • follow up questions to run dialogs
  10. 10. 10 The quality of the Intelligent Assistants depends directly on the quality of the Knowledge Graph Problem: “Garbage in Garbage out” Requirements for the Knowledge Graph: • well structured (using an ontology - schema.org) • homogenous structure/models • accurate information (correctness) • large and detailed coverage (completeness) Knowledge Graph => “Knowledge Graph Lifecycle”
  11. 11. 11 3. Knowledge Graph - Process model Knowledge Creation Knowledge Hosting Knowledge Cleaning Knowledge Enrichment Knowledge Curation Knowledge Deployment Knowledge Assesment Error detection Error correction
  12. 12. 12 3. Knowledge Graph - Task Model Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error CorrectionEvaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction
  13. 13. 13 schema.org • to annotate websites • as ontology for the KG (local properties (IC versus EI) Knowledge Creation: Methods and Tools Knowledge Creation • manual • using Annotation Editor • based on Domain Specification that restrict and extend schema.org • semi-automatic • Annotation Editor suggests mappings/extracted information • e.g. extract information from web pages (by HTML tags) • Manual adaptions needed • mapping • integrate large & fast changing data sets • map different formats to ontology used in Knowledge Graph • automatic • natural language processing (NLP) • machine learning (ML) • extract knowledge from text representations & web pages • named entity recognition, concept mining, text mining, ... Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
  14. 14. 14 Knowledge Creation - Tools & Libraries • GATE (text analysis & language processing) • OpenNLP (supports most common NLP tasks) • RapidMine (data preparation, machine learning, deep learning, text mining, predictive analysis)
  15. 15. 15 Knowledge Hosting Annotation - Tool (e.g. semantify.it) Documentstore (e.g. MongoDB) Graph database (e.g. GraphDB) edit data crawl data map data Hosting the Knowledge Graph Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Hosting semabtic web annotations
  16. 16. 16 Evaluation • Overall process to determine the quality of a KG • Select quality dimensions and metrices • Evaluate representative subsets Correctness • Identify the amount of wrong assertions Completeness • Identify the amount of missing assertions Knowledge Curation - Assessment Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness
  17. 17. 17 Methods • Semi-automatic based on data integrity constraints • user-driven assessment • test-driven assessment • manual assessment based on crowd’s experts • statistical distribution • SPARQL queries • identification of functional dependency violations & missing values Knowledge Curation - Assesment
  18. 18. 18 Knowledge Curation - Assesment Tools • WIQA (Web Information Quality Assessment Framework) • filtering policies to evaluate information quality • SWIQA (Semantic Web Information Quality Assessment Framework) • data quality rules & quality scores for identifying wrong data • LINK-QA • using network metrics • Sieve • flexibly expressing quality assessment methods • fusion methods • Validata • online tool for testing/validating RDF data against ShEx-schemas • Luzzu (Linked Open Datasets) • thirty data quality metrics based on Dataset Quality Ontology
  19. 19. 19 • improve correctness • major objectives • error detection ● wrong instance assertions: isElementOf(i, t) ● wrong property value assertions: p(i1, i2) ● wrong equality assertions: isSameAs(i1, i2) • error correction • of wrong instance assertion: isElementOf(i,t): • i is not a proper instance identifier: Delete assertion or correct i • t is not an existing type name: Delete assertion or correct t • The instance assertion is (semantically) wrong: Delete assertion or find proper t, and do NOT: find a proper i (would neither scale nor making sense) Knowledge Curation - Cleaning Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction
  20. 20. 20 • Error correction of wrong property value assertions: p(i1,i2): • p is not a proper property name: Delete assertion or correct p • i1 is not a proper instance identifier: Delete assertion or correct i1 • i1 is not in any domain of p: Delete assertion or add assertion isElementOf(i1,t) with t is a domain of p. • i2 is not a proper instance identifier: Delete assertion or correct i2 • i2 is not in the range of p for any domain of i1: Delete assertion or add a proper isElementOf assertion for i1 that adds a domain for which i2 is an instance of the range of the property or add a proper isElementOf assertion for i2 that turns it into an instance of a range of the property applied to a domain of p where i1 is an element. • The property assertion is (semantically) wrong: delete assertion or correct it. In this case, you should most likely define proper i2, or search for better p, or search for better i1. • Correction of wrong equality assertions: isSameAs(i1,i2): • i1 is not a proper instance identifier: Delete assertion or correct i1 • i2 is not a proper instance identifier: Delete assertion or correct i2 • The identity assertion is (semantically) wrong: Delete assertion or replace it by a skos operator, which however does not come with operational semantics. The informed reader may here recognize the implicit usage of the closed versus open world assumption. Knowledge Curation - Cleaning Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction
  21. 21. 21 Knowledge Curation - Cleaning • Methods • Instance assertion ● statistical distribution of types and properties ● disjointness of axioms ● supervised machine learning ● entity type dictionaries ● association rule mining • Property value assertion ● statistical distribution ● ontology reasoners ● Wikipedia pages ● outlier detection • Equality assertion ● outlier detection ● constraints ● logical validation ● local context of instances
  22. 22. 22 • Tools • HoloClean ● integrity constraints ● external data ● quantitative statistics ● Steps • separate entry datasets into noisy and clean dataset • assign uncertainty score over the value of noisy datasets • compute marginal probability for each value to be repaired • SDValidate ● statistical distributions ● three steps • compute relative predicate frequency for each statement • each statement selected in first step -> assign score of confidence • apply threshold of confidence Knowledge Curation - Cleaning
  23. 23. 23 Knowledge Curation - Cleaning • Tools • The LOD Laundromat ● cleans Linked Open Data ● takes SPARQL endpoint/archived dataset as entry dataset ● guesses the serialisation format ● identifies syntax errors using a library while parsing RDF ● saves RDF data in canonical format • KATARA ● identifies correct & incorrect data ● generates possible corrections for wrong data • SPIN ● SPARQL Constraint Language ● generates SPARQL Query templates based on data quality problems • inconsistency • lack of comprehensibility • heterogeneity • redundancy
  24. 24. 24 Knowledge Source detection • search for additional assertions to the KG Knowledge Source integration • integrate new assertions into the KG • align new statements with the existing ones Duplicate detection • Methods • string similarity measures • association rule mining • topic modelling • Support Vector Machine • property-based • crowd-sourced data • graph-oriented • formalizing entity resolution Property-Value-Statements correction Knowledge Curation - Enrichment Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction
  25. 25. 25 • Tools • Silk • achieving entity linking • SERIMI • tries to match instances between two datasets • Legato • linking tool based on indexing techniques • … Property-Value-Statements correction • Tools • Sieve • Quality assessment module • Data Fusion module (describes fusion policies) • KnoFuss • data fusion using different methods • ODCleanStore • cleaning, linking, quality assessment and fusing RDF data • ... Knowledge Curation - Enrichment
  26. 26. 26 Knowledge Deployment Graph database Hosting the Knowledge Graph Reasoning Agent Views defined via rules Output Storage • connection of user request -> resources • knowledge management technology (GraphDB) • Inference engines based on deductive reasoning engines (reasoning agents) • data accessible through personalized agents • Restrict: define partial views (via rules) • Enrich: provide contextual & personalized reasoning on top of these knowledge Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
  27. 27. 27 • The Touristic Knowledge Graph integrates and connects data from several sources including: • touristic data sources: • open data sources: • It includes entities of the following types: • LocalBusiness • POIs, Infrastructure • SportsActivityLocations (e.g. Trails, SkiResorts) • Events • Offers • WebCams • Mobility and Transport Pilot: Touristic Knowledge Graph for DACH
  28. 28. 28 SkiRouteCableCar Slope SkiResort Pilot Touristic Knowledge Graph excerpt SkiResort, Lifts, Slopes, WebCams ChairLift WebCam Data Visualisation ( based on GraphDB) containedInPlace SkiLift TBar SnowReport subClassOf containedInPlace
  29. 29. 29 • The Touristic KG is used to answer questions such as: • “Where can I have a traditional Tyrolean food when going cross country skiing?” • “Show me WebCams near Kölner Haus” • “How many people are leaving in Serfaus?” Pilot
  30. 30. 30 • Fully automated knowledge lifecycle • NLU training • query generation • natural language generation • Automatically distribute knowledge into all available channels • Core is the availability of methodologies, methods, and tools to generate, host, curate, deploy, and access Knowledge Graphs containing trillions of statements and more. Vision ©amazon.com Knowledge Graphs ©google.com ©slack.com ©facebook.com Distribute ...

×