Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sören Auer | Enterprise Knowledge Graphs

573 views

Published on

http://2016.semantics.cc/s%C3%B6ren-auer

Published in: Technology
  • Be the first to comment

Sören Auer | Enterprise Knowledge Graphs

  1. 1. Enterprise Knowledge Graphs Sören Auer
  2. 2. The three Big Data „V“ – Variety is often neglected Quelle: Gesellschaft für Informatik Sören Auer 2
  3. 3. Linked Data Principles Addressing the neglected third V (Variety) 1. Use URIs to identify the “things” in your data 2. Use http:// URIs so people (and machines) can look them up on the web 3. When a URI is looked up, return a description of the thing (in RDF format) 4. Include links to related things http://www.w3.org/DesignIssues/LinkedData.html 3 [1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
  4. 4. Linked (Open) Data: The RDF Data Model 4 RDF = Resource Description Framework located in label industry headquarters full nameDHL Post Tower 162.5 m Bonn Logistics Logistik DHL International GmbH height 物流 label Sören Auer
  5. 5. RDF Data Model (a bit more technical) – Graph consists of: • Resources (identified via URIs) • Literals: data values with data type (URI) or language (multilinguality integrated) • Attributes of resources are also URI-identified (from vocabularies) – Various data sources and vocabularies can be arbitrarily mixed and meshed – URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/ gn:locatedIn rdfs:label dbo:industry ex:headquarters foaf:namedbp:DHL_International_GmbH dbp:Post_Tower "162.5"^^xsd:decimal dbp:Bonn dbp:Logistics "Logistik"@de "DHL International GmbH"^^xsd:string ex:height "物流"@zh rdfs:label rdf:value unit:Meter ex:unit
  6. 6. RDF mediates between different Data Models & bridges between Conceptual and Operational Layers Id Title Screen 5624 SmartTV 104cm 5627 Tablet 21cm Prod:5624 rdf:type Electronics Prod:5624 rdfs:label “SmartTV” Prod:5624 hasScreenSize “104”^^unit:cm ... Electronics Vehicle Car Bus Truck Vehicle rdf:type owl:Thing Car rdfs:subClassOf Vehicle Bus rdfs:subClassOf Vehicle ... Tabular/Relational Data Taxonomic/Tree Data Logical Axioms / Schema Male rdfs:subClassOf Human Female rdfs:subClassOf Human Male owl:disjointWith Female ... Sören Auer 6
  7. 7. © Fraunhofer · Seite 7 Vocabularies – Breaking the mold! Semantic data virtualization allows for continuous expansion and enhancement of data and metadata across data sources without loosing the overall perspective Relational data models 1:1 Relation between Data Model und Application Graph based data model Subject Predicate Object / Subject Predicate Object / Subject 1:n Relation between Data Model and Application
  8. 8. © Fraunhofer · Seite 8 Vocabulary Example Vocabulary Schema Instantiation PostTower rdf:type Building PostTower locatedIn dbpedia:Bonn PostTower height "162.5"^^meter located in label industry headquarters full nameDHL Post Tower 162.5 m Bonn Logistics Logistik DHL International GmbH height 物流 label Class: Company Property Expected type inIndustry Industry fullName String headquarter Building Class: Building Property Expected type locatedIn Industry height unit:meter RDFRepresentationVisualRepresentation Company rdf:type rdfs:Class Building rdf:type rdfs:Class inIndustry rdf:type rdfs:Property inIndustry rdfs:domain Company inIndustry rdfs:range Industry headquarter rdf:type rdfs:Property headquarter rdfs:domain Company headquarter rdfs:range Building DHL rdf:type Company DHL fullName "DHL Int. GmbH" DHL inIndustry Logistics DHL headquarter PostTower
  9. 9. Die Semantic Web Layer Cake 2001 http://www.w3.org/2001/10/03-sww-1/slide7-0.html • Monolithisch basierend auf XML • Fokus auf schwergewichtige Semantik (Ontologien, Logic, Reasoning)
  10. 10. © Fraunhofer The Semantic Web Layer Cake 2015 – Bridging between Big & Smart Data Unicode URIs XML JSON CSV RDB HTML RDF RDF/XML JSON-LD CSV2RDF R2RML RDFa RDF Data Shapes RDF-Schema Vocabularies OntologienSKOS Thesauri LogikSWRL Regeln SPARQL (Accesscontrol),Signatur, Encryption(HTTPS/CERT/DANE), • Lingua Franca of Data integration with many technology interfaces (XML, HTML, JSON, CSV, RDB,…) • Focus on lightweight vocabularies, rules, thesauri etc. • Less “invasive”
  11. 11. © Fraunhofer RDF - the Lingua Franca of Data Integration • RDF is simple • We can easily encode and combine all kinds of data models (relational, taxonomic, graphs, object-oriented, …) • RDF supports distributed data and schema • We can seamlessly evolve simple semantic representations (vocabularies) to more complex ones (e.g. ontologies) • Small representational units (URI/IRIs, triples) facilitate mixing and mashing • RDF can be viewed from many perspectives: facts, graphs, ER, logical axioms, graphs, objects • RDF integrates well with other formalisms - HTML (RDFa), XML (RDF/XML), JSON (JSON-LD), CSV, … • Linking and referencing between different knowledge bases, systems and platforms facilitates the creation of sustainable data ecosystems 11
  12. 12. © Fraunhofer Successful application domains Linked Data & Semantic Integration Search Engine Optimization & Web-Commerce  Schema.org used by >20% of Web sites  Major search engines exploit semantic desciptions Pharma, Lifesciences  Mature, comprehensive vocabularies and ontologies  Billions of disease, drug, clinical trial descriptions Digital Libraries  Many established vocabularies (DublinCore, FRBR, EDM)  Millions of aggregated from thousends of memory institutions in Europeana, German Digital Library
  13. 13. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS The Web evolves into a Web of Data Sören Auer 13 Linked Open Data Facebook Open Graph
  14. 14. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Knowledge Graphs – A definition • Fabric of concept, class, property, relationships, entity descriptions • Uses a knowledge representation formalism (typically RDF, RDF-Schema, OWL) • Holistic knowledge (multi-domain, source, granularity): • instance data (ground truth), • open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed data (product models), • derived, aggregated data, • schema data (vocabularies, ontologies) • meta-data (e.g. provenance, versioning, documentation licensing) • comprehensive taxonomies to categorize entities • links between internal and external data • mappings to data stored in other systems and databases
  15. 15. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Knowledge Graph Challenges & Opportunities Knowledge graphs typically cover • Multiple domains • Various levels of granularity • Data from multiple sources • Various degrees of structure Challenges • Quality • Coherence • Co-evolution • Update propagation • Curation & interaction Opportunities • Background knowledge for various applications (e.g. question answering, data integration, machine learning) • Facilitate intra-organizational data sharing and exchange (data value chains) 15
  16. 16. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Comparison of various enterprise data integration paradigms Paradigm Data Model Integr. Strategy Conceptual/ operational Hetero- geneous data Intern./ extern. data No. of sources Type of integr. Domain coverage Se- mantic repres. XML Schema DOM trees LaV operational   medium both medium high Data Warehouse relational GaV operational - partially medium physical small medium Data Lake various LaV operational   large physical high medium MDM UML GaV conceptual - - small physical small medium PIM / PCS trees GaV operational partially partially - physical medium medium Enterprise search document - operational  partially large virtual high low EKG RDF LaV both   medium both high very high [1] Michael Galkin, Sören Auer, Simon Screrri: Enterprise Knowledge Graphs: A Survey. Submitted to 37th International Conference on Information Systems. 2016.
  17. 17. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Knowledge Graph Technology 17
  18. 18. Adding a Semantic Layer to Data Lakes 18 Management Accounting Marketing Sales SupportR&D Semantic Data Lake • central place for model, schema and data historization • Combination of Scale Out (cost reduction) and semantics (increased control & flexibility) • grows incrementally (pay-as-you-go) Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Data Lake (order of magnitude cheaper scalable data store) Knowledge Graph for Relationship Definition and Meta Data Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems JSON-LD CSVW R2RMLXML2RDF
  19. 19. W3C R2RML – Relational to RDF Mapping Sören Auer 19 R2RML: RDB to RDF Mapping Language, W3C Recommendation 27 September 2012 Editors: Souripriya Das, Seema Sundara, Richard Cyganiak http://www.w3.org/TR/r2rml/
  20. 20. Example R2RML Mapping Sören Auer 20
  21. 21. 1. Either resulting RDF knowledge base is materialized in a triple store & 2. subsequently queried using SPARQL 3. or the materialization step is avoided by dynamically mapping an input SPAQRL query into a corresponding SQL query, which renders exactly the same results as the SPARQL query being executed against the materialized RDF dump SPARQLMap – Mapping RDB 2 RDF
  22. 22. Example: Sparqlify • Rationale: Exploit existing formalisms (SQL, SPARQL Construct) as much as possible • flexible & versatile mapping language • translating one SPARQL query into exactly one efficiently executable SQL query • Solid theoretical formalization based on SPARQL-relational algebra transformations • Extremely scalable through elaborated view candidate selection mechanism • Used to publish 20B triples for LinkedGeoData [1] Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases. [2] Unbehauen, Stadler, Auer: Optimizing SPARQL-to-SQL Rewriting. iiWAS 2013 [3] Auer, et al.: Triplify: light-weight linked data publication from relational databases. WWW 2009 SPARQL Construct SQL View Bridge
  23. 23. Semantified Big Data Architecture Blueprint Sören Auer 23 [1] Mami, Scerri, Auer, Vidal: Towards the Semantification of Big Data Technology. DEXA 2016 Datasources Ingestion Storage Semantic Lifting with Mappings Querys Storing of semantic and semantified data in Apache Parquet files on HDFS
  24. 24. SEBIDA Implementation Architecture Sören Auer 24
  25. 25. SEBIDA Evaluation Results • Loads data faster • Has quite different query performance characteristics – faster in 5 out of 12 queries, similar performance in 2, slower in 5 Sören Auer 25
  26. 26. © Fraunhofer · Seite 26 VOCOL: COLLABORATIVE VOCABULARY CURATION ENVIRONMENT Comprehensive Support for Evolving Vocabularies
  27. 27. © Fraunhofer · Seite 27 Industry 4.0 Semantic Models as Bridge between Shop & Office Floor
  28. 28. © Fraunhofer · Seite 28 Semantic Administrative Shell & Reference Architecture for Industry 4.0 (RAMI4.0) Administrative Shell (Verwaltungsschale) provides a digital identity for arbitrary Industry 4.0 components (e.g. sensors, actors/robots) exposing data covering the whole life-cycle Reference Architecture for Industry 4.0 (RAMI4.0) provides a conceptual framework for implementing comprehensive Industry 4.0 scenarios We have implemented both concepts along with a number of IEC and ISO standards in a comprehensive information model ready to be implemented in productive environments
  29. 29. © Fraunhofer · Seite 29 VoCol collaborative Development Environment for Vocabularies Versioning Git/Bitbucket Issue tracking GitLab/ GitHub Syntax validation Docu- mentation generation Authoring Turtle Visualization vOWL Publishing LOD/Sparql Integrates a number of tools & services for different aspects of vocabulary development Is centered around Git version control (or Bitbucket), thus supporting the branching and merging of vocabularies Supports the roundtrip between • Schema/vocabulary development • Competency questions (expressed in SPARQL) • Example data  Bridges between conceptual models and executable code http://eis.iai.uni-bonn.de/Projects/VoCol.html
  30. 30. © Fraunhofer · Seite 30 Development based on Git – Version Control Git is meanwhile the most widely used version control system. It is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was initially designed and developed in 2005 by Linux kernel developers for Linux kernel development Git is the basis for a variety of open-source or commercial services and products such as: GitHub/Bitbucket - Web-based Git repository hosting service with millions of users GitLab/Gitolite - open-source Web-based Git repository management platforms Since TeamFoundationServer release 2013, Microsoft added native support for Git Git is easily extensible and integratable into arbitrary workflows via GitHooks
  31. 31. © Fraunhofer · Seite 31 Information Model – Environment
  32. 32. © Fraunhofer · Seite 32 Environment: Dynamic Documentation
  33. 33. © Fraunhofer · Seite 33 Environment: Dynamic Documentation
  34. 34. © Fraunhofer · Seite 34 Environment: Dynamic Visualization
  35. 35. © Fraunhofer · Seite 35 Environment: Analytics
  36. 36. © Fraunhofer · Seite 36 Environment: Analytics
  37. 37. © Fraunhofer · Seite 37 Environment: Analytics
  38. 38. © Fraunhofer · Seite 38
  39. 39. © Fraunhofer · Seite 39 Environment: Querying
  40. 40. © Fraunhofer · Seite 40 Environment: Evolution
  41. 41. © Fraunhofer · Seite 41 INDUSTRIAL DATA SPACE
  42. 42. © Fraunhofer · Seite 42 Vocabulary-based Integration facilitates Data-driven Businesses Vocablary
  43. 43. © Fraunhofer ·· Seite 43 Die Arbeiten zum Industrial Data Space sind komplementär verzahnt mit der Plattform Industrie 4.0 Handel 4.0 Bank 4.0Versicherung 4.0 …Industrie 4.0 Fokus auf die produzierende Industrie Smart Services Übertragung, Netzwerke Echtzeitsysteme Industrial Data Space Fokus auf Daten Daten …
  44. 44. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS The Industrial Data Space Initiative Community of >30 large German and European Companies Pre-competitive, publicly funded innovation project involving 11 Fraunhofer institutes for developing IDS reference architecture Current members of the Industrial Data Space Association
  45. 45. © Fraunhofer · Seite 45 Bilder: ©Fotolia Francesco De Paoli, Nmedia, hakandogu Semantic Data Linking for Enterprise Data Value Chains Data Lake Pure Internet centralized, monopolistic federated, secure, „trusted“, standard-based completely dezentral, open, unsecure Data management Central Repository Decentral Decentral Data Ownership Central Decentral Decentral Data Linking Single provider Federated, on demand Missing Data Security Bilateral Certified system Bilateral Market structure Central Provider Role system Unstructured Transport infrastructure Internet Internet Internet Industrial Data Space
  46. 46. © Fraunhofer · Seite 46 Bilder: © Fotolia 77260795 ∙ 73040142 58947296 ∙ 68898041 Basic principles of the Industrial Data Space On Demand Vernetzung Linked Light Semantics Security with Industrial Data Container Certified Roles On Demand Interlinking
  47. 47. © Fraunhofer · Seite 47 Bildquellen: Istockphoto Industrial Data Space: On Demand Interlinking Service A Service C Service E Service B Service D Service G Service F Enterprise 4 Enterprise 1 Enterprise 6 Enterprise 2 Enterprise 3 Enterprise 5 All Data stays with its Ownern and are controlled and secured. Only on request for a service data will be shared. No central platform.
  48. 48. © Fraunhofer · Seite 48 --- VERTRAULICH --- Industrial Data Space Upload / Download / Search Internet AppsVocabulary Industrial Data Space Broker Clearing RegistryIndex Industrial Data Space App Store Internal IDS Connector Company A Internal IDS Connector Company B External IDS Connector External IDS Connector Upload Third Party Cloud Provider Download Upload / Download © Fraunhofer IDS Architecture Overview
  49. 49. Big Data is not Just Volume and Velocity Variety (& Varacity) are key challenges Linked Data helps dealing with both • Linked Data life-cycle requires to integrate and adapt results from a number of disciplines – NLP, – Machine Learning, – Knowledge Representation, – Data Management, – User Interaction – … • Applications in a number of domains – cultural heritage, – life sciences, – industry 4.0 / cyber-physical systems, – smart cities, – mobility, – … Sören Auer 49 Linked Data links not only data but also: • Various disciplines • Applications and Use cases
  50. 50. The Team Sören Auer 50
  51. 51. Creating Knowledge out of Interlinked Data Thanks for your attention! Sören Auer http://www.iai.uni-bonn.de/~auer | http://eis.iai.uni-bonn.de auer@cs.uni-bonn.de
  52. 52. LINKED-DATA-BASED QUESTION ANSWERING A Grand Challenge Sören Auer 52
  53. 53. Question Answering research challenges Main Goals • Completeness ⇒ Extension of background knowledge, streams, deduplication • Flexibility ⇒ Deal with keywords and NL • Runtime ⇒ New models for query processing, ranking for top-k queries • Easy use ⇒ Verbalization of queries, entity verbalization, explanation of answers in NL • Multilinguality ⇒ cover several European languages Automatic Extension of background knowledge • 1. Generate query from own data and get answer set A; 2. Add new data set and get answer A’; 3. If info gain, then iterate; 4. Else terminate Data Streams • Continuous queries on data streams (update SPARQL results as new information comes in) • Send novel answers to end user • Open Information Extraction Hybrid Search - extension for queries on unstructured data Ensure Quasi-Completeness • Fully automatic entity consolidation • Find links at runtime, e.g., between DBpedia and LinkedMDB to answer “Which films were directed by and starred Tarantino”? Sören Auer 53 [1] Shekarpour, Marx, Ngomo, Auer: Semantic query interpretation for question answering on linked data. J. Web Semantic 30 (2015) [2] Marx, Usbeck, Ngomo, Höffner, Lehmann, Auer: Towards an open question answering architecture. SEMANTICS 2014 [3] Shekarpour, Ngomo, Auer: Question answering on interlinked data. WWW 2013:
  54. 54. The approach: An Open QA Architecture Create an open, extensible architecture for Linked-Data-based Question Answering • Enable the plugin and competition of different modules for various QA aspects: • Input: query string / question, voice, brain input; Query Splitting; Disambiguation/Mapping; Query Construction; Query Execution; Result presentation • Take context, personalization, feedback into account For Whom? Use Cases: • In-car interaction / Human Vehicle Interaction Where can I find parking? What are the main sights in Luxembourg? • Assisting people with disabilities (e.g. vision impaired) Is there any pharmacy still open? What classics concerts are brodcast next week? • Medical information retrieval Which side effects can be caused by Paracetamol? Do Paracetamol and Tamiflu interfere? •… Sören Auer 54 [1] The WDAqua Marie Curie ITN: Answering Questions using Web Data. http://wdaqua.informatik.uni-bonn.de

×