Successfully reported this slideshow.

Enterprise knowledge graphs

15

Share

Loading in …3
×
1 of 49
1 of 49

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Enterprise knowledge graphs

  1. 1. Enterprise Knowledge Graphs Sören Auer https://www.eccenca.com
  2. 2. The three Big Data „V“ – Variety is often neglected Quelle: Gesellschaft für Informatik Sören Auer 2
  3. 3. Linked Data Principles Addressing the neglected third V (Variety) 1. Use URIs to identify the “things” in your data 2. Use http:// URIs so people (and machines) can look them up on the web 3. When a URI is looked up, return a description of the thing (in RDF format) 4. Include links to related things http://www.w3.org/DesignIssues/LinkedData.html 3 [1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
  4. 4. Linked (Open) Data: The RDF Data Model 4 RDF = Resource Description Framework located in label industry headquarters full nameDHL Post Tower 162.5 m Bonn Logistics Logistik DHL International GmbH height 物流 label Sören Auer
  5. 5. RDF Data Model (a bit more technical) – Graph consists of: • Resources (identified via URIs) • Literals: data values with data type (URI) or language (multilinguality integrated) • Attributes of resources are also URI-identified (from vocabularies) – Various data sources and vocabularies can be arbitrarily mixed and meshed – URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/ gn:locatedIn rdfs:label dbo:industry ex:headquarters foaf:namedbp:DHL_International_GmbH dbp:Post_Tower "162.5"^^xsd:decimal dbp:Bonn dbp:Logistics "Logistik"@de "DHL International GmbH"^^xsd:string ex:height "物流"@zh rdfs:label rdf:value unit:Meter ex:unit
  6. 6. RDF mediates between different Data Models & bridges between Conceptual and Operational Layers Id Title Screen 5624 SmartTV 104cm 5627 Tablet 21cm Prod:5624 rdf:type Electronics Prod:5624 rdfs:label “SmartTV” Prod:5624 hasScreenSize “104”^^unit:cm ... Electronics Vehicle Car Bus Truck Vehicle rdf:type owl:Thing Car rdfs:subClassOf Vehicle Bus rdfs:subClassOf Vehicle ... Tabular/Relational Data Taxonomic/Tree Data Logical Axioms / Schema Male rdfs:subClassOf Human Female rdfs:subClassOf Human Male owl:disjointWith Female ... Sören Auer 6
  7. 7. © Fraunhofer · Seite 7 Vocabulary Example Vocabulary Schema Instantiation PostTower rdf:type Building PostTower locatedIn dbpedia:Bonn PostTower height "162.5"^^meter located in label industry headquarters full nameDHL Post Tower 162.5 m Bonn Logistics Logistik DHL International GmbH height 物流 label Class: Company Property Expected type inIndustry Industry fullName String headquarter Building Class: Building Property Expected type locatedIn Industry height unit:meter RDFRepresentationVisualRepresentation Company rdf:type rdfs:Class Building rdf:type rdfs:Class inIndustry rdf:type rdfs:Property inIndustry rdfs:domain Company inIndustry rdfs:range Industry headquarter rdf:type rdfs:Property headquarter rdfs:domain Company headquarter rdfs:range Building DHL rdf:type Company DHL fullName "DHL Int. GmbH" DHL inIndustry Logistics DHL headquarter PostTower
  8. 8. © Fraunhofer · Seite 8 Semantic Web Layer Cake 2001 http://www.w3.org/2001/10/03-sww-1/slide7-0.html • Monolithic based on XML • Focus on heavyweight Semantic (Ontologies, Logic, Reasoning)
  9. 9. © Fraunhofer The Semantic Web Layer Cake 2015 – Bridging between Big & Smart Data Unicode URIs XML JSON CSV RDB HTML RDF RDF/XML JSON-LD CSV2RDF R2RML RDFa RDF Data Shapes RDF-Schema Vocabularies OntologienSKOS Thesauri LogikSWRL Regeln SPARQL (Accesscontrol),Signatur, Encryption(HTTPS/CERT/DANE), • Lingua Franca of Data integration with many technology interfaces (XML, HTML, JSON, CSV, RDB,…) • Focus on lightweight vocabularies, rules, thesauri etc. • Less “invasive”
  10. 10. © Fraunhofer RDF - the Lingua Franca of Data Integration • RDF is simple • We can easily encode and combine all kinds of data models (relational, taxonomic, graphs, object-oriented, …) • RDF supports distributed data and schema • We can seamlessly evolve simple semantic representations (vocabularies) to more complex ones (e.g. ontologies) • Small representational units (URI/IRIs, triples) facilitate mixing and mashing • RDF can be viewed from many perspectives: facts, graphs, ER, logical axioms, graphs, objects • RDF integrates well with other formalisms - HTML (RDFa), XML (RDF/XML), JSON (JSON-LD), CSV, … • Linking and referencing between different knowledge bases, systems and platforms facilitates the creation of sustainable data ecosystems 10
  11. 11. © Fraunhofer Successful application domains Linked Data & Semantic Integration Search Engine Optimization & Web-Commerce  Schema.org used by >20% of Web sites  Major search engines exploit semantic desciptions Pharma, Lifesciences  Mature, comprehensive vocabularies and ontologies  Billions of disease, drug, clinical trial descriptions Digital Libraries  Many established vocabularies (DublinCore, FRBR, EDM)  Millions of aggregated from thousends of memory institutions in Europeana, German Digital Library
  12. 12. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS The Web evolves into a Web of Data Sören Auer 12 Linked Open Data Facebook Open Graph
  13. 13. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Knowledge Graphs – A definition • Fabric of concept, class, property, relationships, entity descriptions • Uses a knowledge representation formalism (typically RDF, RDF-Schema, OWL) • Holistic knowledge (multi-domain, source, granularity): • instance data (ground truth), • open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed data (product models), • derived, aggregated data, • schema data (vocabularies, ontologies) • meta-data (e.g. provenance, versioning, documentation licensing) • comprehensive taxonomies to categorize entities • links between internal and external data • mappings to data stored in other systems and databases
  14. 14. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Knowledge Graph Challenges & Opportunities Knowledge graphs typically cover • Multiple domains • Various levels of granularity • Data from multiple sources • Various degrees of structure Challenges • Quality • Coherence • Co-evolution • Update propagation • Curation & interaction Opportunities • Background knowledge for various applications (e.g. question answering, data integration, machine learning) • Facilitate intra-organizational data sharing and exchange (data value chains) 14
  15. 15. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Comparison of various enterprise data integration paradigms Paradigm Data Model Integr. Strategy Conceptual/ operational Hetero- geneous data Intern./ extern. data No. of sources Type of integr. Domain coverage Se- mantic repres. XML Schema DOM trees LaV operational   medium both medium high Data Warehouse relational GaV operational - partially medium physical small medium Data Lake various LaV operational   large physical high medium MDM UML GaV conceptual - - small physical small medium PIM / PCS trees GaV operational partially partially - physical medium medium Enterprise search document - operational  partially large virtual high low EKG RDF LaV both   medium both high very high [1] Michael Galkin, Sören Auer, Simon Screrri: Enterprise Knowledge Graphs: A Survey. Submitted to 37th International Conference on Information Systems. 2016.
  16. 16. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Knowledge Graph Technology 16
  17. 17. Adding a Semantic Layer to Data Lakes 17 Management Accounting Marketing Sales SupportR&D Semantic Data Lake • central place for model, schema and data historization • Combination of Scale Out (cost reduction) and semantics (increased control & flexibility) • grows incrementally (pay-as-you-go) Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Data Lake (order of magnitude cheaper scalable data store) Knowledge Graph for Relationship Definition and Meta Data Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems JSON-LD CSVW R2RMLXML2RDF © eccenca.com See also https://www.eccenca.com/en/products-corporate-memory.html
  18. 18. W3C R2RML – Relational to RDF Mapping Sören Auer 18 R2RML: RDB to RDF Mapping Language, W3C Recommendation 27 September 2012 Editors: Souripriya Das, Seema Sundara, Richard Cyganiak http://www.w3.org/TR/r2rml/
  19. 19. Example R2RML Mapping Sören Auer 19
  20. 20. 1. Either resulting RDF knowledge base is materialized in a triple store & 2. subsequently queried using SPARQL 3. or the materialization step is avoided by dynamically mapping an input SPAQRL query into a corresponding SQL query, which renders exactly the same results as the SPARQL query being executed against the materialized RDF dump SPARQLMap – Mapping RDB 2 RDF
  21. 21. Example: Sparqlify • Rationale: Exploit existing formalisms (SQL, SPARQL Construct) as much as possible • flexible & versatile mapping language • translating one SPARQL query into exactly one efficiently executable SQL query • Solid theoretical formalization based on SPARQL-relational algebra transformations • Extremely scalable through elaborated view candidate selection mechanism • Used to publish 20B triples for LinkedGeoData [1] Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases. [2] Unbehauen, Stadler, Auer: Optimizing SPARQL-to-SQL Rewriting. iiWAS 2013 [3] Auer, et al.: Triplify: light-weight linked data publication from relational databases. WWW 2009 SPARQL Construct SQL View Bridge
  22. 22. Semantified Big Data Architecture Blueprint Sören Auer 22 [1] Mami, Scerri, Auer, Vidal: Towards the Semantification of Big Data Technology. DEXA 2016 Datasources Ingestion Storage Semantic Lifting with Mappings Querys Storing of semantic and semantified data in Apache Parquet files on HDFS
  23. 23. SEBIDA Implementation Architecture Sören Auer 23
  24. 24. SEBIDA Evaluation Results • Loads data faster • Has quite different query performance characteristics – faster in 5 out of 12 queries, similar performance in 2, slower in 5 Sören Auer 24
  25. 25. © Fraunhofer · Seite 25 VOCOL: COLLABORATIVE VOCABULARY CURATION ENVIRONMENT Comprehensive Support for Evolving Vocabularies
  26. 26. © Fraunhofer · Seite 26 Industry 4.0 Semantic Models as Bridge between Shop & Office Floor
  27. 27. © Fraunhofer · Seite 27 Semantic Administrative Shell & Reference Architecture for Industry 4.0 (RAMI4.0) Administrative Shell (Verwaltungsschale) provides a digital identity for arbitrary Industry 4.0 components (e.g. sensors, actors/robots) exposing data covering the whole life-cycle Reference Architecture for Industry 4.0 (RAMI4.0) provides a conceptual framework for implementing comprehensive Industry 4.0 scenarios We have implemented both concepts along with a number of IEC and ISO standards in a comprehensive information model ready to be implemented in productive environments
  28. 28. © Fraunhofer · Seite 28 VoCol collaborative Development Environment for Vocabularies Versioning Git/Bitbucket Issue tracking GitLab/ GitHub Syntax validation Docu- mentation generation Authoring Turtle Visualization vOWL Publishing LOD/Sparql Integrates a number of tools & services for different aspects of vocabulary development Is centered around Git version control (or Bitbucket), thus supporting the branching and merging of vocabularies Supports the roundtrip between • Schema/vocabulary development • Competency questions (expressed in SPARQL) • Example data  Bridges between conceptual models and executable code http://eis.iai.uni-bonn.de/Projects/VoCol.html
  29. 29. © Fraunhofer · Seite 29 Development based on Git – Version Control Git is meanwhile the most widely used version control system. It is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was initially designed and developed in 2005 by Linux kernel developers for Linux kernel development Git is the basis for a variety of open-source or commercial services and products such as: GitHub/Bitbucket - Web-based Git repository hosting service with millions of users GitLab/Gitolite - open-source Web-based Git repository management platforms Since TeamFoundationServer release 2013, Microsoft added native support for Git Git is easily extensible and integratable into arbitrary workflows via GitHooks
  30. 30. VoCol Collaborative Vocabulary Development Environment Entry Page
  31. 31. VoCol: Dynamic Documentation
  32. 32. © Fraunhofer · Seite 32 Environment: Dynamic Documentation
  33. 33. © Fraunhofer · Seite 33 VoCol Environment: Dynamic Visualization
  34. 34. © Fraunhofer · Seite 34 VoCol Environment: Analytics
  35. 35. VoCol Environment: Version Control with Git/GitHub/Git Lab/Bitbucket
  36. 36. © Fraunhofer · Seite 36 VoCol Environment: Integrated SPARQL Querying, e.g. for checking competency questions
  37. 37. VoCol Map Visualization
  38. 38. VoCol Environment: Direct Turtle Editing
  39. 39. VoCol Environment: Vocabulary Evolution Report
  40. 40. © Fraunhofer · Seite 40 INDUSTRIAL DATA SPACE
  41. 41. © Fraunhofer · Seite 41 Vocabulary-based Integration facilitates Data-driven Businesses Vocablary
  42. 42. © Fraunhofer ·· Seite 42 Die Arbeiten zum Industrial Data Space sind komplementär verzahnt mit der Plattform Industrie 4.0 Handel 4.0 Bank 4.0Versicherung 4.0 …Industrie 4.0 Fokus auf die produzierende Industrie Smart Services Übertragung, Netzwerke Echtzeitsysteme Industrial Data Space Fokus auf Daten Daten …
  43. 43. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS The Industrial Data Space Initiative Community of >30 large German and European Companies Pre-competitive, publicly funded innovation project involving 11 Fraunhofer institutes for developing IDS reference architecture Current members of the Industrial Data Space Association
  44. 44. © Fraunhofer · Seite 44 Bilder: ©Fotolia Francesco De Paoli, Nmedia, hakandogu Semantic Data Linking for Enterprise Data Value Chains Data Lake Pure Internet centralized, monopolistic federated, secure, „trusted“, standard-based completely dezentral, open, unsecure Data management Central Repository Decentral Decentral Data Ownership Central Decentral Decentral Data Linking Single provider Federated, on demand Missing Data Security Bilateral Certified system Bilateral Market structure Central Provider Role system Unstructured Transport infrastructure Internet Internet Internet Industrial Data Space
  45. 45. © Fraunhofer · Seite 45 Bilder: © Fotolia 77260795 ∙ 73040142 58947296 ∙ 68898041 Basic principles of the Industrial Data Space On Demand Vernetzung Linked Light Semantics Security with Industrial Data Container Certified Roles On Demand Interlinking
  46. 46. © Fraunhofer · Seite 46 Bildquellen: Istockphoto Industrial Data Space: On Demand Interlinking Service A Service C Service E Service B Service D Service G Service F Enterprise 4 Enterprise 1 Enterprise 6 Enterprise 2 Enterprise 3 Enterprise 5 All Data stays with its Ownern and are controlled and secured. Only on request for a service data will be shared. No central platform.
  47. 47. © Fraunhofer · Seite 47 --- VERTRAULICH --- Industrial Data Space Upload / Download / Search Internet AppsVocabulary Industrial Data Space Broker Clearing RegistryIndex Industrial Data Space App Store Internal IDS Connector Company A Internal IDS Connector Company B External IDS Connector External IDS Connector Upload Third Party Cloud Provider Download Upload / Download © Fraunhofer IDS Architecture Overview
  48. 48. Big Data is not Just Volume and Velocity Variety (& Varacity) are key challenges Linked Data helps dealing with both • Linked Data life-cycle requires to integrate and adapt results from a number of disciplines – NLP, – Machine Learning, – Knowledge Representation, – Data Management, – User Interaction – … • Applications in a number of domains – cultural heritage, – life sciences, – industry 4.0 / cyber-physical systems, – smart cities, – mobility, – … Sören Auer 48 Linked Data links not only data but also: • Various disciplines • Applications and Use cases
  49. 49. Creating Knowledge out of Interlinked Data Thanks for your attention! Sören Auer http://www.iai.uni-bonn.de/~auer | http://eis.iai.uni-bonn.de auer@cs.uni-bonn.de https://www.eccenca.com

Editor's Notes

  • http://www.gi.de/nc/service/informatiklexikon/detailansicht/article/big-data.html
  • Data Lake is a storage repository for big data scale raw data in original data formats.
    late binding approach to schema: “Let us decide, when we need it.”
    scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses.
    Semantic Data Lake = Data Lake + Knowledge Graph
    management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other.
    A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities.
    Based on the Resource Description Framework (RDF) standard and Linked Data principles.
  • Die Plattform bietet einen sicheren Raum zur Vernetzung
    Daten bleiben bei den Enterprise und werden nur bei Bedarf vernetzt
    Marktorientiertes Modell ohne Abhängigkeiten von einzelnen Anbietern
    Wertschöpfung und Servicee bleiben beim Enterprise
    Finanzierung über Servicee, nicht über Werbung oder Datenverkauf

    Keine zentrale Datenkrake wie Google, sondern Kontrolle über Daten bleibt bei den Daten-Ownern
    Kunde (Endnutzer) ist nicht Produkt, sondern Souverän über seine Daten
    Das Ganze ist mehr als die Summe der einzelnen Teile (Ende-zu-Ende-Servicee auf Basis der Daten von mehreren bieten überproportional höheren Mehrwert)
    Kein zentraler Datentopf, sondern ein Netz gesunder, sicherer Daten
    Governance nicht monopolistisch, sondern föderal
  • Linked Data approach can help to establish data value chains
    Linked Data life-cycle requires to integrate and adapt results from a number of disciplines (NLP, Machine Learning, Knowledge Representation, Data Management)
    Applications in a number of domains (cultural heritage, life sciences, industry 4.0 / cyber-physical systems, smart cities, mobility,…)
  • ×