The web of interlinked data and knowledge stripped

1,592 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,592
On SlideShare
0
From Embeds
0
Number of Embeds
122
Actions
Shares
0
Downloads
33
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The web of interlinked data and knowledge stripped

  1. 1. Linked Data for Enterprise Information Integration Dr. Sören Auer
  2. 2. Creating Knowledge out of Interlinked Data Web server Web server Problem: Try to search for these things on the current Web: • Apartments near German-English bilingual childcare in Passau • ERP service providers with offices in Vienna and London • Researchers working on multimedia topics in Eastern Europe Information is available on the Web, but opaque to current search. Why do we need the Data Web? passau.de Has everything about childcare in Passau. Immobilienscout.de Knows all about real estate offers in GermanyDB Web server DB Web server Search engineHTML HTML RDF RDF Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate/join such structured information from different sources:
  3. 3. Creating Knowledge out of Interlinked Data 1. Uses RDF Data Model Linked Data in a Nutshell KESW2012 St. Petersburg 1.10.2012 IFMO organizes starts takesPlaceIn 2. Is serialised in triples: IFMO organizes KESW2012 . KESW2012 starts “20121001”^^xsd:date . KESW2012 takesPlaceAt St._Petersburg . 3. Uses Content-negotiation Subject Predicate Object
  4. 4. The emerging Web of Data 20082007 2008 2008 2008 2009 2009 2010 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
  5. 5. Creating Knowledge out of Interlinked Data The situation at a world leading car manufacturer (€97.76 billion revenue, 250.000 employees): • 3.000 heterogeneous IT systems • Different units (car, bus, truck etc.) with very different views • No common language • Inability to identify crucial entities (parts, locations etc.) enterprise wide There is no (can not be a) single Enterprise Information Model A distributed, iterative, bottom-up integration approach such as Linked Data might be able to help (pay-as-you-go). Can Linked Data help to solve the EII problem in a fortune-500 company?
  6. 6. Creating Knowledge out of Interlinked Data Distributed Social Semantic Networking
  7. 7. FromIntranettoEnterpriseDataWebaroundaknowledgehub
  8. 8. Creating Knowledge out of Interlinked Data Inter- linking/ Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ authoring Linked Data Lifecycle
  9. 9. Creating Knowledge out of Interlinked Data Extraction Inter- linking Enrichm ent Quality Analysis Evolution Repair Explora- tion Extrac- tion Store Query Author ing
  10. 10. Creating Knowledge out of Interlinked Data From unstructured sources • NLP, text mining, annotation From semi-structured sources • DBpedia, LinkedGeoData, DataCube From structured sources • RDB2RDF Extraction
  11. 11. Creating Knowledge out of Interlinked Data extract structured information from Wikipedia & make this information available on the Web as LOD: • ask sophisticated queries against Wikipedia (e.g. universities in brandenburg, mayors of elevated towns, soccer players), • link other data sets on the Web to Wikipedia data • Represents a community consensus Recently launched DBpedia Live transforms Wikipedia into a structured knowledge base Transforming Wikipedia into an Knowledge Base S. Auer et al.: DBpedia - A Crystallization Point for the Web of Data. Journal of Web Semantics, Elsevier 2009. Most Cited Article 2006-10 Award S. Auer et al.: DBpedia: A Nucleus for a Web of Open Data. 6th International Semantic Web Conference ISWC07. S. Auer et al.: What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. 4th European Semantic Web Conf. ESWC07
  12. 12. Structure in Wikipedia • Title • Abstract • Infoboxes • Geo-coordinates • Categories • Images • Links – other language versions – other Wikipedia pages – To the Web – Redirects – Disambiguations
  13. 13. Infobox templates {{Infobox Korean settlement | title = Busan Metropolitan City | img = Busan.jpg | imgcaption = A view of the [[Geumjeong]] district in Busan | hangul = 부산 광역시 ... | area_km2 = 763.46 | pop = 3635389 | popyear = 2006 | mayor = Hur Nam-sik | divs = 15 wards (Gu), 1 county (Gun) | region = [[Yeongnam]] | dialect = [[Gyeongsang]] }} http://dbpedia.org/resource/Busan dbp:Busan dbpp:title ″Busan Metropolitan City″ dbp:Busan dbpp:hangul ″부산 광역시″@Hang dbp:Busan dbpp:area_km2 ″763.46“^xsd:float dbp:Busan dbpp:pop ″3635389“^xsd:int dbp:Busan dbpp:region dbp:Yeongnam dbp:Busan dbpp:dialect dbp:Gyeongsang ... Wikitext-Syntax RDF representation
  14. 14. A vast multi-lingual, multi-domain knowledge base DBpedia extraction results in: • descriptions of ca. 3.4 million things (1.5 million classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases • labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages; 4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories • altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions • DBpedia Live (http://live.dbpedia.org/sparql/) & Mappings Wiki (http://mappings.dbpedia.org) integrate the community into a refinement cycle • Upcomming DBpedia inline
  15. 15. Creating Knowledge out of Interlinked Data SELECT ?name ?birth ?description ?person WHERE { ?person dbp:birthPlace dbp:Berlin . ?person skos:subject dbp:Cat:German_musicians . ?person dbp:birth ?birth . ?person foaf:name ?name . ?person rdfs:comment ?description . FILTER (LANG(?description) = 'en') . } ORDER BY ?name DBpedia SPARQL Endpoint
  16. 16. Creating Knowledge out of Interlinked Data DBpedia Applications: Relfinder 2011/05/12 CONSEGI - Sören Auer: DBpedia 17
  17. 17. Creating Knowledge out of Interlinked Data Muddy Boots (BBC): Annotate actors in BBC News with DBpedia identifiers Open Calais (Reuters): named entities connected via owl:sameAs to DBpedia Faviki (social bookmarking): uses DBpedia to group tags & multi-language support Topbraid Composer (ontology editor): links entities to DBpedia DBpedia Applications (3rd party)
  18. 18. Creating Knowledge out of Interlinked Data Many different approaches: D2R, Virtuoso RDF Views, Triplify, No agreement on a formal semantics of RDF2RDF mapping • LOD readiness, SPARQL-SQL translation W3C RDB2RDF WG Extraction Relational Data Tool Triplify Sparqlify D2RQ Virtuoso RDF Views Technology Scripting languages (PHP) Java Java Whole middleware solution SPARQL endpoint - X X X Mapping language SQL SPARQL CONSTRUCT Views + SQL RDF based RDF based Mapping generation Manual Semi- automatic Semi- automatic Manual Scalability Medium- high (but no SPARQL) Very high Medium High Malhotra, Auer, Erling, Hausenblas: W3C RDB2RDF Incubator Group Report. W3C RDB2RDF Incubator Group, 2009.
  19. 19. Creating Knowledge out of Interlinked Data Triplify Light-weight approach for Linked Data publishing from relational databases Auer, Tramp, Aumüller, Lehmann, Hellmann: Triplify - Light-weight Linked Data Publication from Relational Databases. In 18th International World Wide Web Conference (WWW 2009).
  20. 20. Creating Knowledge out of Interlinked Data • Rationale: Exploit existing formalisms (SQL, SPARQL Construct) as much as possible • flexible & versatile mapping language • translating one SPARQL query into exactly one efficiently executable SQL query • Solid theoretical formalization based on SPARQL-relational algebra transformations • Extremely scalable through elaborated view candidate selection mechanism • Used to publish 20B triples for LinkedGeoData Sparqlify Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases. Submitted to VLDB-Journal. SPARQL Construct SQL View Bridge
  21. 21. Creating Knowledge out of Interlinked Data Storage and Querying Inter- linking Enrichm ent Quality Analysis Evolution Repair Explora- tion Extrac- tion Store Query Author ing
  22. 22. Creating Knowledge out of Interlinked Data Querying still by a factor 3-20 slower than relational data management (BSBM, DBpedia Benchmark), but more flexibility Performance increases steadily Comprehensive, well-supported open-source and commercial implementations are available: • OpenLink’s Virtuoso (os+commercial) • Big OWLIM (commercial), Swift OWLIM (os) • 4store (os) • Dydra (hosted) • Bigdata (distributed) • Allegrograph (commercial) • Mulgara (os) RDF Data Management
  23. 23. Creating Knowledge out of Interlinked Data • Uses DBpedia as data and a selection of 25 frequently executed queries • Can generate fractions and multiples of DBpedia‘s size • Does not resemble relational data Performance differences, observed with other benchmarks are amplified DBpedia Benchmark Geometric Mean Morsey, Lehmann, Auer, Ngonga: DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data. Int. Semantic Web Conf. (ISWC2011). Best-paper award.
  24. 24. Creating Knowledge out of Interlinked Data 1. Semantic (Text) Wikis • Authoring of semantically annotated texts 2. Semantic Data Wikis • Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL) Two Kinds of Semantic Wikis
  25. 25. Creating Knowledge out of Interlinked Data • Versatile domain-independent tool • Serves as Linked Data / SPARQL endpoint on the Data Web • Open-source project hosted at Google code • Not just a Wiki UI, but a whole framework for the development of Semantic Web applications • Developed in PHP based on the Zend framework • Very active developer and user community • More than 500 downloads monthly • Large number of use cases, including industry: OntoWiki a semantic data wiki [1] Auer, Dietzold, Riechert: OntoWiki - A Tool for Social, Semantic Collaboration. 5th International Semantic Web Conference, ISWC 2006. [2] Riechert, Morgenstern, Auer, Tramp, Martin: Knowledge Engineering for Historians on the Example of the Catalogus Professorum Lipsiensis 9th Int. Semantic Web Conference ISWC2010. Best paper award.
  26. 26. Creating Knowledge out of Interlinked Data The situation at a world leading car manufacturer (€97.76 billion revenue, 250.000 employees): • 3.000 heterogeneous IT systems • Different units (car, bus, truck etc.) with very different views • No common language • Inability to identify crucial entities (parts, locations etc.) enterprise wide There is no (can not be a) single Enterprise Information Model A distributed, iterative, bottom-up integration approach such as Linked Data might be able to help (pay-as-you-go). Can Linked Data help to solve the EII problem in a fortune-500 company?
  27. 27. Creating Knowledge out of Interlinked Data OntoWiki with a car model database loaded
  28. 28. Creating Knowledge out of Interlinked Data
  29. 29. Creating Knowledge out of Interlinked Data
  30. 30. Creating Knowledge out of Interlinked Data Management of Enterprise Taxonomies with OntoWiki Based on the W3C SKOS standard Corporate Language Management: 500k concepts in 20 languages
  31. 31. Creating Knowledge out of Interlinked Data Search for „combi“ also finds T-model
  32. 32. Creating Knowledge out of Interlinked Data
  33. 33. Creating Knowledge out of Interlinked Data Structured knowledge base allows to search for specific data (i.e. cars with more than 6 seats)
  34. 34. Creating Knowledge out of Interlinked Data … or less than 5 liter fuel consumption per 100km
  35. 35. FromIntranettoEnterpriseDataWebaroundaknowledgehub Auer, Frischmuth, Klímek, Unbehauen, Holzweißig, Marquardt: Linked Data in Enterprise Information Integration Submitted to Semantic Web Journal 2012.
  36. 36. Linked Data & Collaboration for the Digital Humanities Riechert, Morgenstern, Auer, Tramp, Martin: Knowledge Engineering for Historians on the Example of the Catalogus Professorum Lipsiensis. 9th International Semantic Web Conference (ISWC2010). Best Paper award.
  37. 37. OntoWiki Dynamic views on knowledge bases
  38. 38. OntoWiki for the Catalogus Professorum Lipsiensis RDF triples on resource details page
  39. 39. Dynamische Vorschläge aus dem Daten Web OntoWiki for the Catalogus Professorum Lipsiensis
  40. 40. CPM Ontologie
  41. 41. Catalogus Professorum Lipsiensis
  42. 42. Creating Knowledge out of Interlinked Data © CC-BY-NC-ND by ~Dezz~ (residae on flickr) Linking Inter- linking Enrichm ent Quality Analysis Evolution Repair Explora- tion Extrac- tion Store Query Author ing
  43. 43. Creating Knowledge out of Interlinked Data In an uncontrolled environment as the Data Web, there will be a proliferation of equivalent or similar entity identifiers Manual Link discovery: • Sindice integration into UIs • Semantic Pingback Semi-automatic: • SILK • LIMES Automatic/ Supervised: • Raven [1] Linking Entities on the Data Web [1] Ngonga, Lehmann, Auer, Höffner: RAVEN -- Active Learning of Link Specifications, OM@ISWC, 2011.
  44. 44. Creating Knowledge out of Interlinked Data Similarity/Equality/relatedness of entities can be often expressed using a distance metric (e.g. strings - edit distance, POIs - euclidian distance) Uses the characteristics of metric spaces Esp. consequences of triangle inequality d(x, y) < d(x, z) + d(z, y) d(x, z) - d(z, y) < d(x, y) < d(x, z) + d(z, y) Use pessimistic approximations of distances instead of computing them Only compute distances when needed High-performance LIMES framework is available as open- source and outperformes state-of-the-art by an order of magnitude LIMES: Link Discovery in Metric Spaces Ngonga, Auer: LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data 22nd Int. Joint Conf. on Artificial Intelligence (IJCAI2011).
  45. 45. Creating Knowledge out of Interlinked Data Active learning of link specifications: Raven - Towards Zero-Conguration Link Discovery Ngonga Ngomo, Lehmann, Auer, Höffner: RAVEN: Towards Zero-Configuration Link Discovery. In OM 2012.
  46. 46. Creating Knowledge out of Interlinked Data • Experiments even with very large KBs (Diseasome & DBpedia) show that with 10-20 examples a f-score of >95% can be achieved • Learning iteration takes <1s Active learning of link specifications
  47. 47. Creating Knowledge out of Interlinked Data Enrichment Inter- linking Enrichm ent Quality Analysis Evolution Repair Explora- tion Extrac- tion Store Query Author ing
  48. 48. Creating Knowledge out of Interlinked Data Linked Data is mainly instance data!!! ORE (Ontology Repair and Enrichment) tool allows to improve an OWL ontology by fixing inconsistencies & making suggestions for adding further axioms. • Ontology Debugging: OWL reasoning to detect inconsistencies and satisfiable classes + detect the most likely sources for the problems. user can create a repair plan, while maintaining full control. • Ontology Enrichment: uses the DL-Learner framework to suggest definitions & super classes for existing classes in the KB. works if instance data is available for harmonising schema and data. http://aksw.org/Projects/ORE Enrichment & Repair Lehmann, Auer, Tramp: Class Expression Learning for Ontology Engineering. Journal of Web Semantics (JWS), 2011.
  49. 49. Creating Knowledge out of Interlinked Data Given: • Background knowledge base • Positive and negative examples (example = individual in ontology) Goal: • Find an OWL Class Expression / DL concept which • covers as many positive examples as possible • covers as few negative examples as possible Concept C covers example a <=> a is instance of C Analogous problem can be defined for logic programs => Inductive Logic Programming Supervised Machine Learning Task Improving Linked Data Quality by Ontology Learning Hellmann, Lehmann, Auer: Learning of OWL Class Descriptions on Very Large Knowledge Bases. Int. Journal on Semantic Web & Information Systems (IJSWIS), Vol. 5, Issue 2, April-July 2009, ISSN: 1552-6283.
  50. 50. Creating Knowledge out of Interlinked Data Analysis Quality Inter- linking Enrichm ent Quality Analysis Evolution Repair Explora- tion Extrac- tion Store Query Author ing
  51. 51. Creating Knowledge out of Interlinked Data Quality on the Data Web is varying a lot • Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia) Research Challenge • Establish measures for assessing the authority, provenance, reliability of Data Web resources Opportunity for EII: Employ crowd-sourced knowledge from the Data Web in the Enterprise Linked Data Quality Analysis FP7-IP DIACHRON Managing the Evolution and Preservation of the Data Web Started April 2013
  52. 52. Creating Knowledge out of Interlinked Data Evolution © CC-BY-SA by alasis on flickr) Inter- linking Enrichm ent Quality Analysis Evolution Repair Explora- tion Extrac- tion Store Query Author ing
  53. 53. Creating Knowledge out of Interlinked Data • unified method, for data evolution & ontology refactoring. • modularized, declarative definition of evolution patterns => simple compared to imperative description • RDF representation of evolution patterns => patterns can be shared and reused on the Data Web. • declarative definition of bad smells and corresponding evolution patterns promotes the (semi- )automatic improvement of information quality. EvoPat Pattern based KB Evolution Rieß, Heino, Dietzold, Auer: EvoPat - Pattern-Based Evolution and Refactoring of RDF Knowledge Bases. In: 9th International Semantic Web Conference ISWC2010.
  54. 54. Creating Knowledge out of Interlinked Data Exploration Inter- linking Enrichm ent Quality Analysis Evolution Repair Explora- tion Extrac- tion Store Query Author ing
  55. 55. Creating Knowledge out of Interlinked Data An ecosystem of LOD visualizations LODExploration Widgets Spatial faceted- browsing Faceted- browsing Statistical visualization Entity-/faceted- Based browsing Domain specific visualizations … … LODDatasetsChoreography layer • Dataset analysis (size, vocabularies, property histograms etc.) • Selection of suitable visualization widgets Brunetti, Auer, García: The Linked Data Visualization Model. To appear in IJSWIS, 2012.
  56. 56. Creating Knowledge out of Interlinked Data
  57. 57. Creating Knowledge out of Interlinked Data
  58. 58. Creating Knowledge out of Interlinked Data
  59. 59. Creating Knowledge out of Interlinked Data
  60. 60. Creating Knowledge out of Interlinked Data
  61. 61. Creating Knowledge out of Interlinked Data
  62. 62. Creating Knowledge out of Interlinked Data
  63. 63. Creating Knowledge out of Interlinked Data LOD Life-(Washing-)cycle supported by Debian based LOD2 Stack http://stack.lod2.eu
  64. 64. Creating Knowledge out of Interlinked Data Linked Enterprise Intra Data Webs fill the gap between Intra-/Extranets and EIS/ERP Unstructured Information Management Structured Information Management Support the long tail of enterprise information domains • Human-resources • Requirements engineering • Supply-chains
  65. 65. Creating Knowledge out of Interlinked Data When just data shall be exchanged and integrated SOA is quite expensive Facilitates data integration along value-chains within and across enterprises PricewaterhouseCoopers, Technology Forecast, 2009
  66. 66. Creating Knowledge out of Interlinked Data • Linked Data is a promising technology for closing the gap between SOA and unstructured information management • wealth of knowledge available as LOD can be leveraged as background knowledge for Enterprise applications • The application of Linked Data in the enterprise is still largely unexplored (opportunity) • Linked Data will make Enterprise Information Integration more flexible, iterative, cost effective Take home messages Auer, Frischmuth, Klímek, Tramp, Unbehauen, Holzweißig, Marquardt: Linked Data in Enterprise Information Integration Submitted to Semantic Web Journal.
  67. 67. Creating Knowledge out of Interlinked Data DBpedia “Semantification” of Wikipedia AKSW: Bridging Theory with Applications Triplify “Semantification” of (small) Web Applications OntoWiki Collaborative creation of explicit knowledge via Semantic Wikis LIMES Link Discovery Framework for metric spaces Vakantieland Building Data Web applications SoftWiki Distributed, stakeholder driven Requirements Engineering Foundations Marrying databases with RDF and ontologies Tools & Datasets Applications Bringing the Data Web to end users NLP2RDF Integrating Natural Language processing tool chains with LOD Enterprise Knowledge Bases Realizing knowledge hubs within an Enterpise’s Data Intranet Thesaurus Management Defining corp. language & data … DL-Learner Machine Learning for Ontologies Catalogus Professorum Prosopographical knowledge base LinkedGeoData “Semantification” of OpenStreetMaps LESS Semantification Syndication RDB2RDF Mapping relational data to RDF ORE Ontology Enrichment & Repair
  68. 68. EU-FP7 LOD2 Project Overview . Page 71 http://lod2.eu Creating Knowledge out of Interlinked Data AKSW Team
  69. 69. EU-FP7 LOD2 Project Overview . Page 72 http://lod2.eu Creating Knowledge out of Interlinked Data The LOD2 Gang
  70. 70. Creating Knowledge out of Interlinked Data Thanks for your attention! Sören Auer http://www.informatik.uni-leipzig.de/~auer | http://aksw.org | http://lod2.org auer@informatik.uni-leipzig.de Soon at:

×