Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Session 1.2 enrich your knowledge graphs: linked data integration with pool party semantic integrator

144 views

Published on

Talk at SEMANTiCS 2017
www.semantics.cc

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Session 1.2 enrich your knowledge graphs: linked data integration with pool party semantic integrator

  1. 1. Tomas Knap Semantic Web Company Enrich Your Knowledge Graphs: Linked Data Integration with PoolParty Semantic Integrator 1
  2. 2. Agenda ▸ PoolParty Semantic Integrator & UnifiedViews ▸ General data acquisition tasks ▹ Schema mapping ▹ Entity linking ▹ Data fusion ▸ PoolParty Semantic Integrator and data acquisition tasks 2
  3. 3. PoolParty Semantic Integrator & UnifiedViews Introduction 3
  4. 4. PoolParty Semantic Integrator ▸ A semantic technology suite ▹ Organize and maintain company knowledge ■ Skos thesauri/ontology ▹ Annotate documents with resources from a knowledge base ▹ Provide focused search on top of the annotated document space ▸ https://www.poolparty.biz/ ▹ Or please visit PoolParty booth 4
  5. 5. UnifiedViews ▸ UnifiedViews is an ETL tool for RDF data processing ▹ Allows users to manage RDF data processing tasks ▹ Natively supporting RDF data format ▸ Available standalone or as part of PoolParty Semantic Integrator ▹ Data acquisition tasks, long running tasks 5
  6. 6. UnifiedViews Approach ▸ Standard maintenance interface ▹ Define, execute, monitor, schedule, and share data processing tasks ▹ Predefined and customizable building blocks (plugins) to set up the individual data processing tasks ▸ Debugging features ▸ Simplified documentation ▹ Visualizations of the prepared tasks ■ Plugins ■ Data flow 6
  7. 7. UnifiedViews Pipeline 7
  8. 8. UnifiedViews Core Components ▸ Web administration interface ▹ Define and maintain pipelines ▹ Validate, execute, monitor pipelines ▹ Possibility to schedule pipelines ■ Notifications ▹ Possibility to debug pipelines ▹ Possibility to share pipelines and plugins ▹ Define and maintain plugins ▹ Multi-user environment, SSO support ▸ Robust engine running the tasks ▸ API to work with tasks, executions, scheduled events 8
  9. 9. UnifiedViews Core Plugins ▸ Set of Core plugins available ▹ Extractors ■ Obtaining external sources (CSV, DBF, XLS, XML files, RDF data, or relational tables) ▹ Transformers ■ Transforming them between various formats (e.g. CSV files to RDF data, relational tables to RDF data) ■ Executing typical transformations such as SPARQL Update queries, or XSL transformations ▹ Loaders ■ Loading the transformed and curated data to external systems, repositories ▸ 35+ plugins 9
  10. 10. UnifiedViews Custom Plugins ▸ Easy way to extend UnifiedViews with your own plugins ▹ Guide for creating new plugins ▹ Tutorials 10
  11. 11. UnifiedViews Team 11
  12. 12. UnifiedViews Availability ▸ Available under an open source license (GPL + LGPL v3) ▹ Commercial license also available as part of PoolParty Semantic Integrator ▸ Hosted on GitHub ▹ https://github.com/UnifiedViews ▸ http://unifiedviews.eu 12
  13. 13. General Data Acquisition Tasks Overview 13
  14. 14. Generic Data Acquisition Pipelines ▸ Support the full data integration process ▹ Data collection/pre-processing ▹ Schema mapping ▹ Entity linking ▹ Data fusion ▹ Loading data to a resulting data mart 14
  15. 15. General Data Acquisition Tasks Schema Mapping 15
  16. 16. Schema Mapping - Goals ▸ A generic schema mapping DPU ▹ Arbitrary sources (CSV, JSON, XML, relational data sources) mapped to RDF data model ▹ Support for data transformations ▹ Resulting data validation ▹ Suggestion of rules 16
  17. 17. Schema Mapping - Approach ▸ Schema mapping DPU based on RML ▹ Generalization of R2RML (W3C Rec) ▹ Rml.io 17
  18. 18. Schema Mapping - Approach ▸ UnifiedViews pipeline ▹ Explain inputs, mappings, outputs ■ http://rml.io/spec.html#example-input ▹ Explain DPU’s configuration 18
  19. 19. Schema Mapping - Status & Lessons Learned & Next Steps ▸ A generic schema mapping DPU ▹ Arbitrary sources (CSV, JSON, XML, relational data sources) mapped to RDF data model ▹ Support for data transformations ▹ Resulting data validation ▹ Suggestion of rules ▸ Next steps: ▹ Performance evaluation ▹ UI for preparing RML rules 19
  20. 20. General Data Acquisition Tasks Entity Linking 20
  21. 21. Entity Linking - Goals ▸ A generic DPU, which may run entity linking tasks ▹ For arbitrary structured data ▹ Linkage rules may be provided ▸ Usually to find duplicates between acquired data and data in a target knowledge base 21
  22. 22. Entity Linking - Approach ▸ UnifiedViews DPU which wraps Silk ▹ http://silkframework.org/22
  23. 23. Entity Linking - Approach ▸ A sample UnifiedViews pipeline ▹ As extractors, transformers23
  24. 24. Entity Linking Status & Lessons Learned & Next Steps ▸ Linker as extractor ▹ Linked as transformer ▸ Limitations ▹ Performance issues if linking against bigger knowledge bases ■ E.g. DBpedia ▸ Currently working on a special approach for linking concepts with DBpedia resources ▹ Querying Solr with pre-processed DBpedia knowledge base 24
  25. 25. General Data Acquisition Tasks Data Fusion 25
  26. 26. Data Fusion - Goals ▸ A generic UnifiedViews DPU, which is able to fuse different representations of the same resources ▹ Access the quality of data sources ▹ Apply a conflict resolution function 26
  27. 27. Data Fusion - Approach ▸ UnifiedViews DPU, which uses LD-Fusion Tool ▹ http://mifeet.github.io/LD-FusionTool 27
  28. 28. Data Fusion - Approach ▸ UnifiedViews pipeline ▹ Inputs, sameAs links, outputs28
  29. 29. Data Fusion - Status & Lessons Learned & Next Steps ▸ A generic data fusion DPU ▹ Using LD-Fusion Tool ▸ Limitations: ▹ RDF4J not supported ▸ Next steps: ▹ Tracking provenance of fused data in UnifiedViews ▹ Assess quality of: ■ Inputs to data fusion ■ Outputs - fused data 29
  30. 30. General Data Acquisition Tasks In PoolParty Semantic Integrator 30
  31. 31. PoolParty Semantic Integrator and Data Acquisition Tasks ▸ User interface to see overview of data acquisition tasks ▹ List of tasks, its creation, execution, status ▸ Possibility to browse/examine resulting integrated data ▹ E.g. to see the resulting fused data 31
  32. 32. Summary 32
  33. 33. Summary ▸ PoolParty Semantic Integrator and UnifiedViews ▸ General data acquisition tasks ▹ Schema mapping ▹ Entity linking ▹ Data fusion ▸ PoolParty Semantic Integrator and data acquisition tasks 33
  34. 34. Contact Tomas Knap, PhD Architect & Researcher Semantic Web Company Research interests: ▸ Linked Data integration and quality ▸ Linked Data management Contact: ▸ tomas.knap@semantic-web.com 34 © Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/

×