Tomas Knap
Semantic Web Company
Enrich Your
Knowledge
Graphs: Linked
Data
Integration
with PoolParty
Semantic
Integrator
1
Agenda
▸ PoolParty Semantic Integrator
& UnifiedViews
▸ General data acquisition tasks
▹ Schema mapping
▹ Entity linking
▹ Data fusion
▸ PoolParty Semantic Integrator
and data acquisition tasks
2
PoolParty Semantic
Integrator & UnifiedViews
Introduction
3
PoolParty
Semantic
Integrator
▸ A semantic technology suite
▹ Organize and maintain company
knowledge
■ Skos thesauri/ontology
▹ Annotate documents with
resources from a knowledge base
▹ Provide focused search on top of
the annotated document space
▸ https://www.poolparty.biz/
▹ Or please visit PoolParty booth
4
UnifiedViews
▸ UnifiedViews is an ETL tool for
RDF data processing
▹ Allows users to manage RDF data
processing tasks
▹ Natively supporting RDF data
format
▸ Available standalone or as part
of PoolParty Semantic Integrator
▹ Data acquisition tasks, long
running tasks
5
UnifiedViews
Approach
▸ Standard maintenance interface
▹ Define, execute, monitor, schedule, and
share data processing tasks
▹ Predefined and customizable building
blocks (plugins) to set up the individual
data processing tasks
▸ Debugging features
▸ Simplified documentation
▹ Visualizations of the prepared tasks
■ Plugins
■ Data flow
6
UnifiedViews
Pipeline
7
UnifiedViews
Core
Components
▸ Web administration interface
▹ Define and maintain pipelines
▹ Validate, execute, monitor pipelines
▹ Possibility to schedule pipelines
■ Notifications
▹ Possibility to debug pipelines
▹ Possibility to share pipelines and plugins
▹ Define and maintain plugins
▹ Multi-user environment, SSO support
▸ Robust engine running the tasks
▸ API to work with tasks, executions,
scheduled events
8
UnifiedViews
Core Plugins
▸ Set of Core plugins available
▹ Extractors
■ Obtaining external sources (CSV, DBF, XLS, XML
files, RDF data, or relational tables)
▹ Transformers
■ Transforming them between various formats
(e.g. CSV files to RDF data, relational tables to
RDF data)
■ Executing typical transformations such as
SPARQL Update queries, or XSL
transformations
▹ Loaders
■ Loading the transformed and curated data to
external systems, repositories
▸ 35+ plugins
9
UnifiedViews
Custom
Plugins
▸ Easy way to extend
UnifiedViews with your own
plugins
▹ Guide for creating new plugins
▹ Tutorials
10
UnifiedViews
Team
11
UnifiedViews
Availability
▸ Available under an open source
license (GPL + LGPL v3)
▹ Commercial license also available as part
of PoolParty Semantic Integrator
▸ Hosted on GitHub
▹ https://github.com/UnifiedViews
▸ http://unifiedviews.eu
12
General Data
Acquisition Tasks
Overview
13
Generic Data
Acquisition
Pipelines
▸ Support the full data integration
process
▹ Data collection/pre-processing
▹ Schema mapping
▹ Entity linking
▹ Data fusion
▹ Loading data to a resulting data
mart
14
General Data
Acquisition Tasks
Schema Mapping
15
Schema
Mapping -
Goals
▸ A generic schema mapping DPU
▹ Arbitrary sources (CSV, JSON, XML,
relational data sources) mapped to
RDF data model
▹ Support for data transformations
▹ Resulting data validation
▹ Suggestion of rules
16
Schema
Mapping -
Approach
▸ Schema mapping DPU based on RML
▹ Generalization of R2RML (W3C Rec)
▹ Rml.io
17
Schema
Mapping -
Approach
▸ UnifiedViews pipeline
▹ Explain inputs, mappings, outputs
■ http://rml.io/spec.html#example-input
▹ Explain DPU’s configuration
18
Schema
Mapping -
Status &
Lessons
Learned &
Next Steps
▸ A generic schema mapping DPU
▹ Arbitrary sources (CSV, JSON, XML,
relational data sources) mapped to
RDF data model
▹ Support for data transformations
▹ Resulting data validation
▹ Suggestion of rules
▸ Next steps:
▹ Performance evaluation
▹ UI for preparing RML rules
19
General Data
Acquisition Tasks
Entity Linking
20
Entity Linking
- Goals
▸ A generic DPU, which may run entity
linking tasks
▹ For arbitrary structured data
▹ Linkage rules may be provided
▸ Usually to find duplicates between
acquired data and data in a target
knowledge base
21
Entity Linking
- Approach
▸ UnifiedViews DPU which wraps Silk
▹ http://silkframework.org/22
Entity Linking
- Approach
▸ A sample UnifiedViews pipeline
▹ As extractors, transformers23
Entity Linking
Status &
Lessons
Learned &
Next Steps
▸ Linker as extractor
▹ Linked as transformer
▸ Limitations
▹ Performance issues if linking
against bigger knowledge bases
■ E.g. DBpedia
▸ Currently working on a special
approach for linking concepts with
DBpedia resources
▹ Querying Solr with pre-processed
DBpedia knowledge base
24
General Data
Acquisition Tasks
Data Fusion
25
Data Fusion -
Goals
▸ A generic UnifiedViews DPU, which is
able to fuse different representations
of the same resources
▹ Access the quality of data sources
▹ Apply a conflict resolution function
26
Data Fusion -
Approach
▸ UnifiedViews DPU, which uses
LD-Fusion Tool
▹ http://mifeet.github.io/LD-FusionTool
27
Data Fusion -
Approach
▸ UnifiedViews pipeline
▹ Inputs, sameAs links, outputs28
Data Fusion -
Status &
Lessons
Learned &
Next Steps
▸ A generic data fusion DPU
▹ Using LD-Fusion Tool
▸ Limitations:
▹ RDF4J not supported
▸ Next steps:
▹ Tracking provenance of fused data
in UnifiedViews
▹ Assess quality of:
■ Inputs to data fusion
■ Outputs - fused data
29
General Data
Acquisition Tasks
In PoolParty Semantic Integrator
30
PoolParty
Semantic
Integrator
and Data
Acquisition
Tasks
▸ User interface to see overview
of data acquisition tasks
▹ List of tasks, its creation,
execution, status
▸ Possibility to browse/examine
resulting integrated data
▹ E.g. to see the resulting fused data
31
Summary
32
Summary
▸ PoolParty Semantic Integrator
and UnifiedViews
▸ General data acquisition tasks
▹ Schema mapping
▹ Entity linking
▹ Data fusion
▸ PoolParty Semantic Integrator
and data acquisition tasks
33
Contact
Tomas Knap, PhD
Architect & Researcher
Semantic Web Company
Research interests:
▸ Linked Data integration and quality
▸ Linked Data management
Contact:
▸ tomas.knap@semantic-web.com
34
© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/

Session 1.2 enrich your knowledge graphs: linked data integration with pool party semantic integrator

  • 1.
    Tomas Knap Semantic WebCompany Enrich Your Knowledge Graphs: Linked Data Integration with PoolParty Semantic Integrator 1
  • 2.
    Agenda ▸ PoolParty SemanticIntegrator & UnifiedViews ▸ General data acquisition tasks ▹ Schema mapping ▹ Entity linking ▹ Data fusion ▸ PoolParty Semantic Integrator and data acquisition tasks 2
  • 3.
    PoolParty Semantic Integrator &UnifiedViews Introduction 3
  • 4.
    PoolParty Semantic Integrator ▸ A semantictechnology suite ▹ Organize and maintain company knowledge ■ Skos thesauri/ontology ▹ Annotate documents with resources from a knowledge base ▹ Provide focused search on top of the annotated document space ▸ https://www.poolparty.biz/ ▹ Or please visit PoolParty booth 4
  • 5.
    UnifiedViews ▸ UnifiedViews isan ETL tool for RDF data processing ▹ Allows users to manage RDF data processing tasks ▹ Natively supporting RDF data format ▸ Available standalone or as part of PoolParty Semantic Integrator ▹ Data acquisition tasks, long running tasks 5
  • 6.
    UnifiedViews Approach ▸ Standard maintenanceinterface ▹ Define, execute, monitor, schedule, and share data processing tasks ▹ Predefined and customizable building blocks (plugins) to set up the individual data processing tasks ▸ Debugging features ▸ Simplified documentation ▹ Visualizations of the prepared tasks ■ Plugins ■ Data flow 6
  • 7.
  • 8.
    UnifiedViews Core Components ▸ Web administrationinterface ▹ Define and maintain pipelines ▹ Validate, execute, monitor pipelines ▹ Possibility to schedule pipelines ■ Notifications ▹ Possibility to debug pipelines ▹ Possibility to share pipelines and plugins ▹ Define and maintain plugins ▹ Multi-user environment, SSO support ▸ Robust engine running the tasks ▸ API to work with tasks, executions, scheduled events 8
  • 9.
    UnifiedViews Core Plugins ▸ Setof Core plugins available ▹ Extractors ■ Obtaining external sources (CSV, DBF, XLS, XML files, RDF data, or relational tables) ▹ Transformers ■ Transforming them between various formats (e.g. CSV files to RDF data, relational tables to RDF data) ■ Executing typical transformations such as SPARQL Update queries, or XSL transformations ▹ Loaders ■ Loading the transformed and curated data to external systems, repositories ▸ 35+ plugins 9
  • 10.
    UnifiedViews Custom Plugins ▸ Easy wayto extend UnifiedViews with your own plugins ▹ Guide for creating new plugins ▹ Tutorials 10
  • 11.
  • 12.
    UnifiedViews Availability ▸ Available underan open source license (GPL + LGPL v3) ▹ Commercial license also available as part of PoolParty Semantic Integrator ▸ Hosted on GitHub ▹ https://github.com/UnifiedViews ▸ http://unifiedviews.eu 12
  • 13.
  • 14.
    Generic Data Acquisition Pipelines ▸ Supportthe full data integration process ▹ Data collection/pre-processing ▹ Schema mapping ▹ Entity linking ▹ Data fusion ▹ Loading data to a resulting data mart 14
  • 15.
  • 16.
    Schema Mapping - Goals ▸ Ageneric schema mapping DPU ▹ Arbitrary sources (CSV, JSON, XML, relational data sources) mapped to RDF data model ▹ Support for data transformations ▹ Resulting data validation ▹ Suggestion of rules 16
  • 17.
    Schema Mapping - Approach ▸ Schemamapping DPU based on RML ▹ Generalization of R2RML (W3C Rec) ▹ Rml.io 17
  • 18.
    Schema Mapping - Approach ▸ UnifiedViewspipeline ▹ Explain inputs, mappings, outputs ■ http://rml.io/spec.html#example-input ▹ Explain DPU’s configuration 18
  • 19.
    Schema Mapping - Status & Lessons Learned& Next Steps ▸ A generic schema mapping DPU ▹ Arbitrary sources (CSV, JSON, XML, relational data sources) mapped to RDF data model ▹ Support for data transformations ▹ Resulting data validation ▹ Suggestion of rules ▸ Next steps: ▹ Performance evaluation ▹ UI for preparing RML rules 19
  • 20.
  • 21.
    Entity Linking - Goals ▸A generic DPU, which may run entity linking tasks ▹ For arbitrary structured data ▹ Linkage rules may be provided ▸ Usually to find duplicates between acquired data and data in a target knowledge base 21
  • 22.
    Entity Linking - Approach ▸UnifiedViews DPU which wraps Silk ▹ http://silkframework.org/22
  • 23.
    Entity Linking - Approach ▸A sample UnifiedViews pipeline ▹ As extractors, transformers23
  • 24.
    Entity Linking Status & Lessons Learned& Next Steps ▸ Linker as extractor ▹ Linked as transformer ▸ Limitations ▹ Performance issues if linking against bigger knowledge bases ■ E.g. DBpedia ▸ Currently working on a special approach for linking concepts with DBpedia resources ▹ Querying Solr with pre-processed DBpedia knowledge base 24
  • 25.
  • 26.
    Data Fusion - Goals ▸A generic UnifiedViews DPU, which is able to fuse different representations of the same resources ▹ Access the quality of data sources ▹ Apply a conflict resolution function 26
  • 27.
    Data Fusion - Approach ▸UnifiedViews DPU, which uses LD-Fusion Tool ▹ http://mifeet.github.io/LD-FusionTool 27
  • 28.
    Data Fusion - Approach ▸UnifiedViews pipeline ▹ Inputs, sameAs links, outputs28
  • 29.
    Data Fusion - Status& Lessons Learned & Next Steps ▸ A generic data fusion DPU ▹ Using LD-Fusion Tool ▸ Limitations: ▹ RDF4J not supported ▸ Next steps: ▹ Tracking provenance of fused data in UnifiedViews ▹ Assess quality of: ■ Inputs to data fusion ■ Outputs - fused data 29
  • 30.
    General Data Acquisition Tasks InPoolParty Semantic Integrator 30
  • 31.
    PoolParty Semantic Integrator and Data Acquisition Tasks ▸ Userinterface to see overview of data acquisition tasks ▹ List of tasks, its creation, execution, status ▸ Possibility to browse/examine resulting integrated data ▹ E.g. to see the resulting fused data 31
  • 32.
  • 33.
    Summary ▸ PoolParty SemanticIntegrator and UnifiedViews ▸ General data acquisition tasks ▹ Schema mapping ▹ Entity linking ▹ Data fusion ▸ PoolParty Semantic Integrator and data acquisition tasks 33
  • 34.
    Contact Tomas Knap, PhD Architect& Researcher Semantic Web Company Research interests: ▸ Linked Data integration and quality ▸ Linked Data management Contact: ▸ tomas.knap@semantic-web.com 34 © Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/