The document describes the DataBearings platform for semantic data integration. DataBearings uses semantic technologies to integrate heterogeneous data sources on-the-fly without loading data into a warehouse first. This allows for live access to data, integration of IoT data sources, and cost savings by leveraging existing data. DataBearings provides a lightweight solution for dynamic data integration through semantic annotations, reusable components, and a semantic agent programming language to define integration logic and automations.
Anomaly detection and data imputation within time series
DataBearings: A semantic platform for data integration on IoT, Artem Katasonov
1. DataBearings: A Semantic Platform for
Data Integration on IoT
Artem Katasonov
(VTT Technical Research Center of Finland)
2. 26/09/2014 2
Business Needs
• Need: Companies have increasing number of own databases and various other
in-house / external (business partners, Open Data) data sources.
• Need: Companies want to exploit ever-growing and diverse data efficiently and
dynamically for new and better services.
• Need: In the market, there is a great need for novel applications and better
capability to provide novel services to customers in order to differentiate and
compete.
• Need: Companies are looking data management solutions that allow reducing
development and maintenance / extension costs.
3. 26/09/2014 3
Background: General Approaches to Data Integration
4 approaches:
• Integrated packages (e.g. SAP)
• Messaging (ESB i.e. WS-* based, etc.)
• Data warehouses (Extract-Transform-Load approach)
• Enterprise Information Integration (EII) – integration without first loading into
a warehouse, i.e. “on the fly”
Points for EII:
• Access to “live” data
• Internet of Things (sensors, RFID) makes a good case for it.
• Reduce costs by allowing leveraging existing data sources in new ways,
avoiding data replication with hardware, software and human costs.
• Enables integration with external sources (warehouses cannot help here).
• Allow fast and iterative "trying out" new data sources, new processing
pipelines, or new distribution channels (when not 100% sure that is beneficial for
business).
4. 26/09/2014 4
Drawbacks of Non-semantic EII
Commercial EII tools:
• Heavy and expensive.
• Some work only with databases, not Web services, etc.
• Relational approach:
• Need to manually define a schema that integrates the schemas of the
underlying data sources.
• Such a federation view is harder to modify later.
• Do not include data processing functionality (only federate data, post-processing
has to be done elsewhere).
• Do not support data updates (leaving that to EAI tools).
5. 26/09/2014 5
Semantic EII in DataBearings: How it Works
Database
Query
decomposition
File
Query
multiplication
Sub-queries
translation
Query
analysis
Query
reformulation
Single high-level
query
S-Q1
S-Q2
S-Q3
SQL
SOAP /
REST
GET /
local IO
Custom data
post-processing
(incl. formatting)
Web
Service
Join /
Single answer Union
Result 1
Result 2
Result 3
Web
server
Low-level
query
Results
filtering
6. 26/09/2014 6
A DataBearings-based solution
supplies data to “Street Parking
Enforcement” mobile
application:
• Integrates data from various
payment providers
• ‘Pay and display’
machines.
• Mobile payment
services (EasyPark,
Parkman, etc.).
7. 26/09/2014 7
A DataBearings-based solution
supplies data to CarP:
• Integrates static (manually-managed)
data and
dynamic data (from
sensors).
• Integrates data from
different Finnish cities
(different systems in use for
static and dynamic data).
• Delivers data in Datex II
format
8. 26/09/2014 8
Data Integration for Datex II, CarP and related
timed pull timed pull
Access
Jyväskylä static data
(MS Excel document)
Forwarding
DATEX II publication
push
Tampere static data
(NettiParkki, SOAP Web
service)
Jyväskylä dynamic data
(Designa, proprietary
interface )
Tampere extra static
data
(PlatformX, JSON Web
service)
Pirkkala dynamic data
(Designa, proprietary
interface)
Jyväskylä VMS
(Designa, proprietary
interface)
Pirkkala VMS
(FLS Rosign, proprietary
Web service)
Tampere dynamic data
(PlatformX, JSON Web
service)
Integration
push
timed
pull
query-time
pull
push push
Parking Guidance mobile
app
9. 26/09/2014 9
Currently, SPoT is a single data
source service (video-based
plate recognition in car parks).
A DataBearings-based solution
is under development to extend
SPoT:
• Integrate the currently used
data with street parking data
from various sources.
10. 26/09/2014 10
Semantic Data Abstraction (via Query Reformulation)
Without (data
as it is):
With
(interpreted
data):
15. 26/09/2014 15
Smart Home Pilot Functionality
Enquiries:
• Weather outside, from FMI service
• Light condition outside, from FMI service
• Temperature inside, from ThereGate
• Power consumption at the Audio/Video equipment power outlet, from ThereGate
• State of Audio/Video (Sleep, Standby, Music, TV, PS3), inferred based on above power consumption
Commands:
• Light on/off (2 lamps), via ThereGate
• Pay music, via Spotify on a PC
Other:
• Inform (manually) that everyone left house
• Inform (automatically, based on visibility of home WiFi) that a particular person came or left
• Receive personal Welcome home, X! and Bye, X! messages
Automation:
• IF Nobody home AND Somebody came home THEN
• WHEN It is dark enough outside THEN Switch a light on
• IF Everyone left the house THEN Ask if to switch all the lights off
• IF It is night AND State of Audio/Video changed to ‘Standby’ THEN Switch the lights off (in our case, always means
going to sleep)
• IF Wardrobe door is left open AND Somebody is home THEN Alarm via repeatedly switching a light on/off AND Send
a message
• WHEN Door is closed OR Acknowledged from phone OR 1 minute elapsed THEN Stop alarm
16. 26/09/2014 16
Foundation: Semantic Agent Programming Language (S-APL)
S-APL
N3Logic – Tim Berners-Lee et al. use of N3 to
represent production rules, allowing data and rules to be
within same document or model.
{:Hamppi :occupancy ?x. ?x > 960} =>
{:Hamppi :is :full}
Notation3 (N3) – Original and current view of Tim Berners-Lee on what
RDF should have been. Basically, RDF with nesting.
{:Hamppi :occupancy 200} :source :Designa
Resource Description Framework (RDF) –
W3C standard
:Hamppi :occupancy 200
• Much more expressive rules.
• Can remove data - allows dynamics.
• Allows “procedural”-like programming
(equivalents to variables, if-then-else,
cycles, functions).
• Can execute Java components.
21. 26/09/2014 21
Summary: Benefits of DataBearings
As compared to non-semantic EII solutions:
• Lightweight, Cheaper
• More powerful: better suited for handling heterogeneity and multitude of data sources.
• Future-proof: it is much easier to extend the system later to support N+1th data source
or M+1th data processing case.
• Integrated: combines data federation and data pipeline capabilities as well as supports
data updates: data can be accesses from multiple sources, processed as needed and
delivered to the intended destination, all within a single platform.
As compared to the ETL (extract-transform-load) approach to introducing semantic data
management:
• Easier transition: can keep data where it was, no need to transfer data into semantic
databases.
• Higher performance: semantic databases typically do not handle big amounts of data
well.
• Natural integration of 3rd party data sources: there is typically no control over those,
cannot ask to move to semantic representation.