BIG DATA EUROPE'S
INTEGRATOR PLATFORM
A ONE-STOP SOLUTION FOR BIG AND
SMART DATA MANAGEMENT
EuroPro Workshop @ EDBT/ICDT
Conference
21.03.2017
Supporting the Societal Domains with Big Data
Technology
BigDataEurope Project
22-mars-17www.big-data-europe.eu
BigDataEurope Action
 EC Horizon 2020 Coordination & Support Action
o ~5mio €, 2015-2017
 Show societal value of Big Data
o Across all societal challenges addressed by H2020
 Lower barrier for using big data technologies
o Effort to setup and deploy use-case workflows
o Lack of skills & expertise
 Help establish data value chains across domains &
orgs. 22-mars-17www.big-data-europe.eu
Consortium
NCSR
DEMOKRITOS
Data Value Chain Evolution
22-mars-17
Extraction, Curation Quality, Linking,
Integration
Publication,
Visualization, Analysis
Extraction, Curation, Quality,
Linking, Integration, Publication,
Visualization, Analysis
Health
Transport
Security
Extraction Curation Quality Linking Integration Publication Visualization Analysis
Data
Repositories
Linked
Open Data
TIME
Food SocietiesClimate Energy
Proprietary,
‘locked-in’
solutions
OS Solutions,
Big Data
Stackswww.big-data-europe.eu
A flexible, generic platform for (Big) Data
Value Chain Deployment
2. Architecture
Big Data Integrator
22-mars-17www.big-data-europe.eu
Big Data Integrator:
Architecture
 Stacks Open Source solutions (Free)
 Dockerization
 Facilitates integration and
deployment
 Plug-and-play BD Platform
 Key BDE additions
o Support layer: integrated UI
o Semantification layer
 Final BDI Release:
o (3rd) May 2017] Final 22-mars-17www.big-data-europe.eu
Big Data Integrator: In-Use
 Big Data Integrator:
https://github.com/big-data-europe
WIKI : extensive documentation, information
on supported components, instructions, etc.
22-mars-17www.big-data-europe.eu
BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom components Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
9
 Data Acquisition: Apache Flume
 Data Storage: Hue, Apache Cassandra, ScyllaDB, Apache
Hive, Postgis
 Search/Indexing: Apache Solr
 Message Passing: Apache Kafka
 Data Processing: Spark, Flink
 Semantic Components: Sansa, Silk, Strabon, Sextant,
GeoTriples, Semagrow, Limes, 4Store, Openlink Virtuoso
BDI Docker Containers (..and
counting)
22-mars-17www.big-data-europe.eu
Semantic Layer
www.big-data-europe.eu
 Semantic Data
Lakes
o Minimal ingestion
pre-processing
o Semantic layer
maintains metadata
o Add meaning when
retrieving/processin
Data Lake: scalable unstructured data store
Relationship definitions and metadata
JSON-LD CSVW R2RMLXML2RDF
 Ongoing Research for Semantic Big Data & Analytics
Knowledge
Graphs
Semantic Layer tools
22-mars-17www.big-data-europe.eu
 BDE tooling for Semantic Data
Lake:
o Swagger: Semantics of RESTful APIs
o Semantic Analytics Stack (SANSA):
Distributed data processing over
large-scale Knowledge Graphs
o Semagrow: SPARQL over Big Data
stores
o Ontario: Querying over Semantic
Demonstrating the Societal Value through 7
Pilot ‘Real-world’ use-cases
1. Overview
BigDataEurope Pilots
22-mars-17www.big-data-europe.eu
7 Pilots
◎ BDI Platform Instantiations
o Allow end-users to easily deploy functionality in own system
environment
o Modularized Docker approach - easier to replace components
o Reduces effort to keep 3rd party software updated & integrated
◎ 7 Societal Challenge Pilots
o Aligned with 7 European Commision H2020 Societal Challenges
o Real-world use-cases (Data, Objectives, Solutions)
o Some pilots have different data & objectives but a similar solution
◎More info:
22-mars-17www.big-data-europe.eu
www.big-data-europe.eu/pilot/
7 BDI Instances
22-mars-17www.big-data-europe.eu
Free Workshops, Hangouts & Webinars
BigDataEurope Activities
22-mars-17www.big-data-europe.eu
3rd round of Societal
Workshops
22-mars-17www.big-data-europe.eu
Health 11/12 May 2017 Malta Collocated with EC eHealth Week
Food&Agr
i
31 March 2017 Brussels Co-organized with e-ROSA H2020
project
Energy Autumn 2017 Brussels (Details to Follow)
Transport 12/13 September 2017 Collocated with Big Data for
Transport, Tisa workshop, organised
by Ertico
Climate 21/22 September 2017 Brussels Collocated with iScape Project
Workshop – Improving the Smart
Control of Air Pollution in Europe by
EC JRC Ispra.
Societies Autumn 2017 T.B.C. (Details to Follow)
Security Autumn 2017 Brussels Standalone Event
BIG DATA INTEGRATOR
www.github.com/big-data-europe
SEVEN PILOT DESCRIPTIONS
www.big-data-europe.eu/pilot/
PROJECT COORDINATION (Fraunhofer IAIS)
Prof. Sören Auer, auer © cs.uni-bonn · de
> Dr. Simon Scerri, scerri © cs.uni-bonn · de
EIS Department/Group,
Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany
Questions & Contacts
www.big-data-europe.eu
22-mars-17
#BigDataEurope
leads the Fraunhofer
Big Data Alliance

Bde euro proworkshop

  • 1.
    BIG DATA EUROPE'S INTEGRATORPLATFORM A ONE-STOP SOLUTION FOR BIG AND SMART DATA MANAGEMENT EuroPro Workshop @ EDBT/ICDT Conference 21.03.2017
  • 2.
    Supporting the SocietalDomains with Big Data Technology BigDataEurope Project 22-mars-17www.big-data-europe.eu
  • 3.
    BigDataEurope Action  ECHorizon 2020 Coordination & Support Action o ~5mio €, 2015-2017  Show societal value of Big Data o Across all societal challenges addressed by H2020  Lower barrier for using big data technologies o Effort to setup and deploy use-case workflows o Lack of skills & expertise  Help establish data value chains across domains & orgs. 22-mars-17www.big-data-europe.eu
  • 4.
  • 5.
    Data Value ChainEvolution 22-mars-17 Extraction, Curation Quality, Linking, Integration Publication, Visualization, Analysis Extraction, Curation, Quality, Linking, Integration, Publication, Visualization, Analysis Health Transport Security Extraction Curation Quality Linking Integration Publication Visualization Analysis Data Repositories Linked Open Data TIME Food SocietiesClimate Energy Proprietary, ‘locked-in’ solutions OS Solutions, Big Data Stackswww.big-data-europe.eu
  • 6.
    A flexible, genericplatform for (Big) Data Value Chain Deployment 2. Architecture Big Data Integrator 22-mars-17www.big-data-europe.eu
  • 7.
    Big Data Integrator: Architecture Stacks Open Source solutions (Free)  Dockerization  Facilitates integration and deployment  Plug-and-play BD Platform  Key BDE additions o Support layer: integrated UI o Semantification layer  Final BDI Release: o (3rd) May 2017] Final 22-mars-17www.big-data-europe.eu
  • 8.
    Big Data Integrator:In-Use  Big Data Integrator: https://github.com/big-data-europe WIKI : extensive documentation, information on supported components, instructions, etc. 22-mars-17www.big-data-europe.eu
  • 9.
    BDE vs Hadoopdistributions Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom 9
  • 10.
     Data Acquisition:Apache Flume  Data Storage: Hue, Apache Cassandra, ScyllaDB, Apache Hive, Postgis  Search/Indexing: Apache Solr  Message Passing: Apache Kafka  Data Processing: Spark, Flink  Semantic Components: Sansa, Silk, Strabon, Sextant, GeoTriples, Semagrow, Limes, 4Store, Openlink Virtuoso BDI Docker Containers (..and counting) 22-mars-17www.big-data-europe.eu
  • 11.
    Semantic Layer www.big-data-europe.eu  SemanticData Lakes o Minimal ingestion pre-processing o Semantic layer maintains metadata o Add meaning when retrieving/processin Data Lake: scalable unstructured data store Relationship definitions and metadata JSON-LD CSVW R2RMLXML2RDF  Ongoing Research for Semantic Big Data & Analytics Knowledge Graphs
  • 12.
    Semantic Layer tools 22-mars-17www.big-data-europe.eu BDE tooling for Semantic Data Lake: o Swagger: Semantics of RESTful APIs o Semantic Analytics Stack (SANSA): Distributed data processing over large-scale Knowledge Graphs o Semagrow: SPARQL over Big Data stores o Ontario: Querying over Semantic
  • 13.
    Demonstrating the SocietalValue through 7 Pilot ‘Real-world’ use-cases 1. Overview BigDataEurope Pilots 22-mars-17www.big-data-europe.eu
  • 14.
    7 Pilots ◎ BDIPlatform Instantiations o Allow end-users to easily deploy functionality in own system environment o Modularized Docker approach - easier to replace components o Reduces effort to keep 3rd party software updated & integrated ◎ 7 Societal Challenge Pilots o Aligned with 7 European Commision H2020 Societal Challenges o Real-world use-cases (Data, Objectives, Solutions) o Some pilots have different data & objectives but a similar solution ◎More info: 22-mars-17www.big-data-europe.eu www.big-data-europe.eu/pilot/
  • 15.
  • 16.
    Free Workshops, Hangouts& Webinars BigDataEurope Activities 22-mars-17www.big-data-europe.eu
  • 17.
    3rd round ofSocietal Workshops 22-mars-17www.big-data-europe.eu Health 11/12 May 2017 Malta Collocated with EC eHealth Week Food&Agr i 31 March 2017 Brussels Co-organized with e-ROSA H2020 project Energy Autumn 2017 Brussels (Details to Follow) Transport 12/13 September 2017 Collocated with Big Data for Transport, Tisa workshop, organised by Ertico Climate 21/22 September 2017 Brussels Collocated with iScape Project Workshop – Improving the Smart Control of Air Pollution in Europe by EC JRC Ispra. Societies Autumn 2017 T.B.C. (Details to Follow) Security Autumn 2017 Brussels Standalone Event
  • 18.
    BIG DATA INTEGRATOR www.github.com/big-data-europe SEVENPILOT DESCRIPTIONS www.big-data-europe.eu/pilot/ PROJECT COORDINATION (Fraunhofer IAIS) Prof. Sören Auer, auer © cs.uni-bonn · de > Dr. Simon Scerri, scerri © cs.uni-bonn · de EIS Department/Group, Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany Questions & Contacts www.big-data-europe.eu 22-mars-17 #BigDataEurope leads the Fraunhofer Big Data Alliance

Editor's Notes

  • #5 9/16 partners: Sole or joint domain representatives of 7 SC domains (COORDINATION ROLE) Other 7/16 partners: technical support (SUPPORT ROLE) Fraunhofer coordinates the project
  • #10 Compatibility note for BigTop