BIG DATA EUROPE
AND THE 7 SOCIETAL PILOTS
BDVA Summit 2016, Valencia1 December 2016
Summit 2016
Talk outline
 The BigDataEurope Project & Mission [2 slides]
 The Big Data Integrator (BDI) platform [3 slides]
 7 Pilots for the 7 Societal Challenge Domains
o Overview
o SC4 (Transport: Traffic Conditions Estimation)
o SC7 (Security: Event Detection) [DEMO]
6-déc.-16www.big-data-europe.eu
Supporting the Societal Domains with Big Data Technology
BigDataEurope Project
6-déc.-16www.big-data-europe.eu
BigDataEurope Action
 EC Horizon 2020 Coord. & Support Action
o ~5mio €, 2015-2017
 Lower barrier for using BD technologies
o Setup & deploy use-case workflows, lack of expertise
 Show societal value of Big Data
o Across 7 H2020 societal challenges
o Establish data value chains across domains & orgs.
6-déc.-16www.big-data-europe.eu
Data Value Chain Evolution
6-déc.-16
Extraction, Curation Quality, Linking,
Integration
Publication,
Visualization, Analysis
Extraction, Curation, Quality,
Linking, Integration, Publication,
Visualization, Analysis
Health
Transport
Security
Extraction Curation Quality Linking Integration Publication Visualization Analysis
Data
Repositories
Linked
Open Data
TIME
Food SocietiesClimate Energy
Proprietary,
‘locked-in’
solutions
OS Solutions,
Big Data Stacks
www.big-data-europe.eu
A flexible, generic platform for (Big) Data Value
Chain Deployment
Big Data Integrator
6-déc.-16www.big-data-europe.eu
• Must be considered at: data acquisition, data processing and data
display level
• A need to find a solution to accommodate all 3 levels
• It is an important concern to most SCs
• Common feeling “better integration solution of wider variety of data
leads to better statistics”
• Most help in this direction is needed by SC1 and SC5, remains an
important aspect for All SCs
• Decisions depend on results of statistics which are as good as the
data quality which is used
SC1 SC2 SC3 SC4 SC5 SC6 SC7
Societal Perception of the 4 V’s
Platform
Requirements
Big Data Integrator: Architecture
 Key points
o Stacks Open Source solutions (Free)
o Dockerization
o Facilitates integration and deployment
o Plug-and-play BD Platform
o Cloud-deployment ready
 Key BDE additions
o Support layer: integrated UI
o Semantification layer
6-déc.-16www.big-data-europe.eu
Big Data Integrator: In-Use
 Big Data Integrator:
https://github.com/big-data-europe
WIKI : extensive documentation, information on
supported components, instructions, etc.
6-déc.-16www.big-data-europe.eu
Demonstrating the Societal Value through 7 Pilot
‘Real-world’ use-cases
1. Overview
BigDataEurope Pilots
6-déc.-16www.big-data-europe.eu
Pilots: Overview
 SC1: Health & Pharm.
 SC2: Food & Agr.
 SC3: Energy
 SC4: Transport
6-déc.-16www.big-data-europe.eu
 SC5: Climate
 SC6: Social Sciences
 SC7: Security
7 Pilots
◎ BDI Platform Instantiations
o Allow end-users to easily deploy functionality in own system environment
o Modularized Docker approach - easier to replace components
o Reduces effort to keep 3rd party software updated & integrated
◎ 7 Societal Challenge Pilots
o Aligned with 7 European Commision H2020 Societal Challenges
o Real-world use-cases (Data, Objectives, Solutions)
o Some pilots have different data & objectives but a similar solution
6-déc.-16www.big-data-europe.eu
SC1: Pharmacology research
6-déc.-16
www.big-data-europe.eu
Life
Sciences
& Health
• Query a large
number of datasets,
some large
• Existing elaborate
ingestion and
homogenization by
OpenPHACTS
• Extensive toolset
developed by OPF
and others
Objective: Large-scale heterogeneous pharma-
research data linking & integration
SC1: Architecture & Components
6-déc.-16www.big-data-europe.eu
• Replicate Open PHACTS
functionality on the BDE
infrastructure using OS solutions
• Based on Virtuoso, proprietary
distributed database
• Apply to other domains (e.g.
Agriculture)
• Porting to BDI gives flexibility
and enables new functionalities
• Logging & system health monitoring
SC2: Viticulture resources
6-déc.-16www.big-data-europe.eu
Food and
Agriculture
Objective: Automate publication ingestion and
thematic classification
• AgInfra is a major
infrastructure for
agriculture
researchers, serving
cross-linked
bibliography, data,
and processing
services
www.big-data-europe.eu
SC2: Architecture & Components
• BDI deployed as an external
infrastructure for processing
text (viticulture publications)
• Storing and processing text at
a larger scale than AgInfra
can currently manage
SC3: Predictive maintenance
6-déc.-16www.big-data-europe.eu
Energy
• Wind turbine monitoring
applies computational
models to sensor data
streams
• Models are weekly re-
parameterized using
week’s data from multiple
turbines
Objective: Real-time turbine monitoring stream
processing and analytics
www.big-data-europe.eu
• Existing in-house non-scalable solution for model
parameterization
• Reliable Fortran software for data analysis
• Efficient, but not scalable to data volume
• Developing a BDI orchestrator
• Re-uses existing software unmodified
• Makes it easy to apply in parallel to many
datasets and manage the outputs
SC3: Architecture & Components
SC4: Traffic conditions estimation
6-déc.-16www.big-data-europe.eu
Transport
• Combines:
• Traffic modelling from
historical data
• Current measurements from a
taxi fleet of 1200 vehicles
Objective: Estimation of real-time traffic
conditions in Thessaloniki
6-déc.-16www.big-data-europe.eu
• New Flink implementations
of map matching and
traffic prediction algorithms
• BDI provides access to
varied data sources
• PostGIS database with
city map
• ElasticSearch database
of historical data
• Kafka stream of real-
time data
SC4: Architecture & Components
SC5: Climate modelling
6-déc.-16www.big-data-europe.eu
Climate
• Preparing modelling experiments
• Slicing, transforming, combining datasets
• Submission and retrieval from modelling
infrastructure
• Discovering and re-using previously
computed derivatives
• Lineage annotation: computer derivatives
from datasets and model parameters
• Finding appropriate past runs avoids
repeating weeks-long modelling runs
Objective: Supporting data-intensive climate research
• BDI offers:
• Hive for managing data
in a way that can be
retrieved and
manipulated, rather
than file blocks
• Cassandra stores
structured and textual
metadata for searching
headers and lineage
• Existing infrastructure; stable, reliable software for parallel computation of models
• BDI is deployed as an external infrastructure for preparing and managing datasets
SC5: Architecture & Components
SC6: Municipality budgets
6-déc.-16www.big-data-europe.eu
Social
Sciences
• Ingestion of budget and
budget execution data
• Multiple municipalities in
varied formats and data
models
Objective: Homogenized Budgetary data made
available for analysis and comparison
6-déc.-16www.big-data-europe.eu
• BDI deployed as ingestion
and storage infrastructure
for external tools
• Homogenizes variety of
data (JSON, CSV, XML,
etc.)
• Exposes data as SPARQL
endpoint serving
homogenized data
• Existing analytics and visualization tools
• Use SPARQL queries to retrieve only the relevant slices of the overall data
SC6: Architecture & Components
SC7: Change detection & verification
6-déc.-16www.big-data-europe.eu
Secure
Societies
• Events are extracted from text
published by news agencies and
on social networking sites
• Events are geo-located and
relevant changes are detected by
comparing current and previous
satellite images
Objective: Detect and Verify Events based on Satellite
Imagery, News and Social Media
6-déc.-16www.big-data-europe.eu
Event Detection
Change Detection
• Re-implementation of change
detection algorithms for Spark
• Parallel orchestrator for text analytics
• Re-uses existing software
• Scales to many input streams
• BDI provides:
• Cassandra for text content and
metadata
• Strabon GIS store for detected
change location
• Homogeneous access to both for
analysis and visualization
SC7: Architecture & Components
Demonstrating the Societal Value through 7 Pilot
‘Real-world’ use-cases
2. In-depth look at the Transport Pilot
BigDataEurope Pilots
6-déc.-16www.big-data-europe.eu
Transport Pilot: Architecture & Objectives
“A scalable, fault-tolerant and flexible platform based on open source
frameworks that can process unbounded data sets and graphs.”
Message Broker: Kafka Cluster
 L. Selmi - BDE - Tech. Workshop
Apache Kafka is a high-throughput distributed
durable messaging system
Apache Kafka
Stream and Batch Processor: Flink Cluster
 L. Selmi - BDE - Tech. Workshop
Apache Flink is an open source platform for
distributed stream and batch data processing.
Apache Flink
Storage and Indexing: Elasticsearch Cluster
 L. Selmi - BDE - Tech. Workshop
Elasticsearch is a distributed open source document
database built on top of Apache Lucene
Map-Matching & Prediction: Rserve
 L. Selmi - BDE - Tech. Workshop
R is a free software environment for
statistical computing. It is used in the
pilot to run the map-matching and the
prediction algorithms.
The R Project
Transport Pilot: Architecture (High-level)
 L. Selmi - BDE - Tech. Workshop
Transport Pilot: BDE Components in Docker Swarm
 L. Selmi - BDE - Tech. Workshop
Transport Pilot: The BDE Platform Stack
 L. Selmi - BDE - Tech. Workshop
Visualization
L. Selmi - BDE - Tech. Workshop
SC4 Pilot 1 can process real-
time FCD data for map-
matching and simple road
segments classification
(normal/congested)
Demonstrating the Societal Value through 7 Pilot
‘Real-world’ use-cases
3. Demonstration of the Security Pilot
BigDataEurope Pilots
6-déc.-16www.big-data-europe.eu
Architecture for SC 7
38
Stack
Security Pilot in Practice
 Demonstration
6-déc.-16www.big-data-europe.eu
Free Workshops, Hangouts & Webinars
BigDataEurope Activities
6-déc.-16www.big-data-europe.eu
2nd round of Societal Workshops
6-déc.-16www.big-data-europe.eu
Transport 22 September 2016 Brussels Collocated with Big Data for
Transport, Tisa workshop
Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-
20 stakeholder consultation
Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day
on “Smart Grids and Storage”
Climate 11 October 2016 Brussels Collocated with Melodies Project
Event – Exploiting Open Data
Security 18 October 2016 Brussels Standalone Workshop
Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual
European DDI User Conference
Health 9 December 2016 Brussels Standalone Workshop
Other Activities
 Fresh set (7) of Societal Workshops in 2017
 Various SC-focussed and general hangouts, follow!
o Apache Flink & BDE (20 Oct) – available online
o BDVA & BDE Webinar planned early next year
o Keep track on BDE Website (Events)
6-déc.-16www.big-data-europe.eu
WEB: www.big-data-europe.eu EMAIL: info@big-data-europe.eu
BIG DATA INTEGRATOR
www.github.com/big-data-europe
PROJECT COORDINATION (Fraunhofer IAIS)
Prof. Sören Auer, auer © cs.uni-bonn · de
> Dr. Simon Scerri, scerri © cs.uni-bonn · de
EIS Department/Group,
Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany
Questions & Contacts
www.big-data-europe.eu
6-déc.-16
#BigDataEurope
leads the Fraunhofer
Big Data Alliance

BigDataEurope @BDVA Summit2016 2: Societal Pilots

  • 1.
    BIG DATA EUROPE ANDTHE 7 SOCIETAL PILOTS BDVA Summit 2016, Valencia1 December 2016 Summit 2016
  • 2.
    Talk outline  TheBigDataEurope Project & Mission [2 slides]  The Big Data Integrator (BDI) platform [3 slides]  7 Pilots for the 7 Societal Challenge Domains o Overview o SC4 (Transport: Traffic Conditions Estimation) o SC7 (Security: Event Detection) [DEMO] 6-déc.-16www.big-data-europe.eu
  • 3.
    Supporting the SocietalDomains with Big Data Technology BigDataEurope Project 6-déc.-16www.big-data-europe.eu
  • 4.
    BigDataEurope Action  ECHorizon 2020 Coord. & Support Action o ~5mio €, 2015-2017  Lower barrier for using BD technologies o Setup & deploy use-case workflows, lack of expertise  Show societal value of Big Data o Across 7 H2020 societal challenges o Establish data value chains across domains & orgs. 6-déc.-16www.big-data-europe.eu
  • 5.
    Data Value ChainEvolution 6-déc.-16 Extraction, Curation Quality, Linking, Integration Publication, Visualization, Analysis Extraction, Curation, Quality, Linking, Integration, Publication, Visualization, Analysis Health Transport Security Extraction Curation Quality Linking Integration Publication Visualization Analysis Data Repositories Linked Open Data TIME Food SocietiesClimate Energy Proprietary, ‘locked-in’ solutions OS Solutions, Big Data Stacks www.big-data-europe.eu
  • 6.
    A flexible, genericplatform for (Big) Data Value Chain Deployment Big Data Integrator 6-déc.-16www.big-data-europe.eu
  • 7.
    • Must beconsidered at: data acquisition, data processing and data display level • A need to find a solution to accommodate all 3 levels • It is an important concern to most SCs • Common feeling “better integration solution of wider variety of data leads to better statistics” • Most help in this direction is needed by SC1 and SC5, remains an important aspect for All SCs • Decisions depend on results of statistics which are as good as the data quality which is used SC1 SC2 SC3 SC4 SC5 SC6 SC7 Societal Perception of the 4 V’s Platform Requirements
  • 8.
    Big Data Integrator:Architecture  Key points o Stacks Open Source solutions (Free) o Dockerization o Facilitates integration and deployment o Plug-and-play BD Platform o Cloud-deployment ready  Key BDE additions o Support layer: integrated UI o Semantification layer 6-déc.-16www.big-data-europe.eu
  • 9.
    Big Data Integrator:In-Use  Big Data Integrator: https://github.com/big-data-europe WIKI : extensive documentation, information on supported components, instructions, etc. 6-déc.-16www.big-data-europe.eu
  • 10.
    Demonstrating the SocietalValue through 7 Pilot ‘Real-world’ use-cases 1. Overview BigDataEurope Pilots 6-déc.-16www.big-data-europe.eu
  • 11.
    Pilots: Overview  SC1:Health & Pharm.  SC2: Food & Agr.  SC3: Energy  SC4: Transport 6-déc.-16www.big-data-europe.eu  SC5: Climate  SC6: Social Sciences  SC7: Security
  • 12.
    7 Pilots ◎ BDIPlatform Instantiations o Allow end-users to easily deploy functionality in own system environment o Modularized Docker approach - easier to replace components o Reduces effort to keep 3rd party software updated & integrated ◎ 7 Societal Challenge Pilots o Aligned with 7 European Commision H2020 Societal Challenges o Real-world use-cases (Data, Objectives, Solutions) o Some pilots have different data & objectives but a similar solution 6-déc.-16www.big-data-europe.eu
  • 13.
    SC1: Pharmacology research 6-déc.-16 www.big-data-europe.eu Life Sciences &Health • Query a large number of datasets, some large • Existing elaborate ingestion and homogenization by OpenPHACTS • Extensive toolset developed by OPF and others Objective: Large-scale heterogeneous pharma- research data linking & integration
  • 14.
    SC1: Architecture &Components 6-déc.-16www.big-data-europe.eu • Replicate Open PHACTS functionality on the BDE infrastructure using OS solutions • Based on Virtuoso, proprietary distributed database • Apply to other domains (e.g. Agriculture) • Porting to BDI gives flexibility and enables new functionalities • Logging & system health monitoring
  • 15.
    SC2: Viticulture resources 6-déc.-16www.big-data-europe.eu Foodand Agriculture Objective: Automate publication ingestion and thematic classification • AgInfra is a major infrastructure for agriculture researchers, serving cross-linked bibliography, data, and processing services
  • 16.
    www.big-data-europe.eu SC2: Architecture &Components • BDI deployed as an external infrastructure for processing text (viticulture publications) • Storing and processing text at a larger scale than AgInfra can currently manage
  • 17.
    SC3: Predictive maintenance 6-déc.-16www.big-data-europe.eu Energy •Wind turbine monitoring applies computational models to sensor data streams • Models are weekly re- parameterized using week’s data from multiple turbines Objective: Real-time turbine monitoring stream processing and analytics
  • 18.
    www.big-data-europe.eu • Existing in-housenon-scalable solution for model parameterization • Reliable Fortran software for data analysis • Efficient, but not scalable to data volume • Developing a BDI orchestrator • Re-uses existing software unmodified • Makes it easy to apply in parallel to many datasets and manage the outputs SC3: Architecture & Components
  • 19.
    SC4: Traffic conditionsestimation 6-déc.-16www.big-data-europe.eu Transport • Combines: • Traffic modelling from historical data • Current measurements from a taxi fleet of 1200 vehicles Objective: Estimation of real-time traffic conditions in Thessaloniki
  • 20.
    6-déc.-16www.big-data-europe.eu • New Flinkimplementations of map matching and traffic prediction algorithms • BDI provides access to varied data sources • PostGIS database with city map • ElasticSearch database of historical data • Kafka stream of real- time data SC4: Architecture & Components
  • 21.
    SC5: Climate modelling 6-déc.-16www.big-data-europe.eu Climate •Preparing modelling experiments • Slicing, transforming, combining datasets • Submission and retrieval from modelling infrastructure • Discovering and re-using previously computed derivatives • Lineage annotation: computer derivatives from datasets and model parameters • Finding appropriate past runs avoids repeating weeks-long modelling runs Objective: Supporting data-intensive climate research
  • 22.
    • BDI offers: •Hive for managing data in a way that can be retrieved and manipulated, rather than file blocks • Cassandra stores structured and textual metadata for searching headers and lineage • Existing infrastructure; stable, reliable software for parallel computation of models • BDI is deployed as an external infrastructure for preparing and managing datasets SC5: Architecture & Components
  • 23.
    SC6: Municipality budgets 6-déc.-16www.big-data-europe.eu Social Sciences •Ingestion of budget and budget execution data • Multiple municipalities in varied formats and data models Objective: Homogenized Budgetary data made available for analysis and comparison
  • 24.
    6-déc.-16www.big-data-europe.eu • BDI deployedas ingestion and storage infrastructure for external tools • Homogenizes variety of data (JSON, CSV, XML, etc.) • Exposes data as SPARQL endpoint serving homogenized data • Existing analytics and visualization tools • Use SPARQL queries to retrieve only the relevant slices of the overall data SC6: Architecture & Components
  • 25.
    SC7: Change detection& verification 6-déc.-16www.big-data-europe.eu Secure Societies • Events are extracted from text published by news agencies and on social networking sites • Events are geo-located and relevant changes are detected by comparing current and previous satellite images Objective: Detect and Verify Events based on Satellite Imagery, News and Social Media
  • 26.
    6-déc.-16www.big-data-europe.eu Event Detection Change Detection •Re-implementation of change detection algorithms for Spark • Parallel orchestrator for text analytics • Re-uses existing software • Scales to many input streams • BDI provides: • Cassandra for text content and metadata • Strabon GIS store for detected change location • Homogeneous access to both for analysis and visualization SC7: Architecture & Components
  • 27.
    Demonstrating the SocietalValue through 7 Pilot ‘Real-world’ use-cases 2. In-depth look at the Transport Pilot BigDataEurope Pilots 6-déc.-16www.big-data-europe.eu
  • 28.
    Transport Pilot: Architecture& Objectives “A scalable, fault-tolerant and flexible platform based on open source frameworks that can process unbounded data sets and graphs.”
  • 29.
    Message Broker: KafkaCluster  L. Selmi - BDE - Tech. Workshop Apache Kafka is a high-throughput distributed durable messaging system Apache Kafka
  • 30.
    Stream and BatchProcessor: Flink Cluster  L. Selmi - BDE - Tech. Workshop Apache Flink is an open source platform for distributed stream and batch data processing. Apache Flink
  • 31.
    Storage and Indexing:Elasticsearch Cluster  L. Selmi - BDE - Tech. Workshop Elasticsearch is a distributed open source document database built on top of Apache Lucene
  • 32.
    Map-Matching & Prediction:Rserve  L. Selmi - BDE - Tech. Workshop R is a free software environment for statistical computing. It is used in the pilot to run the map-matching and the prediction algorithms. The R Project
  • 33.
    Transport Pilot: Architecture(High-level)  L. Selmi - BDE - Tech. Workshop
  • 34.
    Transport Pilot: BDEComponents in Docker Swarm  L. Selmi - BDE - Tech. Workshop
  • 35.
    Transport Pilot: TheBDE Platform Stack  L. Selmi - BDE - Tech. Workshop
  • 36.
    Visualization L. Selmi -BDE - Tech. Workshop SC4 Pilot 1 can process real- time FCD data for map- matching and simple road segments classification (normal/congested)
  • 37.
    Demonstrating the SocietalValue through 7 Pilot ‘Real-world’ use-cases 3. Demonstration of the Security Pilot BigDataEurope Pilots 6-déc.-16www.big-data-europe.eu
  • 38.
  • 39.
    Security Pilot inPractice  Demonstration 6-déc.-16www.big-data-europe.eu
  • 40.
    Free Workshops, Hangouts& Webinars BigDataEurope Activities 6-déc.-16www.big-data-europe.eu
  • 41.
    2nd round ofSocietal Workshops 6-déc.-16www.big-data-europe.eu Transport 22 September 2016 Brussels Collocated with Big Data for Transport, Tisa workshop Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018- 20 stakeholder consultation Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day on “Smart Grids and Storage” Climate 11 October 2016 Brussels Collocated with Melodies Project Event – Exploiting Open Data Security 18 October 2016 Brussels Standalone Workshop Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual European DDI User Conference Health 9 December 2016 Brussels Standalone Workshop
  • 42.
    Other Activities  Freshset (7) of Societal Workshops in 2017  Various SC-focussed and general hangouts, follow! o Apache Flink & BDE (20 Oct) – available online o BDVA & BDE Webinar planned early next year o Keep track on BDE Website (Events) 6-déc.-16www.big-data-europe.eu
  • 43.
    WEB: www.big-data-europe.eu EMAIL:info@big-data-europe.eu BIG DATA INTEGRATOR www.github.com/big-data-europe PROJECT COORDINATION (Fraunhofer IAIS) Prof. Sören Auer, auer © cs.uni-bonn · de > Dr. Simon Scerri, scerri © cs.uni-bonn · de EIS Department/Group, Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany Questions & Contacts www.big-data-europe.eu 6-déc.-16 #BigDataEurope leads the Fraunhofer Big Data Alliance