BIG DATA EUROPE:
CONCEPT, PLATFORM,
AND PILOTS
BDE SC5 Workshop, Brussels
6 November
2017
Talk outline
 The BigDataEurope action
 The Big Data Integrator platform
 Pilots across all seven H2020 challenges
29/11/2017www.big-data-europe.eu
BigDataEurope Action
29/11/2017www.big-data-europe.eu
Big Data Europe (CSA: 2015-
17)
 Show societal value of Big Data
o Across all societal challenges addressed by Horizon 2020
 Lower barrier for using big data technologies
o Effort and resources to convert tools and workflows
o Skills and expertise
 Help establish data value chains
o Across languages, organizations and domains
29/11/2017www.big-data-europe.eu
Consortium
NCSR
DEMOKRITOS
Stakeholder Engagement
Stakeholder engagement
workshops:
 Present the action and its
showcase deployments
 Raise awareness about BDE
results and what they mean
for the stakeholders
 Collect requirements to drive
further development29/11/2017www.big-data-europe.eu
Big Data Integrator
29/11/2017www.big-data-europe.eu
Architecture
 Big Data Integrator (BDI):
o The prototype developed by BDE
 Main points of the architecture
o Dockerization
o Support layer, including integrated UI
o Semantification layer
29/11/2017www.big-data-europe.eu
 Data serving:
o HDFS, Cassandra, 4store, PostGIS,
Strabon, Elastic Search, Semagrow
 Processing:
o Spark, Flink, Sansa
 Stream ingestion middleware:
o Flume, Kafka
BDE Components
www.big-data-europe.eu
BigDataEurope Pilots
29/11/2017www.big-data-europe.eu
SC1: Pharmacology research
29/11/2017www.big-data-europe.eu
Life
Science
s &
Health
 Extensive toolset
developed by
OPF and others
 Query of a large number of datasets
(including large scale datasets)
 Existing elaborate ingestion and
homogenization by the OpenPHACTS
Foundation
SC1 Pilot: Points Demonstrated
29/11/2017www.big-data-europe.eu
Life
Science
s &
Health
 Porting to BDI gives flexibility
o Using Virtuoso or a number of open source
alternatives without development effort for
the superstructure and tools around it
 Porting to BDI offers new capabilities
o Logging and system health monitoring
SC2: Viticulture resources
29/11/2017www.big-data-europe.eu
Food and
Agricultur
e
 AgInfra is a major infrastructure for
agriculture researchers serving cross-linked
bibliography, data and processing services
 The pilot
automates
publication
ingestion and
thematic
classification
SC2 Pilot: Points Demonstrated
29/11/2017www.big-data-europe.eu
Food and
Agricultur
e
 AgInfra: Existing infrastructure for data and
services that process it
 BDI is deployed as an external infrastructure
for processing text (viticulture publications)
o Allows storing and processing text at a larger
scale than AgInfra can currently manage
 BDI extracts bibliographic metadata from
large scale full texts
 Extracted metadata is ingested to AgInfra
SC3: Predictive maintenance
29/11/2017www.big-data-europe.eu
Energy
 Prediction of localized energy-
production using coarse weather
forecasts
 Localized weather conditions using
computational fluid dynamics
SC3 Pilot: Points Demonstrated
29/11/2017www.big-data-europe.eu
Energy
 Orchestration and data management of
the background actions:
o CFD simulations
o Correlation optimizations
 Developing a BDI orchestrator
o Re-uses existing software unmodified
o Makes easy the parallel application at many
datasets and management of the output
SC4: Traffic conditions
estimation
29/11/2017www.big-data-europe.eu
Transpor
t
 Estimation of real-time
traffic conditions in
Thessaloniki
 Combines:
o Traffic modelling from
historical data
o Current measurements
from a taxi fleet of 1200
vehicles
SC4 Pilot: Points Demonstrated
29/11/2017www.big-data-europe.eu
Transpor
t
 New Flink implementations of map
matching and traffic prediction algorithms
 BDI provides access to a variety of data
sources
o PostGIS database with city map
o ElasticSearch database of historical data
o Kafka stream of real-time data
SC5: Inverse source estimation
29/11/2017www.big-data-europe.eu
Climate
 Estimation of the location of a substance
release when the information available is
monitoring readings
 Next step:
o Combine heterogeneous, open data
regarding population and hospital location at
the affected area
SC5 Pilot: Points Demonstrated
www.big-data-europe.eu
Climate
 BDI offers:
o HDFS to store NetCDF and GRIB files
o Strabon and Virtuoso to store Linked Open
Geo Data about demographics
o Cassandra to store hospital information
o Semagrow to provide a transparent access
point for Strabon, Virtuoso and Cassandra
SC6: Municipality budgets
29/11/2017www.big-data-europe.eu
Social
Science
s
 Ingestion of budget and
budget execution data
 Multiple municipalities
in varying formats and
data models
 Homogenized data
made available for
analysis and
comparison
SC6 Pilot: Points Demonstrated
www.big-data-europe.eu
Social
Science
s
 Existing analytics and visualization tools
o Use SPARQL queries to retrieve only the
relevant slices of the overall data
 BDI is deployed as an ingestion and
storage infrastructure for external tools
o Ingests and homogenizes a constant flow of
heterogeneous data formats and data
models
o Exposes data as SPARQL endpoint serving
homogenized data stored in 4store, a
scalable distributed RDF store
SC7: Change detection &
verification
29/11/2017www.big-data-europe.eu
Secure
Societie
s
 Events are extracted from
text published by news
agencies and on social
networking sites
 Events are geo-located and
relevant changes are
detected by comparing
current and previous
satellite images
SC7 Pilot: Points Demonstrated
www.big-data-europe.eu
Secure
Societie
s
 Re-implementation of change detection
algorithms for Spark
 Parallel orchestrator for text analytics
o Re-uses existing software
o Scales to many input streams
 BDI provides:
o Cassandra for text content and metadata
o Strabon GIS store for detected change
location
o Homogeneous access to both for analysis
Questions?
29/11/2017www.big-data-europe.eu
 BigDataEurope Web site:
https://www.big-data-europe.eu
 Big Data Integrator:
https://github.com/big-data-europe
 Thank you for your attention!

BDE: Concepts, Platform and Pilots

  • 1.
    BIG DATA EUROPE: CONCEPT,PLATFORM, AND PILOTS BDE SC5 Workshop, Brussels 6 November 2017
  • 2.
    Talk outline  TheBigDataEurope action  The Big Data Integrator platform  Pilots across all seven H2020 challenges 29/11/2017www.big-data-europe.eu
  • 3.
  • 4.
    Big Data Europe(CSA: 2015- 17)  Show societal value of Big Data o Across all societal challenges addressed by Horizon 2020  Lower barrier for using big data technologies o Effort and resources to convert tools and workflows o Skills and expertise  Help establish data value chains o Across languages, organizations and domains 29/11/2017www.big-data-europe.eu
  • 5.
  • 6.
    Stakeholder Engagement Stakeholder engagement workshops: Present the action and its showcase deployments  Raise awareness about BDE results and what they mean for the stakeholders  Collect requirements to drive further development29/11/2017www.big-data-europe.eu
  • 7.
  • 8.
    Architecture  Big DataIntegrator (BDI): o The prototype developed by BDE  Main points of the architecture o Dockerization o Support layer, including integrated UI o Semantification layer 29/11/2017www.big-data-europe.eu
  • 9.
     Data serving: oHDFS, Cassandra, 4store, PostGIS, Strabon, Elastic Search, Semagrow  Processing: o Spark, Flink, Sansa  Stream ingestion middleware: o Flume, Kafka BDE Components www.big-data-europe.eu
  • 10.
  • 11.
    SC1: Pharmacology research 29/11/2017www.big-data-europe.eu Life Science s& Health  Extensive toolset developed by OPF and others  Query of a large number of datasets (including large scale datasets)  Existing elaborate ingestion and homogenization by the OpenPHACTS Foundation
  • 12.
    SC1 Pilot: PointsDemonstrated 29/11/2017www.big-data-europe.eu Life Science s & Health  Porting to BDI gives flexibility o Using Virtuoso or a number of open source alternatives without development effort for the superstructure and tools around it  Porting to BDI offers new capabilities o Logging and system health monitoring
  • 13.
    SC2: Viticulture resources 29/11/2017www.big-data-europe.eu Foodand Agricultur e  AgInfra is a major infrastructure for agriculture researchers serving cross-linked bibliography, data and processing services  The pilot automates publication ingestion and thematic classification
  • 14.
    SC2 Pilot: PointsDemonstrated 29/11/2017www.big-data-europe.eu Food and Agricultur e  AgInfra: Existing infrastructure for data and services that process it  BDI is deployed as an external infrastructure for processing text (viticulture publications) o Allows storing and processing text at a larger scale than AgInfra can currently manage  BDI extracts bibliographic metadata from large scale full texts  Extracted metadata is ingested to AgInfra
  • 15.
    SC3: Predictive maintenance 29/11/2017www.big-data-europe.eu Energy Prediction of localized energy- production using coarse weather forecasts  Localized weather conditions using computational fluid dynamics
  • 16.
    SC3 Pilot: PointsDemonstrated 29/11/2017www.big-data-europe.eu Energy  Orchestration and data management of the background actions: o CFD simulations o Correlation optimizations  Developing a BDI orchestrator o Re-uses existing software unmodified o Makes easy the parallel application at many datasets and management of the output
  • 17.
    SC4: Traffic conditions estimation 29/11/2017www.big-data-europe.eu Transpor t Estimation of real-time traffic conditions in Thessaloniki  Combines: o Traffic modelling from historical data o Current measurements from a taxi fleet of 1200 vehicles
  • 18.
    SC4 Pilot: PointsDemonstrated 29/11/2017www.big-data-europe.eu Transpor t  New Flink implementations of map matching and traffic prediction algorithms  BDI provides access to a variety of data sources o PostGIS database with city map o ElasticSearch database of historical data o Kafka stream of real-time data
  • 19.
    SC5: Inverse sourceestimation 29/11/2017www.big-data-europe.eu Climate  Estimation of the location of a substance release when the information available is monitoring readings  Next step: o Combine heterogeneous, open data regarding population and hospital location at the affected area
  • 20.
    SC5 Pilot: PointsDemonstrated www.big-data-europe.eu Climate  BDI offers: o HDFS to store NetCDF and GRIB files o Strabon and Virtuoso to store Linked Open Geo Data about demographics o Cassandra to store hospital information o Semagrow to provide a transparent access point for Strabon, Virtuoso and Cassandra
  • 21.
    SC6: Municipality budgets 29/11/2017www.big-data-europe.eu Social Science s Ingestion of budget and budget execution data  Multiple municipalities in varying formats and data models  Homogenized data made available for analysis and comparison
  • 22.
    SC6 Pilot: PointsDemonstrated www.big-data-europe.eu Social Science s  Existing analytics and visualization tools o Use SPARQL queries to retrieve only the relevant slices of the overall data  BDI is deployed as an ingestion and storage infrastructure for external tools o Ingests and homogenizes a constant flow of heterogeneous data formats and data models o Exposes data as SPARQL endpoint serving homogenized data stored in 4store, a scalable distributed RDF store
  • 23.
    SC7: Change detection& verification 29/11/2017www.big-data-europe.eu Secure Societie s  Events are extracted from text published by news agencies and on social networking sites  Events are geo-located and relevant changes are detected by comparing current and previous satellite images
  • 24.
    SC7 Pilot: PointsDemonstrated www.big-data-europe.eu Secure Societie s  Re-implementation of change detection algorithms for Spark  Parallel orchestrator for text analytics o Re-uses existing software o Scales to many input streams  BDI provides: o Cassandra for text content and metadata o Strabon GIS store for detected change location o Homogeneous access to both for analysis
  • 25.
    Questions? 29/11/2017www.big-data-europe.eu  BigDataEurope Website: https://www.big-data-europe.eu  Big Data Integrator: https://github.com/big-data-europe  Thank you for your attention!

Editor's Notes