DEMONSTRATING THE
SOCIETAL VALUE OF
BIG & SMART DATA
MANAGEMENT
Apache Big_Data Europe, Seville14 November 2016
Talk outline
The BigDataEurope Project & Mission
The Big Data Integrator (BDI) platform
7 Pilots for the 7 Societal Challenge Domains
A look into the BDI platform [DEMO]
Collocated Event – Today @ 16:30pm
14-nov.-16www.big-data-europe.eu
Supporting the Societal Domains with Big Data Technology
BigDataEurope Project
14-nov.-16www.big-data-europe.eu
BigDataEurope Action
EC Horizon 2020 Coordination & Support Action
o ~5mio €, 2015-2017
Show societal value of Big Data
o Across all societal challenges addressed by H2020
Lower barrier for using big data technologies
o Effort and resources to convert tools and workflows
o Skills and expertise
Help establish data value chains across domains & orgs.
14-nov.-16www.big-data-europe.eu
Consortium
NCSR
DEMOKRITOS
Stakeholder Engagement Cycle
Present action, showcase
deployments
Raise awareness about BDE results,
what they mean for stakeholders
Collect requirements to drive
further development
14-nov.-16
www.big-data-europe.eu
M12M6 M18 M24 M30
Data Value Chain Evolution
14-nov.-16
Extraction, Curation Quality, Linking,
Integration
Publication,
Visualization, Analysis
Extraction, Curation, Quality,
Linking, Integration, Publication,
Visualization, Analysis
Health
Transport
Security
Extraction Curation Quality Linking Integration Publication Visualization Analysis
Data
Repositories
Linked
Open Data
TIME
Food SocietiesClimate Energy
Proprietary,
‘locked-in’
solutions
OS Solutions,
Big Data Stacks
www.big-data-europe.eu
Quelle: Gesellschaft für Informatik
Variety – The most neglected V?
Data Source
Heterogeneity
Lack of
interoperability
/semantics
A flexible, generic platform for (Big) Data Value
Chain Deployment
Big Data Integrator
14-nov.-16www.big-data-europe.eu
Big Data Integrator
Prototype developed by BDE
o Incorporates existing BD technology
o Facilitates integration and deployment
Main points of the architecture
o Dockerization
o Support layer, including integrated UI
o Semantification layer
14-nov.-16www.big-data-europe.eu
Generic Architecture
14-nov.-16www.big-data-europe.eu
Plug-and-play BD Platform
Cloud-deployment ready
Domain independent, Customisable
Stacks Open Source solutions
BDI Prototype Releases
1. [July 2016]
2. December 2016
3. ….
Docker containers
14-nov.-16www.big-data-europe.eu
Docker offers lightweight virtualization
o Containers can be shared/provisioned on different Linux variations/versions
Identical base system
o NOT Required
All BDI components
o Docker containers
BDI Docker Containers (so far)
14-nov.-16www.big-data-europe.eu
Data serving: HDFS, Cassandra,
4store, PostGIS, Strabon, Elastic
Search, Hive, Semagrow
Processing: Spark, Flink, Sansa
Stream ingestion middleware:
Flume, Kafka
BDI Instances – An example
14-nov.-16www.big-data-europe.eu
Processing and storage components
o Re-used existing docker containers (where available)
o Dockerized by BDE otherwise
o Ensuring all can be provisioned through Docker Swarm
Other BDI Components:
o Support Layer
o Semantic Layer
Supporting the Societal Domains with Big Data Technology
BigDataEurope Project
14-nov.-16www.big-data-europe.eu
Semantic Layer
www.big-data-europe.eu
Semantic Data Lakes
o Minimal ingestion
pre-processing
o Semantic layer
maintains metadata
o Add meaning when
retrieving/processing
Data Lake: scalable unstructured data store
Relationship definitions and metadata
JSON-LD CSVW R2RMLXML2RDF
Ongoing Research for Semantic Big Data & Analytics
Knowledge Graphs
Semantic Layer tools
14-nov.-16www.big-data-europe.eu
BDE tooling for Semantic Data Lake:
o Swagger: Semantics of RESTful APIs
o Semantic Analytics Stack (SANSA):
Distributed data processing over large-
scale Knowledge Graphs
o Semagrow: SPARQL over Big Data stores
o Ontario: Querying over Semantic Data
Lakes
More Information
Big Data Integrator:
https://github.com/big-data-europe
README includes extensive documentation, instructions
and information on supported components
“Integrators at Work! Real-Life Applications of
Apache Big Data Components” @4:30 PM
o Includes more details & demo
14-nov.-16www.big-data-europe.eu
Demonstrating the Societal Value through 7 Pilot
‘Real-world’ use-cases
BigDataEurope Pilots
14-nov.-16www.big-data-europe.eu
Pilots: Overview
SC1: Health & Pharm.
SC2: Food & Agr.
SC3: Energy
SC4: Transport
14-nov.-16www.big-data-europe.eu
SC5: Climate
SC6: Social Sciences
SC7: Security
7 Pilots
◎ BDI Platform Instantiations
o Allow end-users to easily deploy functionality in own system environment
o Modularized Docker approach - easier to replace components
o Reduces effort to keep 3rd party software updated & integrated
◎ 7 Societal Challenge Pilots
o Aligned with 7 European Commision H2020 Societal Challenges
o Real-world use-cases (Data, Objectives, Solutions)
o Some pilots have different data & objectives but a similar solution
14-nov.-16www.big-data-europe.eu
SC1: Pharmacology research
14-nov.-16
www.big-data-europe.eu
Life
Sciences
& Health
• Query a large
number of datasets,
some large
• Existing elaborate
ingestion and
homogenization by
OpenPHACTS
• Extensive toolset
developed by OPF
and others
Objective: Large-scale heterogeneous pharma-
research data linking & integration
SC1: Architecture & Components
14-nov.-16www.big-data-europe.eu
• Replicate Open PHACTS
functionality on the BDE
infrastructure using OS solutions
• Based on Virtuoso, proprietary
distributed database
• Apply to other domains (e.g.
Agriculture)
• Porting to BDI gives flexibility
and enables new functionalities
• Logging & system health monitoring
SC2: Viticulture resources
14-nov.-16www.big-data-europe.eu
Food and
Agriculture
Objective: Automate publication ingestion and
thematic classification
• AgInfra is a major
infrastructure for
agriculture
researchers, serving
cross-linked
bibliography, data,
and processing
services
www.big-data-europe.eu
SC2: Architecture & Components
• BDI deployed as an external
infrastructure for processing
text (viticulture publications)
• Storing and processing text at
a larger scale than AgInfra
can currently manage
SC3: Predictive maintenance
14-nov.-16www.big-data-europe.eu
Energy
• Wind turbine monitoring
applies computational
models to sensor data
streams
• Models are weekly re-
parameterized using
week’s data from multiple
turbines
Objective: Real-time turbine monitoring stream
processing and analytics
www.big-data-europe.eu
• Existing in-house non-scalable solution for model
parameterization
• Reliable Fortran software for data analysis
• Efficient, but not scalable to data volume
• Developing a BDI orchestrator
• Re-uses existing software unmodified
• Makes it easy to apply in parallel to many
datasets and manage the outputs
SC3: Architecture & Components
SC4: Traffic conditions estimation
14-nov.-16www.big-data-europe.eu
Transport
• Combines:
• Traffic modelling from
historical data
• Current measurements from a
taxi fleet of 1200 vehicles
Objective: Estimation of real-time traffic
conditions in Thessaloniki
14-nov.-16www.big-data-europe.eu
• New Flink implementations
of map matching and
traffic prediction algorithms
• BDI provides access to
varied data sources
• PostGIS database with
city map
• ElasticSearch database
of historical data
• Kafka stream of real-
time data
SC4: Architecture & Components
SC5: Climate modelling
14-nov.-16www.big-data-europe.eu
Climate
• Preparing modelling experiments
• Slicing, transforming, combining datasets
• Submission and retrieval from modelling
infrastructure
• Discovering and re-using previously
computed derivatives
• Lineage annotation: computer derivatives
from datasets and model parameters
• Finding appropriate past runs avoids
repeating weeks-long modelling runs
Objective: Supporting data-intensive climate research
• BDI offers:
• Hive for managing data
in a way that can be
retrieved and
manipulated, rather
than file blocks
• Cassandra stores
structured and textual
metadata for searching
headers and lineage
• Existing infrastructure; stable, reliable software for parallel computation of models
• BDI is deployed as an external infrastructure for preparing and managing datasets
SC5: Architecture & Components
SC6: Municipality budgets
14-nov.-16www.big-data-europe.eu
Social
Sciences
• Ingestion of budget and
budget execution data
• Multiple municipalities in
varied formats and data
models
Objective: Homogenized Budgetary data made
available for analysis and comparison
14-nov.-16www.big-data-europe.eu
• BDI deployed as ingestion
and storage infrastructure
for external tools
• Homogenizes variety of
data (JSON, CSV, XML,
etc.)
• Exposes data as SPARQL
endpoint serving
homogenized data
• Existing analytics and visualization tools
• Use SPARQL queries to retrieve only the relevant slices of the overall data
SC6: Architecture & Components
SC7: Change detection & verification
14-nov.-16www.big-data-europe.eu
Secure
Societies
• Events are extracted from text
published by news agencies and
on social networking sites
• Events are geo-located and
relevant changes are detected by
comparing current and previous
satellite images
Objective: Detect and Verify Events based on Satellite
Imagery, News and Social Media
BigDataEurope Action
EC Horizon 2020 Coordination & Support Action
o ~5mio €, 2015-2017
Show societal value of Big Data
o Across all societal challenges addressed by H2020
Lower barrier for using big data technologies
o Effort and resources to convert tools and workflows
o Skills and expertise
Help establish data value chains across domains & orgs.
14-nov.-16www.big-data-europe.eu
BigDataEurope Action
EC Horizon 2020 Coordination & Support Action
o ~5mio €, 2015-2017
Show societal value of Big Data
o Across all societal challenges addressed by H2020
Lower barrier for using big data technologies
o Effort and resources to convert tools and workflows
o Skills and expertise
Help establish data value chains across domains & orgs.
14-nov.-16www.big-data-europe.eu
2nd round of Societal Workshops
14-nov.-16www.big-data-europe.eu
Transport 22 September 2016 Brussels Collocated with Big Data for
Transport, Tisa workshop
Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-
20 stakeholder consultation
Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day
on “Smart Grids and Storage”
Climate 11 October 2016 Brussels Collocated with Melodies Project
Event – Exploiting Open Data
Security 18 October 2016 Brussels Standalone Workshop
Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual
European DDI User Conference
Health 9 December 2016 Brussels Standalone Workshop
Other Activities
Fresh set (7) of Societal Workshops in 2017
Various SC-focussed and general hangouts, follow!
o Apache Flink & BDE (20 Oct) – available online
o More to follow!
o Keep track on BDE Website (Events)
14-nov.-16www.big-data-europe.eu
Demonstrating the ease-of-use in deploying
custom instances of the BDI Platform
BDI Platform – A Demo
14-nov.-16www.big-data-europe.eu
WEB: www.big-data-europe.eu EMAIL: info@big-data-europe.eu
BIG DATA INTEGRATOR
www.github.com/big-data-europe
PROJECT COORDINATION (Fraunhofer IAIS)
Prof. Sören Auer, auer © cs.uni-bonn · de
> Dr. Simon Scerri, scerri © cs.uni-bonn · de
EIS Department/Group,
Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany
Questions & Contacts
www.big-data-europe.eu
14-nov.-16
#BigDataEurope
leads the Fraunhofer
Big Data Alliance

Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

  • 1.
    DEMONSTRATING THE SOCIETAL VALUEOF BIG & SMART DATA MANAGEMENT Apache Big_Data Europe, Seville14 November 2016
  • 2.
    Talk outline The BigDataEuropeProject & Mission The Big Data Integrator (BDI) platform 7 Pilots for the 7 Societal Challenge Domains A look into the BDI platform [DEMO] Collocated Event – Today @ 16:30pm 14-nov.-16www.big-data-europe.eu
  • 3.
    Supporting the SocietalDomains with Big Data Technology BigDataEurope Project 14-nov.-16www.big-data-europe.eu
  • 4.
    BigDataEurope Action EC Horizon2020 Coordination & Support Action o ~5mio €, 2015-2017 Show societal value of Big Data o Across all societal challenges addressed by H2020 Lower barrier for using big data technologies o Effort and resources to convert tools and workflows o Skills and expertise Help establish data value chains across domains & orgs. 14-nov.-16www.big-data-europe.eu
  • 5.
  • 6.
    Stakeholder Engagement Cycle Presentaction, showcase deployments Raise awareness about BDE results, what they mean for stakeholders Collect requirements to drive further development 14-nov.-16 www.big-data-europe.eu M12M6 M18 M24 M30
  • 7.
    Data Value ChainEvolution 14-nov.-16 Extraction, Curation Quality, Linking, Integration Publication, Visualization, Analysis Extraction, Curation, Quality, Linking, Integration, Publication, Visualization, Analysis Health Transport Security Extraction Curation Quality Linking Integration Publication Visualization Analysis Data Repositories Linked Open Data TIME Food SocietiesClimate Energy Proprietary, ‘locked-in’ solutions OS Solutions, Big Data Stacks www.big-data-europe.eu
  • 8.
    Quelle: Gesellschaft fürInformatik Variety – The most neglected V? Data Source Heterogeneity Lack of interoperability /semantics
  • 9.
    A flexible, genericplatform for (Big) Data Value Chain Deployment Big Data Integrator 14-nov.-16www.big-data-europe.eu
  • 10.
    Big Data Integrator Prototypedeveloped by BDE o Incorporates existing BD technology o Facilitates integration and deployment Main points of the architecture o Dockerization o Support layer, including integrated UI o Semantification layer 14-nov.-16www.big-data-europe.eu
  • 11.
    Generic Architecture 14-nov.-16www.big-data-europe.eu Plug-and-play BDPlatform Cloud-deployment ready Domain independent, Customisable Stacks Open Source solutions BDI Prototype Releases 1. [July 2016] 2. December 2016 3. ….
  • 12.
    Docker containers 14-nov.-16www.big-data-europe.eu Docker offerslightweight virtualization o Containers can be shared/provisioned on different Linux variations/versions Identical base system o NOT Required All BDI components o Docker containers
  • 13.
    BDI Docker Containers(so far) 14-nov.-16www.big-data-europe.eu Data serving: HDFS, Cassandra, 4store, PostGIS, Strabon, Elastic Search, Hive, Semagrow Processing: Spark, Flink, Sansa Stream ingestion middleware: Flume, Kafka
  • 14.
    BDI Instances –An example 14-nov.-16www.big-data-europe.eu Processing and storage components o Re-used existing docker containers (where available) o Dockerized by BDE otherwise o Ensuring all can be provisioned through Docker Swarm Other BDI Components: o Support Layer o Semantic Layer
  • 15.
    Supporting the SocietalDomains with Big Data Technology BigDataEurope Project 14-nov.-16www.big-data-europe.eu
  • 16.
    Semantic Layer www.big-data-europe.eu Semantic DataLakes o Minimal ingestion pre-processing o Semantic layer maintains metadata o Add meaning when retrieving/processing Data Lake: scalable unstructured data store Relationship definitions and metadata JSON-LD CSVW R2RMLXML2RDF Ongoing Research for Semantic Big Data & Analytics Knowledge Graphs
  • 17.
    Semantic Layer tools 14-nov.-16www.big-data-europe.eu BDEtooling for Semantic Data Lake: o Swagger: Semantics of RESTful APIs o Semantic Analytics Stack (SANSA): Distributed data processing over large- scale Knowledge Graphs o Semagrow: SPARQL over Big Data stores o Ontario: Querying over Semantic Data Lakes
  • 18.
    More Information Big DataIntegrator: https://github.com/big-data-europe README includes extensive documentation, instructions and information on supported components “Integrators at Work! Real-Life Applications of Apache Big Data Components” @4:30 PM o Includes more details & demo 14-nov.-16www.big-data-europe.eu
  • 19.
    Demonstrating the SocietalValue through 7 Pilot ‘Real-world’ use-cases BigDataEurope Pilots 14-nov.-16www.big-data-europe.eu
  • 20.
    Pilots: Overview SC1: Health& Pharm. SC2: Food & Agr. SC3: Energy SC4: Transport 14-nov.-16www.big-data-europe.eu SC5: Climate SC6: Social Sciences SC7: Security
  • 21.
    7 Pilots ◎ BDIPlatform Instantiations o Allow end-users to easily deploy functionality in own system environment o Modularized Docker approach - easier to replace components o Reduces effort to keep 3rd party software updated & integrated ◎ 7 Societal Challenge Pilots o Aligned with 7 European Commision H2020 Societal Challenges o Real-world use-cases (Data, Objectives, Solutions) o Some pilots have different data & objectives but a similar solution 14-nov.-16www.big-data-europe.eu
  • 22.
    SC1: Pharmacology research 14-nov.-16 www.big-data-europe.eu Life Sciences &Health • Query a large number of datasets, some large • Existing elaborate ingestion and homogenization by OpenPHACTS • Extensive toolset developed by OPF and others Objective: Large-scale heterogeneous pharma- research data linking & integration
  • 23.
    SC1: Architecture &Components 14-nov.-16www.big-data-europe.eu • Replicate Open PHACTS functionality on the BDE infrastructure using OS solutions • Based on Virtuoso, proprietary distributed database • Apply to other domains (e.g. Agriculture) • Porting to BDI gives flexibility and enables new functionalities • Logging & system health monitoring
  • 24.
    SC2: Viticulture resources 14-nov.-16www.big-data-europe.eu Foodand Agriculture Objective: Automate publication ingestion and thematic classification • AgInfra is a major infrastructure for agriculture researchers, serving cross-linked bibliography, data, and processing services
  • 25.
    www.big-data-europe.eu SC2: Architecture &Components • BDI deployed as an external infrastructure for processing text (viticulture publications) • Storing and processing text at a larger scale than AgInfra can currently manage
  • 26.
    SC3: Predictive maintenance 14-nov.-16www.big-data-europe.eu Energy •Wind turbine monitoring applies computational models to sensor data streams • Models are weekly re- parameterized using week’s data from multiple turbines Objective: Real-time turbine monitoring stream processing and analytics
  • 27.
    www.big-data-europe.eu • Existing in-housenon-scalable solution for model parameterization • Reliable Fortran software for data analysis • Efficient, but not scalable to data volume • Developing a BDI orchestrator • Re-uses existing software unmodified • Makes it easy to apply in parallel to many datasets and manage the outputs SC3: Architecture & Components
  • 28.
    SC4: Traffic conditionsestimation 14-nov.-16www.big-data-europe.eu Transport • Combines: • Traffic modelling from historical data • Current measurements from a taxi fleet of 1200 vehicles Objective: Estimation of real-time traffic conditions in Thessaloniki
  • 29.
    14-nov.-16www.big-data-europe.eu • New Flinkimplementations of map matching and traffic prediction algorithms • BDI provides access to varied data sources • PostGIS database with city map • ElasticSearch database of historical data • Kafka stream of real- time data SC4: Architecture & Components
  • 30.
    SC5: Climate modelling 14-nov.-16www.big-data-europe.eu Climate •Preparing modelling experiments • Slicing, transforming, combining datasets • Submission and retrieval from modelling infrastructure • Discovering and re-using previously computed derivatives • Lineage annotation: computer derivatives from datasets and model parameters • Finding appropriate past runs avoids repeating weeks-long modelling runs Objective: Supporting data-intensive climate research
  • 31.
    • BDI offers: •Hive for managing data in a way that can be retrieved and manipulated, rather than file blocks • Cassandra stores structured and textual metadata for searching headers and lineage • Existing infrastructure; stable, reliable software for parallel computation of models • BDI is deployed as an external infrastructure for preparing and managing datasets SC5: Architecture & Components
  • 32.
    SC6: Municipality budgets 14-nov.-16www.big-data-europe.eu Social Sciences •Ingestion of budget and budget execution data • Multiple municipalities in varied formats and data models Objective: Homogenized Budgetary data made available for analysis and comparison
  • 33.
    14-nov.-16www.big-data-europe.eu • BDI deployedas ingestion and storage infrastructure for external tools • Homogenizes variety of data (JSON, CSV, XML, etc.) • Exposes data as SPARQL endpoint serving homogenized data • Existing analytics and visualization tools • Use SPARQL queries to retrieve only the relevant slices of the overall data SC6: Architecture & Components
  • 34.
    SC7: Change detection& verification 14-nov.-16www.big-data-europe.eu Secure Societies • Events are extracted from text published by news agencies and on social networking sites • Events are geo-located and relevant changes are detected by comparing current and previous satellite images Objective: Detect and Verify Events based on Satellite Imagery, News and Social Media
  • 35.
    BigDataEurope Action EC Horizon2020 Coordination & Support Action o ~5mio €, 2015-2017 Show societal value of Big Data o Across all societal challenges addressed by H2020 Lower barrier for using big data technologies o Effort and resources to convert tools and workflows o Skills and expertise Help establish data value chains across domains & orgs. 14-nov.-16www.big-data-europe.eu
  • 36.
    BigDataEurope Action EC Horizon2020 Coordination & Support Action o ~5mio €, 2015-2017 Show societal value of Big Data o Across all societal challenges addressed by H2020 Lower barrier for using big data technologies o Effort and resources to convert tools and workflows o Skills and expertise Help establish data value chains across domains & orgs. 14-nov.-16www.big-data-europe.eu
  • 37.
    2nd round ofSocietal Workshops 14-nov.-16www.big-data-europe.eu Transport 22 September 2016 Brussels Collocated with Big Data for Transport, Tisa workshop Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018- 20 stakeholder consultation Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day on “Smart Grids and Storage” Climate 11 October 2016 Brussels Collocated with Melodies Project Event – Exploiting Open Data Security 18 October 2016 Brussels Standalone Workshop Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual European DDI User Conference Health 9 December 2016 Brussels Standalone Workshop
  • 38.
    Other Activities Fresh set(7) of Societal Workshops in 2017 Various SC-focussed and general hangouts, follow! o Apache Flink & BDE (20 Oct) – available online o More to follow! o Keep track on BDE Website (Events) 14-nov.-16www.big-data-europe.eu
  • 39.
    Demonstrating the ease-of-usein deploying custom instances of the BDI Platform BDI Platform – A Demo 14-nov.-16www.big-data-europe.eu
  • 40.
    WEB: www.big-data-europe.eu EMAIL:info@big-data-europe.eu BIG DATA INTEGRATOR www.github.com/big-data-europe PROJECT COORDINATION (Fraunhofer IAIS) Prof. Sören Auer, auer © cs.uni-bonn · de > Dr. Simon Scerri, scerri © cs.uni-bonn · de EIS Department/Group, Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany Questions & Contacts www.big-data-europe.eu 14-nov.-16 #BigDataEurope leads the Fraunhofer Big Data Alliance