H2020 BigDataEurope is a flagship project of the European Union's Horizon 2020 framework programme for research and innovation. In this talk we present the Docker-based BigDataEurope platform, which integrates a variety of Big Data processing components such as Hive, Cassandra, Apache Flink and Spark. Particularly supporting the variety dimension of Big Data, it adds a semantic data processing layer, which allows to ingest, map, transform and exploit semantically enriched data. In this talk, we will present the innovative technical architecture as well as applications of the BigDataEurope platform for life sciences (OpenPhacts), mobility, food & agriculture as well as industrial analytics (predictive maintenance). We demonstrate how societal value can be generated by Big Data analytics, e.g. making transportation networks more efficient or facilitating drug research.
2. Talk outline
The BigDataEurope Project & Mission
The Big Data Integrator (BDI) platform
7 Pilots for the 7 Societal Challenge Domains
A look into the BDI platform [DEMO]
Collocated Event – Today @ 16:30pm
14-nov.-16www.big-data-europe.eu
3. Supporting the Societal Domains with Big Data Technology
BigDataEurope Project
14-nov.-16www.big-data-europe.eu
4. BigDataEurope Action
EC Horizon 2020 Coordination & Support Action
o ~5mio €, 2015-2017
Show societal value of Big Data
o Across all societal challenges addressed by H2020
Lower barrier for using big data technologies
o Effort and resources to convert tools and workflows
o Skills and expertise
Help establish data value chains across domains & orgs.
14-nov.-16www.big-data-europe.eu
6. Stakeholder Engagement Cycle
Present action, showcase
deployments
Raise awareness about BDE results,
what they mean for stakeholders
Collect requirements to drive
further development
14-nov.-16
www.big-data-europe.eu
M12M6 M18 M24 M30
7. Data Value Chain Evolution
14-nov.-16
Extraction, Curation Quality, Linking,
Integration
Publication,
Visualization, Analysis
Extraction, Curation, Quality,
Linking, Integration, Publication,
Visualization, Analysis
Health
Transport
Security
Extraction Curation Quality Linking Integration Publication Visualization Analysis
Data
Repositories
Linked
Open Data
TIME
Food SocietiesClimate Energy
Proprietary,
‘locked-in’
solutions
OS Solutions,
Big Data Stacks
www.big-data-europe.eu
8. Quelle: Gesellschaft für Informatik
Variety – The most neglected V?
Data Source
Heterogeneity
Lack of
interoperability
/semantics
9. A flexible, generic platform for (Big) Data Value
Chain Deployment
Big Data Integrator
14-nov.-16www.big-data-europe.eu
10. Big Data Integrator
Prototype developed by BDE
o Incorporates existing BD technology
o Facilitates integration and deployment
Main points of the architecture
o Dockerization
o Support layer, including integrated UI
o Semantification layer
14-nov.-16www.big-data-europe.eu
14. BDI Instances – An example
14-nov.-16www.big-data-europe.eu
Processing and storage components
o Re-used existing docker containers (where available)
o Dockerized by BDE otherwise
o Ensuring all can be provisioned through Docker Swarm
Other BDI Components:
o Support Layer
o Semantic Layer
15. Supporting the Societal Domains with Big Data Technology
BigDataEurope Project
14-nov.-16www.big-data-europe.eu
16. Semantic Layer
www.big-data-europe.eu
Semantic Data Lakes
o Minimal ingestion
pre-processing
o Semantic layer
maintains metadata
o Add meaning when
retrieving/processing
Data Lake: scalable unstructured data store
Relationship definitions and metadata
JSON-LD CSVW R2RMLXML2RDF
Ongoing Research for Semantic Big Data & Analytics
Knowledge Graphs
17. Semantic Layer tools
14-nov.-16www.big-data-europe.eu
BDE tooling for Semantic Data Lake:
o Swagger: Semantics of RESTful APIs
o Semantic Analytics Stack (SANSA):
Distributed data processing over large-
scale Knowledge Graphs
o Semagrow: SPARQL over Big Data stores
o Ontario: Querying over Semantic Data
Lakes
18. More Information
Big Data Integrator:
https://github.com/big-data-europe
README includes extensive documentation, instructions
and information on supported components
“Integrators at Work! Real-Life Applications of
Apache Big Data Components” @4:30 PM
o Includes more details & demo
14-nov.-16www.big-data-europe.eu
19. Demonstrating the Societal Value through 7 Pilot
‘Real-world’ use-cases
BigDataEurope Pilots
14-nov.-16www.big-data-europe.eu
20. Pilots: Overview
SC1: Health & Pharm.
SC2: Food & Agr.
SC3: Energy
SC4: Transport
14-nov.-16www.big-data-europe.eu
SC5: Climate
SC6: Social Sciences
SC7: Security
21. 7 Pilots
◎ BDI Platform Instantiations
o Allow end-users to easily deploy functionality in own system environment
o Modularized Docker approach - easier to replace components
o Reduces effort to keep 3rd party software updated & integrated
◎ 7 Societal Challenge Pilots
o Aligned with 7 European Commision H2020 Societal Challenges
o Real-world use-cases (Data, Objectives, Solutions)
o Some pilots have different data & objectives but a similar solution
14-nov.-16www.big-data-europe.eu
22. SC1: Pharmacology research
14-nov.-16
www.big-data-europe.eu
Life
Sciences
& Health
• Query a large
number of datasets,
some large
• Existing elaborate
ingestion and
homogenization by
OpenPHACTS
• Extensive toolset
developed by OPF
and others
Objective: Large-scale heterogeneous pharma-
research data linking & integration
23. SC1: Architecture & Components
14-nov.-16www.big-data-europe.eu
• Replicate Open PHACTS
functionality on the BDE
infrastructure using OS solutions
• Based on Virtuoso, proprietary
distributed database
• Apply to other domains (e.g.
Agriculture)
• Porting to BDI gives flexibility
and enables new functionalities
• Logging & system health monitoring
24. SC2: Viticulture resources
14-nov.-16www.big-data-europe.eu
Food and
Agriculture
Objective: Automate publication ingestion and
thematic classification
• AgInfra is a major
infrastructure for
agriculture
researchers, serving
cross-linked
bibliography, data,
and processing
services
25. www.big-data-europe.eu
SC2: Architecture & Components
• BDI deployed as an external
infrastructure for processing
text (viticulture publications)
• Storing and processing text at
a larger scale than AgInfra
can currently manage
26. SC3: Predictive maintenance
14-nov.-16www.big-data-europe.eu
Energy
• Wind turbine monitoring
applies computational
models to sensor data
streams
• Models are weekly re-
parameterized using
week’s data from multiple
turbines
Objective: Real-time turbine monitoring stream
processing and analytics
27. www.big-data-europe.eu
• Existing in-house non-scalable solution for model
parameterization
• Reliable Fortran software for data analysis
• Efficient, but not scalable to data volume
• Developing a BDI orchestrator
• Re-uses existing software unmodified
• Makes it easy to apply in parallel to many
datasets and manage the outputs
SC3: Architecture & Components
28. SC4: Traffic conditions estimation
14-nov.-16www.big-data-europe.eu
Transport
• Combines:
• Traffic modelling from
historical data
• Current measurements from a
taxi fleet of 1200 vehicles
Objective: Estimation of real-time traffic
conditions in Thessaloniki
29. 14-nov.-16www.big-data-europe.eu
• New Flink implementations
of map matching and
traffic prediction algorithms
• BDI provides access to
varied data sources
• PostGIS database with
city map
• ElasticSearch database
of historical data
• Kafka stream of real-
time data
SC4: Architecture & Components
30. SC5: Climate modelling
14-nov.-16www.big-data-europe.eu
Climate
• Preparing modelling experiments
• Slicing, transforming, combining datasets
• Submission and retrieval from modelling
infrastructure
• Discovering and re-using previously
computed derivatives
• Lineage annotation: computer derivatives
from datasets and model parameters
• Finding appropriate past runs avoids
repeating weeks-long modelling runs
Objective: Supporting data-intensive climate research
31. • BDI offers:
• Hive for managing data
in a way that can be
retrieved and
manipulated, rather
than file blocks
• Cassandra stores
structured and textual
metadata for searching
headers and lineage
• Existing infrastructure; stable, reliable software for parallel computation of models
• BDI is deployed as an external infrastructure for preparing and managing datasets
SC5: Architecture & Components
33. 14-nov.-16www.big-data-europe.eu
• BDI deployed as ingestion
and storage infrastructure
for external tools
• Homogenizes variety of
data (JSON, CSV, XML,
etc.)
• Exposes data as SPARQL
endpoint serving
homogenized data
• Existing analytics and visualization tools
• Use SPARQL queries to retrieve only the relevant slices of the overall data
SC6: Architecture & Components
34. SC7: Change detection & verification
14-nov.-16www.big-data-europe.eu
Secure
Societies
• Events are extracted from text
published by news agencies and
on social networking sites
• Events are geo-located and
relevant changes are detected by
comparing current and previous
satellite images
Objective: Detect and Verify Events based on Satellite
Imagery, News and Social Media
35. BigDataEurope Action
EC Horizon 2020 Coordination & Support Action
o ~5mio €, 2015-2017
Show societal value of Big Data
o Across all societal challenges addressed by H2020
Lower barrier for using big data technologies
o Effort and resources to convert tools and workflows
o Skills and expertise
Help establish data value chains across domains & orgs.
14-nov.-16www.big-data-europe.eu
36. BigDataEurope Action
EC Horizon 2020 Coordination & Support Action
o ~5mio €, 2015-2017
Show societal value of Big Data
o Across all societal challenges addressed by H2020
Lower barrier for using big data technologies
o Effort and resources to convert tools and workflows
o Skills and expertise
Help establish data value chains across domains & orgs.
14-nov.-16www.big-data-europe.eu
37. 2nd round of Societal Workshops
14-nov.-16www.big-data-europe.eu
Transport 22 September 2016 Brussels Collocated with Big Data for
Transport, Tisa workshop
Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-
20 stakeholder consultation
Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day
on “Smart Grids and Storage”
Climate 11 October 2016 Brussels Collocated with Melodies Project
Event – Exploiting Open Data
Security 18 October 2016 Brussels Standalone Workshop
Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual
European DDI User Conference
Health 9 December 2016 Brussels Standalone Workshop
38. Other Activities
Fresh set (7) of Societal Workshops in 2017
Various SC-focussed and general hangouts, follow!
o Apache Flink & BDE (20 Oct) – available online
o More to follow!
o Keep track on BDE Website (Events)
14-nov.-16www.big-data-europe.eu
39. Demonstrating the ease-of-use in deploying
custom instances of the BDI Platform
BDI Platform – A Demo
14-nov.-16www.big-data-europe.eu