Big Data Europe Integrator Platform
Empowering Communities with Data Technologies
Technical Contributions
BDE Webinar - 27 April
Dr. Hajira Jabeen
Senior researcher
University of Bonn
Platform Goals
◎Opensource
◎Ease of Use
◎Support a variety of use cases
◎Embrace emerging Big Data Technologies
◎Simple integration with custom components
Key actors
Platform Architecture
4
5
Platform Architecture
Platform Architecture
6
Platform Architecture
Support Layer
Init Daemon
GUIs
Monitor
App Layer
Traffic
Forecast
Satellite Image Analysis
Platform Layer
Spark Flink Semantic Layer
Ontario SANSA Semagrow
Kafka
Real-time Stream Monitoring
...
...
Resource Management Layer (Swarm)
Hardware Layer
Premises Cloud (AWS, GCE, MS Azure, …)
Data Layer
Hadoop NOSQL Store CassandraElasticsearch ...RDF Store
BDE Supported Frameworks
Search/indexing Data processing
Apache Solr Apache Spark
Data acquisition Apache Flink
Apache Flume Semantic Components
Message passing Strabon
Apache Kafka Sextant
Data storage GeoTriples
Hue Silk
Apache Cassandra SEMAGROW
ScyllaDB LIMES
Apache Hive 4Store
Postgis OpenLink Virtuoso
8
Platform features
◎ BDE Development Environment
o Stack builder
o Workflow builder
o Instructions to add custom components to the BDE
stack
◎ Administrator Interface
o SwarmUI
◎ UI Integrator
o Workflow monitor
o Integrated web interface
9
Platform installation
◎Manual installation guide
◎Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
◎Screencasts
10
Deploying a Big Data Stack
◎ Stack
o collection of communicating components
o to solve a specific problem
◎ Described in Docker Compose
o Component configuration
o Application topology
11
Enhancing the Component
◎ Orchestrator required for initialization process
(init_daemon)
o Components may depend on each other
o Components may require manual intervention
◎ User Interface Integration
o Standard Interfaces from components
o Combine and align the interfaces
12
User Interfaces
◎Target: Facilitate use of the platform
o User Interface Adaption
◎Available interfaces
o Workflow UIs
❖ Workflow Builder
❖ Workflow Monitor
o Swarm UI
o Integrator UI
13
BDE Workflow Builder
14
Component 1
Component 2
Component 3
BDE Workflow Monitor
15
Component 1
Finished
Component 2
Finished
Component 3
Inprogress
Swarm UI
Increase number
of instances
Integrator UI
17
Component 1 Component 2
Beyond the state of the art ...
Smart Big Data
Increase the value of Big Data
by adding meaning to it!
18
Semantic Data Lake (Ontario)
◎Data Swamp
o Repository of data in its raw format
o Structured, semi-structured, unstructured
o Schema-less
◎Data Lake
o Add a Semantic layer on top of the source
datasets
o The data is semantically lifted using existing
19
21
SANSA Stack
Find Big Data Europe at :
https://github.com/big-data-europe
22
jabeen@iai.uni-bonn.de
23
BigDataEurope & BDVA: Synergies
24
BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
25
BDE vs Hadoop distributions
◎BDE is not built on top of existing distributions
◎Targets
o Communities
o Research institutions
◎Bridges scientists and open data
◎Multi Tier research efforts towards Smart
Data
26

BDE-BDVA Webinar: BDE Technical Overview

  • 1.
    Big Data EuropeIntegrator Platform Empowering Communities with Data Technologies Technical Contributions BDE Webinar - 27 April Dr. Hajira Jabeen Senior researcher University of Bonn
  • 2.
    Platform Goals ◎Opensource ◎Ease ofUse ◎Support a variety of use cases ◎Embrace emerging Big Data Technologies ◎Simple integration with custom components
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
    Platform Architecture Support Layer InitDaemon GUIs Monitor App Layer Traffic Forecast Satellite Image Analysis Platform Layer Spark Flink Semantic Layer Ontario SANSA Semagrow Kafka Real-time Stream Monitoring ... ... Resource Management Layer (Swarm) Hardware Layer Premises Cloud (AWS, GCE, MS Azure, …) Data Layer Hadoop NOSQL Store CassandraElasticsearch ...RDF Store
  • 8.
    BDE Supported Frameworks Search/indexingData processing Apache Solr Apache Spark Data acquisition Apache Flink Apache Flume Semantic Components Message passing Strabon Apache Kafka Sextant Data storage GeoTriples Hue Silk Apache Cassandra SEMAGROW ScyllaDB LIMES Apache Hive 4Store Postgis OpenLink Virtuoso 8
  • 9.
    Platform features ◎ BDEDevelopment Environment o Stack builder o Workflow builder o Instructions to add custom components to the BDE stack ◎ Administrator Interface o SwarmUI ◎ UI Integrator o Workflow monitor o Integrated web interface 9
  • 10.
    Platform installation ◎Manual installationguide ◎Using Docker Machine o On local machine (VirtualBox) o In cloud (AWS, DigitalOcean, Azure) o Bare metal ◎Screencasts 10
  • 11.
    Deploying a BigData Stack ◎ Stack o collection of communicating components o to solve a specific problem ◎ Described in Docker Compose o Component configuration o Application topology 11
  • 12.
    Enhancing the Component ◎Orchestrator required for initialization process (init_daemon) o Components may depend on each other o Components may require manual intervention ◎ User Interface Integration o Standard Interfaces from components o Combine and align the interfaces 12
  • 13.
    User Interfaces ◎Target: Facilitateuse of the platform o User Interface Adaption ◎Available interfaces o Workflow UIs ❖ Workflow Builder ❖ Workflow Monitor o Swarm UI o Integrator UI 13
  • 14.
    BDE Workflow Builder 14 Component1 Component 2 Component 3
  • 15.
    BDE Workflow Monitor 15 Component1 Finished Component 2 Finished Component 3 Inprogress
  • 16.
  • 17.
  • 18.
    Beyond the stateof the art ... Smart Big Data Increase the value of Big Data by adding meaning to it! 18
  • 19.
    Semantic Data Lake(Ontario) ◎Data Swamp o Repository of data in its raw format o Structured, semi-structured, unstructured o Schema-less ◎Data Lake o Add a Semantic layer on top of the source datasets o The data is semantically lifted using existing 19
  • 21.
  • 22.
    Find Big DataEurope at : https://github.com/big-data-europe 22 jabeen@iai.uni-bonn.de
  • 23.
  • 24.
  • 25.
    BDE vs Hadoopdistributions Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom 25
  • 26.
    BDE vs Hadoopdistributions ◎BDE is not built on top of existing distributions ◎Targets o Communities o Research institutions ◎Bridges scientists and open data ◎Multi Tier research efforts towards Smart Data 26