Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BDE-BDVA Webinar: BDE Technical Overview


Published on

Hajira Jabeen's technical overview of the Big Data Europe Integrator Platform

Published in: Data & Analytics
  • Be the first to comment

BDE-BDVA Webinar: BDE Technical Overview

  1. 1. Big Data Europe Integrator Platform Empowering Communities with Data Technologies Technical Contributions BDE Webinar - 27 April Dr. Hajira Jabeen Senior researcher University of Bonn
  2. 2. Platform Goals ◎Opensource ◎Ease of Use ◎Support a variety of use cases ◎Embrace emerging Big Data Technologies ◎Simple integration with custom components
  3. 3. Key actors
  4. 4. Platform Architecture 4
  5. 5. 5 Platform Architecture
  6. 6. Platform Architecture 6
  7. 7. Platform Architecture Support Layer Init Daemon GUIs Monitor App Layer Traffic Forecast Satellite Image Analysis Platform Layer Spark Flink Semantic Layer Ontario SANSA Semagrow Kafka Real-time Stream Monitoring ... ... Resource Management Layer (Swarm) Hardware Layer Premises Cloud (AWS, GCE, MS Azure, …) Data Layer Hadoop NOSQL Store CassandraElasticsearch ...RDF Store
  8. 8. BDE Supported Frameworks Search/indexing Data processing Apache Solr Apache Spark Data acquisition Apache Flink Apache Flume Semantic Components Message passing Strabon Apache Kafka Sextant Data storage GeoTriples Hue Silk Apache Cassandra SEMAGROW ScyllaDB LIMES Apache Hive 4Store Postgis OpenLink Virtuoso 8
  9. 9. Platform features ◎ BDE Development Environment o Stack builder o Workflow builder o Instructions to add custom components to the BDE stack ◎ Administrator Interface o SwarmUI ◎ UI Integrator o Workflow monitor o Integrated web interface 9
  10. 10. Platform installation ◎Manual installation guide ◎Using Docker Machine o On local machine (VirtualBox) o In cloud (AWS, DigitalOcean, Azure) o Bare metal ◎Screencasts 10
  11. 11. Deploying a Big Data Stack ◎ Stack o collection of communicating components o to solve a specific problem ◎ Described in Docker Compose o Component configuration o Application topology 11
  12. 12. Enhancing the Component ◎ Orchestrator required for initialization process (init_daemon) o Components may depend on each other o Components may require manual intervention ◎ User Interface Integration o Standard Interfaces from components o Combine and align the interfaces 12
  13. 13. User Interfaces ◎Target: Facilitate use of the platform o User Interface Adaption ◎Available interfaces o Workflow UIs ❖ Workflow Builder ❖ Workflow Monitor o Swarm UI o Integrator UI 13
  14. 14. BDE Workflow Builder 14 Component 1 Component 2 Component 3
  15. 15. BDE Workflow Monitor 15 Component 1 Finished Component 2 Finished Component 3 Inprogress
  16. 16. Swarm UI Increase number of instances
  17. 17. Integrator UI 17 Component 1 Component 2
  18. 18. Beyond the state of the art ... Smart Big Data Increase the value of Big Data by adding meaning to it! 18
  19. 19. Semantic Data Lake (Ontario) ◎Data Swamp o Repository of data in its raw format o Structured, semi-structured, unstructured o Schema-less ◎Data Lake o Add a Semantic layer on top of the source datasets o The data is semantically lifted using existing 19
  20. 20. 21 SANSA Stack
  21. 21. Find Big Data Europe at : 22
  22. 22. 23
  23. 23. BigDataEurope & BDVA: Synergies 24
  24. 24. BDE vs Hadoop distributions Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom 25
  25. 25. BDE vs Hadoop distributions ◎BDE is not built on top of existing distributions ◎Targets o Communities o Research institutions ◎Bridges scientists and open data ◎Multi Tier research efforts towards Smart Data 26