Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce
Jan. 30, 2018•0 likes
1 likes
Be the first to like this
Show More
•541 views
views
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Download to read offline
Report
Data & Analytics
Talk at the Big Data Europe SC6 workshop number 3 taking place on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference: The Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce.
Platform installation
◎ Manual installation guide
◎ Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
◎ Screencasts
30
◎ High level picture
o docker-compose.yml describes pipeline topology
◎ BDE provided components
o extend template image with your code
◎ New components
o build a Docker image for your component
o this is your own little Virtual Machine for your component
◎ Sharing
o publish topology as git repository
o publish new components on docker hub
Platform development
Development
◎ Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/data
◎ Published components
o Image repositories on GitHub
o Automated builds on DockerHub
o Documentation on BDE Wiki
34
Enhancing the Component
◎ Orchestrator required for initialization process
(init_daemon)
o Components may depend on each other
o Components may require manual intervention
◎ User Interface Integration
o Standard Interfaces from components
o Combine and align the interfaces
38
Deploying a Big Data Stack
◎ Stack
o collection of communicating components
o to solve a specific problem
◎ Described in Docker Compose
o Component configuration
o Application topology
40
Beyond the state of the art ...
Smart Big Data
Increase the value of Big Data
by adding meaning to it!
48
Semantic Data Lake (Ontario)
◎ Data Swamp
o Repository of data in its raw format
o Structured, semi-structured, unstructured
o Schema-less
◎ Data Lake
o Add a Semantic layer on top of the source
datasets
o The data is semantically lifted using existing
ontology terms
49
BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
54
BDE vs Hadoop distributions
◎ BDE is not built on top of existing distributions
◎ Targets
o Communities
o Research institutions
◎ Bridges scientists and open data
◎ Multi Tier research efforts towards Smart
Data
55