Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce

Big Data Europe
Apps, challenges, goals
Ir. Aad Versteden, TenForce
SC6 workshop

Platform Goals
“Your data has value,
why don’t you unlock it?”

◎ What is Big Data?
o Volume
o Velocity
o Variety
o Veracity
Platform Goals

Platform Goals
◎ Easy to
o Install
o Develop
o Deploy
o Integrate

Societal Challenges
Different domains
with pilot cases
validating the platform

Societal Challenges
◎ Health
◎ Food
◎ Energy
◎ Transport
◎ Climate
◎ Social Sciences
◎ Security

SC4: Transport
◎ Show and predict traffic jams
◎ ~ taxi fleet shares GPS data
◎ Big Data?
o Velocity
o [Volume]

SC3: Energy
◎ Preventative maintenance
by vibration analysis
◎ Big Data?
o High Volume (batch)
o High Velocity (live)

SC7: Security
◎ Detect change in human constructions, link to
news events
◎ Big Data?
o Volume

SC1: Health
◎ Can we use open source to answer Pharma
questions?
◎ Large semantic graph, complex questions
◎ Big Data?
o Variety

SC2: Food
◎ Mine viticulture research
& share semantic information
◎ Big Data?
o Variety

SC5: Climate
◎ Where did an airborne risk come from?
◎ Precalculate emission spots with common
weather patterns
◎ Big Data?
o Volume

SC6: Social Sciences
Martin will tell you later :-)

Platform Architecture
Support Layer
Init Daemon
GUIs
Monitor
App Layer
Traffic
Forecast
Satellite Image Analysis
Platform Layer
Spark Flink Semantic Layer
Ontario SANSA Semagrow
Kafka
Real-time Stream Monitoring
...
...
Resource Management Layer (Swarm)
Hardware Layer
Premises Cloud (AWS, GCE, MS Azure, …)
Data Layer
Hadoop NOSQL Store CassandraElasticsearch ...RDF Store

Supported Frameworks
Search/indexing Data processing
Apache Solr Apache Spark
Data acquisition Apache Flink
Apache Flume Semantic Components
Message passing Strabon
Apache Kafka Sextant
Data storage GeoTriples
Hue Silk
Apache Cassandra SEMAGROW
ScyllaDB LIMES
Apache Hive 4Store
Postgis OpenLink Virtuoso
25

Making Big Data Accessible
How do we make it easy?
27

Actors
◎ Install stack
◎ Develop
◎ Deploy
◎ Monitor results
29

Platform installation
◎ Manual installation guide
◎ Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
◎ Screencasts
30

BDI Stack Lifecycle
Developing
Custom
Applications

◎ High level picture
o docker-compose.yml describes pipeline topology
◎ BDE provided components
o extend template image with your code
◎ New components
o build a Docker image for your component
o this is your own little Virtual Machine for your component
◎ Sharing
o publish topology as git repository
o publish new components on docker hub
Platform development

Development
◎ Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/data
◎ Published components
o Image repositories on GitHub
o Automated builds on DockerHub
o Documentation on BDE Wiki
34

BDI Stack Lifecycle
Docker Images

BDI Stack Lifecycle
BDI Stack (workflow)
builder

BDI Stack Lifecycle
Custom Components
*Init Daemon
*Integrator UI

Enhancing the Component
◎ Orchestrator required for initialization process
(init_daemon)
o Components may depend on each other
o Components may require manual intervention
◎ User Interface Integration
o Standard Interfaces from components
o Combine and align the interfaces
38

BDI Stack Lifecycle
Deploy BDE
Platform/Stack
to the Cluster

Deploying a Big Data Stack
◎ Stack
o collection of communicating components
o to solve a specific problem
◎ Described in Docker Compose
o Component configuration
o Application topology
40

BDI Stack Lifecycle
Stack/Cluster
Monitor

User Interfaces
◎ Make it easy to use
◎ Available interfaces
o Stack Builder
o Swarm UI
o Workflow Builder
o BDI Integrator
42

Beyond the state of the art ...
Smart Big Data
Increase the value of Big Data
by adding meaning to it!
48

Semantic Data Lake (Ontario)
◎ Data Swamp
o Repository of data in its raw format
o Structured, semi-structured, unstructured
o Schema-less
◎ Data Lake
o Add a Semantic layer on top of the source
datasets
o The data is semantically lifted using existing
ontology terms
49

Check it out
https://github.com/big-data-europe
52
aad.versteden@tenforce.com
@impulsater
https://github.com/madnificent

BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
54

BDE vs Hadoop distributions
◎ BDE is not built on top of existing distributions
◎ Targets
o Communities
o Research institutions
◎ Bridges scientists and open data
◎ Multi Tier research efforts towards Smart
Data
55

Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce

Similar to Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce (20)

More from BigData_Europe

More from BigData_Europe (20)

Recently uploaded

Recently uploaded (20)

Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce