BigFoot: Big Data For Every Organization

.
. BigFoot: Big Data For Every Organization
Matteo Dell’Amico
Open World Forum 2014, Paris

About BigFoot Goals
BigFoot Goals
.
Big Data For Every Organization
.
.
Automatic & self-tuned deployment for private clouds
Optimization on all layers
Scalable machine learning (time-series analysis, forecasting,
clustering…)
Optimizations for big data frameworks
Interactive queries on raw data
Contribute to the Free Software community

About BigFoot The BigFoot Architecture
My Presentation
.
Scheduling
.
.
HFSP: a new Hadoop scheduler
Schedsim: a playground to simulate new schedulers
.
OpenStack
.
.
Apache Spark on demand
Work in progress: VM placement optimizations

Scheduling in Hadoop Size-Based Scheduling
“Fair” Sharing vs. Size-Based
100
cluster
usage (%)
50
time
(s)
job 3
job 2
10 15 37.5 42.5 50
100
cluster
usage (%)
10 20 30 50
50
time
(s)
job 1
job 1 job 2 job 3 job 1

Scheduling in Hadoop HFSP
HFSP: Size-Based Scheduling For Hadoop
.
.
Consistently better than Fair Scheduler (and others…)
The more the system is loaded, the more difference
We estimate job sizes: it works!
Download from https://github.com/bigfootproject/hfsp

Scheduling in Hadoop PSBS
PSBS – Practical Size-Based Scheduler
Existing Schedulers PSBS: Our proposal
.
.
Plotting scheduler response time
blue: better than traditional “fair scheduler”; red: worse
Paper: http://arxiv.org/abs/1410.6122
Simulator: https://github.com/bigfootproject/schedsim

OpenStack Sahara
OpenStack Sahara
.
Hadoop On-Demand
.
.
Choose number and size of machines
Choose Hadoop version
Voila, a cluster in your datacenter!
.
Analytics As-A Service
.
.
Compile your Jar
Choose number and size of machines, etc., as before
A cluster appears, does your analytics, and vanishes

OpenStack Sahara
Spark On Sahara
.
Spark Is Cool
.
.
A project started by the Berkeley AMP Lab
Fast: in-memory computing
Easy: concise code in Scala or Python
.
What We Did .
.
We made Spark available on Sahara since May

OpenStack Scheduling
Work In Progress
.
OpenStack Scheduler
.
.
Places virtual machines one at a time
Allows hand-defined filters
Tries to place VMs on least loaded hosts
.
What We Want To Do .
.
Do the placement of a cluster!
VMs that talk a lot to each other: place them close
Place them also close to data!
Not too many: we don’t want to overload drives

Parting Words Conclusion
Thank You!
.
.
These slides:
http://bit.ly/bigfoot_owf14
.
.
Web: http://bigfootproject.eu
Twitter: @bigfoot_project
Github: http:
//github.com/bigfootproject/
Bitbucket:
bitbucket.org/bigfootproject/

BigFoot: Big Data For Every Organization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to BigFoot: Big Data For Every Organization

Similar to BigFoot: Big Data For Every Organization (20)

Recently uploaded

Recently uploaded (20)

BigFoot: Big Data For Every Organization