SlideShare a Scribd company logo
. 
. BigFoot: Big Data For Every Organization 
Matteo Dell’Amico 
Open World Forum 2014, Paris
About BigFoot
About BigFoot Goals 
BigFoot Goals 
. 
Big Data For Every Organization 
. 
. 
Automatic & self-tuned deployment for private clouds 
Optimization on all layers 
Scalable machine learning (time-series analysis, forecasting, 
clustering…) 
Optimizations for big data frameworks 
Interactive queries on raw data 
Contribute to the Free Software community
About BigFoot The BigFoot Architecture 
My Presentation 
. 
Scheduling 
. 
. 
HFSP: a new Hadoop scheduler 
Schedsim: a playground to simulate new schedulers 
. 
OpenStack 
. 
. 
Apache Spark on demand 
Work in progress: VM placement optimizations
Scheduling in Hadoop
Scheduling in Hadoop Size-Based Scheduling 
“Fair” Sharing vs. Size-Based 
100 
cluster 
usage (%) 
50 
time 
(s) 
job 3 
job 2 
10 15 37.5 42.5 50 
100 
cluster 
usage (%) 
10 20 30 50 
50 
time 
(s) 
job 1 
job 1 job 2 job 3 job 1
Scheduling in Hadoop Size-Based Scheduling 
“Fair” Sharing vs. Size-Based 
100 
cluster 
usage (%) 
50 
time 
(s) 
job 3 
job 2 
10 15 37.5 42.5 50 
100 
cluster 
usage (%) 
10 20 30 50 
50 
time 
(s) 
job 1 
job 1 job 2 job 3 job 1
Scheduling in Hadoop HFSP 
HFSP: Size-Based Scheduling For Hadoop 
. 
. 
Consistently better than Fair Scheduler (and others…) 
The more the system is loaded, the more difference 
We estimate job sizes: it works! 
Download from https://github.com/bigfootproject/hfsp
Scheduling in Hadoop PSBS 
PSBS – Practical Size-Based Scheduler 
Existing Schedulers PSBS: Our proposal 
. 
. 
Plotting scheduler response time 
blue: better than traditional “fair scheduler”; red: worse 
Paper: http://arxiv.org/abs/1410.6122 
Simulator: https://github.com/bigfootproject/schedsim
OpenStack
OpenStack Sahara 
OpenStack Sahara 
. 
Hadoop On-Demand 
. 
. 
Choose number and size of machines 
Choose Hadoop version 
Voila, a cluster in your datacenter! 
. 
Analytics As-A Service 
. 
. 
Compile your Jar 
Choose number and size of machines, etc., as before 
A cluster appears, does your analytics, and vanishes
OpenStack Sahara 
Spark On Sahara 
. 
Spark Is Cool 
. 
. 
A project started by the Berkeley AMP Lab 
Fast: in-memory computing 
Easy: concise code in Scala or Python 
. 
What We Did . 
. 
We made Spark available on Sahara since May
OpenStack Scheduling 
Work In Progress 
. 
OpenStack Scheduler 
. 
. 
Places virtual machines one at a time 
Allows hand-defined filters 
Tries to place VMs on least loaded hosts 
. 
What We Want To Do . 
. 
Do the placement of a cluster! 
VMs that talk a lot to each other: place them close 
Place them also close to data! 
Not too many: we don’t want to overload drives
Parting Words
Parting Words Conclusion 
Thank You! 
. 
. 
These slides: 
http://bit.ly/bigfoot_owf14 
. 
. 
Web: http://bigfootproject.eu 
Twitter: @bigfoot_project 
Github: http: 
//github.com/bigfootproject/ 
Bitbucket: 
bitbucket.org/bigfootproject/

More Related Content

What's hot

HybridAzureCloud
HybridAzureCloudHybridAzureCloud
HybridAzureCloudChris Condo
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
Revolution Analytics
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
Peter Skomoroch
 
Distro compute
Distro computeDistro compute
Distro compute
andyelastacloud
 
Apache spark presentation
Apache spark presentationApache spark presentation
Apache spark presentation
Mahboob Hussain
 
Cloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveCloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveDataWorks Summit
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
Accubits Technologies
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
Peter Skomoroch
 
Atmosphere 2014: When Storm hits data. Data streams processing in real time -...
Atmosphere 2014: When Storm hits data. Data streams processing in real time -...Atmosphere 2014: When Storm hits data. Data streams processing in real time -...
Atmosphere 2014: When Storm hits data. Data streams processing in real time -...
PROIDEA
 
VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
Lynn Langit
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
mictc
 
Diminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesDiminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations Slides
Sebastian Werner
 
Leveraging your hadoop cluster better - running performant code at scale
Leveraging your hadoop cluster better - running performant code at scaleLeveraging your hadoop cluster better - running performant code at scale
Leveraging your hadoop cluster better - running performant code at scale
Michael Kopp
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Databricks
 
3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMRFaizan Javed
 
Dask for Analytics
Dask for AnalyticsDask for Analytics
Dask for Analytics
Nico Liberato Candio
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
Jamie Kinney
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
PAPIs.io
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Dataiku
 
Msr2009 ian
Msr2009 ianMsr2009 ian
Msr2009 ianSAIL_QU
 

What's hot (20)

HybridAzureCloud
HybridAzureCloudHybridAzureCloud
HybridAzureCloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 
Distro compute
Distro computeDistro compute
Distro compute
 
Apache spark presentation
Apache spark presentationApache spark presentation
Apache spark presentation
 
Cloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveCloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and Hive
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
 
Atmosphere 2014: When Storm hits data. Data streams processing in real time -...
Atmosphere 2014: When Storm hits data. Data streams processing in real time -...Atmosphere 2014: When Storm hits data. Data streams processing in real time -...
Atmosphere 2014: When Storm hits data. Data streams processing in real time -...
 
VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
Diminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesDiminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations Slides
 
Leveraging your hadoop cluster better - running performant code at scale
Leveraging your hadoop cluster better - running performant code at scaleLeveraging your hadoop cluster better - running performant code at scale
Leveraging your hadoop cluster better - running performant code at scale
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR
 
Dask for Analytics
Dask for AnalyticsDask for Analytics
Dask for Analytics
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
 
Msr2009 ian
Msr2009 ianMsr2009 ian
Msr2009 ian
 

Viewers also liked

Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMorning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
MongoDB
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FSMongoDB
 
Technology Entrepreneurship Venture Lab 2012 beer buddy app
Technology Entrepreneurship Venture Lab 2012   beer buddy appTechnology Entrepreneurship Venture Lab 2012   beer buddy app
Technology Entrepreneurship Venture Lab 2012 beer buddy app
doc2005
 
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
MongoDB
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
Pp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingPp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingLucas Abrantes
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
MongoDB
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
RTigger
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
Ned Potter
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
Aaron Irizarry
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
Seth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
Drift
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
Leslie Samuel
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
Kai Zhao
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Kai Zhao
 
Beer industry
Beer industry Beer industry
Beer industry
Christian Adeler
 
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalakeKylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kai Zhao
 
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
Kai Zhao
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
Kai Zhao
 

Viewers also liked (20)

Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMorning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Technology Entrepreneurship Venture Lab 2012 beer buddy app
Technology Entrepreneurship Venture Lab 2012   beer buddy appTechnology Entrepreneurship Venture Lab 2012   beer buddy app
Technology Entrepreneurship Venture Lab 2012 beer buddy app
 
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Pp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingPp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewing
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
 
Beer industry
Beer industry Beer industry
Beer industry
 
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalakeKylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
 
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
 

Similar to BigFoot: Big Data For Every Organization

Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
Manish Gupta
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
Joe Stein
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
srikanthhadoop
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Steve Watt
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
MapR Technologies
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
Jim Dowling
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
Geohedrick
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
Nicola Ferraro
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Bds session 13 14
Bds session 13 14Bds session 13 14
Bds session 13 14
Infinity Tech Solutions
 
HFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn ProtocolHFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn Protocol
Matteo Dell'Amico
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Chris Baglieri
 

Similar to BigFoot: Big Data For Every Organization (20)

NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Bds session 13 14
Bds session 13 14Bds session 13 14
Bds session 13 14
 
HFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn ProtocolHFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn Protocol
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 

Recently uploaded

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 

Recently uploaded (20)

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 

BigFoot: Big Data For Every Organization

  • 1. . . BigFoot: Big Data For Every Organization Matteo Dell’Amico Open World Forum 2014, Paris
  • 3. About BigFoot Goals BigFoot Goals . Big Data For Every Organization . . Automatic & self-tuned deployment for private clouds Optimization on all layers Scalable machine learning (time-series analysis, forecasting, clustering…) Optimizations for big data frameworks Interactive queries on raw data Contribute to the Free Software community
  • 4.
  • 5. About BigFoot The BigFoot Architecture My Presentation . Scheduling . . HFSP: a new Hadoop scheduler Schedsim: a playground to simulate new schedulers . OpenStack . . Apache Spark on demand Work in progress: VM placement optimizations
  • 7. Scheduling in Hadoop Size-Based Scheduling “Fair” Sharing vs. Size-Based 100 cluster usage (%) 50 time (s) job 3 job 2 10 15 37.5 42.5 50 100 cluster usage (%) 10 20 30 50 50 time (s) job 1 job 1 job 2 job 3 job 1
  • 8. Scheduling in Hadoop Size-Based Scheduling “Fair” Sharing vs. Size-Based 100 cluster usage (%) 50 time (s) job 3 job 2 10 15 37.5 42.5 50 100 cluster usage (%) 10 20 30 50 50 time (s) job 1 job 1 job 2 job 3 job 1
  • 9. Scheduling in Hadoop HFSP HFSP: Size-Based Scheduling For Hadoop . . Consistently better than Fair Scheduler (and others…) The more the system is loaded, the more difference We estimate job sizes: it works! Download from https://github.com/bigfootproject/hfsp
  • 10. Scheduling in Hadoop PSBS PSBS – Practical Size-Based Scheduler Existing Schedulers PSBS: Our proposal . . Plotting scheduler response time blue: better than traditional “fair scheduler”; red: worse Paper: http://arxiv.org/abs/1410.6122 Simulator: https://github.com/bigfootproject/schedsim
  • 12. OpenStack Sahara OpenStack Sahara . Hadoop On-Demand . . Choose number and size of machines Choose Hadoop version Voila, a cluster in your datacenter! . Analytics As-A Service . . Compile your Jar Choose number and size of machines, etc., as before A cluster appears, does your analytics, and vanishes
  • 13. OpenStack Sahara Spark On Sahara . Spark Is Cool . . A project started by the Berkeley AMP Lab Fast: in-memory computing Easy: concise code in Scala or Python . What We Did . . We made Spark available on Sahara since May
  • 14. OpenStack Scheduling Work In Progress . OpenStack Scheduler . . Places virtual machines one at a time Allows hand-defined filters Tries to place VMs on least loaded hosts . What We Want To Do . . Do the placement of a cluster! VMs that talk a lot to each other: place them close Place them also close to data! Not too many: we don’t want to overload drives
  • 16. Parting Words Conclusion Thank You! . . These slides: http://bit.ly/bigfoot_owf14 . . Web: http://bigfootproject.eu Twitter: @bigfoot_project Github: http: //github.com/bigfootproject/ Bitbucket: bitbucket.org/bigfootproject/