SlideShare a Scribd company logo
Hadoop on Mesos
with a short history of distributed computing
Agenda
1. Introduction (to me)
2. A short history of distributed computing
3. Hadoop on Mesos
4. Case study - Airbnb
5. Final thoughts
6. Q&A
About me - Brenden Matthews
● cyclist
● runner
● started computering before it was cool
● free software advocate & contributor (Conky)
● for a living, engineers software @ Airbnb
About me - Brenden Matthews
● cyclist
● runner
● started computering before it was cool
● free software advocate & contributor (Conky)
● for a living, engineers software @
I don't even like computers.
Von Neumann Bottleneck
● Forever limited by memory and other I/O
bandwidth limitations
● To do more, you must scale beyond a single
node
● Even with SMP
systems, the same
limitations apply
A little history
Early days of distributed computing
● Working around the Von Neumann
Bottleneck: scaling up & out (Cray, SGI,
IBM)
● 'Supercomputers' only practical for
organizations with budget multipliers that
start with a 'B'
Who has time to build a datacentre?
● Xen hypervisor is released in 2003, paves
the way for an 'abstract datacentre' through
virtualization
● Amazon launches EC2 in 2006, kicks off the
'cloud computing' craze
DIY supercomputer; a novel approach
● Google's MapReduce papers formalized the
concept of 'black-box' distributed computing
(2004)
● Google's own infrastructure is built upon free
software and commodity hardware
DIY supercomputer; a novel approach
● Hadoop: a free implementation of Google's
infrastructure; 'big computing' for all (2005)
○ Robust
○ High tolerance of system failure
We're still left with
many incomplete solutions
● EC2 doesn't solve some problems:
○ Virtualization delivers poor performance when
compared to 'bare metal'; must compensate by
adding more instances
○ Frequent instance failures (mystery reboots, etc)
○ EC2 isn't 'application aware' (though some have
tried)
What else?
● Supercomputers aren't affordable
● Building a datacentre is not feasible for most
● Existing 'application in the cloud' systems
are too restrictive
How can we overcome
these problems?
The dream is alive.
Mesos is an operating system for your cluster
that provides application level distributed
computing
Mesos helps bridge the gap between the
hardware and your application (or 'framework',
in Mesos terms)
What's Mesos?
Why Mesos?
yes, but...
I enjoy doing things the hard way.
I really enjoy doing
things the hard way.
Hadoop on Mesos: Why?
● Formalized, scalable distributed computing
● Extensive toolset (Hive, Pig, Cascading,
Cascalog, ...)
● Familiar to many ('gold standard')
● Hadoop as a distributed application (a novel
concept!)
● Multiple versions of Hadoop (upgrade path)
● Why stop at Hadoop? There's more to do
with our cluster! (Chronos, Storm, Jenkins,
Spark, ...) and who has time to manage it?
Hadoop on Mesos: Goals
● Avoid complexity: rely on existing, vetted
systems, where possible
● Hadoop on Mesos should behave like any
other Hadoop
● Realize high resource utilization
● Minimize contention & starvation
● Make Hadoop a first class framework on
Mesos
Hadoop terminology
● JobTracker: manages cluster resources,
assigns tasks to TaskTrackers
● TaskTracker: manages individual
map/reduce tasks, serves intermediate data
amongst other TaskTrackers
● Job: collection of map and reduce tasks
● Task: one unit of work for a job (be it map or
reduce)
● Slot: a task executor, is either map or
reduce
● HDFS: distributed filesystem (outside scope)
Hadoop on Mesos: Challenges
● Availability: JobTracker must ensure
adequate map and reduce slots are
available for current & future jobs
● Capacity: how do you estimate capacity?
How do you profile jobs?
● Optimization: general case, or specific
cases? Per job resource allocation policies?
Separate JobTrackers for different job
types?
Hadoop on Mesos: Challenges
○ Mesos reservations allow for reservation of slave
resources for frameworks
○ Hadoop FairScheduler supports role fair sharing and
task pre-emption within JobTracker
● Resource reservations:
handling competing
frameworks on the same
cluster
Hadoop on Mesos: Challenges
Job Maps Reduces Duration Start
1 95 5 1h 0
2 5 100 1m 1m
3 10 10 30m 60m
4 50 0 20m 70m
5 100 5 1h 80m
Maps Reduces
95 5
48 52
10 10
60 10
90 10
Job Flow
With capacity for 100 slots
A contrived example
Maps Reduces
50 50
50 50
50 50
50 50
50 50
Ideal allocation Actual Hadoop
Hadoop on Mesos: What we did
● Mesos Scheduler is a thin layer atop the
Hadoop scheduler
● JobTracker launches TaskTrackers for each
job, using either a fixed or variable slot policy
○ Fixed policy launches a fixed number of slots per
TaskTracker
○ Variable policy attempts to launch an ideal number
of TaskTrackers and slots based on job queue
● Task scheduling is left to the underlying
scheduler (i.e., Hadoop FairScheduler)
Suggested key configuration values
Hadoop on Mesos: How we did it
Name Value
mapred.tasktracker.map.tasks.maximum 50
mapred.tasktracker.reduce.tasks.maximum 50
mapred.mesos.slot.map.minimum 1000
mapred.mesos.slot.reduce.minimum 1000
mapred.mesos.scheduler.policy.fixed false
mapred.mesos.slot.cpus 0.95
mapred.mesos.slot.mem 1550
● Engineering & analytics departments use
Hive, Pig, Cascading and other tools on
Hadoop:
○ Building search indices
○ Pricing suggestion system
○ Trust & safety, fraud detection
○ Business analytics
● Dealing with hypergrowth
Case study: Airbnb
● Had previously been using EMR, Amazon's
managed Hadoop as a service
● EMR suffers from:
○ limited Hive/Pig features
○ feature lag
○ inability to patch or modify Hadoop
● Data infrastructure was prone to error due to
significant complexity
○ EMR clusters would be spun up & destroyed every
week
○ accessing Hadoop required strange SSH 'hopping'
Case study: Airbnb, yesterday
Case study: Airbnb, today
● We run Chronos, Hadoop, and Storm on
Mesos now
● Finished complete migration to Mesos from
EMR (June 2013)
● ~500 Chronos jobs
● ~20TiB of daily Hive data, ~1-2PiB of
archived data
● Data availability: all time high
● Eng. & analytics customer satisfaction
through the roof
Case study: Airbnb, today
Action shots
Action shots
Next steps
● Locality awareness
● HDFS on Mesos
● HA JobTracker
● JobTracker on Mesos
Links
● The code: https://github.com/airbnb/mesos
● Airbnb Engineering Blog: http://nerds.airbnb.
com/
● My other stuff: https://github.
com/brndnmtthws
brenden@diddyinc.com
brenden.matthews@airbnb.com
Thanks!
Questions?

More Related Content

What's hot

Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
Yousun Jeong
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc.
 
Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3
Alluxio, Inc.
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
Bill Liu
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
Peter Clapham
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
Joe Stein
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Spark Summit
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
DataStax Academy
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
Joe Stein
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Amazon Web Services
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
DoiT International
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Radhika Puthiyetath
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
Scott Miao
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
DataStax Academy
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
Gwen (Chen) Shapira
 

What's hot (20)

Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 

Viewers also liked

OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101
Trinath Somanchi
 
Distributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use casesDistributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use cases
Trinath Somanchi
 
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
Trinath Somanchi
 
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and SolutionsSecuring NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Trinath Somanchi
 
Optimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's HeadOptimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's Head
David Skok
 
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on SecuritySDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security
Trinath Somanchi
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
Carol Smith
 

Viewers also liked (7)

OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101
 
Distributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use casesDistributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use cases
 
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
 
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and SolutionsSecuring NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
 
Optimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's HeadOptimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's Head
 
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on SecuritySDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar to Hadoop on-mesos

Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and Integration
Alex Baretto
 
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterMesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Ankur Chauhan
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
Andy Petrella
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
aswini pilli
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
Aswini Ashu
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Anant Corporation
 
Apache Mesos Distributed Computing Talk
Apache Mesos Distributed Computing Talk Apache Mesos Distributed Computing Talk
Apache Mesos Distributed Computing Talk
brandongulla
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Reynold Xin
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
Aamir Ameen
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
Harika583
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
samthemonad
 
Hadoop
HadoopHadoop
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
Debajani Mohanty
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
Sathish24111
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Big Data Montreal
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 

Similar to Hadoop on-mesos (20)

Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and Integration
 
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterMesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
 
Apache Mesos Distributed Computing Talk
Apache Mesos Distributed Computing Talk Apache Mesos Distributed Computing Talk
Apache Mesos Distributed Computing Talk
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Hadoop
HadoopHadoop
Hadoop
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

Hadoop on-mesos

  • 1. Hadoop on Mesos with a short history of distributed computing
  • 2. Agenda 1. Introduction (to me) 2. A short history of distributed computing 3. Hadoop on Mesos 4. Case study - Airbnb 5. Final thoughts 6. Q&A
  • 3. About me - Brenden Matthews ● cyclist ● runner ● started computering before it was cool ● free software advocate & contributor (Conky) ● for a living, engineers software @ Airbnb
  • 4. About me - Brenden Matthews ● cyclist ● runner ● started computering before it was cool ● free software advocate & contributor (Conky) ● for a living, engineers software @ I don't even like computers.
  • 5. Von Neumann Bottleneck ● Forever limited by memory and other I/O bandwidth limitations ● To do more, you must scale beyond a single node ● Even with SMP systems, the same limitations apply A little history
  • 6. Early days of distributed computing ● Working around the Von Neumann Bottleneck: scaling up & out (Cray, SGI, IBM) ● 'Supercomputers' only practical for organizations with budget multipliers that start with a 'B'
  • 7. Who has time to build a datacentre? ● Xen hypervisor is released in 2003, paves the way for an 'abstract datacentre' through virtualization ● Amazon launches EC2 in 2006, kicks off the 'cloud computing' craze
  • 8. DIY supercomputer; a novel approach ● Google's MapReduce papers formalized the concept of 'black-box' distributed computing (2004) ● Google's own infrastructure is built upon free software and commodity hardware
  • 9. DIY supercomputer; a novel approach ● Hadoop: a free implementation of Google's infrastructure; 'big computing' for all (2005) ○ Robust ○ High tolerance of system failure
  • 10. We're still left with many incomplete solutions ● EC2 doesn't solve some problems: ○ Virtualization delivers poor performance when compared to 'bare metal'; must compensate by adding more instances ○ Frequent instance failures (mystery reboots, etc) ○ EC2 isn't 'application aware' (though some have tried) What else? ● Supercomputers aren't affordable ● Building a datacentre is not feasible for most ● Existing 'application in the cloud' systems are too restrictive
  • 11. How can we overcome these problems?
  • 12. The dream is alive.
  • 13. Mesos is an operating system for your cluster that provides application level distributed computing Mesos helps bridge the gap between the hardware and your application (or 'framework', in Mesos terms) What's Mesos?
  • 15. I enjoy doing things the hard way.
  • 16. I really enjoy doing things the hard way.
  • 17. Hadoop on Mesos: Why? ● Formalized, scalable distributed computing ● Extensive toolset (Hive, Pig, Cascading, Cascalog, ...) ● Familiar to many ('gold standard') ● Hadoop as a distributed application (a novel concept!) ● Multiple versions of Hadoop (upgrade path) ● Why stop at Hadoop? There's more to do with our cluster! (Chronos, Storm, Jenkins, Spark, ...) and who has time to manage it?
  • 18. Hadoop on Mesos: Goals ● Avoid complexity: rely on existing, vetted systems, where possible ● Hadoop on Mesos should behave like any other Hadoop ● Realize high resource utilization ● Minimize contention & starvation ● Make Hadoop a first class framework on Mesos
  • 19. Hadoop terminology ● JobTracker: manages cluster resources, assigns tasks to TaskTrackers ● TaskTracker: manages individual map/reduce tasks, serves intermediate data amongst other TaskTrackers ● Job: collection of map and reduce tasks ● Task: one unit of work for a job (be it map or reduce) ● Slot: a task executor, is either map or reduce ● HDFS: distributed filesystem (outside scope)
  • 20. Hadoop on Mesos: Challenges ● Availability: JobTracker must ensure adequate map and reduce slots are available for current & future jobs ● Capacity: how do you estimate capacity? How do you profile jobs? ● Optimization: general case, or specific cases? Per job resource allocation policies? Separate JobTrackers for different job types?
  • 21. Hadoop on Mesos: Challenges ○ Mesos reservations allow for reservation of slave resources for frameworks ○ Hadoop FairScheduler supports role fair sharing and task pre-emption within JobTracker ● Resource reservations: handling competing frameworks on the same cluster
  • 22. Hadoop on Mesos: Challenges Job Maps Reduces Duration Start 1 95 5 1h 0 2 5 100 1m 1m 3 10 10 30m 60m 4 50 0 20m 70m 5 100 5 1h 80m Maps Reduces 95 5 48 52 10 10 60 10 90 10 Job Flow With capacity for 100 slots A contrived example Maps Reduces 50 50 50 50 50 50 50 50 50 50 Ideal allocation Actual Hadoop
  • 23. Hadoop on Mesos: What we did ● Mesos Scheduler is a thin layer atop the Hadoop scheduler ● JobTracker launches TaskTrackers for each job, using either a fixed or variable slot policy ○ Fixed policy launches a fixed number of slots per TaskTracker ○ Variable policy attempts to launch an ideal number of TaskTrackers and slots based on job queue ● Task scheduling is left to the underlying scheduler (i.e., Hadoop FairScheduler)
  • 24. Suggested key configuration values Hadoop on Mesos: How we did it Name Value mapred.tasktracker.map.tasks.maximum 50 mapred.tasktracker.reduce.tasks.maximum 50 mapred.mesos.slot.map.minimum 1000 mapred.mesos.slot.reduce.minimum 1000 mapred.mesos.scheduler.policy.fixed false mapred.mesos.slot.cpus 0.95 mapred.mesos.slot.mem 1550
  • 25. ● Engineering & analytics departments use Hive, Pig, Cascading and other tools on Hadoop: ○ Building search indices ○ Pricing suggestion system ○ Trust & safety, fraud detection ○ Business analytics ● Dealing with hypergrowth Case study: Airbnb
  • 26. ● Had previously been using EMR, Amazon's managed Hadoop as a service ● EMR suffers from: ○ limited Hive/Pig features ○ feature lag ○ inability to patch or modify Hadoop ● Data infrastructure was prone to error due to significant complexity ○ EMR clusters would be spun up & destroyed every week ○ accessing Hadoop required strange SSH 'hopping' Case study: Airbnb, yesterday
  • 27. Case study: Airbnb, today ● We run Chronos, Hadoop, and Storm on Mesos now ● Finished complete migration to Mesos from EMR (June 2013) ● ~500 Chronos jobs ● ~20TiB of daily Hive data, ~1-2PiB of archived data
  • 28. ● Data availability: all time high ● Eng. & analytics customer satisfaction through the roof Case study: Airbnb, today
  • 31. Next steps ● Locality awareness ● HDFS on Mesos ● HA JobTracker ● JobTracker on Mesos
  • 32. Links ● The code: https://github.com/airbnb/mesos ● Airbnb Engineering Blog: http://nerds.airbnb. com/ ● My other stuff: https://github. com/brndnmtthws brenden@diddyinc.com brenden.matthews@airbnb.com