Hadoop on-mesos

Hadoop on Mesos
with a short history of distributed computing
Agenda
1. Introduction (to me)
2. A short history of distributed computing
3. Hadoop on Mesos
4. Case study - Airbnb
5. Final thoughts
6. Q&A
About me - Brenden Matthews
● cyclist
● runner
● started computering before it was cool
● free software advocate & contributor (Conky)
● for a living, engineers software @ Airbnb
About me - Brenden Matthews
● cyclist
● runner
● started computering before it was cool
● free software advocate & contributor (Conky)
● for a living, engineers software @
I don't even like computers.
Von Neumann Bottleneck
● Forever limited by memory and other I/O
bandwidth limitations
● To do more, you must scale beyond a single
node
● Even with SMP
systems, the same
limitations apply
A little history
Early days of distributed computing
● Working around the Von Neumann
Bottleneck: scaling up & out (Cray, SGI,
IBM)
● 'Supercomputers' only practical for
organizations with budget multipliers that
start with a 'B'
Who has time to build a datacentre?
● Xen hypervisor is released in 2003, paves
the way for an 'abstract datacentre' through
virtualization
● Amazon launches EC2 in 2006, kicks off the
'cloud computing' craze
DIY supercomputer; a novel approach
● Google's MapReduce papers formalized the
concept of 'black-box' distributed computing
(2004)
● Google's own infrastructure is built upon free
software and commodity hardware
DIY supercomputer; a novel approach
● Hadoop: a free implementation of Google's
infrastructure; 'big computing' for all (2005)
○ Robust
○ High tolerance of system failure
We're still left with
many incomplete solutions
● EC2 doesn't solve some problems:
○ Virtualization delivers poor performance when
compared to 'bare metal'; must compensate by
adding more instances
○ Frequent instance failures (mystery reboots, etc)
○ EC2 isn't 'application aware' (though some have
tried)
What else?
● Supercomputers aren't affordable
● Building a datacentre is not feasible for most
● Existing 'application in the cloud' systems
are too restrictive
How can we overcome
these problems?
The dream is alive.
Mesos is an operating system for your cluster
that provides application level distributed
computing
Mesos helps bridge the gap between the
hardware and your application (or 'framework',
in Mesos terms)
What's Mesos?
Why Mesos?
yes, but...
I enjoy doing things the hard way.
I really enjoy doing
things the hard way.
Hadoop on Mesos: Why?
● Formalized, scalable distributed computing
● Extensive toolset (Hive, Pig, Cascading,
Cascalog, ...)
● Familiar to many ('gold standard')
● Hadoop as a distributed application (a novel
concept!)
● Multiple versions of Hadoop (upgrade path)
● Why stop at Hadoop? There's more to do
with our cluster! (Chronos, Storm, Jenkins,
Spark, ...) and who has time to manage it?
Hadoop on Mesos: Goals
● Avoid complexity: rely on existing, vetted
systems, where possible
● Hadoop on Mesos should behave like any
other Hadoop
● Realize high resource utilization
● Minimize contention & starvation
● Make Hadoop a first class framework on
Mesos
Hadoop terminology
● JobTracker: manages cluster resources,
assigns tasks to TaskTrackers
● TaskTracker: manages individual
map/reduce tasks, serves intermediate data
amongst other TaskTrackers
● Job: collection of map and reduce tasks
● Task: one unit of work for a job (be it map or
reduce)
● Slot: a task executor, is either map or
reduce
● HDFS: distributed filesystem (outside scope)
Hadoop on Mesos: Challenges
● Availability: JobTracker must ensure
adequate map and reduce slots are
available for current & future jobs
● Capacity: how do you estimate capacity?
How do you profile jobs?
● Optimization: general case, or specific
cases? Per job resource allocation policies?
Separate JobTrackers for different job
types?
Hadoop on Mesos: Challenges
○ Mesos reservations allow for reservation of slave
resources for frameworks
○ Hadoop FairScheduler supports role fair sharing and
task pre-emption within JobTracker
● Resource reservations:
handling competing
frameworks on the same
cluster
Hadoop on Mesos: Challenges
Job Maps Reduces Duration Start
1 95 5 1h 0
2 5 100 1m 1m
3 10 10 30m 60m
4 50 0 20m 70m
5 100 5 1h 80m
Maps Reduces
95 5
48 52
10 10
60 10
90 10
Job Flow
With capacity for 100 slots
A contrived example
Maps Reduces
50 50
50 50
50 50
50 50
50 50
Ideal allocation Actual Hadoop
Hadoop on Mesos: What we did
● Mesos Scheduler is a thin layer atop the
Hadoop scheduler
● JobTracker launches TaskTrackers for each
job, using either a fixed or variable slot policy
○ Fixed policy launches a fixed number of slots per
TaskTracker
○ Variable policy attempts to launch an ideal number
of TaskTrackers and slots based on job queue
● Task scheduling is left to the underlying
scheduler (i.e., Hadoop FairScheduler)
Suggested key configuration values
Hadoop on Mesos: How we did it
Name Value
mapred.tasktracker.map.tasks.maximum 50
mapred.tasktracker.reduce.tasks.maximum 50
mapred.mesos.slot.map.minimum 1000
mapred.mesos.slot.reduce.minimum 1000
mapred.mesos.scheduler.policy.fixed false
mapred.mesos.slot.cpus 0.95
mapred.mesos.slot.mem 1550
● Engineering & analytics departments use
Hive, Pig, Cascading and other tools on
Hadoop:
○ Building search indices
○ Pricing suggestion system
○ Trust & safety, fraud detection
○ Business analytics
● Dealing with hypergrowth
Case study: Airbnb
● Had previously been using EMR, Amazon's
managed Hadoop as a service
● EMR suffers from:
○ limited Hive/Pig features
○ feature lag
○ inability to patch or modify Hadoop
● Data infrastructure was prone to error due to
significant complexity
○ EMR clusters would be spun up & destroyed every
week
○ accessing Hadoop required strange SSH 'hopping'
Case study: Airbnb, yesterday
Case study: Airbnb, today
● We run Chronos, Hadoop, and Storm on
Mesos now
● Finished complete migration to Mesos from
EMR (June 2013)
● ~500 Chronos jobs
● ~20TiB of daily Hive data, ~1-2PiB of
archived data
● Data availability: all time high
● Eng. & analytics customer satisfaction
through the roof
Case study: Airbnb, today
Action shots
Action shots
Next steps
● Locality awareness
● HDFS on Mesos
● HA JobTracker
● JobTracker on Mesos
Links
● The code: https://github.com/airbnb/mesos
● Airbnb Engineering Blog: http://nerds.airbnb.
com/
● My other stuff: https://github.
com/brndnmtthws
brenden@diddyinc.com
brenden.matthews@airbnb.com
Thanks!
Questions?
1 of 34

Recommended

Scalable On-Demand Hadoop Clusters with Docker and Mesos by
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosDataWorks Summit
3K views20 slides
Scaling Big Data with Hadoop and Mesos by
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosDiscover Pinterest
2.7K views50 slides
Fully fault tolerant real time data pipeline with docker and mesos by
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Rahul Kumar
359 views37 slides
February 2016 HUG: Running Spark Clusters in Containers with Docker by
February 2016 HUG: Running Spark Clusters in Containers with DockerFebruary 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerYahoo Developer Network
4.4K views31 slides
HBaseCon 2013: Apache HBase Operations at Pinterest by
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
8.5K views18 slides
Elastic HBase on Mesos - HBaseCon 2015 by
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
12.8K views46 slides

More Related Content

What's hot

Spark day 2017 - Spark on Kubernetes by
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesYousun Jeong
2.6K views50 slides
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters by
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
2.4K views36 slides
How to Protect Big Data in a Containerized Environment by
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
1.1K views41 slides
Accelerating Hive with Alluxio on S3 by
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Alluxio, Inc.
962 views27 slides
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo by
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
3.1K views30 slides
Apache Superset at Airbnb by
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at AirbnbBill Liu
8.4K views24 slides

What's hot(20)

Spark day 2017 - Spark on Kubernetes by Yousun Jeong
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
Yousun Jeong2.6K views
How to Protect Big Data in a Containerized Environment by BlueData, Inc.
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc. 1.1K views
Accelerating Hive with Alluxio on S3 by Alluxio, Inc.
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3
Alluxio, Inc.962 views
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo by Joe Stein
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein3.1K views
Apache Superset at Airbnb by Bill Liu
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
Bill Liu8.4K views
Lessons Learned Running Hadoop and Spark in Docker Containers by BlueData, Inc.
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc. 18.3K views
Streaming Processing with a Distributed Commit Log by Joe Stein
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
Joe Stein3.1K views
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed by Spark Summit
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Spark Summit2.2K views
Get started with Developing Frameworks in Go on Apache Mesos by Joe Stein
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
Joe Stein1.9K views
HPC and cloud distributed computing, as a journey by Peter Clapham
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham871 views
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303... by Amazon Web Services
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Amazon Web Services86.5K views
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen... by Radhika Puthiyetath
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Radhika Puthiyetath2.7K views
Achieve big data analytic platform with lambda architecture on cloud by Scott Miao
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
Scott Miao1.1K views
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ... by DataStax Academy
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
DataStax Academy29.3K views

Viewers also liked

OpenStack DRaaS - Freezer - 101 by
OpenStack DRaaS - Freezer - 101OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101Trinath Somanchi
1.2K views16 slides
Distributed VNF Management - Architecture and Use cases by
Distributed VNF Management - Architecture and Use casesDistributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use casesTrinath Somanchi
978 views21 slides
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more.. by
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..Trinath Somanchi
1.3K views24 slides
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions by
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and SolutionsSecuring NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and SolutionsTrinath Somanchi
1.1K views26 slides
Optimize Your Funnel By Getting Inside Your Buyer's Head by
Optimize Your Funnel By Getting Inside Your Buyer's HeadOptimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's HeadDavid Skok
23.9K views84 slides
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security by
SDN and NFV integrated OpenStack Cloud - Birds eye view on SecuritySDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on SecurityTrinath Somanchi
737 views17 slides

Viewers also liked(7)

Distributed VNF Management - Architecture and Use cases by Trinath Somanchi
Distributed VNF Management - Architecture and Use casesDistributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use cases
Trinath Somanchi978 views
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more.. by Trinath Somanchi
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
Trinath Somanchi1.3K views
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions by Trinath Somanchi
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and SolutionsSecuring NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Trinath Somanchi1.1K views
Optimize Your Funnel By Getting Inside Your Buyer's Head by David Skok
Optimize Your Funnel By Getting Inside Your Buyer's HeadOptimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's Head
David Skok23.9K views
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security by Trinath Somanchi
SDN and NFV integrated OpenStack Cloud - Birds eye view on SecuritySDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security
Trinath Somanchi737 views
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017 by Carol Smith
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
Carol Smith4.9M views

Similar to Hadoop on-mesos

Apache Mesos Overview and Integration by
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and IntegrationAlex Baretto
595 views33 slides
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center by
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterMesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterAnkur Chauhan
875 views19 slides
Seminar Presentation Hadoop by
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
81.5K views51 slides
What is Distributed Computing, Why we use Apache Spark by
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
6.4K views45 slides
project--2 nd review_2 by
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
178 views38 slides
project--2 nd review_2 by
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
96 views38 slides

Similar to Hadoop on-mesos(20)

Apache Mesos Overview and Integration by Alex Baretto
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and Integration
Alex Baretto595 views
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center by Ankur Chauhan
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterMesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Ankur Chauhan875 views
Seminar Presentation Hadoop by Varun Narang
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang81.5K views
What is Distributed Computing, Why we use Apache Spark by Andy Petrella
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
Andy Petrella6.4K views
project--2 nd review_2 by aswini pilli
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
aswini pilli178 views
project--2 nd review_2 by Aswini Ashu
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
Aswini Ashu96 views
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2 by Anant Corporation
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Anant Corporation237 views
Apache Mesos Distributed Computing Talk by brandongulla
Apache Mesos Distributed Computing Talk Apache Mesos Distributed Computing Talk
Apache Mesos Distributed Computing Talk
brandongulla293 views
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C... by Reynold Xin
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Reynold Xin18.1K views
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015 by Deanna Kosaraju
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju1.5K views
Hadoop live online training by Harika583
Hadoop live online trainingHadoop live online training
Hadoop live online training
Harika583483 views
Architecting and productionising data science applications at scale by samthemonad
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
samthemonad201 views
CouchBase The Complete NoSql Solution for Big Data by Debajani Mohanty
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
Debajani Mohanty987 views
An Introduction to Apache Hadoop, Mahout and HBase by Lukas Vlcek
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek5.4K views
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal... by Big Data Montreal
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Big Data Montreal 1.3K views

Recently uploaded

Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITShapeBlue
66 views8 slides
State of the Union - Rohit Yadav - Apache CloudStack by
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStackShapeBlue
106 views53 slides
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
46 views28 slides
"Surviving highload with Node.js", Andrii Shumada by
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
33 views29 slides
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Moses Kemibaro
27 views38 slides
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueShapeBlue
71 views20 slides

Recently uploaded(20)

Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue66 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue106 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue46 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays33 views
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro27 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue71 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman38 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray1042 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue37 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates by ShapeBlue
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesKeynote Talk: Open Source is Not Dead - Charles Schulz - Vates
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates
ShapeBlue84 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue75 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue31 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson126 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc72 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software317 views

Hadoop on-mesos

  • 1. Hadoop on Mesos with a short history of distributed computing
  • 2. Agenda 1. Introduction (to me) 2. A short history of distributed computing 3. Hadoop on Mesos 4. Case study - Airbnb 5. Final thoughts 6. Q&A
  • 3. About me - Brenden Matthews ● cyclist ● runner ● started computering before it was cool ● free software advocate & contributor (Conky) ● for a living, engineers software @ Airbnb
  • 4. About me - Brenden Matthews ● cyclist ● runner ● started computering before it was cool ● free software advocate & contributor (Conky) ● for a living, engineers software @ I don't even like computers.
  • 5. Von Neumann Bottleneck ● Forever limited by memory and other I/O bandwidth limitations ● To do more, you must scale beyond a single node ● Even with SMP systems, the same limitations apply A little history
  • 6. Early days of distributed computing ● Working around the Von Neumann Bottleneck: scaling up & out (Cray, SGI, IBM) ● 'Supercomputers' only practical for organizations with budget multipliers that start with a 'B'
  • 7. Who has time to build a datacentre? ● Xen hypervisor is released in 2003, paves the way for an 'abstract datacentre' through virtualization ● Amazon launches EC2 in 2006, kicks off the 'cloud computing' craze
  • 8. DIY supercomputer; a novel approach ● Google's MapReduce papers formalized the concept of 'black-box' distributed computing (2004) ● Google's own infrastructure is built upon free software and commodity hardware
  • 9. DIY supercomputer; a novel approach ● Hadoop: a free implementation of Google's infrastructure; 'big computing' for all (2005) ○ Robust ○ High tolerance of system failure
  • 10. We're still left with many incomplete solutions ● EC2 doesn't solve some problems: ○ Virtualization delivers poor performance when compared to 'bare metal'; must compensate by adding more instances ○ Frequent instance failures (mystery reboots, etc) ○ EC2 isn't 'application aware' (though some have tried) What else? ● Supercomputers aren't affordable ● Building a datacentre is not feasible for most ● Existing 'application in the cloud' systems are too restrictive
  • 11. How can we overcome these problems?
  • 12. The dream is alive.
  • 13. Mesos is an operating system for your cluster that provides application level distributed computing Mesos helps bridge the gap between the hardware and your application (or 'framework', in Mesos terms) What's Mesos?
  • 15. I enjoy doing things the hard way.
  • 16. I really enjoy doing things the hard way.
  • 17. Hadoop on Mesos: Why? ● Formalized, scalable distributed computing ● Extensive toolset (Hive, Pig, Cascading, Cascalog, ...) ● Familiar to many ('gold standard') ● Hadoop as a distributed application (a novel concept!) ● Multiple versions of Hadoop (upgrade path) ● Why stop at Hadoop? There's more to do with our cluster! (Chronos, Storm, Jenkins, Spark, ...) and who has time to manage it?
  • 18. Hadoop on Mesos: Goals ● Avoid complexity: rely on existing, vetted systems, where possible ● Hadoop on Mesos should behave like any other Hadoop ● Realize high resource utilization ● Minimize contention & starvation ● Make Hadoop a first class framework on Mesos
  • 19. Hadoop terminology ● JobTracker: manages cluster resources, assigns tasks to TaskTrackers ● TaskTracker: manages individual map/reduce tasks, serves intermediate data amongst other TaskTrackers ● Job: collection of map and reduce tasks ● Task: one unit of work for a job (be it map or reduce) ● Slot: a task executor, is either map or reduce ● HDFS: distributed filesystem (outside scope)
  • 20. Hadoop on Mesos: Challenges ● Availability: JobTracker must ensure adequate map and reduce slots are available for current & future jobs ● Capacity: how do you estimate capacity? How do you profile jobs? ● Optimization: general case, or specific cases? Per job resource allocation policies? Separate JobTrackers for different job types?
  • 21. Hadoop on Mesos: Challenges ○ Mesos reservations allow for reservation of slave resources for frameworks ○ Hadoop FairScheduler supports role fair sharing and task pre-emption within JobTracker ● Resource reservations: handling competing frameworks on the same cluster
  • 22. Hadoop on Mesos: Challenges Job Maps Reduces Duration Start 1 95 5 1h 0 2 5 100 1m 1m 3 10 10 30m 60m 4 50 0 20m 70m 5 100 5 1h 80m Maps Reduces 95 5 48 52 10 10 60 10 90 10 Job Flow With capacity for 100 slots A contrived example Maps Reduces 50 50 50 50 50 50 50 50 50 50 Ideal allocation Actual Hadoop
  • 23. Hadoop on Mesos: What we did ● Mesos Scheduler is a thin layer atop the Hadoop scheduler ● JobTracker launches TaskTrackers for each job, using either a fixed or variable slot policy ○ Fixed policy launches a fixed number of slots per TaskTracker ○ Variable policy attempts to launch an ideal number of TaskTrackers and slots based on job queue ● Task scheduling is left to the underlying scheduler (i.e., Hadoop FairScheduler)
  • 24. Suggested key configuration values Hadoop on Mesos: How we did it Name Value mapred.tasktracker.map.tasks.maximum 50 mapred.tasktracker.reduce.tasks.maximum 50 mapred.mesos.slot.map.minimum 1000 mapred.mesos.slot.reduce.minimum 1000 mapred.mesos.scheduler.policy.fixed false mapred.mesos.slot.cpus 0.95 mapred.mesos.slot.mem 1550
  • 25. ● Engineering & analytics departments use Hive, Pig, Cascading and other tools on Hadoop: ○ Building search indices ○ Pricing suggestion system ○ Trust & safety, fraud detection ○ Business analytics ● Dealing with hypergrowth Case study: Airbnb
  • 26. ● Had previously been using EMR, Amazon's managed Hadoop as a service ● EMR suffers from: ○ limited Hive/Pig features ○ feature lag ○ inability to patch or modify Hadoop ● Data infrastructure was prone to error due to significant complexity ○ EMR clusters would be spun up & destroyed every week ○ accessing Hadoop required strange SSH 'hopping' Case study: Airbnb, yesterday
  • 27. Case study: Airbnb, today ● We run Chronos, Hadoop, and Storm on Mesos now ● Finished complete migration to Mesos from EMR (June 2013) ● ~500 Chronos jobs ● ~20TiB of daily Hive data, ~1-2PiB of archived data
  • 28. ● Data availability: all time high ● Eng. & analytics customer satisfaction through the roof Case study: Airbnb, today
  • 31. Next steps ● Locality awareness ● HDFS on Mesos ● HA JobTracker ● JobTracker on Mesos
  • 32. Links ● The code: https://github.com/airbnb/mesos ● Airbnb Engineering Blog: http://nerds.airbnb. com/ ● My other stuff: https://github. com/brndnmtthws brenden@diddyinc.com brenden.matthews@airbnb.com