SlideShare a Scribd company logo
1 of 34
Download to read offline
Hadoop on Mesos
with a short history of distributed computing
Agenda
1. Introduction (to me)
2. A short history of distributed computing
3. Hadoop on Mesos
4. Case study - Airbnb
5. Final thoughts
6. Q&A
About me - Brenden Matthews
● cyclist
● runner
● started computering before it was cool
● free software advocate & contributor (Conky)
● for a living, engineers software @ Airbnb
About me - Brenden Matthews
● cyclist
● runner
● started computering before it was cool
● free software advocate & contributor (Conky)
● for a living, engineers software @
I don't even like computers.
Von Neumann Bottleneck
● Forever limited by memory and other I/O
bandwidth limitations
● To do more, you must scale beyond a single
node
● Even with SMP
systems, the same
limitations apply
A little history
Early days of distributed computing
● Working around the Von Neumann
Bottleneck: scaling up & out (Cray, SGI,
IBM)
● 'Supercomputers' only practical for
organizations with budget multipliers that
start with a 'B'
Who has time to build a datacentre?
● Xen hypervisor is released in 2003, paves
the way for an 'abstract datacentre' through
virtualization
● Amazon launches EC2 in 2006, kicks off the
'cloud computing' craze
DIY supercomputer; a novel approach
● Google's MapReduce papers formalized the
concept of 'black-box' distributed computing
(2004)
● Google's own infrastructure is built upon free
software and commodity hardware
DIY supercomputer; a novel approach
● Hadoop: a free implementation of Google's
infrastructure; 'big computing' for all (2005)
○ Robust
○ High tolerance of system failure
We're still left with
many incomplete solutions
● EC2 doesn't solve some problems:
○ Virtualization delivers poor performance when
compared to 'bare metal'; must compensate by
adding more instances
○ Frequent instance failures (mystery reboots, etc)
○ EC2 isn't 'application aware' (though some have
tried)
What else?
● Supercomputers aren't affordable
● Building a datacentre is not feasible for most
● Existing 'application in the cloud' systems
are too restrictive
How can we overcome
these problems?
The dream is alive.
Mesos is an operating system for your cluster
that provides application level distributed
computing
Mesos helps bridge the gap between the
hardware and your application (or 'framework',
in Mesos terms)
What's Mesos?
Why Mesos?
yes, but...
I enjoy doing things the hard way.
I really enjoy doing
things the hard way.
Hadoop on Mesos: Why?
● Formalized, scalable distributed computing
● Extensive toolset (Hive, Pig, Cascading,
Cascalog, ...)
● Familiar to many ('gold standard')
● Hadoop as a distributed application (a novel
concept!)
● Multiple versions of Hadoop (upgrade path)
● Why stop at Hadoop? There's more to do
with our cluster! (Chronos, Storm, Jenkins,
Spark, ...) and who has time to manage it?
Hadoop on Mesos: Goals
● Avoid complexity: rely on existing, vetted
systems, where possible
● Hadoop on Mesos should behave like any
other Hadoop
● Realize high resource utilization
● Minimize contention & starvation
● Make Hadoop a first class framework on
Mesos
Hadoop terminology
● JobTracker: manages cluster resources,
assigns tasks to TaskTrackers
● TaskTracker: manages individual
map/reduce tasks, serves intermediate data
amongst other TaskTrackers
● Job: collection of map and reduce tasks
● Task: one unit of work for a job (be it map or
reduce)
● Slot: a task executor, is either map or
reduce
● HDFS: distributed filesystem (outside scope)
Hadoop on Mesos: Challenges
● Availability: JobTracker must ensure
adequate map and reduce slots are
available for current & future jobs
● Capacity: how do you estimate capacity?
How do you profile jobs?
● Optimization: general case, or specific
cases? Per job resource allocation policies?
Separate JobTrackers for different job
types?
Hadoop on Mesos: Challenges
○ Mesos reservations allow for reservation of slave
resources for frameworks
○ Hadoop FairScheduler supports role fair sharing and
task pre-emption within JobTracker
● Resource reservations:
handling competing
frameworks on the same
cluster
Hadoop on Mesos: Challenges
Job Maps Reduces Duration Start
1 95 5 1h 0
2 5 100 1m 1m
3 10 10 30m 60m
4 50 0 20m 70m
5 100 5 1h 80m
Maps Reduces
95 5
48 52
10 10
60 10
90 10
Job Flow
With capacity for 100 slots
A contrived example
Maps Reduces
50 50
50 50
50 50
50 50
50 50
Ideal allocation Actual Hadoop
Hadoop on Mesos: What we did
● Mesos Scheduler is a thin layer atop the
Hadoop scheduler
● JobTracker launches TaskTrackers for each
job, using either a fixed or variable slot policy
○ Fixed policy launches a fixed number of slots per
TaskTracker
○ Variable policy attempts to launch an ideal number
of TaskTrackers and slots based on job queue
● Task scheduling is left to the underlying
scheduler (i.e., Hadoop FairScheduler)
Suggested key configuration values
Hadoop on Mesos: How we did it
Name Value
mapred.tasktracker.map.tasks.maximum 50
mapred.tasktracker.reduce.tasks.maximum 50
mapred.mesos.slot.map.minimum 1000
mapred.mesos.slot.reduce.minimum 1000
mapred.mesos.scheduler.policy.fixed false
mapred.mesos.slot.cpus 0.95
mapred.mesos.slot.mem 1550
● Engineering & analytics departments use
Hive, Pig, Cascading and other tools on
Hadoop:
○ Building search indices
○ Pricing suggestion system
○ Trust & safety, fraud detection
○ Business analytics
● Dealing with hypergrowth
Case study: Airbnb
● Had previously been using EMR, Amazon's
managed Hadoop as a service
● EMR suffers from:
○ limited Hive/Pig features
○ feature lag
○ inability to patch or modify Hadoop
● Data infrastructure was prone to error due to
significant complexity
○ EMR clusters would be spun up & destroyed every
week
○ accessing Hadoop required strange SSH 'hopping'
Case study: Airbnb, yesterday
Case study: Airbnb, today
● We run Chronos, Hadoop, and Storm on
Mesos now
● Finished complete migration to Mesos from
EMR (June 2013)
● ~500 Chronos jobs
● ~20TiB of daily Hive data, ~1-2PiB of
archived data
● Data availability: all time high
● Eng. & analytics customer satisfaction
through the roof
Case study: Airbnb, today
Action shots
Action shots
Next steps
● Locality awareness
● HDFS on Mesos
● HA JobTracker
● JobTracker on Mesos
Links
● The code: https://github.com/airbnb/mesos
● Airbnb Engineering Blog: http://nerds.airbnb.
com/
● My other stuff: https://github.
com/brndnmtthws
brenden@diddyinc.com
brenden.matthews@airbnb.com
Thanks!
Questions?

More Related Content

What's hot

Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesYousun Jeong
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
 
Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Alluxio, Inc.
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at AirbnbBill Liu
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Spark Summit
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosJoe Stein
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules RestructuredDoiT International
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...Radhika Puthiyetath
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...DataStax Academy
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 

What's hot (20)

Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 

Viewers also liked

OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101Trinath Somanchi
 
Distributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use casesDistributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use casesTrinath Somanchi
 
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..Trinath Somanchi
 
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and SolutionsSecuring NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and SolutionsTrinath Somanchi
 
Optimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's HeadOptimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's HeadDavid Skok
 
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on SecuritySDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on SecurityTrinath Somanchi
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

Viewers also liked (7)

OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101
 
Distributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use casesDistributed VNF Management - Architecture and Use cases
Distributed VNF Management - Architecture and Use cases
 
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
OpenStack Collaboration made in heaven with Heat, Mistral, Neutron and more..
 
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and SolutionsSecuring NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
Securing NFV and SDN Integrated OpenStack Cloud: Challenges and Solutions
 
Optimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's HeadOptimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's Head
 
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on SecuritySDN and NFV integrated OpenStack Cloud - Birds eye view on Security
SDN and NFV integrated OpenStack Cloud - Birds eye view on Security
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar to Hadoop on-mesos

Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and IntegrationAlex Baretto
 
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterMesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterAnkur Chauhan
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Anant Corporation
 
Apache Mesos Distributed Computing Talk
Apache Mesos Distributed Computing Talk Apache Mesos Distributed Computing Talk
Apache Mesos Distributed Computing Talk brandongulla
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseLukas Vlcek
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 

Similar to Hadoop on-mesos (20)

Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and Integration
 
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterMesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
 
Apache Mesos Distributed Computing Talk
Apache Mesos Distributed Computing Talk Apache Mesos Distributed Computing Talk
Apache Mesos Distributed Computing Talk
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Hadoop
HadoopHadoop
Hadoop
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 

Recently uploaded (20)

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 

Hadoop on-mesos

  • 1. Hadoop on Mesos with a short history of distributed computing
  • 2. Agenda 1. Introduction (to me) 2. A short history of distributed computing 3. Hadoop on Mesos 4. Case study - Airbnb 5. Final thoughts 6. Q&A
  • 3. About me - Brenden Matthews ● cyclist ● runner ● started computering before it was cool ● free software advocate & contributor (Conky) ● for a living, engineers software @ Airbnb
  • 4. About me - Brenden Matthews ● cyclist ● runner ● started computering before it was cool ● free software advocate & contributor (Conky) ● for a living, engineers software @ I don't even like computers.
  • 5. Von Neumann Bottleneck ● Forever limited by memory and other I/O bandwidth limitations ● To do more, you must scale beyond a single node ● Even with SMP systems, the same limitations apply A little history
  • 6. Early days of distributed computing ● Working around the Von Neumann Bottleneck: scaling up & out (Cray, SGI, IBM) ● 'Supercomputers' only practical for organizations with budget multipliers that start with a 'B'
  • 7. Who has time to build a datacentre? ● Xen hypervisor is released in 2003, paves the way for an 'abstract datacentre' through virtualization ● Amazon launches EC2 in 2006, kicks off the 'cloud computing' craze
  • 8. DIY supercomputer; a novel approach ● Google's MapReduce papers formalized the concept of 'black-box' distributed computing (2004) ● Google's own infrastructure is built upon free software and commodity hardware
  • 9. DIY supercomputer; a novel approach ● Hadoop: a free implementation of Google's infrastructure; 'big computing' for all (2005) ○ Robust ○ High tolerance of system failure
  • 10. We're still left with many incomplete solutions ● EC2 doesn't solve some problems: ○ Virtualization delivers poor performance when compared to 'bare metal'; must compensate by adding more instances ○ Frequent instance failures (mystery reboots, etc) ○ EC2 isn't 'application aware' (though some have tried) What else? ● Supercomputers aren't affordable ● Building a datacentre is not feasible for most ● Existing 'application in the cloud' systems are too restrictive
  • 11. How can we overcome these problems?
  • 12. The dream is alive.
  • 13. Mesos is an operating system for your cluster that provides application level distributed computing Mesos helps bridge the gap between the hardware and your application (or 'framework', in Mesos terms) What's Mesos?
  • 15. I enjoy doing things the hard way.
  • 16. I really enjoy doing things the hard way.
  • 17. Hadoop on Mesos: Why? ● Formalized, scalable distributed computing ● Extensive toolset (Hive, Pig, Cascading, Cascalog, ...) ● Familiar to many ('gold standard') ● Hadoop as a distributed application (a novel concept!) ● Multiple versions of Hadoop (upgrade path) ● Why stop at Hadoop? There's more to do with our cluster! (Chronos, Storm, Jenkins, Spark, ...) and who has time to manage it?
  • 18. Hadoop on Mesos: Goals ● Avoid complexity: rely on existing, vetted systems, where possible ● Hadoop on Mesos should behave like any other Hadoop ● Realize high resource utilization ● Minimize contention & starvation ● Make Hadoop a first class framework on Mesos
  • 19. Hadoop terminology ● JobTracker: manages cluster resources, assigns tasks to TaskTrackers ● TaskTracker: manages individual map/reduce tasks, serves intermediate data amongst other TaskTrackers ● Job: collection of map and reduce tasks ● Task: one unit of work for a job (be it map or reduce) ● Slot: a task executor, is either map or reduce ● HDFS: distributed filesystem (outside scope)
  • 20. Hadoop on Mesos: Challenges ● Availability: JobTracker must ensure adequate map and reduce slots are available for current & future jobs ● Capacity: how do you estimate capacity? How do you profile jobs? ● Optimization: general case, or specific cases? Per job resource allocation policies? Separate JobTrackers for different job types?
  • 21. Hadoop on Mesos: Challenges ○ Mesos reservations allow for reservation of slave resources for frameworks ○ Hadoop FairScheduler supports role fair sharing and task pre-emption within JobTracker ● Resource reservations: handling competing frameworks on the same cluster
  • 22. Hadoop on Mesos: Challenges Job Maps Reduces Duration Start 1 95 5 1h 0 2 5 100 1m 1m 3 10 10 30m 60m 4 50 0 20m 70m 5 100 5 1h 80m Maps Reduces 95 5 48 52 10 10 60 10 90 10 Job Flow With capacity for 100 slots A contrived example Maps Reduces 50 50 50 50 50 50 50 50 50 50 Ideal allocation Actual Hadoop
  • 23. Hadoop on Mesos: What we did ● Mesos Scheduler is a thin layer atop the Hadoop scheduler ● JobTracker launches TaskTrackers for each job, using either a fixed or variable slot policy ○ Fixed policy launches a fixed number of slots per TaskTracker ○ Variable policy attempts to launch an ideal number of TaskTrackers and slots based on job queue ● Task scheduling is left to the underlying scheduler (i.e., Hadoop FairScheduler)
  • 24. Suggested key configuration values Hadoop on Mesos: How we did it Name Value mapred.tasktracker.map.tasks.maximum 50 mapred.tasktracker.reduce.tasks.maximum 50 mapred.mesos.slot.map.minimum 1000 mapred.mesos.slot.reduce.minimum 1000 mapred.mesos.scheduler.policy.fixed false mapred.mesos.slot.cpus 0.95 mapred.mesos.slot.mem 1550
  • 25. ● Engineering & analytics departments use Hive, Pig, Cascading and other tools on Hadoop: ○ Building search indices ○ Pricing suggestion system ○ Trust & safety, fraud detection ○ Business analytics ● Dealing with hypergrowth Case study: Airbnb
  • 26. ● Had previously been using EMR, Amazon's managed Hadoop as a service ● EMR suffers from: ○ limited Hive/Pig features ○ feature lag ○ inability to patch or modify Hadoop ● Data infrastructure was prone to error due to significant complexity ○ EMR clusters would be spun up & destroyed every week ○ accessing Hadoop required strange SSH 'hopping' Case study: Airbnb, yesterday
  • 27. Case study: Airbnb, today ● We run Chronos, Hadoop, and Storm on Mesos now ● Finished complete migration to Mesos from EMR (June 2013) ● ~500 Chronos jobs ● ~20TiB of daily Hive data, ~1-2PiB of archived data
  • 28. ● Data availability: all time high ● Eng. & analytics customer satisfaction through the roof Case study: Airbnb, today
  • 31. Next steps ● Locality awareness ● HDFS on Mesos ● HA JobTracker ● JobTracker on Mesos
  • 32. Links ● The code: https://github.com/airbnb/mesos ● Airbnb Engineering Blog: http://nerds.airbnb. com/ ● My other stuff: https://github. com/brndnmtthws brenden@diddyinc.com brenden.matthews@airbnb.com