Data & Infrastructure
Brenden Matthews
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/airbnb-data-infrastructure

I...
Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the sprea...
Alternative Titles

● Datacentres of the future
● Building HA infrastructure
● Building automated HA infrastructure
● Data...
A Quick Survey
● Google Borg
● Google MapReduce
A Quick Survey
● Google Borg
● Google MapReduce
● Apache Hadoop
A Quick Survey
● Google Borg
● Google MapReduce
● Apache Hadoop
● Apache Mesos
A Quick Survey
● Google Borg
● Google MapReduce
● Apache Hadoop
● Apache Mesos
○
○
○
○

Chronos
Marathon
Storm
Apache Auro...
Apache Mesos
Distributed computing platform
Or, a distributed operating system
Apache Mesos
●
●
●
●

Master/slave architecture
One master elected among
themselves
Most of the state is contained in
the ...
Apache Mesos: Components
●

●

●

●

libprocess
○ Components communicate using async messaging
○ Messages are immutable; i...
Apache Mesos: Slave Detail
●
●
●
●
●

●

Slaves are configured with a resource
policy
Slaves execute tasks, which are subm...
Apache Mesos: Framework Detail
●
●
●
●
●
●

Frameworks are applications that run on Mesos
The framework runs as a separate...
Apache Mesos: Framework Detail
A sample resource offer
--id: 201310221926-2276627466-5050-24060-52872
framework_id: 201310...
Apache Mesos: Framework Detail
Resource offer handling sample in JavaScala
public void resourceOffers(SchedulerDriver sche...
Apache Mesos: Framework Detail
●
●

:(

Writing frameworks is not for everyone! (it’s a bit tricky)
Frameworks like Marath...
Apache Mesos: Framework Detail
●
●

Writing frameworks is not for everyone! (it’s a bit tricky)
Frameworks like Marathon a...
Frameworks: Hadoop
● Hadoop on Mesos behaves like any other
Hadoop (except, perhaps, YARN)
● Code lives at https://github....
Frameworks: Storm
● Storm is a distributed stream processing
framework
● ‘doing for realtime processing what Hadoop
did fo...
Frameworks: Chronos
● Chronos is a task scheduler that runs on
Mesos
● Could be thought of as ‘distributed cron on
Mesos’
...
Frameworks: Apache Aurora
● Aurora is a service framework developed at
Twitter - a significant portion of Twitter’s
infras...
Frameworks: Marathon
● Marathon is a framework for running
services on Mesos, similar to Aurora
● Marathon can be thought ...
Marathon
Marathon as a Meta-Framework
● Marathon is designed to run tasks and
guarantee they stay running
● Why not run Marathon on...
Let’s talk about what this means
High Availability
● Slaves execute tasks, and the slaves
themselves are independent of each other
● You may run frameworks...
High Availability
Typical Mesos cluster
●
2 masters, 1 elected
●
2 instances of framework A,
1 elected

Master

Slave

T

...
High Availability
HA Mesos cluster w/ Marathon
●
3 masters, 1 elected
●
3 instances of framework A,
1 elected

Master

Sla...
High Availability
HA Mesos cluster w/ Marathon
●
3 masters, 1 elected
●
3 instances of framework A,
1 elected

Master

Sla...
High Availability
● Split cluster across datacentres
○ us-east-1a
○ us-east-1b
○ us-east-1e

● Replication factor of 3 wit...
Automated Infrastructure
● Every machine is exactly the same! (except
masters)
● Maintenance becomes as simple as
start/st...
Seeing is believing
Airpad
● A small ruby library for deploying
applications (i.e., services) on Mesos with
Marathon
● Depends upon SmartStack...
Airpad
● Things we run (experimentally) with Airpad
○
○
○
○
○
○
○

Kafka
Cassandra
Presto
Chronos
Marathon
Hadoop JobTrack...
Airpad Demonstration
Other Lessons I’ve Learned
● Figure out how to manage state early on
○ Depend upon replicated services (Cassandra, Kafka,
...
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/airbnbdata-infrastructure
Data & Infrastructure at Airbnb
Upcoming SlideShare
Loading in...5
×

Data & Infrastructure at Airbnb

2,303

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1exrtyR.

Brenden Matthews describes the infrastructure built at Airbnb using Mesos in order to support Hadoop and Storm. Filmed at qconsf.com.

Brenden Matthews is a software engineer at Airbnb on the data infrastructure team. He's the creator of Conky (a system monitor for X), an Apache commiter, and a free software enthusiast & advocate.

Published in: Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,303
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Data & Infrastructure at Airbnb

  1. 1. Data & Infrastructure Brenden Matthews
  2. 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /airbnb-data-infrastructure InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  3. 3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. Alternative Titles ● Datacentres of the future ● Building HA infrastructure ● Building automated HA infrastructure ● Data & Infrastructure
  5. 5. A Quick Survey ● Google Borg ● Google MapReduce
  6. 6. A Quick Survey ● Google Borg ● Google MapReduce ● Apache Hadoop
  7. 7. A Quick Survey ● Google Borg ● Google MapReduce ● Apache Hadoop ● Apache Mesos
  8. 8. A Quick Survey ● Google Borg ● Google MapReduce ● Apache Hadoop ● Apache Mesos ○ ○ ○ ○ Chronos Marathon Storm Apache Aurora (incubator)
  9. 9. Apache Mesos Distributed computing platform Or, a distributed operating system
  10. 10. Apache Mesos ● ● ● ● Master/slave architecture One master elected among themselves Most of the state is contained in the slaves themselves Master doesn’t do much: ○ Manages resources ○ Acts as a go-between for slaves and frameworks Master Master Slave Slave Slave Slave ZooKeeper ZooKeeper ZooKeeper
  11. 11. Apache Mesos: Components ● ● ● ● libprocess ○ Components communicate using async messaging ○ Messages are immutable; internals easily parallelized Master ○ Offers slave resources to frameworks ○ Launches tasks on slaves for accepted offers ○ Forwards status messages between tasks and frameworks ○ Task reconciliation for frameworks Slave ○ Monitors individual tasks, reports status to master ○ Performs resource monitoring on tasks ○ Ensures tasks don’t exceed resource limits (cgroups) Framework (i.e., your application) ○ Receives resource offers from master ○ Launches tasks for acceptable offers
  12. 12. Apache Mesos: Slave Detail ● ● ● ● ● ● Slaves are configured with a resource policy Slaves execute tasks, which are submitted by frameworks Task resource limits are enforced with cgroups Tasks that exceed memory limit will be killed (OOM’d) Resources: ○ CPU, mem, ports (‘standard’) ○ network, and user defined parameters Recovery: slaves can be restarted without killing tasks (cool!) Framework CPU Memory Share Chronos 1 1 3% Storm 5 5 15% Marathon 16 30 50% * 32 60 100%
  13. 13. Apache Mesos: Framework Detail ● ● ● ● ● ● Frameworks are applications that run on Mesos The framework runs as a separate process, either on it’s own or as a Mesos task itself (more on this later) Frameworks must decide whether resource offers are sufficient before launching a task Once tasks are launched, frameworks must wait for status updates and monitor the state of tasks Task state can be reconciled with the Mesos master Framework state may be stored using the Mesos State API (a keyvalue store)
  14. 14. Apache Mesos: Framework Detail A sample resource offer --id: 201310221926-2276627466-5050-24060-52872 framework_id: 201310152336-200446986-5050-29272-0000 slave_id: 201310182038-2276627466-5050-2945-0 hostname: i-babc911a resources: ports: range: begin: 31002 Type end: 32000 role: * CPUs cpus: value: 16 Memory role: marathon mem: value: 30720 Ports role: marathon slave_load_hint: 0.53 Value Role 16 Marathon 30GiB Marathon [31002,32000] *
  15. 15. Apache Mesos: Framework Detail Resource offer handling sample in JavaScala public void resourceOffers(SchedulerDriver schedulerDriver, continued… List<Offer> offers) { for (offer <- offers) { // this is actually Scala final boolean sufficient = computeSlots(); if (!sufficient) { // Launch TaskTrackers to satisfy the slot requirements. schedulerDriver.declineOffer(offer.getId()); // Pull out the cpus, memory, disk, and 2 ports from the offer. continue; for (Resource resource : offer.getResourcesList()) { } if (resource.getName().equals("cpus") schedulerDriver.launchTasks(offer.getId(), && resource.getType() == Value.Type.SCALAR) { cpus = resource.getScalar().getValue(); cpuRole = resource.getRole(); } else if (resource.getName().equals("mem") && resource.getType() == Value.Type.SCALAR) { mem = resource.getScalar().getValue(); memRole = resource.getRole(); } else if (resource.getName().equals("disk") && resource.getType() == Value.Type.SCALAR) { //... Arrays.asList(info)); }
  16. 16. Apache Mesos: Framework Detail ● ● :( Writing frameworks is not for everyone! (it’s a bit tricky) Frameworks like Marathon and Apache Aurora make it possible to write applications atop Mesos without having to worry about Mesos
  17. 17. Apache Mesos: Framework Detail ● ● Writing frameworks is not for everyone! (it’s a bit tricky) Frameworks like Marathon and Apache Aurora make it possible to write applications atop Mesos without having to worry about Mesos ● ● The Mesos framework ecosystem is alive and well! A quadfecta of frameworks cover most use cases: ○ Hadoop - batch processing ○ Storm - stream processing ○ Chronos - task scheduling ○ Marathon or Aurora - long running services
  18. 18. Frameworks: Hadoop ● Hadoop on Mesos behaves like any other Hadoop (except, perhaps, YARN) ● Code lives at https://github. com/mesos/hadoop
  19. 19. Frameworks: Storm ● Storm is a distributed stream processing framework ● ‘doing for realtime processing what Hadoop did for batch processing’ — Nathan Marz ● Storm runs on Mesos at Twitter, but does not ship with a Mesos scheduler ● Code lives at https://github. com/brndnmtthws/storm
  20. 20. Frameworks: Chronos ● Chronos is a task scheduler that runs on Mesos ● Could be thought of as ‘distributed cron on Mesos’ ● Code lives at https://github. com/airbnb/chronos
  21. 21. Frameworks: Apache Aurora ● Aurora is a service framework developed at Twitter - a significant portion of Twitter’s infrastructure runs atop Aurora ● Aurora was announced as an Apache Incubator project on Oct 1st, 2013 ● Code lives at https://github. com/twitter/aurora
  22. 22. Frameworks: Marathon ● Marathon is a framework for running services on Mesos, similar to Aurora ● Marathon can be thought of as a meta framework (more on this later) ● Project was created by many of the folks behind Chronos ● Code lives at https://github. com/mesosphere/marathon
  23. 23. Marathon
  24. 24. Marathon as a Meta-Framework ● Marathon is designed to run tasks and guarantee they stay running ● Why not run Marathon on top of itself in addition to other frameworks? ● Frameworks like Hadoop and Chronos can be run atop Marathon today
  25. 25. Let’s talk about what this means
  26. 26. High Availability ● Slaves execute tasks, and the slaves themselves are independent of each other ● You may run frameworks as tasks on slaves ● A high availability cluster might consist of having 1 or more Mesos masters, in addition to frameworks, running as Mesos tasks
  27. 27. High Availability Typical Mesos cluster ● 2 masters, 1 elected ● 2 instances of framework A, 1 elected Master Slave T T T Master Slave Framework A T T T Slave Framework A T T T
  28. 28. High Availability HA Mesos cluster w/ Marathon ● 3 masters, 1 elected ● 3 instances of framework A, 1 elected Master Slave T T T Master Slave Framework A T T T Slave Framework A T T T
  29. 29. High Availability HA Mesos cluster w/ Marathon ● 3 masters, 1 elected ● 3 instances of framework A, 1 elected Master Slave T T T Master Master Slave Framework A T T T Slave Framework A T T T Framework A
  30. 30. High Availability ● Split cluster across datacentres ○ us-east-1a ○ us-east-1b ○ us-east-1e ● Replication factor of 3 with rack awareness reduces sleepless nights
  31. 31. Automated Infrastructure ● Every machine is exactly the same! (except masters) ● Maintenance becomes as simple as start/stopping slaves ● Application experts have greater control over deployment, without the need for worrying about resources
  32. 32. Seeing is believing
  33. 33. Airpad ● A small ruby library for deploying applications (i.e., services) on Mesos with Marathon ● Depends upon SmartStack, Airbnb’s service discovery tool
  34. 34. Airpad ● Things we run (experimentally) with Airpad ○ ○ ○ ○ ○ ○ ○ Kafka Cassandra Presto Chronos Marathon Hadoop JobTracker Other internal tools
  35. 35. Airpad Demonstration
  36. 36. Other Lessons I’ve Learned ● Figure out how to manage state early on ○ Depend upon replicated services (Cassandra, Kafka, HDFS) ○ Use replicated storage (S3, HDFS) ○ Create backups and restore processes ● Better to over-provision than under-provision ○ It’s easier to scale up than scale down
  37. 37. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/airbnbdata-infrastructure

×