Data & Infrastructure
Brenden Matthews
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/airbnb-data-infrastructure

InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Alternative Titles

● Datacentres of the future
● Building HA infrastructure
● Building automated HA infrastructure
● Data & Infrastructure
A Quick Survey
● Google Borg
● Google MapReduce
A Quick Survey
● Google Borg
● Google MapReduce
● Apache Hadoop
A Quick Survey
● Google Borg
● Google MapReduce
● Apache Hadoop
● Apache Mesos
A Quick Survey
● Google Borg
● Google MapReduce
● Apache Hadoop
● Apache Mesos
○
○
○
○

Chronos
Marathon
Storm
Apache Aurora (incubator)
Apache Mesos
Distributed computing platform
Or, a distributed operating system
Apache Mesos
●
●
●
●

Master/slave architecture
One master elected among
themselves
Most of the state is contained in
the slaves themselves
Master doesn’t do much:
○ Manages resources
○ Acts as a go-between for
slaves and frameworks

Master

Master

Slave
Slave
Slave
Slave

ZooKeeper
ZooKeeper
ZooKeeper
Apache Mesos: Components
●

●

●

●

libprocess
○ Components communicate using async messaging
○ Messages are immutable; internals easily parallelized
Master
○ Offers slave resources to frameworks
○ Launches tasks on slaves for accepted offers
○ Forwards status messages between tasks and frameworks
○ Task reconciliation for frameworks
Slave
○ Monitors individual tasks, reports status to master
○ Performs resource monitoring on tasks
○ Ensures tasks don’t exceed resource limits (cgroups)
Framework (i.e., your application)
○ Receives resource offers from master
○ Launches tasks for acceptable offers
Apache Mesos: Slave Detail
●
●
●
●
●

●

Slaves are configured with a resource
policy
Slaves execute tasks, which are submitted
by frameworks
Task resource limits are enforced with
cgroups
Tasks that exceed memory limit will be
killed (OOM’d)
Resources:
○ CPU, mem, ports (‘standard’)
○ network, and user defined parameters
Recovery: slaves can be restarted without
killing tasks (cool!)

Framework

CPU

Memory

Share

Chronos

1

1

3%

Storm

5

5

15%

Marathon

16

30

50%

*

32

60

100%
Apache Mesos: Framework Detail
●
●
●
●
●
●

Frameworks are applications that run on Mesos
The framework runs as a separate process, either on it’s own or as
a Mesos task itself (more on this later)
Frameworks must decide whether resource offers are sufficient
before launching a task
Once tasks are launched, frameworks must wait for status updates
and monitor the state of tasks
Task state can be reconciled with the Mesos master
Framework state may be stored using the Mesos State API (a keyvalue store)
Apache Mesos: Framework Detail
A sample resource offer
--id: 201310221926-2276627466-5050-24060-52872
framework_id: 201310152336-200446986-5050-29272-0000
slave_id: 201310182038-2276627466-5050-2945-0
hostname: i-babc911a
resources:
ports:
range:
begin: 31002
Type
end: 32000
role: *
CPUs
cpus:
value: 16
Memory
role: marathon
mem:
value: 30720
Ports
role: marathon
slave_load_hint: 0.53

Value

Role

16

Marathon

30GiB

Marathon

[31002,32000]

*
Apache Mesos: Framework Detail
Resource offer handling sample in JavaScala
public void resourceOffers(SchedulerDriver schedulerDriver,

continued…

List<Offer> offers) {
for (offer <- offers) { // this is actually Scala

final boolean sufficient = computeSlots();
if (!sufficient) {

// Launch TaskTrackers to satisfy the slot requirements.

schedulerDriver.declineOffer(offer.getId());

// Pull out the cpus, memory, disk, and 2 ports from the
offer.

continue;

for (Resource resource : offer.getResourcesList()) {

}

if (resource.getName().equals("cpus")

schedulerDriver.launchTasks(offer.getId(),

&& resource.getType() == Value.Type.SCALAR) {
cpus = resource.getScalar().getValue();
cpuRole = resource.getRole();
} else if (resource.getName().equals("mem")
&& resource.getType() == Value.Type.SCALAR) {
mem = resource.getScalar().getValue();
memRole = resource.getRole();
} else if (resource.getName().equals("disk")
&& resource.getType() == Value.Type.SCALAR) {
//...

Arrays.asList(info));
}
Apache Mesos: Framework Detail
●
●

:(

Writing frameworks is not for everyone! (it’s a bit tricky)
Frameworks like Marathon and Apache Aurora make it possible to
write applications atop Mesos without having to worry about Mesos
Apache Mesos: Framework Detail
●
●

Writing frameworks is not for everyone! (it’s a bit tricky)
Frameworks like Marathon and Apache Aurora make it possible to
write applications atop Mesos without having to worry about Mesos

●
●

The Mesos framework ecosystem is alive and well!
A quadfecta of frameworks cover most use cases:
○ Hadoop - batch processing
○ Storm - stream processing
○ Chronos - task scheduling
○ Marathon or Aurora - long running services
Frameworks: Hadoop
● Hadoop on Mesos behaves like any other
Hadoop (except, perhaps, YARN)
● Code lives at https://github.
com/mesos/hadoop
Frameworks: Storm
● Storm is a distributed stream processing
framework
● ‘doing for realtime processing what Hadoop
did for batch processing’ — Nathan Marz
● Storm runs on Mesos at Twitter, but does
not ship with a Mesos scheduler
● Code lives at https://github.
com/brndnmtthws/storm
Frameworks: Chronos
● Chronos is a task scheduler that runs on
Mesos
● Could be thought of as ‘distributed cron on
Mesos’
● Code lives at https://github.
com/airbnb/chronos
Frameworks: Apache Aurora
● Aurora is a service framework developed at
Twitter - a significant portion of Twitter’s
infrastructure runs atop Aurora
● Aurora was announced as an Apache
Incubator project on Oct 1st, 2013
● Code lives at https://github.
com/twitter/aurora
Frameworks: Marathon
● Marathon is a framework for running
services on Mesos, similar to Aurora
● Marathon can be thought of as a meta
framework (more on this later)
● Project was created by many of the folks
behind Chronos
● Code lives at https://github.
com/mesosphere/marathon
Marathon
Marathon as a Meta-Framework
● Marathon is designed to run tasks and
guarantee they stay running
● Why not run Marathon on top of itself in
addition to other frameworks?
● Frameworks like Hadoop and Chronos can
be run atop Marathon today
Let’s talk about what this means
High Availability
● Slaves execute tasks, and the slaves
themselves are independent of each other
● You may run frameworks as tasks on slaves
● A high availability cluster might consist of
having 1 or more Mesos masters, in addition
to frameworks, running as Mesos tasks
High Availability
Typical Mesos cluster
●
2 masters, 1 elected
●
2 instances of framework A,
1 elected

Master

Slave

T

T

T

Master

Slave

Framework A

T

T

T

Slave

Framework A

T

T

T
High Availability
HA Mesos cluster w/ Marathon
●
3 masters, 1 elected
●
3 instances of framework A,
1 elected

Master

Slave

T

T

T

Master

Slave

Framework A

T

T

T

Slave

Framework A

T

T

T
High Availability
HA Mesos cluster w/ Marathon
●
3 masters, 1 elected
●
3 instances of framework A,
1 elected

Master

Slave

T

T

T

Master

Master

Slave

Framework A

T

T

T

Slave

Framework A

T

T

T

Framework A
High Availability
● Split cluster across datacentres
○ us-east-1a
○ us-east-1b
○ us-east-1e

● Replication factor of 3 with rack awareness
reduces sleepless nights
Automated Infrastructure
● Every machine is exactly the same! (except
masters)
● Maintenance becomes as simple as
start/stopping slaves
● Application experts have greater control over
deployment, without the need for worrying
about resources
Seeing is believing
Airpad
● A small ruby library for deploying
applications (i.e., services) on Mesos with
Marathon
● Depends upon SmartStack, Airbnb’s service
discovery tool
Airpad
● Things we run (experimentally) with Airpad
○
○
○
○
○
○
○

Kafka
Cassandra
Presto
Chronos
Marathon
Hadoop JobTracker
Other internal tools
Airpad Demonstration
Other Lessons I’ve Learned
● Figure out how to manage state early on
○ Depend upon replicated services (Cassandra, Kafka,
HDFS)
○ Use replicated storage (S3, HDFS)
○ Create backups and restore processes

● Better to over-provision than under-provision
○ It’s easier to scale up than scale down
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/airbnbdata-infrastructure

Data & Infrastructure at Airbnb

  • 1.
  • 2.
    Watch the videowith slide synchronization on InfoQ.com! http://www.infoq.com/presentations /airbnb-data-infrastructure InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3.
    Presented at QConSan Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4.
    Alternative Titles ● Datacentresof the future ● Building HA infrastructure ● Building automated HA infrastructure ● Data & Infrastructure
  • 5.
    A Quick Survey ●Google Borg ● Google MapReduce
  • 6.
    A Quick Survey ●Google Borg ● Google MapReduce ● Apache Hadoop
  • 7.
    A Quick Survey ●Google Borg ● Google MapReduce ● Apache Hadoop ● Apache Mesos
  • 8.
    A Quick Survey ●Google Borg ● Google MapReduce ● Apache Hadoop ● Apache Mesos ○ ○ ○ ○ Chronos Marathon Storm Apache Aurora (incubator)
  • 9.
    Apache Mesos Distributed computingplatform Or, a distributed operating system
  • 10.
    Apache Mesos ● ● ● ● Master/slave architecture Onemaster elected among themselves Most of the state is contained in the slaves themselves Master doesn’t do much: ○ Manages resources ○ Acts as a go-between for slaves and frameworks Master Master Slave Slave Slave Slave ZooKeeper ZooKeeper ZooKeeper
  • 11.
    Apache Mesos: Components ● ● ● ● libprocess ○Components communicate using async messaging ○ Messages are immutable; internals easily parallelized Master ○ Offers slave resources to frameworks ○ Launches tasks on slaves for accepted offers ○ Forwards status messages between tasks and frameworks ○ Task reconciliation for frameworks Slave ○ Monitors individual tasks, reports status to master ○ Performs resource monitoring on tasks ○ Ensures tasks don’t exceed resource limits (cgroups) Framework (i.e., your application) ○ Receives resource offers from master ○ Launches tasks for acceptable offers
  • 12.
    Apache Mesos: SlaveDetail ● ● ● ● ● ● Slaves are configured with a resource policy Slaves execute tasks, which are submitted by frameworks Task resource limits are enforced with cgroups Tasks that exceed memory limit will be killed (OOM’d) Resources: ○ CPU, mem, ports (‘standard’) ○ network, and user defined parameters Recovery: slaves can be restarted without killing tasks (cool!) Framework CPU Memory Share Chronos 1 1 3% Storm 5 5 15% Marathon 16 30 50% * 32 60 100%
  • 13.
    Apache Mesos: FrameworkDetail ● ● ● ● ● ● Frameworks are applications that run on Mesos The framework runs as a separate process, either on it’s own or as a Mesos task itself (more on this later) Frameworks must decide whether resource offers are sufficient before launching a task Once tasks are launched, frameworks must wait for status updates and monitor the state of tasks Task state can be reconciled with the Mesos master Framework state may be stored using the Mesos State API (a keyvalue store)
  • 14.
    Apache Mesos: FrameworkDetail A sample resource offer --id: 201310221926-2276627466-5050-24060-52872 framework_id: 201310152336-200446986-5050-29272-0000 slave_id: 201310182038-2276627466-5050-2945-0 hostname: i-babc911a resources: ports: range: begin: 31002 Type end: 32000 role: * CPUs cpus: value: 16 Memory role: marathon mem: value: 30720 Ports role: marathon slave_load_hint: 0.53 Value Role 16 Marathon 30GiB Marathon [31002,32000] *
  • 15.
    Apache Mesos: FrameworkDetail Resource offer handling sample in JavaScala public void resourceOffers(SchedulerDriver schedulerDriver, continued… List<Offer> offers) { for (offer <- offers) { // this is actually Scala final boolean sufficient = computeSlots(); if (!sufficient) { // Launch TaskTrackers to satisfy the slot requirements. schedulerDriver.declineOffer(offer.getId()); // Pull out the cpus, memory, disk, and 2 ports from the offer. continue; for (Resource resource : offer.getResourcesList()) { } if (resource.getName().equals("cpus") schedulerDriver.launchTasks(offer.getId(), && resource.getType() == Value.Type.SCALAR) { cpus = resource.getScalar().getValue(); cpuRole = resource.getRole(); } else if (resource.getName().equals("mem") && resource.getType() == Value.Type.SCALAR) { mem = resource.getScalar().getValue(); memRole = resource.getRole(); } else if (resource.getName().equals("disk") && resource.getType() == Value.Type.SCALAR) { //... Arrays.asList(info)); }
  • 16.
    Apache Mesos: FrameworkDetail ● ● :( Writing frameworks is not for everyone! (it’s a bit tricky) Frameworks like Marathon and Apache Aurora make it possible to write applications atop Mesos without having to worry about Mesos
  • 17.
    Apache Mesos: FrameworkDetail ● ● Writing frameworks is not for everyone! (it’s a bit tricky) Frameworks like Marathon and Apache Aurora make it possible to write applications atop Mesos without having to worry about Mesos ● ● The Mesos framework ecosystem is alive and well! A quadfecta of frameworks cover most use cases: ○ Hadoop - batch processing ○ Storm - stream processing ○ Chronos - task scheduling ○ Marathon or Aurora - long running services
  • 18.
    Frameworks: Hadoop ● Hadoopon Mesos behaves like any other Hadoop (except, perhaps, YARN) ● Code lives at https://github. com/mesos/hadoop
  • 19.
    Frameworks: Storm ● Stormis a distributed stream processing framework ● ‘doing for realtime processing what Hadoop did for batch processing’ — Nathan Marz ● Storm runs on Mesos at Twitter, but does not ship with a Mesos scheduler ● Code lives at https://github. com/brndnmtthws/storm
  • 20.
    Frameworks: Chronos ● Chronosis a task scheduler that runs on Mesos ● Could be thought of as ‘distributed cron on Mesos’ ● Code lives at https://github. com/airbnb/chronos
  • 21.
    Frameworks: Apache Aurora ●Aurora is a service framework developed at Twitter - a significant portion of Twitter’s infrastructure runs atop Aurora ● Aurora was announced as an Apache Incubator project on Oct 1st, 2013 ● Code lives at https://github. com/twitter/aurora
  • 22.
    Frameworks: Marathon ● Marathonis a framework for running services on Mesos, similar to Aurora ● Marathon can be thought of as a meta framework (more on this later) ● Project was created by many of the folks behind Chronos ● Code lives at https://github. com/mesosphere/marathon
  • 23.
  • 24.
    Marathon as aMeta-Framework ● Marathon is designed to run tasks and guarantee they stay running ● Why not run Marathon on top of itself in addition to other frameworks? ● Frameworks like Hadoop and Chronos can be run atop Marathon today
  • 25.
    Let’s talk aboutwhat this means
  • 26.
    High Availability ● Slavesexecute tasks, and the slaves themselves are independent of each other ● You may run frameworks as tasks on slaves ● A high availability cluster might consist of having 1 or more Mesos masters, in addition to frameworks, running as Mesos tasks
  • 27.
    High Availability Typical Mesoscluster ● 2 masters, 1 elected ● 2 instances of framework A, 1 elected Master Slave T T T Master Slave Framework A T T T Slave Framework A T T T
  • 28.
    High Availability HA Mesoscluster w/ Marathon ● 3 masters, 1 elected ● 3 instances of framework A, 1 elected Master Slave T T T Master Slave Framework A T T T Slave Framework A T T T
  • 29.
    High Availability HA Mesoscluster w/ Marathon ● 3 masters, 1 elected ● 3 instances of framework A, 1 elected Master Slave T T T Master Master Slave Framework A T T T Slave Framework A T T T Framework A
  • 30.
    High Availability ● Splitcluster across datacentres ○ us-east-1a ○ us-east-1b ○ us-east-1e ● Replication factor of 3 with rack awareness reduces sleepless nights
  • 31.
    Automated Infrastructure ● Everymachine is exactly the same! (except masters) ● Maintenance becomes as simple as start/stopping slaves ● Application experts have greater control over deployment, without the need for worrying about resources
  • 32.
  • 33.
    Airpad ● A smallruby library for deploying applications (i.e., services) on Mesos with Marathon ● Depends upon SmartStack, Airbnb’s service discovery tool
  • 34.
    Airpad ● Things werun (experimentally) with Airpad ○ ○ ○ ○ ○ ○ ○ Kafka Cassandra Presto Chronos Marathon Hadoop JobTracker Other internal tools
  • 35.
  • 36.
    Other Lessons I’veLearned ● Figure out how to manage state early on ○ Depend upon replicated services (Cassandra, Kafka, HDFS) ○ Use replicated storage (S3, HDFS) ○ Create backups and restore processes ● Better to over-provision than under-provision ○ It’s easier to scale up than scale down
  • 38.
    Watch the videowith slide synchronization on InfoQ.com! http://www.infoq.com/presentations/airbnbdata-infrastructure