Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Tupperware:
Containerized Deployment at FB
Aravind Narayanan
aravindn@fb.com
DockerCon 2014
Scale makes everything harder
• Running single instance: easy
• Running at scale in production: messy and
complicated
Prov...
Time spent on
Application Logic
Time spent on getting app
to run in prod
Tupperware to the rescue
“This is my binary. Run it on X machines!”
• Engineer is hands-off
• Doesn’t need to worry about ...
Agenda
1. Architecture
2. Sandboxes
3. Ecosystem
4. Lessons learnt
Facebook Datacenters
PRN
SNC FRC
ASH
LLA
Terminology
• A DC has one or more clusters
• A cluster has multiple racks
• A rack has multiple machines
!
• A TW job is ...
Architecture
Scheduler
Host1 Host2 Host3
config.tw
twdeploy
Server DB
Start Task
BitTorrent-based binary
package store
Quot...
Machine “failures” / hour
Failover
Failover
Scheduler
Host1 Host2 Host3
Server DB
Start Task
QuoteService QuoteService
Heartbeat
BitTorrent-based binary
pack...
Painless Hardware maintenance
• Notify scheduler of impending operations
• Scheduler can preemptively move tasks
• Gracefu...
Expressive allocation policies
Production Host
MyBigJob
QuoteService
QuoteAggregator
Production Host
Top of Rack Switch
Ne...
TW Agent
Agent Helper
TW Agent process
Package Manager
Resource Manager
Task Manager API
Scheduler heartbeat
Agent Helper ...
Agent Helper process
Heartbeat with Agent, 	

prevents zombies
TW Agent process
Package Manager
Resource Manager
Task Mana...
Logging
Compress on the fly
Log Files
• Persistent logs
• Instantaneous rotation
Task A
Agent Helper
Log Catcher
stderr std...
Sandboxing
Initially, used chroots to contain processes
• No isolation
• Not secure
LinuXContainers
• As tech matured, we switched
• Separate process and file
namespaces, set up by Helper
• Mount required re...
Service permissions
• Every container runs sshd
• SSH directly into the container
• Regulate access
Production Host
TW Age...
Configuring the container
Resource limits
• CPU, RAM & disk limits
• Implemented with cgroups
• Agent handles memory limits
with cgroup notification ...
Resource Limits in action
watchdog-service - tw.mem.rss_bytes
Migrate from Chroots to
Containers
• No-op for most services
• But new namespaces posed problems
for some
• Major hurdle w...
Service Discovery
Scheduler
Host1 Host2 Host3
Server DB
Service Registry
QuoteService
QuoteService
Host1:12345 ALIVE
Host2...
Monitoring & Alerting
Alternatives to Tupperware
• Why not use Docker / CoreOS?
• They didn’t exist
• TW integrates with other FB systems
• Why ...
Lessons learnt
Releases are scary!
• Release often
• Dry runs
• Canaries are your friends
• Manage dependencies
Sane defaults
• Users shouldn’t have to read entire manual
• Choose what makes sense for most services
What went wrong?
• Hard to understand why TW did
something
• It’s not about “what went wrong”,
but “what should I do next?”
Tupperware
• Automated deployment
• Less work for engineers
• Containers for security
and isolation
• Increased efficiency
...
Upcoming SlideShare
Loading in …5
×

Tupperware: Containerized Deployment at FB

23,402 views

Published on

Tupperware: Containerized Deployment at FB

  1. 1. Tupperware: Containerized Deployment at FB Aravind Narayanan aravindn@fb.com DockerCon 2014
  2. 2. Scale makes everything harder • Running single instance: easy • Running at scale in production: messy and complicated Provision machines Distribute binaries Daemonize processMonitoring Failover Geo-distribution Machine decoms
  3. 3. Time spent on Application Logic Time spent on getting app to run in prod
  4. 4. Tupperware to the rescue “This is my binary. Run it on X machines!” • Engineer is hands-off • Doesn’t need to worry about machines in prod • Handles failover, when machines go bad • Efficient use of infrastructure • 300,000+ processes, spread over 15,000+ services
  5. 5. Agenda 1. Architecture 2. Sandboxes 3. Ecosystem 4. Lessons learnt
  6. 6. Facebook Datacenters PRN SNC FRC ASH LLA
  7. 7. Terminology • A DC has one or more clusters • A cluster has multiple racks • A rack has multiple machines ! • A TW job is equivalent to a service • A job has multiple tasks, each an instance of the service
  8. 8. Architecture Scheduler Host1 Host2 Host3 config.tw twdeploy Server DB Start Task BitTorrent-based binary package store QuoteService
  9. 9. Machine “failures” / hour Failover
  10. 10. Failover Scheduler Host1 Host2 Host3 Server DB Start Task QuoteService QuoteService Heartbeat BitTorrent-based binary package store
  11. 11. Painless Hardware maintenance • Notify scheduler of impending operations • Scheduler can preemptively move tasks • Graceful migration for stateless services • Stateful services may endure maintenance
  12. 12. Expressive allocation policies Production Host MyBigJob QuoteService QuoteAggregator Production Host Top of Rack Switch NetworkHogJob NetworkHogJob Job M Job N Job M Job N
  13. 13. TW Agent Agent Helper TW Agent process Package Manager Resource Manager Task Manager API Scheduler heartbeat Agent Helper Task B Agent Helper Task C Task A Production Host
  14. 14. Agent Helper process Heartbeat with Agent, prevents zombies TW Agent process Package Manager Resource Manager Task Manager API Scheduler heartbeat Agent Helper Task A
  15. 15. Logging Compress on the fly Log Files • Persistent logs • Instantaneous rotation Task A Agent Helper Log Catcher stderr stdout
  16. 16. Sandboxing Initially, used chroots to contain processes • No isolation • Not secure
  17. 17. LinuXContainers • As tech matured, we switched • Separate process and file namespaces, set up by Helper • Mount required resources directly into container • Secure & isolated
  18. 18. Service permissions • Every container runs sshd • SSH directly into the container • Regulate access Production Host TW Agent process Task Manager API Agent Helper Task A SSH X
  19. 19. Configuring the container
  20. 20. Resource limits • CPU, RAM & disk limits • Implemented with cgroups • Agent handles memory limits with cgroup notification API • Adaptive limits
  21. 21. Resource Limits in action watchdog-service - tw.mem.rss_bytes
  22. 22. Migrate from Chroots to Containers • No-op for most services • But new namespaces posed problems for some • Major hurdle was social, not technical
  23. 23. Service Discovery Scheduler Host1 Host2 Host3 Server DB Service Registry QuoteService QuoteService Host1:12345 ALIVE Host2:12345 ALIVE QuoteService Host2:12345 ALIVE Host1:12345 DEAD Start Task QuoteService QuoteService Host2:12345 ALIVE Host3:12345 ALIVE QuoteService Start Task
  24. 24. Monitoring & Alerting
  25. 25. Alternatives to Tupperware • Why not use Docker / CoreOS? • They didn’t exist • TW integrates with other FB systems • Why not use VMs? • Performance penalty • Hypervisor makes debugging harder
  26. 26. Lessons learnt
  27. 27. Releases are scary! • Release often • Dry runs • Canaries are your friends • Manage dependencies
  28. 28. Sane defaults • Users shouldn’t have to read entire manual • Choose what makes sense for most services
  29. 29. What went wrong? • Hard to understand why TW did something • It’s not about “what went wrong”, but “what should I do next?”
  30. 30. Tupperware • Automated deployment • Less work for engineers • Containers for security and isolation • Increased efficiency Questions? Time spent on getting app to run in prod Time spent on Application Logic

×