Experiences from production 
Deployment, performance, failure 
David Mytton 
All Your Base - Oct 2014 
blog.serverdensity.com
David Mytton
serverdensity.com/allyourbase
Slides: twitter.com/davidmytton
Agenda 
● Architecture 
● Performance 
● Where to host? 
● Downtime 
● Preparation
Server Density Architecture
Server Density Architecture 
● ~100 servers - Ubuntu 12.04
Server Density Architecture 
● ~100 servers - Ubuntu 12.04 
● 50:50 virtual/dedicated
Server Density Architecture 
● ~100 servers - Ubuntu 12.04 
● 50:50 virtual/dedicated 
● 200TB/m processed data
Server Density Architecture 
● ~100 servers - Ubuntu 12.04 
● 50:50 virtual/dedicated 
● 200TB/m processed data 
● Nginx, Python, MongoDB
Server Density Architecture 
● ~100 servers - Ubuntu 12.04 
● 50:50 virtual/dedicated 
● 200TB/m processed data 
● Nginx, Python, MongoDB 
● Softlayer > 1TB RAM, 5TB SSDs
Two choices for deployment
Two choices for deployment 
● Virtualized 
● Bare metal
Advantages of virtualization 
● Easy to manage
Advantages of virtualization 
● Easy to manage 
● Fast boot
Advantages of virtualization 
● Easy to manage 
● Fast boot 
● Easier to resize/migrate
Advantages of virtualization 
● Easy to manage 
● Fast boot 
● Easier to resize/migrate 
● Templating/snapshots
Advantages of virtualization 
● Easy to manage 
● Fast boot 
● Easier to resize/migrate 
● Templating/snapshots 
● Containment
Disadvantages of virtualization 
● Another layer
Disadvantages of virtualization 
● Another layer 
● Hypervisor overhead
Disadvantages of virtualization 
● Another layer 
● Hypervisor overhead 
● Host contention
Disadvantages of virtualization 
● Another layer 
● Hypervisor overhead 
● Host contention 
● i/o performance
Advantages of bare metal 
● Dedicated resources
Advantages of bare metal 
● Dedicated resources 
● Direct access to hardware
Advantages of bare metal 
● Dedicated resources 
● Direct access to hardware 
● Customisable specs
Advantages of bare metal 
● Dedicated resources 
● Direct access to hardware 
● Customisable specs 
● Performance
Disadvantages of bare metal 
● Build/deploy time
Disadvantages of bare metal 
● Build/deploy time 
● More difficult to resize
Disadvantages of bare metal 
● Build/deploy time 
● More difficult to resize 
● Difficult to migrate/snapshot
Disadvantages of bare metal 
● Build/deploy time 
● More difficult to resize 
● Difficult to migrate/snapshot 
● Capex/lifetime
Performance problems?
Performance problems? 
Easy answer: move to bare metal!
Key performance factors 
● Network
Key performance factors 
● Network 
● EC2: Cluster compute, 
high memory, high i/o, 
high storage 
● GCE: Higher CPU 
instances
Key performance factors 
● Network
Key performance factors 
● Network 
Location Ping RTT Latency 
Within USA 40-80ms 
Trans-Atlantic 100ms 
Trans-Pacific 150ms 
Europe-Japan 300ms
Networking performance 
AWS 
GCE 
bit.ly/googlevsamazon
Key performance factors 
● Memory
http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
Key performance factors 
● Memory is expensive
Key performance factors 
● Disk 
● SSDs!
Key performance factors 
● Disk 
● SSDs! 
GCE: 256GB = $83.20/m 
EC2: 256GB = $35.32/m 
SL: 200GB = $81/m
Why cloud? 
● Flexible
Why cloud? 
● Flexible 
● Unlimited resources
Why cloud? 
● Flexible 
● Unlimited resources 
● Cheap to get started
Why cloud? 
● Flexible 
● Unlimited resources 
● Cheap to get started 
● Other products
Why colo?
Why colo? 
● Vastly cheaper
Why colo? 
● Vastly cheaper 
● Complete control
Let’s talk about downtime
2013 Spend: ~$5bn
2013 Spend: ~$6bn
2013 Spend: ~$4bn
How much do you spend? 
You will have downtime
Preparation
Preparation - On Call 
● Rotations
Preparation - On Call 
● Rotations 
● Off call
Preparation - On Call 
● Rotations 
● Off call 
● Reachability - Train, 3G/4G 
(edge?!), Do Not Disturb mode, 
system updates 
● Work the next day?
Preparation - On Call 
● Rotations 
● Off call 
● Reachability - Train, 3G/4G 
(edge?!), Do Not Disturb mode, 
● Work system the updates 
next day? 
● Work the next day?
Preparation - Documentation
Preparation - Documentation 
● Searchable
Preparation - Documentation 
● Searchable 
● Easy to edit
Preparation - Documentation 
● Searchable 
● Easy to edit 
● Independent of your infrastructure
Preparation - Documentation 
● Searchable 
● Easy to edit 
● Independent of your infrastructure 
● Up to date
Unexpected failures
Unexpected failures 
● Communication systems
Unexpected failures 
● Communication systems 
● Network connectivity
Unexpected failures 
● Communication systems 
● Network connectivity 
● Access to support
ALERT!
ALERT! 
1. Load up incident response checklist
ALERT! 
1. Load up incident response checklist 
2. Log incident in JIRA
ALERT! 
1. Load up incident response checklist 
2. Log incident in JIRA 
3. Log into Ops War Room
ALERT! 
1. Load up incident response checklist 
2. Log incident in JIRA 
3. Log into Ops War Room 
4. Public status post
ALERT! 
1. Load up incident response checklist 
2. Log incident in JIRA 
3. Log into Ops War Room 
4. Public status post 
5. Initial investigation
Key response principles
Key response principles 
● Log everything
Key response principles 
● Log everything 
● Frequent public status updates
Key response principles 
● Log everything 
● Frequent public status updates 
● Gather the team
Key response principles 
● Log everything 
● Frequent public status updates 
● Gather the team 
● Escalate!
Summary 
● Architecture 
● Performance 
● Where to host? 
● Downtime 
● Preparation
どもありがとうございます 
@davidmytton 
david@serverdensity.com 
blog.serverdensity.com 
serverdensity.com/allyourbase

Experiences from DevOps production: Deployment, performance, failure.