1. Amazon War Stories
Avoiding casualties, collateral damage, and
being outflanked on the road to world domination
Jon Gallagher, CEO/CTO
Nube de Helado Software, Inc.
jon@nubedehelado.com
http://NubeDeHelado.com
619-318-5999
@JonGal @nubedehelado
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
2. Your Allies
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
3. The 3 Elements of Systems
Management
Reactive Management – Managing to Stop Losing Money
Proactive Management – Managing to Save Money
Strategic Management – Managing to Make Money
You should be doing all three all the time, but your emphasis will
change as time goes on.
S
R P S R R P
P S R P
S
System Design System Beta System Rollout World Dominance
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
4. So What’s the Problem?
BRILLIANC
E!
Inevitable, Overwhelming, and
Complete Success!
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
5. System Design War Stories
Business Issues
◦ Construct your relationship with your
provider like any major vendor
relationship (e.g., your banker)
◦ Budgeting/cost control
Operational Issues
◦ Who’s doing what, when are they doing
it, why are they doing it, and where?
Technical Issues
◦ System configuration skews
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
6. System Design Casualty
Prevention
Business Issues – 7 P’s
◦ Create your business relationship
◦ Create your system diagram
◦ Map the AWS infrastructure to your diagram
Operational Issues – 5 W’s
◦ Structure your access to AWS with their Identity and
Access Manager ( IAM)
◦ Create a corporate account
◦ Use IAM to structure who is allowed to do what
Technical Issues – The How
◦ Use AWS CloudFormation to build your systems
◦ Create your development infrastructure in
CloudFormation
◦ Test and deploy from CloudFormation
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
7. The Instagram-ish Growth Path
http://speakerdeck.com/u/mikeyk/p/scaling-instagram
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
8. System Beta War Stories
Business Issues
◦ Understanding all the costs
(money, time, etc.)
◦ Modeling the business
Operational Issues
◦ Monitoring the systems
◦ Using the feedback
Technical Issues
◦ Security
◦ Configuration
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
9. System Beta Collateral Damage
Prevention
Business Issues
◦ Build a cost model of each type of service
you use
◦ Understand what you are delivering
Operational Issues
◦ Use the monitoring inherent in every service
Technical Issues
◦ Use your security groups—and use them
wisely
◦ Test yourself for vulnerabilities
◦ Assume the worst
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
10. System Rollout War Stories
Business Issues
◦ Are your systems delivering the promise you’re
marketing?
◦ Never pay for marginal costs with fixed dollars
Operational Issues
◦ SLAs
◦ Scalability
◦ Survivability
Technical Issues
◦ SLAs
◦ Scalability
◦ Survivability
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
11. System Rollout – Guarding Your
Flanks
Business Issues
◦ What does your customer want out of the system and what are
they getting?
◦ Are you building barriers to entry?
Operational Issues
◦ Does your reporting include the information needed to protect the
company’s advantage?
◦ Are you collecting information that can answer questions about
the user’s experience?
◦ Are you forecasting system usage? The effect of outages?
Testing disaster scenarios?
Technical Issues
◦ Do you have monitoring for system responsiveness and
capabilities?
◦ Can you meet the explicit and implicit SLAs your company
makes?
◦ Can you recover from outages in critical components? Do you
have outage plans, communication strategies, and escalation
procedures?
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
12. World Domination War Stories
Business Issues
◦ When is it time to leave AWS?
◦ When is it time to go back to AWS?
Operational Issues
◦ Eliminate the job you started with
Technical Issues
◦ Plan for obsolescence
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
13. World Domination
Business Issues
◦ Is it cheaper to build now, rather than
buy?
Facebook
(blog.facebook.com/blog.php?post=262655797
130)
◦ Are you big enough to make the best
deal?
Netflix
(http://techblog.netflix.com/2010/12/5-lessons-
weve-learned-using-aws.html)
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
14. World Domination (cont.)
Operational Issues
◦ Have you spun off all non-essential tasks?
IT Infrastructure
Technical Issues
◦ What are your dependencies?
◦ What are the technical trends?
◦ What is everyone else using?
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
15. Global Domination – Temporary
for the Ill-Preprepared
Jon Gallagher • jon@nubedehelado.com • 619-318-5999 • http://NubeDeHelado.com
Editor's Notes
It’s a cliché to say that business is like war, quoting Sun Tzu. But warfare is a useful model to check yourself against. Precept such as “no plan survives first contact with the customer”, etc.
----- Meeting Notes (4/13/12 11:34) -----Business Issues: Put as much care in constructiong your business relationship with AMZ or any other provider as you did in creating the business. Sole proprietor? ok, use your own card. LLC, or Corp? create and use the entity's credit facilities. Partnership? you're on your own.Operation Issues:Who is in charge? – Every hour that ticks by the AMZ cash register rings. Use IAM to establish the rights and responsibilties of everyone who has access to the AWS resources.. Also, never ever let marketing do anything but look at pretty pictures (Story about Kelly picking the biggest EC2 instance)What are we running? Where are we running it?When are we running it?Why are we running itTechnical Issues –Use CloudFormation to create and document your systems. Use it to upgrade and document your systems. (Supplyframe ‘s critical ad server still running on ephemeral storage)
Rough story of how Instagram grew. Started with one hard server
Technical Issues – Understand what your need for security is, and what your commitment is. Obviously things like HIPAA, PIC, etc. are your guidelines, but what if you’re a scrapbooking site? You need to test yourself before the script kiddies do.
Understanding your customer expereince is key to making sure you don’t lose customers to competitors, or even initecompetors in the first place. Supplyframe struggling to provide ads quickly to China, having to own the experience. Customers will quietly slip away.Technical/Operational Issues –It doesn’t matter that it has never happened before, you should still prepare for it. Netflix Chaos monkey.More importantly, you should be the one to know there are problems before the customer does. Supplyframe escalation broken, email from CEO frequently the first sign of an outage.