Adobe has quickly scaled from nothing to a huge presence in the AWS cloud.
This is the story from the trenches: how we screwed up, learned and evolved our use of Chef to help get us to today. Taming Chef to work in the AWS cloud while trying to build a platform at a large scale was not as easy as we originally planned, and we’re consistently trying to make it better. We’ll share some tips and tricks from our experience.
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Zero to Production in Crazy Time: Adobe’s Transformation
1. Zero to Prod in Crazy Time
John Martinez | Adobe Cloud Services
2. About Me
• Currently working as a Cloud Operations Engineer at Adobe
• I get to figure out new stuff, and make really old stuff work in AWS
• 20+ years doing UNIX/Linux work
• Learned about cloud computing at Netflix
• Working at Adobe feeds my habit - photography
5. How We Got Started
• Creative Cloud went live in late April 2012
• AWS from the start
• We needed to do SOMETHING
• Yes, it was really that scientific of a decision
• Chef vs. Puppet
• That learning curve
6. #EPICFAIL #1
• Not socializing the need for Chef to the dev team
• Once sold, keep momentum going
• The “let’s make this more complicated than it needs to be syndrome”
• Start with easy stuff first, then graduate
• Ops guy admits: the dev people know how to use software
engineering methods for creating and maintaining infrastructure code:
USE IT
7. Tweaking Knobs
• EC2 AMIs: bake or configure?
• Baking positive: fast boot times
• Baking negative: too static
• Configure positive: very dynamic
• Configure negative: can take forever to boot
• We settled on a mostly dynamic configuration, with some static baking
• knife-ec2 is great, but what about autoscale?
• The CloudFormation connection
8. #EPICFAIL #2
• Get Chef, don’t actually use it
• Back to that learning curve (Hint:Training)
• Issue with compressed timelines and small staff
• In the heat of deploying prod, doing stupid things
• Losing track of what got deployed where
• Who’s doing what?
• Not sleeping sucks
9. Out of the Rubble
• Now that we’re live: refactor time (a.k.a. Fix all the broken stuff)
• Chef development for reals
• OMG:WINDOWS?!?!
• Not a lot of expertise in-house or outside
• Ops guy admits: learned to love dev tools like Jenkins and Git
10. It’s Alive!
• Did gradually over time
• Started with simple recipes, graduated to more complicated ones
• Using Environments to deploy the right thing in the right place
• It’s AWS stupid: you SHOULD kill your instances
• CloudFormation to AutoScale to Chef Client
11. It’s Alive (v1)
EC2
Instances
S3 Bucket
(validator
key)
Cloud
Formation
Auto
Scale
Group
Hosted
1
1. knife upload
Cookbooks
Environment
Roles
Data bags
2 3
4
0
0. Manual
Editor (vi)
Perforce
cfn-create-stack
4. Chef Client
Bootstrap
Data Bag Key
Recipes
12. More Automation (v2)
EC2
Instances
S3 Bucket
(validator
key)
Cloud
Formation
Auto
Scale
Group
Hosted
1
1. knife upload
Cookbooks
Environment
Roles
Data bags
2 3
4
0
0. Automated
Git
Jenkins
Jenkins CFN
4. Chef Client
Bootstrap
Data Bag Key
Recipes
13. On Bootstrapping EC2 Instances
• Biggest issue with Chef in AWS: straying from knife-ec2
• Read the bootstrap document and reverse engineer it
• http://wiki.opscode.com/display/chef/Client+Bootstrap+Fast+Start+Guide
• http://wiki.opscode.com/display/chef/EC2+Bootstrap+Fast+Start+Guide
• user-data is your friend
• Use it for node identity
• Resist the devil: don’t send any API keys or passwords or embarrassing things via user-data!!!
• Windows works this way, too, but learn PowerShell
15. #EPICFAIL #3
• Failing to architect for failure (double BAM)
• Even though we built a hot AWS architecture, we still got bit
• What does it mean when Hosted Chef is down for us?
• Talk to Opscode...really, talk to them, they want to help
16. How We’re Trying to Improve
• Mostly around availability
• Augment Hosted Chef with Private Chef
• Mostly around security
• Use the tools at your disposal
• IAM policies for EC2 roles and S3 bucket security
• Mostly around performance
• Refactoring AWS-related code to use AWS SDK for Ruby
• AMI factory from base Amazon Linux or Ubuntu AMIs (bonus points for Windows)
17. The End
• Operational scripts, template examples and other bits
• https://github.com/Adobe-CloudOps
• Contact me:
• @johnmartinez
• martinez@adobe.com
• Questions? Suggestions? Come talk to me after!