Zero to Production in Crazy Time: Adobe’s Transformation


Published on

Adobe has quickly scaled from nothing to a huge presence in the AWS cloud.

This is the story from the trenches: how we screwed up, learned and evolved our use of Chef to help get us to today. Taming Chef to work in the AWS cloud while trying to build a platform at a large scale was not as easy as we originally planned, and we’re consistently trying to make it better. We’ll share some tips and tricks from our experience.

Published in: Technology, Self Improvement
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Zero to Production in Crazy Time: Adobe’s Transformation

  1. 1. Zero to Prod in Crazy TimeJohn Martinez | Adobe Cloud Services
  2. 2. About Me• Currently working as a Cloud Operations Engineer at Adobe• I get to figure out new stuff, and make really old stuff work in AWS• 20+ years doing UNIX/Linux work• Learned about cloud computing at Netflix• Working at Adobe feeds my habit - photography
  3. 3. About Ops PeopleSome people see us as Ninjas, I really see us as Storm Troopers
  4. 4. Cloud Platforms @ Adobe• Creative Cloud• Marketing Cloud• Digital Publishing Suite• Phonegap• Typekit•• Echosign• Revel• ...and growing...
  5. 5. How We Got Started• Creative Cloud went live in late April 2012• AWS from the start• We needed to do SOMETHING• Yes, it was really that scientific of a decision• Chef vs. Puppet• That learning curve
  6. 6. #EPICFAIL #1• Not socializing the need for Chef to the dev team• Once sold, keep momentum going• The “let’s make this more complicated than it needs to be syndrome”• Start with easy stuff first, then graduate• Ops guy admits: the dev people know how to use softwareengineering methods for creating and maintaining infrastructure code:USE IT
  7. 7. Tweaking Knobs• EC2 AMIs: bake or configure?• Baking positive: fast boot times• Baking negative: too static• Configure positive: very dynamic• Configure negative: can take forever to boot• We settled on a mostly dynamic configuration, with some static baking• knife-ec2 is great, but what about autoscale?• The CloudFormation connection
  8. 8. #EPICFAIL #2• Get Chef, don’t actually use it• Back to that learning curve (Hint:Training)• Issue with compressed timelines and small staff• In the heat of deploying prod, doing stupid things• Losing track of what got deployed where• Who’s doing what?• Not sleeping sucks
  9. 9. Out of the Rubble• Now that we’re live: refactor time (a.k.a. Fix all the broken stuff)• Chef development for reals• OMG:WINDOWS?!?!• Not a lot of expertise in-house or outside• Ops guy admits: learned to love dev tools like Jenkins and Git
  10. 10. It’s Alive!• Did gradually over time• Started with simple recipes, graduated to more complicated ones• Using Environments to deploy the right thing in the right place• It’s AWS stupid: you SHOULD kill your instances• CloudFormation to AutoScale to Chef Client
  11. 11. It’s Alive (v1)EC2InstancesS3 Bucket(validatorkey)CloudFormationAutoScaleGroupHosted11. knife uploadCookbooksEnvironmentRolesData bags2 3400. ManualEditor (vi)Perforcecfn-create-stack4. Chef ClientBootstrapData Bag KeyRecipes
  12. 12. More Automation (v2)EC2InstancesS3 Bucket(validatorkey)CloudFormationAutoScaleGroupHosted11. knife uploadCookbooksEnvironmentRolesData bags2 3400. AutomatedGitJenkinsJenkins CFN4. Chef ClientBootstrapData Bag KeyRecipes
  13. 13. On Bootstrapping EC2 Instances• Biggest issue with Chef in AWS: straying from knife-ec2• Read the bootstrap document and reverse engineer it••• user-data is your friend• Use it for node identity• Resist the devil: don’t send any API keys or passwords or embarrassing things via user-data!!!• Windows works this way, too, but learn PowerShell
  14. 14. #EPICFAIL #3Oh crap, Opscode is DOWN!!!
  15. 15. #EPICFAIL #3• Failing to architect for failure (double BAM)• Even though we built a hot AWS architecture, we still got bit• What does it mean when Hosted Chef is down for us?• Talk to Opscode...really, talk to them, they want to help
  16. 16. How We’re Trying to Improve• Mostly around availability• Augment Hosted Chef with Private Chef• Mostly around security• Use the tools at your disposal• IAM policies for EC2 roles and S3 bucket security• Mostly around performance• Refactoring AWS-related code to use AWS SDK for Ruby• AMI factory from base Amazon Linux or Ubuntu AMIs (bonus points for Windows)
  17. 17. The End• Operational scripts, template examples and other bits•• Contact me:• @johnmartinez•• Questions? Suggestions? Come talk to me after!