Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Don't Repeat Our Mistakes! Lessons Learned from Running Go Daddy's Private Cloud (OpenStack Queens Summit)

179 views

Published on

After years of running one of the largest OpenStack clouds, we've learned a thing or two. Early architectural decisions about networking, storage, and scaling have real and lasting consequences. We'll walk through some of these early decisions, some which turned out to be good, but many turned out to be bad. Also included are some strategies for thinking through the long term impacts, to help you avoid similar pitfalls for your cloud.

(Video at https://www.youtube.com/watch?v=LzIkTqfb1nI )

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Don't Repeat Our Mistakes! Lessons Learned from Running Go Daddy's Private Cloud (OpenStack Queens Summit)

  1. 1. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Don’t Repeat Our Mistakes! Lessons Learned from Running Go Daddy’s Private Cloud Kris Lindgren klindgren@godaddy.com Mike Dorman mike.dorman@sendgrid.com OpenStack Queens Summit, November 2017, Sydney
  2. 2. Copyright© 2016 GoDaddy Inc. All Rights Reserved. OpenStack at Go Daddy ● 2013: POC cloud (Havana) ● 2014: First production apps (Icehouse) ● 2014: Nova cells v1 (Kilo) ● 2015: “OpenStack everywhere” (Liberty) ● 2017: Working toward containerized services
  3. 3. Copyright© 2016 GoDaddy Inc. All Rights Reserved. OpenStack at Go Daddy ● What we built: ○ Shared nothing regions ○ Ephemeral disk on local storage ○ Simple networking ○ No live migration ○ Multiple AZ’s ● Scale ○ 1000’s Computes, >100,000 Cores ○ 10,000’s VM’s
  4. 4. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Avoiding “Accidental Architecture” Product Infrastructure & Scaling Management
  5. 5. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Private Cloud = Free Compute High Demand = Overconsumption Product - Need for Chargeback/Showback Free Compute = High Demand
  6. 6. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Product - Have a Cohesive Vision • Which OpenStack Services/features • User onboard/off-boarding • Patching cadences/methodology • Legacy integrations • Adding capacity • SLAs • How do end users “consume” OpenStack? • Procedure for changing the vision • Helps with cloud paradigm shift • Expect and tolerate failure
  7. 7. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Product Issues - How to Avoid • Manage expectations (for yourself and for users) • Showback and controls around quota • Education and evangelism • Docs and sample code • “Cloud ready” early adopters • Ongoing guidance 1.Cloud 2.?????? 3.Profit!X
  8. 8. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Scaling - Nova Cells (v1) Justification • Assumed we would grow fast • Challenges with scaling Nova/RMQ • Easier earlier than later • Ongoing debt to manage patches • Cells v2 was coming soon http://www.dorm.org/blog/converting-to-openstack-nova-cells-without-destroying-the-world/
  9. 9. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Scaling - Nova Cells (v1) Retrospective Good • Helped us to scale • Gained expertise with Nova • Community street cred Bad • No scaling for Neutron • Patches get more difficult • Non-standard config • Delays on v2 • Migration to v2 is unknown 20/20 Hindsight • Scale/shard RMQ instead • Aspirations about scale • Porting patches is top blocker
  10. 10. Copyright© 2017 GoDaddy Inc. All Rights Reserved. • Colocated API services and RMQ • (Except Glance) • Dedicated hardware overkill • Local python packages • Made sense for POC • Nova separated later with Cells v1 Scaling - Collapsed Architecture Justification
  11. 11. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Scaling - Collapsed Architecture Retrospective Good • Simple architecture • Minimal hardware • Easy network ACLs • Up and running fast Bad • Large failure impacts • Resource contention • Single API endpoints 20/20 Hindsight • OK for POC • Ignored it too long • Easy to scale out • (Implementing now)
  12. 12. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Infrastructure - Special Neutron Architecture Justification • Neutron L2 assumptions • L3 folded clos network • L2 stops at leafs • Uncomfortable with overlays • Provider network per rack • Routed floating IPs • Overload AZ to pick a network • Local patches for network scheduling
  13. 13. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Infrastructure - Special Neutron Architecture Retrospective Good • Same for VMs and metal • Simple infrastructure • Easy on users • Network IP usages API • Segmented networks spec Bad • Snowflake setup • L2 adjacency expectations • Added features difficult (LBaaS) • Migration to Neutron segmented networks? 20/20 Hindsight • Works pretty well • Patches are limited • IP usages API extension • Segmented networks in Neutron • Many others with same problem
  14. 14. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Management - Puppet Single Source of Truth Justification • Big Puppet shop • Single source of config • Good for server bootstrapping • OpenStack-Puppet modules • API providers • Code pipeline already in place • Ansible kicks off puppet apply
  15. 15. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Management - Puppet Single Source of Truth Retrospective Good • Single source of config (in theory) • Efficient bootstrapping • NOOP mode for sanity Bad • State in Puppet, Hiera, APIs • Some managed manually • Duplicate API objects • Omnibus deployments • NOOP report not always accurate! • Orphaned/forgotten servers • Orchestration difficult 20/20 Hindsight • Many unintended problems • Not really a single source • Need for targeted deployments • Other tools for orchestration • Use for bootstrapping
  16. 16. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Strategies for Avoiding Accidental Architecture • Think of your future selves •Quantify tech debt interest • Almost nothing will be temporary •Make a specific plan and timeline • Carefully consider scale •Overestimating can be as bad as underestimating • Automate first •At least make it capable
  17. 17. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Strategies for Avoiding Accidental Architecture • KISS! http://stella.report
  18. 18. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Strategies for Avoiding Accidental Architecture • Spread the knowledge wealth http://stella.report * The Coming Software Apocalypse: https://www.theatlantic.com/technology/archive/2017/09/saving-the-world-from-code/540393/ “The problem, [...] is that we are attempting to build systems that are beyond our ability to intellectually manage.” *
  19. 19. Copyright© 2016 GoDaddy Inc. All Rights Reserved. Recap: How to Live with No Regrets Questions? Other Ideas? klindgren@godaddy.com mike.dorman@sendgrid.com ● Manage expectations ● Education and evangelism ● Helpful early adopters ● Ongoing guidance ● Remember your future self ● Account and plan for tech debt ● Sane scale expectations ● Automate, automate, automate ● Simplicity ● Knowledge sharing

×