OpenStack at Scale
- My learnings
Balaji Narayanan
Mar 19, 2016
Moi
- @balajijegan
- 10+ years at Yahoo across Platforms (OpenStack,
Hadoop, User Management) and Advertising
- Dev Ops and Architecture
- Hacker
- http://www.slideshare.net/balajijegan
Listen to your Customers
Continuous Delivery
Automation
Monitoring
Troubleshooting / Tools
Log Aggregation
Operational Hacks
Capacity Management
Utilization
Integration with Existing Systems
Photo Credits
- Bill Caskey / The Sales Cooke
- http://www.cloudways.com/blog/devops-accelerated-sdlc-improved-distributed-
computing-scalability/
- http://www.the-vital-edge.com/knowledge-and-artificial-intelligence/
- http://www.mediafactory.org.au/emily-malone/2014/03/14/week-02-troubleshooting/
- http://cdn.serietivu.com/wp-content/uploads/2013/01/TBBT-6-Sheldon-Cooper-
Bazinga.jpg
- http://storyglitz.com/wp-content/uploads/2015/02/bolier-iron-box.jpg
- https://www.boundless.com/business/textbooks/boundless-business-

OpenStack at Scale - My Learnings

Editor's Notes

  • #2 OpenStack at Scale - My learnings. Talk at LSPE Bangalore
  • #4 OpenStack - What is OpenStack. What does it offer? IaaS, Private Cloud
  • #5 OpenStack - Reach and Scale. Deployments across US, Asia, Europe and Australia. Developer Productivity, VM and BM for Production Use Cases. 30 independent clusters, primarily based on security domain and use cases. Now most of what I am going to talk about has nothing to do with OpenStack. It is true for any service that you run. If you are a Dev Ops, you are already paying attention to all of these.
  • #6 Listen to your customers.
  • #7 Continuous Delivery, Dev Ops Slide6 - Product Managers wants features to go out. Developers want to complete. Operations wants stability. Change is a big cause of production incidents. During change moratorium, there are very little outages. We can live as is or improve process / agility.
  • #8 Automation Slide 7 - Automation. Automate yourself out of your job. There will be interesting things to do. Deployment, Validation, Smoke Tests, Performance Tests, Capacity Augmentation.
  • #9 Monitoring - Actionable alerts, Monitor Services, Synthetic tests Slide 8 - Synthetic monitoring, Service Monitoring. Every alert is actionable. If it is not, will need to be secondary. VMbooter - Detects most of the cases.
  • #10 Troubleshooting, Knowledge Sharing, Runbooks Slide 9 - Troubleshooting, Providing Self Serve Tool for users,
  • #11 Searching in a haystack Slide 10 - Log Aggregation
  • #12 Hacks are good. Follow the same process as feature development
  • #13 Capacity Management, Quota Provisioning - Private Cloud brings in its own challenges
  • #14 Utilization in Private Clouds - Over commit, Metering, Billing
  • #15 Integrating with Existing Systems