Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Silicon Valley Code Camp 2019 - Data center automation using pipeline as code


Published on

Have you ever thought of automating end to end workflow for setting up a new data center by single click? Have you ever thought of implementing automations for infrastructure setups which generally takes months of effort to hours with single click? Some of the examples of such setups are:

1. Setting up DC network in reliable and reproducible manner.
2. Automatic OS provisioning on blade servers.
3. Configuring the DC components using idempotent automation workflows.
4. Setting up highly available internal private cloud / container orchestration platform like Kubernetes on auto provisioned infra.
5. A very complex Inventory state life management workflow.

To accomplish reliable, reproducible and idempotent automation for infrastructure setup, NVIDIA DevOps team has been working on implementing *DC Automation Manager*, a framework developed using CICD tools ecosystem.

In this presentation we will talk about design and automation used at NVIDIA GPU Cloud to setup new DC of 1000s of GPU and CPU blade servers from scratch using Jenkins and GitOps for,

1. Streamlining inventory life cycle
2. L2/L3 network setups
3. Node provisioning and OS configuration with dynamic inventory capabilities
4. Setting up container orchestration platforms on BM/Cloud
5. Bridging the gap between application engineering and operation engineering.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Silicon Valley Code Camp 2019 - Data center automation using pipeline as code

  1. 1. Data Center Automation using Pipeline as Code Gopi Vadlamudi NGC DevOps Team
  2. 2. 2 DevOps Movement Varma T,
  3. 3. 3 DevOps Evolution Dzone
  4. 4. DevOps Acceleration and Trends Focus shifting towards ‘DevOps Assembly Lines’ Automating and connecting activities performed by several teams Google Trends with keyword ‘DevOps’
  5. 5. 5 SecurityNetworking Orchestration Platforms Data Centers Application Deployments Provisioning Across industries with S/W based Automations Massively Growing DevOps Needs BMC Docs
  6. 6. 6 DevOps in DC
  7. 7. Why DevOps into Data Center Setup ? As in other production units, like in a Dunkin’ Donut’s bakery Stephen J.Serio
  8. 8. 8 On-Prem vs Cloud Vs An analogy to differentiate on-prem Vs cloud setup is, baking a cake on your own Vs buying it from a bakery.
  9. 9. 9 Major Differences FACTORS Cloud ERP ON-PREMISE ERP Cost Predictable costs, cheaper investment and no additional hardware for short term. Initial CAPEX cost is high to setup the BM with risks involved. Capacity utilization is crucial. Security Data security on the vendor side with associated risks. Matured security infrastructure through IAM (Identity Access Management). Security is self driven with more control. But challenge in integrating access with IAMs/AD/SSO, key management, network segmentation and software/system life cycle management functions. Customization Less customizations, offering greater stability but no control over underlying HW. Ability to customize and control underlying HW structure, performance to containers, kernel settings directly on BM, but associated with challenges in managing them (network, nodes, boot process, virtual infrastructure) Implementation Less time to implement a complete workflow. More control on implementation process, but takes longer time to implement workflow Resources Provisioning resources for compute, storage and networking taken care. Provisioning resources is challenging and need better understanding. Use case Small and midsize businesses seeking lower upfront costs, system stability and ease of access Larger enterprise businesses with higher budgets, a desire to customize system operations and the existing infrastructure for BW heavy and performance (AL/ML centric) along with security control.
  10. 10. Automation of DC setup workflow helps... ● Set up DC network in a reliable and reproducible manner. ● Manage Inventory life-cycle through an automated flow. ● Automate OS provisioning on bare metal. ● Configure the DC components. ● Set up highly available private cloud and container orchestration platform. ● Effective communication across cross-functional teams.
  11. 11. 11 Take away E2E Automation is the key to better manage spending, security, resources, customizations and implementation in a stable, reliable, reproducible and idempotent manner for setting up DC infrastructure.
  12. 12. 12 Microservices Inspired Automation
  13. 13. 13 Microservices Inspired Automation ● Autonomous scripts, meant for specific configuration tasks. ● Independent automations communicate with each other using job relationships. ● One centralized orchestration job for E2E setup. ● Each automation has a separate codebase and can be managed by its owners.
  14. 14. 14 Overview DevOps evolution Why DC automation Microservices inspired automation ● E2E DC Components Integration ● Streamlining Inventory Life Cycle ● Network & Node Provisioning ● Orchestration Platforms Setup ● Containerized microservices deployment ● Challenges and approaches ● Takeaways ● Future Steps
  15. 15. 15 E2E DC Component Integration Device Configuration Monitoring CICD Automation Physical DC Infrastructure Servers DCIM StorageKubernetes Application Node Provisioning, Security and Configuration
  16. 16. 16 Streamlining Inventory Life Cycle
  17. 17. 17 Inventory Management Objectives ● Asset management by SiteOps. ● Manage Device Information. ● Track ownership states for devices. ● Track operational states for devices.
  18. 18. 18 Network & Node provisioning
  19. 19. 19 Network & Node Provisioning ● Automate OS installation ● Apply organization policies: ○ Configure security. ○ Apply network configuration. ○ Application specific node configuration
  20. 20. 20 Node Provisioning & Configuration
  21. 21. 21 Orchestration platforms setup
  22. 22. 22 Container Orchestration Platform Objectives ● Setup a container orchestration platform like Kubernetes. ● Persist logs of Cluster Management. ● Version control changes to configuration of the cluster. ● Execute the cluster management activities in Jenkins or any CICD tool. ● Enable CI for Container Orchestration.
  23. 23. 23 Container Orchestration Platform Orchestration Platform: Kubernetes Management tasks for the following activities can be scripted: 1. k8s cluster creation 2. k8s cluster reset 3. k8s cluster service upgrade 4. k8s cluster scaling up & down 5. K8s cluster validation
  24. 24. 24 Automation Workflow
  25. 25. K8s Cluster Setup validation Automate k8s setup & cluster-validation for: ● identifying the setup issues much earlier in cycle ● setting the platform for reliable application deployment. Example test cases: ● All k8s masters have "Ready" status. ● All k8s nodes have "Ready" status. ● Component status returns healthy for all components. ● All pods in the kube-system namespace are running and healthy.
  26. 26. 26 Containerized Microservices Deployment
  27. 27. 27 Microservice Deployment ● Deploy Containerized Applications in Kubernetes ● Track every deployment ● Group logically related kubernetes resources. ● Environment Specific Configuration
  28. 28. 28 Helm ● Charts ○ Allow you to install and upgrade k8s resources. ○ Can be versioned. ○ Can establish dependencies to other charts. ● Tiller - Server side component of helm. ○ It is deployed in kubernetes as a pod ○ It keeps track of revisions of a release deployed. ○ Allows rollback based on revision number.
  29. 29. 29 Microservice Deployment Workflow
  30. 30. 30 Challenges and Approaches
  31. 31. 31 Challenges and Approaches ● Open source tools may not support every use-case. Be ready to extend it. ● While setting up Kubernetes at scale, we may encounter failures. We had to tune the configurations to suit our requirement and fix a few bugs too. ● Kubernetes, Helm, Security, Network switch and router configuration has learning curve. ● Ansible is stateless. You need additional dashboard to store the cluster state. ● Versioning infra, unlike application is very complicated process. ● Not all systems support webhooks. You may have to use polling to fetch status in a few integrations.
  32. 32. 32 Takeaways ● Automation can be used to create idempotent, reproducible infrastructure. Automate now, don’t delay. ● Version the infrastructure code. Follow pull-request & review for each change. ● Maintain inventory life cycle in a well defined state workflow. ● Use stateful CM which can record the state of your infra, if possible. ● Treat security as primary customer.
  33. 33. Future Steps Intelligent DevOps systems should be capable of providing: ○ Auto healing ○ Auto error detection ○ Auto rollback ○ Unsupervised ML that automatically tests and verifies deployments. ○ Analyze data from various tools. ○ Uses automated testing tools to look for anomalies and failures.
  34. 34. 34 Thanks: SVCC NVIDIA NGC Group, DevOps Team