Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hitless Controller Upgrades

70 views

Published on

In these slides you will be able to learn about:
1. Traditional Network Upgrades
2. Controller Upgrade CI/CD Toolsets
3. Data and Control Layer Separation
4. Challenges with OpenFlow Hitless Upgrade
5. Controller APP Change
6. Controller Infrastructure
7. No pipeline change
8. Node Upgrades
9. Controller & Application Upgrades
10. Multi Site Cluster/Controller groups

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Hitless Controller Upgrades

  1. 1. © 2018 LUMINA NETWORKS, INC. © 2018 LUMINA NETWORKS, INC. SDN Meetup Hitless Controller Upgrade
  2. 2. © 2018 LUMINA NETWORKS, INC. Introduction Towards a Hitless upgrade. • Traditional Network Upgrades – Closed Systems • HW and Control Bundled (From the one Vendor) • HW upgrade sometimes requires Control plane refresh – Line card needs new OS and/or RE upgrade. – Large Events • Sometimes Months of Planning • Failure is handled by rollback – End Game is lots of small Automated Upgrades.
  3. 3. © 2018 LUMINA NETWORKS, INC. Brutal Automation is the only way Its easy to regress back to inefficient practices. • Arash Ashouriha, Deutsche Telekom AG (NYSE: DT)'s deputy chief technology officer, said the only way that his company could now succeed was through a process of "brutal automation.” THE HAGUE -- SDN NFV World Congress 2017
  4. 4. © 2018 LUMINA NETWORKS, INC. Controller Upgrade CI/CD Toolsets Software Practices and Toolsets that need to be employed. • Upgrades MUST be Automated. • Automated Dev Test Framework. – NO Shortcuts! • Pre Validation Checks. • Engineer Hands off Upgrade Process. • Post Validation Checks. • Automated Rollback. • Post Rollback Validations.
  5. 5. © 2018 LUMINA NETWORKS, INC. Data and Control Layer Separation • Data Plane – Rule driven • Openflow rules • Configured by application on controller – Isolated from control plane – Benefits of no control traffic between nodes – Decisions made by application – Any "white box" with OF interface – Flows and groups are static until reprogrammed
  6. 6. © 2018 LUMINA NETWORKS, INC. Data and Control Layer Separation • Control Plane – Application/"Flow Manager" • Controller acts as message bus • Application calculates flows/groups – Receives LLDP from nodes • Topology built – Shares/Distributes network state to all Controllers – Drives potential for "hitless" upgrade – Has it’s challenges…
  7. 7. © 2018 LUMINA NETWORKS, INC. Challenges with Openflow Hitless Upgrade Can it be hitless? Types of Changes we need to understand. • Controller APP Change – Path Computational change that requires an algorithm change – Service Change (new way of using abstracted resources) • Controller Change – Project Updates - openflow plugin /stats manager /topology manager etc – Plugin Updates - openflow 1.3 -> 1.4 – MDSAL/Model changes - yang model changes • Dataplane Pipeline – No Pipeline Change >>>> HITLESS ☺ • Flows, Groups, Tables stay the same – Pipeline Change • Flows, Groups, Tables are not supporting new Pipeline
  8. 8. © 2018 LUMINA NETWORKS, INC. Controller APP Change • Can you overlay a PCE Change? • New LSP Mesh / SR topo (Nodes SID) • Even if you could handle a new Label base, you need to handle: – Match Duplication (on ingress) • How would you handle this? – Action Duplication (on egress) • Resource Limits – Group Limits - stats manager with lots of groups - clustering then replicates that data – Flow limits
  9. 9. © 2018 LUMINA NETWORKS, INC. Controller Infrastructure • Plugin Changes – Experimenter (mechanism for proprietary messages within the protocol) – Version Bump • Controller Project Changes – Is Hitless Upgrade Considered Part of the Project? – Namespace – Functionality
  10. 10. © 2018 LUMINA NETWORKS, INC. No PCE change or Pipeline change (Easiest Scenario) But we still have to be aware of: • Group Limits • Flow Limits • Stats Manager – Reconciling Flows – General Load (lots of data) No pipeline change
  11. 11. © 2018 LUMINA NETWORKS, INC. • Flow and or Group type changes. – Flows actions you may need change • Ingress flow now has a new action? – Group Tables you may need change • Change from All to a Hierarchy – New Tables • Table reassignment • Flow and group tables perform different functions • Packet match lookups/forwarding Pipeline Change
  12. 12. © 2018 LUMINA NETWORKS, INC. Node Upgrades • Switch OS upgrade – Remove from service • Rerouting any transit services • Got ingress or egress services? – They are dual homed right? If they aren’t, well.. – Upgrade – Check – Place Back into Service.
  13. 13. © 2018 LUMINA NETWORKS, INC. Controller & Application Upgrades • Option A • Single cluster • Disconnect switches - data plane continues, flows/groups state is persistent • Perform upgrade • Re-deploy • Reconnect Switches • Reliably manage outage window • Not completely hitless
  14. 14. © 2018 LUMINA NETWORKS, INC. Multi Site Cluster/Controller groups Not so easy • Option B • Idea of having a fall back cluster • Increased redundancy, Increased cost • Point switches to this cluster - if datastore are shared across both clusters, can upgrade one cluster at a time • Will this be hitless? • Key lies in what is actually being upgraded • However - hitless rollback if required • Saves production state in case of emergency
  15. 15. © 2018 LUMINA NETWORKS, INC. How we do it Not so easy • Avoiding initial data plane impact – Prepare • Stop running controller process • Disconnect controllers from switches • Environment tools - orchestration/monitoring systems – Checks • Switch connections • Controller status • Data plane – Upgrade
  16. 16. © 2018 LUMINA NETWORKS, INC. Automation Tools • Software provisioning/IT automation • Completely hands off - process driven upgrade • Operational ready process - tested and proven • Powerful automation tool - Ansible Project • Concept of roles/playbooks and inventories – Pre-Check • Ability to check for existing packages/files/information • Make decisions based on OS • Run native/non-native commands direct to servers – Upgrade • Copy, move and edit files • Extract and install packages • Native Linux Functionality built into native ansible commands – Post-Check • Validation • File cksum checks • Application Config
  17. 17. © 2018 LUMINA NETWORKS, INC. In-house DevOps Tools • Compare and validate datastore with switches • Use to understand current state of network - – Nodes? • LLDP received? – Links? • Is topology built internally? • Is appropriate topology datastore populated correctly? – Flows? • Comparison of operational/config datastore • Are flows reported on switches and in operational? • Verify correct flow and group calculation
  18. 18. © 2018 LUMINA NETWORKS, INC. Challenges • Lab and Production environment differences • Users/Permissions • Directory Structure • Addressing schemes • Resource limitation • Hard to get "identical" production environment • Inventory management • Variables, secrets, package versioning • Process needs to be "bullet proof" • Tested/Refined,Feedback, etc • CI/CD • Accounting for differences between lab and production can be tricky • Product Changes/Customer tool changes • Changes in orchestration applications • Application namespace changes and functionality changes • Regression testing needs to be thorough and capture corner cases • Appropriate testing framework
  19. 19. © 2018 LUMINA NETWORKS, INC. Way around the challenges • Automation, automation, automation • Know the environment/product well enough to automate the entire process • Automated Testing framework - thorough use case and functionality testing • No changes implemented that aren’t tested • No engineering "hands on" during upgrade • Anyone can run the upgrade is the goal • Knowledge – Knowledge is in the process – Knowledge is in the automation and toolset / CI/CD – Efficiency, effectiveness - not reliant on individuals or their knowledge in constantly changing industry
  20. 20. © 2018 LUMINA NETWORKS, INC. Thank you!

×