Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

When DevOps and Networking Intersect by Brent Salisbury of socketplane.io

957 views

Published on

When DevOps and Networking Intersect by Brent Salisbury of socketplane.io

Published in: Technology
  • Be the first to comment

When DevOps and Networking Intersect by Brent Salisbury of socketplane.io

  1. 1. when network and devops intersect Brent Salisbury socketplane.io
  2. 2. socketplane.io - docker networking John Willis Co-Founder & VP Business Development Formerly: Formerly CTO Stateless Networks Madhu Venugopal Co-Founder & President Formerly: Principal Engineer Office of the CTO, Red Hat Brent Salisbury Co-Founder & VP Engineering Formerly: Senior Engineer Office of the CTO, Red Hat Dave Tucker Co-Founder, VP Product Formerly: Senior Engineer Office of the CTO, Red Hat
  3. 3. lessons_learned struct 1. the evolving network! 2. lessons learned from controller development! 3. netops from an operational+dev view! 4. looking ahead
  4. 4. the problem
  5. 5. Cost Network Compute - Storage Vertical Integration Horizontal Scale Number Widgets - Economies of Scale
  6. 6. Over Provisioned Under Provisioned Network Capacity Needs Network Usage Growth Time
  7. 7. Efficient Provisioning Network Capacity Needs Network Usage Growth Time
  8. 8. Where we were • CLI for everything • vendor management tools did everything and nothing. • used to be Perl, TCL and later Python • zero ip management ! • turned into a who can make the best obscure magic !
  9. 9. Where we are • CLI for everything • vendor management tools did everything and nothing. • used to be Perl, TCL and later Python • zero ip management ! • turned into a who can make the best obscure magic !
  10. 10. where we are(ish) • exponential growth with flat operating budgets! • incessant pressure for uptime + capex/opex cost reduction! • the majority of networks still maintain proprietary hw, sw and api! • datapaths are still barely programmable ! • netops manages very little beyond the ToR.
  11. 11. quick review of node distribution • distributed! • centralized! • de-centralized
  12. 12. Centralized
  13. 13. Centralized the sdn approach Forwarding Population Controller Match + Action
  14. 14. Decentralized
  15. 15. Decentralized the sdn approach Forwarding Population + Clustered Controller Orchestration Topology Match + Action
  16. 16. similarly both hard problems Routing Engine Line Card 1 MAC Source Addres s MAC Destinati on IP Source Address IP Destinati on Sour ce Port Destinati on Port Instructions Ing res s Por t Pri orit y Proto col * * * * * * GOTO/ Drop/ Controller/ Normal 0 *. * Data Plane P1 P2 P... MAC MAC IP IP Source Sour Destinati Bus Destinati Source Destinati Instructions Addres ce on Port on Address on s Port Line Card 2 Ethernet Ing res s Por t Pri orit y Proto col * * * * * * GOTO/ Drop/ Controller/ Normal 0 *. * Data Plane P1 P2 P... Line Card ... MAC Source Addres s MAC Destinati on IP Source Address IP Destinati on Sour ce Port Destinati on Port Instructions Ing res s Por t Pri orit y Proto col * * * * * * GOTO/ Drop/ Controller/ Normal 0 *. * Data Plane P1 P2 P... Controller OVS MAC Source Addres s MAC Destinati on IP Source Address IP Destinati on Sour ce Port Destinati on Port Instructions Ing res s Por t Pri orit y Proto col * * * * * * GOTO/ Drop/ Controller/ Normal 0 *. * Data Plane P1 P2 P... OF Switch MAC Source Addres s MAC Destinati on IP Source Address IP Destinati on Sour ce Port Destinati on Port Instructions Ing res s Por t Pri orit y Proto col * * * * * * GOTO/ Drop/ Controller/ Normal 0 *. * Data Plane P1 P2 P... Random Agent MAC Source Addres s MAC Destinati on IP Source Address IP Destinati on Sour ce Port Destinati on Port Instructions Ing res s Por t Pri orit y Proto col * * * * * * GOTO/ Drop/ Controller/ Normal 0 *. * Data Plane P1 P2 P... Fabric
  17. 17. Distributed
  18. 18. the internets scales Distributed
  19. 19. the barrier to scale Host 1 L2 Flooding and Learning Host 2 Data Plane Data Plane Flooding Flooding VLAN x VLAN x ! • Live workload migration cripples network ops! • subnets for policy groupings are the only reason to think in those terms anymore
  20. 20. shit that doesn't scale • the next few slides are things i thought were possible at some point around the problem of L2! ! • lesson learned prototype and fail faster! ! • ask your team why they really need L2
  21. 21. Proactive L2 Flooding and Learning with Legacy VLANs Proactive Rule - Match: ARP Action: Normal Maintaining Legacy Broadcast Domains Controller Never Punts ARP Host 1 OpenFlow Controller Host 2 Data Plane Data Plane Flooding Flooding VLAN x VLAN x Can Also Serve as a Fallback Failure Mode or Hybrid Mirgration Strategy
  22. 22. Reactive OpenFlow Flow Policy OpenFlow Controller OpenFlow Switch Data Plane 1st Packet in Flow MAC Source Addres s MAC Destinati on IP Source Address P1 P2 P3 IP Destinati on Sour ce Port Destinati on Port Svr 1 Svr 2 Svr 3 Instructions Ing res s Por t Pri orit y * * * * * * GOTO/ Drop/ Controller/ Normal 0 *. Proto col * Packet-In A Flowmod Installs a Flow Rule for Subsequent Matching Packets
  23. 23. Controller Intercepting ARP and Proxy the Reply ARP Request and Reply OpenFlow Controller Host (Key) Location (Value) Host 2 IP, MAC,Tenant ==> Tunnel 200 Tep IP Match: ARP Action: Controller Match: ARP Action: Controller Host 2 Data Plane Data Plane Switch 1 Switch 2 VLAN ID Constraints Becomes Irrelevant Tenancy Maintained in the Controller Host 1 Controllers can Answers and/or Sends ARP (proxy)
  24. 24. Controller Connect Source and Destination Hosts via Packet-In and Flowmods ARP Request Host (Key) Location (Value) Host 2 IP, MAC,Tenant ==> Tunnel 200 Tep IP Flowmod Building Data Path OpenFlow Controller Flowmod Building Data Path Match: ARP Action: Controller Match: ARP Action: Controller Host 2 Data Plane Data Plane Switch 1 Switch 2 Host 1 Data Path (Tunnel, or Flow Path VLAN ID Constraints Becomes Irrelevant Tenancy Maintained in the Controller
  25. 25. not if but when ! • build infrastructure for the worst case scenario, because it will be worse.! • cascading failure suck! • focus on solving the problem not the implementation! • intelligence in the datapath HW is a good thing as long ideally if coupled with open and programmatically manageable Control and Data Plane Split Brain Control Plane Data Plane - DPID ::00:01 P1 P2 P3 DPID DPID DPID ? ? X ? ? ?
  26. 26. this movie has a shitty ending Bridge Linux Bridging Frame In IPTables Frame Egress HAProxy Functions X,Y, Z
  27. 27. What Works: Performance and Reliability First Table 0 Classifier Table 2 …….. Frame Out Table n OVS/DPDK Packet Forwarding Pipeline Frame In Function Foo Function Bar Stages
  28. 28. traffic alignment from the 90’s Data Center L3 Core Data Center L3 Core Physical Switch vSwitch Physical Switch vSwitch Physical Switch vSwitch Firewall North/South Security Policy Data Center Today
  29. 29. new architectures for new workloads Distributed Policy Application For Data Center Data Center L3 Core Data Center L3 Core Physical Switch vSwitch Physical Switch vSwitch Physical Switch vSwitch East West Security Policy
  30. 30. trust what you know • rely your own operational experiences, if you don't have any go get some even if its stalking customers! • don't fall in love with implementations, they are probably wrong! • ask questions but be open minded! • avoid slide jockeys! • avoid the vendor wars! • avoid cults! • complexity w/o abstraction fails! • almost all abstractions fail
  31. 31. serenity now, insanity later • make time for research and planning!! • wether it is a big infra project or an dev sprint, don't let the oppressive demand of execution compromise a practical design! ! • that said, if the plan sucks, change it.
  32. 32. nothing is easy, don't make it harder • prototyping and early feedback should be your compass • when users says, this seems a little too complex, LISTEN! • odds are you aren't going to be able to get the right abstraction to hide your over-engineering
  33. 33. performance and reliability first • network operators are measured in uptime first • don't compromise reliability for cost savings without making it very clear to all leadership, not just the IT manager heroes. • perform consistency checking
  34. 34. /dev • understand the problem first! ! • if you don't understand the problem stalk someone who does! ! • make readable code! ! • code for the worst case scenario
  35. 35. architecture • if it isn't broke, don't break it • architects need understandable components • architects need predictable components • predictive analysis is a big data problem • predict problems with operational tools and data • don't build a nuclear submarine when a bicycle will do
  36. 36. test and prototype ! • verify before you hit enter! • automate all production changes! • setup rollback processes! ! • the result:! • should be shorter change windows! • faster rollbacks! • better trained operators
  37. 37. everybody is smart • "A great team doesn’t mean that they had the smartest people. What made those teams great is that everyone trusted one another. It can be a powerful thing when that magic dynamic exists." -Gene Kim
  38. 38. team culture • not proving how much smarter you are then your co-workers. • give credit to the team first, its just weird otherwise • don't hoard contacts • find peoples passion and maximize it • protect your cultures morale like it is your bank account
  39. 39. where to start? • starting out! • no one can learn for you, find your passion! • learn linux! • explore vswitches, I recommend http://openvswitch.org! • connect with peers in the community and share experiences • explore compute (containers, hypervisors and everything else beyond the top of rack! ! • further along! • code, i recommend Golang atm fwiw! • learn CI tools and sw dev processes! • contributes to upstream open source! • build something that solves others problems and open source it

×