Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Routed Fabrics For Ceph

30 views

Published on

Chris Ellis: Routed Fabrics For Ceph

Published in: Technology
  • Be the first to comment

Routed Fabrics For Ceph

  1. 1. http://intrbiz.comchris@intrbiz.com Routed Fabrics For Ceph Chris Ellis - @intrbiz Fast & Effective Networking For Ceph Ceph Day London 2019
  2. 2. http://intrbiz.comchris@intrbiz.com Hello! ● I’m Chris ○ IT jack of all trades ● Mostly a PostgreSQL Consultant ○ Full stack: ■ from electronic design to web dev ● Very much into Open Source ○ Started a monitoring system project a few years ago ○ Big openSUSE and PostgreSQL fan ● Been using and playing with Ceph for a couple of years ○ Build a small VM farm with Ceph for shared storage
  3. 3. http://intrbiz.comchris@intrbiz.com Routed Fabrics, Huh?
  4. 4. http://intrbiz.comchris@intrbiz.com Routed Fabrics, Huh? ● Essentially we make servers participate in routing ○ Every network link the server has is active / active utilised ○ Every server takes part in the routing protocol ○ Routing protocol deals with device and link failures ■ Data just takes another path in the event of a fault ● Equal Cost Multi Path (ECMP) is used to efficiently move traffic ○ IP packets are routed over all available links ○ TCP streams don’t get split across more than one path ■ Single stream is still limited to the bandwidth of your links ○ IE: with 4x 10Gbe NICs we can push 40Gb/s of traffic in aggregate ■ An individual TCP stream maxes at 10Gb/s
  5. 5. http://intrbiz.comchris@intrbiz.com The Build ● My setup is about as small as you can go ● I've my R&D setup ● It's only two switches ● But it's about showing that these approaches work even at small scale ○ All traffic is still routed ○ We still get all benefits of a Routed Fabric ○ We can use cheap commodity switching ○ You don't need super high end kit to get efficiency and speed ● Yes, it's not a real Clos topology, you need a bigger problem domain for that ● This is about thinking about different ways of doing things
  6. 6. http://intrbiz.comchris@intrbiz.com What You’ll Need
  7. 7. http://intrbiz.comchris@intrbiz.com Connecting Things
  8. 8. http://intrbiz.comchris@intrbiz.com Connecting Things
  9. 9. http://intrbiz.comchris@intrbiz.com A Cunning Plan - Network Assignments ● Switch 1: 172.31.1.0/24 ○ Port 1: 172.31.1.0/30 ○ Port 2: 172.31.1.4/30 ○ … ○ Port 24: 172.31.1.92/30 ● Inter-switch: 172.31.3.0/24 ○ Link 1: 172.31.3.0/30 ○ Link 2: 172.31.3.4/30 ○ … ○ Link 8: 172.31.3.28/30 ● Switch 2: 172.31.2.0/24 ○ Port 1: 172.31.2.0/30 ○ Port 2: 172.31.2.4/30 ○ … ○ Port 24: 172.31.2.92/30 ● Ceph: 172.28.0.0/24 ○ Node 1: 172.28.0.1/32 ○ Node 2: 172.28.0.2/32 ○ … ○ Node 12: 172.28.0.12/32
  10. 10. http://intrbiz.comchris@intrbiz.com Configuring Your Switches - Turn On Routing ip routing router ospf router-id 172.26.1.210 network 172.31.1.0 255.255.255.0 area 0.0.0.0 network 172.31.3.0 255.255.255.0 area 0.0.0.0 redistribute connected redistribute static exit
  11. 11. http://intrbiz.comchris@intrbiz.com Configuring Your Switches - Server Interface interface 0/1 mtu 9018 routing ip address 172.31.1.1 255.255.255.252 ip ospf area 0.0.0.0 exit
  12. 12. http://intrbiz.comchris@intrbiz.com Configuring Your Switches - Server Interface interface 0/2 mtu 9018 routing ip address 172.31.1.5 255.255.255.252 ip ospf area 0.0.0.0 exit
  13. 13. http://intrbiz.comchris@intrbiz.com Configuring Your Switches - Server Interface interface 0/24 mtu 9018 routing ip address 172.31.1.93 255.255.255.252 ip ospf area 0.0.0.0 exit
  14. 14. http://intrbiz.comchris@intrbiz.com Configuring Your Switches - Inter Switch Interface interface 0/28 mtu 9018 routing ip address 172.31.3.1 255.255.255.252 ip ospf area 0.0.0.0 exit
  15. 15. http://intrbiz.comchris@intrbiz.com Configuring Your Ceph Server - Interfaces $> cat ifcfg-eth4 BOOTPROTO='static' IPADDR='172.31.1.2/30' MTU='9000' $> cat ifcfg-eth6 BOOTPROTO='static' IPADDR='172.31.1.6/30' MTU='9000' $> cat ifcfg-eth5 BOOTPROTO='static' IPADDR='172.31.2.2/30' MTU='9000' $> cat ifcfg-eth7 BOOTPROTO='static' IPADDR='172.31.2.6/30' MTU='9000'
  16. 16. http://intrbiz.comchris@intrbiz.com Configuring Your Ceph Server - Dummy Interface $> cat ifcfg-dummy0 BOOTPROTO='static' IPADDR='172.28.0.1/32' MTU='9000'
  17. 17. http://intrbiz.comchris@intrbiz.com Configuring Your Ceph Server - Quagga & OSPFd $> zypper in quagga ospfd
  18. 18. http://intrbiz.comchris@intrbiz.com Configuring Your Ceph Server - Quagga $> cat zebra.conf hostname ceph1 ! interface eth4 ip address 172.31.1.2/30 interface eth5 ip address 172.31.2.2/30 interface eth6 ip address 172.31.1.6/30 interface eth7 ip address 172.31.2.6/30 !
  19. 19. http://intrbiz.comchris@intrbiz.com Configuring Your Ceph Server - OSPFd $> cat ospfd.conf hostname ceph1 ! interface eth4 interface eth5 interface eth6 interface eth7 router ospf ospf router-id 172.26.1.1 network 172.28.0.1/32 area 0 network 172.31.1.2/30 area 0 network 172.31.1.6/30 area 0 network 172.31.2.2/30 area 0 network 172.31.2.6/30 area 0 !
  20. 20. http://intrbiz.comchris@intrbiz.com Configuring Your Ceph Server - Kernel $> cat sysctl.conf # Enable IP routing net.ipv4.ip_forward = 1 # Tweak ECMP policy net.ipv4.fib_multipath_hash_policy = 1 net.ipv4.fib_multipath_use_neigh = 1 # Disable reverse path filtering net.ipv4.conf.all.rp_filter = 0 # Enable reverse path filtering on normal NICs net.ipv4.conf.bond1.rp_filter = 1
  21. 21. http://intrbiz.comchris@intrbiz.com Configuring Your Ceph Server - Ceph $> cat ceph.conf [global] public_network = 172.28.0.0/24
  22. 22. http://intrbiz.comchris@intrbiz.com Et Volia $> ip route 172.26.28.2 proto zebra metric 20 nexthop via 172.31.1.10 dev eth7 weight 1 nexthop via 172.31.1.14 dev eth6 weight 1 nexthop via 172.31.2.10 dev eth4 weight 1 nexthop via 172.31.2.14 dev eth5 weight 1 172.26.28.3 proto zebra metric 20 nexthop via 172.31.1.18 dev eth7 weight 1 nexthop via 172.31.1.22 dev eth6 weight 1 nexthop via 172.31.2.18 dev eth4 weight 1 nexthop via 172.31.2.22 dev eth5 weight 1 ...
  23. 23. http://intrbiz.comchris@intrbiz.com Caveats ● Make sure that MTUs are configured correctly and match ○ OSPF is a custom IP type, if your MTU is mismatched packets get corrupted ● Label your cables ○ Swapping cables around will break things ● Quagga will only set a default route if no default route is already defined ○ OSPFd needs: `default-information originate metric-type 1`
  24. 24. http://intrbiz.comchris@intrbiz.com Further Reading ● Intro to Clos networks ○ https://en.wikipedia.org/wiki/Clos_network ● Google white paper on their CLOS topologies ○ https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43837.pdf ● Cumulus on Clos and ECMP: ○ https://cumulusnetworks.com/blog/celebrating-ecmp-part-one/ ● Benefits of ditching layer 2 ○ https://thenewstack.io/ditch-pitfalls-layer-2-networks-modern-data-center-design/

×