SCALE/SWITCHengines Update 
Current and Possible SDN Applications 
SDN Workshop, Winterthur, 20 November 2014 
Simon Leinen 
simon.leinen@switch.ch
What happened since last time? 
• Status at October 2013 SDN workshop presentation: 
“Building Cloud – Where SDN Could Help”: 
– Pilot OpenStack/Ceph cloud “BCC” existed 
–New production cloud (2*2 racks) was planned 
– SDN considered applicable for: 
• Low-cost scalable internal fabric: whitebox ToR leaf/spine with multipath 
• DC/Backbone interface without traditional $$ routers 
• Tunneling towards customers for VPC (Virtual Private Cloud) offerings 
© 2014 SWITCH 
2
“BCC” Cluster – Still Running! 
(for some value of…)-: 
• Built ~10-node Ceph+OpenStack cluster 
BCC – “building cloud competence” 
• Services: 
– VMs for various researchers and internal testers 
– File synchronization server for ~3’500 end users 
of SWITCHdrive service 
• Networking: 
–2*10GE per server 
– 6*10GE on front-end servers, which route 
– Two Brocade “ToR” switches with TRILL-based 
multi-chassis multipath, L2+VLANs 
– 2*10GE towards backbone 
© 2014 SWITCH 
3
Pilot  SWITCH Cloud Project  SCALE 
Co-funded (50% of manpower) by CUS P-2 program 
“Information scientifique – accès, traitement et sauvegarde” 
Project SCALE, May 2014 – May 2015 
• Two sites: UniL Géopolis, ZHdK Toni-Areal, each with 
– 32 (16 compute+16 storage) 2U servers w/2*10GE each 
– 2*10GE external connectivity (pending 2nd link @Toni-Areal) 
– room for growth to ~20 racks 
• New “SWITCHengines” IaaS offering, in limited testing 
now 
© 2014 SWITCH 
4
SCALE Networking 
• Hardware: Two ToRs (Brocade ICX 48*10GE,6*40GE) 
–External: BGP peerings (IPv4+IPv6) to backbone routers 
– Internal: Redundant connections to servers (802.1ad), mult. VLANs 
– Interconnected with single 40GE link 
–Router connections optical, all others DAC (direct-attach copper) 
© 2014 SWITCH 
5
SCALE Networking: Software 
• OpenStack “Icehouse” (2014.1) 
–Neutron (OpenStack Networking), standard/open-source components: 
• ML2 (Modular Layer-2) plugin 
• Open vSwitch (OVS) 
• VXLAN overlay for tenant network isolation 
• Setup: 
–Two OpenStack regions LS/ZH (each with its own Ceph cluster) 
– Single “network node” (VM!) per region for virtual L3 routing 
• between tenant (virtual) networks 
• between VMs and Internet 
© 2014 SWITCH 
6
Neutron 
© 2014 SWITCH 
7
Neutron 
© 2014 SWITCH 
8
Neutron 
© 2014 SWITCH 
9
Problems: MTU (solved) 
• In the default configuration, usable MTU in tenant (overlay) 
networks is 14xx bytes 
• So, “ping” works, but “apt-get update” hangs… 
• Everybody (who uses ML2/OVS/VXLAN) has this problem! 
• Default way to “fix” this is to lower client (VM) MTU 
– IMHO this is hopeless – 1500 bytes much too ingrained now 
• We increase underlay MTU instead (to 1600) - much better 
 
© 2014 SWITCH 
10
Problems: Performance 
• Single-stream performance VM <-> Internet ~1.5Gb/s 
– Should be close to 10Gb/s 
• Many possible bottlenecks: 
– Single network node (will get better with DVR in OpenStack Juno) 
–Virtualized network node (could be un-virtualized if necessary) 
–OVS overhead (will get better with new versions, e.g. zero-copy) 
–VXLAN overhead (will get better with kernel versions, possibly 
hardware support with future Ethernet adapters) 
© 2014 SWITCH 
11
Problems: Missing features 
• IPv6 for VMs 
– “hopefully in Juno” (not really a Neutron issue?) 
• VPC (VPNning back into customers’ campus LANs) 
–To be added in the longer term (IT needs more than researchers?) 
• LBaaS (OpenStack-integrated Load Balancer) 
– Should be easy to add 
© 2014 SWITCH 
12
Move to “real” SDN controller? 
• But there are so many to choose from! 
– Nicira N$X 
–OpenContrail (open source, uses MPLS, Juniper base) 
– Midokura/MidoNet (open source since this month! EPFL ties) 
–Nuage (uses MPLS, Alcatel-Lucent/TiMetra base) 
–Open Daylight (but what does it do?) 
–Calico (open source, Metaswitch base, L3-based, even does IPv6) 
– Plumgrid, … 
– Snabb NFV! (open source, hackable, high-performance user-space) 
• …or should we just wait until the market sorts this out? 
© 2014 SWITCH 
13
Growing the Cloud: Internal fabric 
• Beyond a few racks, we need some sort of “aggregation 
layer” beyond the ToR. There are multiple approaches: 
–Traditional with large aggregation switch (doubled for redundancy) 
– Modern with leaf/spine design <- cost-effective “commodity” kit 
• How can servers make use of parallelism in the fabric? 
– Smart L2 switches (TRILL, Multi-chassis LAG etc.) – vendor lock-in? 
– L3 switches with hypervisor-based overlay à la Nicira OVP 
© 2014 SWITCH 
14
New HW/SW options for leaf/spine 
• “White-box” ToR switches sold without OS (just OPIE) 
– e.g. 32-port 40GE <CHF 10’000 
• Run e.g. Cumulus Networks (Linux) on them 
–Could use Puppet to provision = same as servers 
© 2014 SWITCH 
15
What Became of “Future Internet”? 
“The Internet has a great future behind it”- Jon Crowcroft 
Big funding drive since ~2007 in US and EU (and…) 
• Clean slate / greenfield / disruptive thinking 
• Radical new (or old) ideas (e.g. ICN) 
• Testbeds! 
© 2014 SWITCH 
16
What Became of “Future Internet”? 
• In 2014: 
– EU held last Future Internet Assembly 
– Moving towards new horizons, e.g. 5G 
• FI-PPP still running – see 5 December event @ZHAW 
© 2014 SWITCH 
17
What Became of “Future Internet”? 
• In 2014: 
– EU held last Future Internet Assembly 
– Moving towards new horizons, e.g. 5G 
• FI-PPP still running – see 5 December event @ZHAW 
Hypothesis: 
Future Internet = Current Internet + Cloud 
• as new generative platform (cf. FI-Labs) 
• to save Telcos (NFV) 
Future of Cloud (incl. NFV) = OpenStack 
© 2014 SWITCH 
18

SCALE/SWITCHengines Update - Current and Possible SDN Applications

  • 1.
    SCALE/SWITCHengines Update Currentand Possible SDN Applications SDN Workshop, Winterthur, 20 November 2014 Simon Leinen simon.leinen@switch.ch
  • 2.
    What happened sincelast time? • Status at October 2013 SDN workshop presentation: “Building Cloud – Where SDN Could Help”: – Pilot OpenStack/Ceph cloud “BCC” existed –New production cloud (2*2 racks) was planned – SDN considered applicable for: • Low-cost scalable internal fabric: whitebox ToR leaf/spine with multipath • DC/Backbone interface without traditional $$ routers • Tunneling towards customers for VPC (Virtual Private Cloud) offerings © 2014 SWITCH 2
  • 3.
    “BCC” Cluster –Still Running! (for some value of…)-: • Built ~10-node Ceph+OpenStack cluster BCC – “building cloud competence” • Services: – VMs for various researchers and internal testers – File synchronization server for ~3’500 end users of SWITCHdrive service • Networking: –2*10GE per server – 6*10GE on front-end servers, which route – Two Brocade “ToR” switches with TRILL-based multi-chassis multipath, L2+VLANs – 2*10GE towards backbone © 2014 SWITCH 3
  • 4.
    Pilot  SWITCHCloud Project  SCALE Co-funded (50% of manpower) by CUS P-2 program “Information scientifique – accès, traitement et sauvegarde” Project SCALE, May 2014 – May 2015 • Two sites: UniL Géopolis, ZHdK Toni-Areal, each with – 32 (16 compute+16 storage) 2U servers w/2*10GE each – 2*10GE external connectivity (pending 2nd link @Toni-Areal) – room for growth to ~20 racks • New “SWITCHengines” IaaS offering, in limited testing now © 2014 SWITCH 4
  • 5.
    SCALE Networking •Hardware: Two ToRs (Brocade ICX 48*10GE,6*40GE) –External: BGP peerings (IPv4+IPv6) to backbone routers – Internal: Redundant connections to servers (802.1ad), mult. VLANs – Interconnected with single 40GE link –Router connections optical, all others DAC (direct-attach copper) © 2014 SWITCH 5
  • 6.
    SCALE Networking: Software • OpenStack “Icehouse” (2014.1) –Neutron (OpenStack Networking), standard/open-source components: • ML2 (Modular Layer-2) plugin • Open vSwitch (OVS) • VXLAN overlay for tenant network isolation • Setup: –Two OpenStack regions LS/ZH (each with its own Ceph cluster) – Single “network node” (VM!) per region for virtual L3 routing • between tenant (virtual) networks • between VMs and Internet © 2014 SWITCH 6
  • 7.
  • 8.
  • 9.
  • 10.
    Problems: MTU (solved) • In the default configuration, usable MTU in tenant (overlay) networks is 14xx bytes • So, “ping” works, but “apt-get update” hangs… • Everybody (who uses ML2/OVS/VXLAN) has this problem! • Default way to “fix” this is to lower client (VM) MTU – IMHO this is hopeless – 1500 bytes much too ingrained now • We increase underlay MTU instead (to 1600) - much better  © 2014 SWITCH 10
  • 11.
    Problems: Performance •Single-stream performance VM <-> Internet ~1.5Gb/s – Should be close to 10Gb/s • Many possible bottlenecks: – Single network node (will get better with DVR in OpenStack Juno) –Virtualized network node (could be un-virtualized if necessary) –OVS overhead (will get better with new versions, e.g. zero-copy) –VXLAN overhead (will get better with kernel versions, possibly hardware support with future Ethernet adapters) © 2014 SWITCH 11
  • 12.
    Problems: Missing features • IPv6 for VMs – “hopefully in Juno” (not really a Neutron issue?) • VPC (VPNning back into customers’ campus LANs) –To be added in the longer term (IT needs more than researchers?) • LBaaS (OpenStack-integrated Load Balancer) – Should be easy to add © 2014 SWITCH 12
  • 13.
    Move to “real”SDN controller? • But there are so many to choose from! – Nicira N$X –OpenContrail (open source, uses MPLS, Juniper base) – Midokura/MidoNet (open source since this month! EPFL ties) –Nuage (uses MPLS, Alcatel-Lucent/TiMetra base) –Open Daylight (but what does it do?) –Calico (open source, Metaswitch base, L3-based, even does IPv6) – Plumgrid, … – Snabb NFV! (open source, hackable, high-performance user-space) • …or should we just wait until the market sorts this out? © 2014 SWITCH 13
  • 14.
    Growing the Cloud:Internal fabric • Beyond a few racks, we need some sort of “aggregation layer” beyond the ToR. There are multiple approaches: –Traditional with large aggregation switch (doubled for redundancy) – Modern with leaf/spine design <- cost-effective “commodity” kit • How can servers make use of parallelism in the fabric? – Smart L2 switches (TRILL, Multi-chassis LAG etc.) – vendor lock-in? – L3 switches with hypervisor-based overlay à la Nicira OVP © 2014 SWITCH 14
  • 15.
    New HW/SW optionsfor leaf/spine • “White-box” ToR switches sold without OS (just OPIE) – e.g. 32-port 40GE <CHF 10’000 • Run e.g. Cumulus Networks (Linux) on them –Could use Puppet to provision = same as servers © 2014 SWITCH 15
  • 16.
    What Became of“Future Internet”? “The Internet has a great future behind it”- Jon Crowcroft Big funding drive since ~2007 in US and EU (and…) • Clean slate / greenfield / disruptive thinking • Radical new (or old) ideas (e.g. ICN) • Testbeds! © 2014 SWITCH 16
  • 17.
    What Became of“Future Internet”? • In 2014: – EU held last Future Internet Assembly – Moving towards new horizons, e.g. 5G • FI-PPP still running – see 5 December event @ZHAW © 2014 SWITCH 17
  • 18.
    What Became of“Future Internet”? • In 2014: – EU held last Future Internet Assembly – Moving towards new horizons, e.g. 5G • FI-PPP still running – see 5 December event @ZHAW Hypothesis: Future Internet = Current Internet + Cloud • as new generative platform (cf. FI-Labs) • to save Telcos (NFV) Future of Cloud (incl. NFV) = OpenStack © 2014 SWITCH 18