Cloud Networking Trends
Rob Cone – Principal Engineer, Cloud Service Provider Group
Madhusudhan Rangarajan – Principal Engineer, Cloud Service Provider Group
Agenda
•  Overview of Cloud
•  Network Topologies in Cloud
•  A Closer Look at IaaS and Infrastructure Offloads
•  Future of NICs & FPGAs in the Datacenter
The Cloud is Heterogeneous (& Vast)
Service Class IaaS Tiered SaaS
Hyper-converged
SaaS
Primary service offering
Lease hosted infrastructure to run
customer workload & environment
Deliver hosted service on tiered
infrastructure via custom App
Deliver services via homogeneous
infrastructure optimized for search
Primary revenue model
Direct – paid per unit time by
instance type (many classes)
Mixed – generally paid indirectly for
ad placement (notable exceptions)
Mixed – generally paid indirectly for
ad placement
High-speed SDN value High Low High
Biggest players in this space
were first to create & deploy “smart” NICs
Folded Clos Network Topology
Tiered Structure Results in more zones, and traffic tends to be limited within
a rack or contained within multiple smaller zones. Less $, but requires more
control over workload distribution
Source: http://firstclassfunc.com/facebook-fabric-networking
Fully Routed CloS
Very high E-W BW and uniform latencies across any two nodes within a
much larger zone. Facilitates distributed networking functions (Eg: Load
balancers, SSL termination)
Source: https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf
A Closer Look at IaaS
IaaS revenue scales with VMs /rack or Containers/rack
!  Rack is a good 1st order OpEx proxy
–  Limited racks/DC, node/rack, VM/node
–  Average VM/node scales with core count
!  VM sizes vary: ½ core up to “all” cores
–  Bigger VMs command higher prices
–  Mix of VM sizes varies with demand
!  Not all cores assigned all the time
–  CSP must account for “utilization” factor
SDN Offload
Xeon® Cores
VM instances
SDN & vSwitch
standard NIC
PCI Express
With Trusted Host With Untrusted Host
Control Plane
Xeon® Cores
VM instances
i/pNIC
PCI Express
Control Plane
SDN & vSwitch
Xeon® Cores
VM instances
i/pNIC
PCI Express
Control Plane
SDN & vSwitch
Functions of SDN & vSwitch:
•  ACL, NAT, crypto, metering
•  VXLAN, NVGRE, TEP
•  RDMA (vMotion, storage) w/ encap/decap
Different CSPs have different monetization & TCO models, and have different
thresholds to consider SmartNIC Deployments
Agile Innovation, Differentiation,
Optimization
Fast Infrastructure Acceleration
Evolution, Non-Standardized
IaaS: Higher Revenue w/ Low DC TAX
•  Basic functions utilize 1-4 cores at 10Gb
•  Overhead scales significantly w/ port speed
and advanced features
RDMA Offload •  RDMA addresses high CPU util, TCP latency tail
!  Zero CPU util when done in HW, uS level latencies
!  Higher speed media driving larger network throughput
•  Main use-cases:
!  Distributed storage: read/write data without CPU
Utilization
!  RDMA as a transport (Eg – RPC calls)
•  Also used in special purpose clusters within the DC
like DL & HPC clusters
•  Multiple Flavors & Implementations:
!  RoCE, RoCEv2, iWarp
!  Multiple congestion management schemes: DCQCN,
Timely, DCTCP
Memory
Buffer 1
Memory
Buffer 2
NIC
NIC
App
App
Cloud Networking Changes
Tunneling
Formats
RDMA
Storage
Security
VXLAN/IPinIP/STTP
VXLAN-GPE
GENEVE
Flow Tables OpenFlow 1.0
Custom Flow Tables
iWarp v2
ROCEv1 ROCEv2 ROCEv3?
New Tunneling Formats?
Data Replication
IPsec, SSL, TLS, DTLS (New versions), Deep Packet Inspection
Erasure Coding, Storage Crypto in Cloud,
Future Enhancements
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
OpenFlow 1.1
Compression/deduplication
NSH
iWarp v1
Typical Silicon
PRQ timelines
Custom Congestion
Mgmt Algorithms
High performance with flexibility, customizability, agility, scalability along with
non-standardized implementations drive programmable acceleration solutions
Custom Compression
Custom Crypto
Rough timelines
PNIC
•  Millions of logic elements
•  High memory bandwidth
•  High network bandwidth
•  Full custom pipeline
•  Capable of compute and
networking acceleration
Comparisons Of Programmable Technologies
iNIC
•  Typical many core architecture
•  Small cores with variable amount of
HW acceleration
•  Capable of networking acceleration
•  Requires specialized OS/Development
Environment
Onload
•  High performance CPU cores
•  Designed for general purpose
code
•  Good single thread perf and
balance throughput
•  Easy to program " rich
ecosystem
FPGA NPU CPU
Tradeoffs are Infrastructure Tax, Ease of
Programmability, Jitter/Predictable Behavior, and
Power/Perf
PCIe
PR Region 0
(e.g. Platform)
Memory Interface
NetworkInterface
PR Region 1
(e.g. Application)
FPGA Technology Introduction
LOGIC ELEMENTS
!  Main programmable component
!  Millions of logic elements
!  Simple logic, adders, and registers
!  Interconnect with configurable
fabric
PCIE HOST INTERFACE
!  Hardened + Soft host interface
!  Hardened PCIe controller
!  Soft interface allows different use
models and drivers
MEMORY INTERFACES
!  Configurable high performance
memory interfaces
!  Hardened controllers
NETWORK INTERFACE
!  Configurable network interfaces
!  Hard/soft interfaces
MEMORY BLOCKS
!  Thousands of 20Kb memory blocks
!  Allows processing to stay on-chip
VARIABLE PRECISION DSP BLOCKS
!  Allows FPGA to perform compute
intensive functions
PARTIAL RECONFIGURATION
!  Allows separate regions
Datacenter Accelerator Use Cases
CPU
Ethernet MAC
i/pNIC
Network Interface
Accelerator
CPU Interface
VM VM VM
In-line (eg. i/pNIC)
•  Provides bump-in wire acceleration
•  E.g. classification of traffic to VMs
•  Packet manipulation acceleration
•  Could be combination of ASIC + FPGA
CPU
Accelerator
Look-aside
Accelerator
CPU Interface
VM VM VM
Look-Aside
•  Provides look-aside acceleration
•  Provides algorithmic acceleration for
a SW application or service
CPU
Accelerator
Acc.
CPU Interface
VM VM VM
Multi-Function (eg.
FPGA)
•  Provides both in-line and look aside
•  Ability to span multiple applications
•  Packet manipulation acceleration
Acc.
Network
Ether.
Example – Azure* Catapult
Source: MSFT Whitepaper Cloud-Scale-Acceleration-Architecture.pdf
Takeaway
•  Networking has become an innovation vector for CSPs without a
clear path to standardization
•  Inline infrastructure and application acceleration is expected to
grow into the future
•  Intel has multiple options to facilitate this innovation:
•  ASIC NIC – optimal power and performance for known IP
•  General Purpose CPU – maximum flexibility for innovation
•  FPGA – enabling continued innovation with better power and
performance than GP
Questions?
16
Notices and Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Intel, the Intel logo, Xeon® are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
© Intel Corporation.

Cloud Networking Trends

  • 1.
    Cloud Networking Trends RobCone – Principal Engineer, Cloud Service Provider Group Madhusudhan Rangarajan – Principal Engineer, Cloud Service Provider Group
  • 2.
    Agenda •  Overview ofCloud •  Network Topologies in Cloud •  A Closer Look at IaaS and Infrastructure Offloads •  Future of NICs & FPGAs in the Datacenter
  • 3.
    The Cloud isHeterogeneous (& Vast) Service Class IaaS Tiered SaaS Hyper-converged SaaS Primary service offering Lease hosted infrastructure to run customer workload & environment Deliver hosted service on tiered infrastructure via custom App Deliver services via homogeneous infrastructure optimized for search Primary revenue model Direct – paid per unit time by instance type (many classes) Mixed – generally paid indirectly for ad placement (notable exceptions) Mixed – generally paid indirectly for ad placement High-speed SDN value High Low High Biggest players in this space were first to create & deploy “smart” NICs
  • 4.
    Folded Clos NetworkTopology Tiered Structure Results in more zones, and traffic tends to be limited within a rack or contained within multiple smaller zones. Less $, but requires more control over workload distribution Source: http://firstclassfunc.com/facebook-fabric-networking
  • 5.
    Fully Routed CloS Veryhigh E-W BW and uniform latencies across any two nodes within a much larger zone. Facilitates distributed networking functions (Eg: Load balancers, SSL termination) Source: https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf
  • 6.
    A Closer Lookat IaaS IaaS revenue scales with VMs /rack or Containers/rack !  Rack is a good 1st order OpEx proxy –  Limited racks/DC, node/rack, VM/node –  Average VM/node scales with core count !  VM sizes vary: ½ core up to “all” cores –  Bigger VMs command higher prices –  Mix of VM sizes varies with demand !  Not all cores assigned all the time –  CSP must account for “utilization” factor
  • 7.
    SDN Offload Xeon® Cores VMinstances SDN & vSwitch standard NIC PCI Express With Trusted Host With Untrusted Host Control Plane Xeon® Cores VM instances i/pNIC PCI Express Control Plane SDN & vSwitch Xeon® Cores VM instances i/pNIC PCI Express Control Plane SDN & vSwitch Functions of SDN & vSwitch: •  ACL, NAT, crypto, metering •  VXLAN, NVGRE, TEP •  RDMA (vMotion, storage) w/ encap/decap Different CSPs have different monetization & TCO models, and have different thresholds to consider SmartNIC Deployments Agile Innovation, Differentiation, Optimization Fast Infrastructure Acceleration Evolution, Non-Standardized IaaS: Higher Revenue w/ Low DC TAX •  Basic functions utilize 1-4 cores at 10Gb •  Overhead scales significantly w/ port speed and advanced features
  • 8.
    RDMA Offload • RDMA addresses high CPU util, TCP latency tail !  Zero CPU util when done in HW, uS level latencies !  Higher speed media driving larger network throughput •  Main use-cases: !  Distributed storage: read/write data without CPU Utilization !  RDMA as a transport (Eg – RPC calls) •  Also used in special purpose clusters within the DC like DL & HPC clusters •  Multiple Flavors & Implementations: !  RoCE, RoCEv2, iWarp !  Multiple congestion management schemes: DCQCN, Timely, DCTCP Memory Buffer 1 Memory Buffer 2 NIC NIC App App
  • 9.
    Cloud Networking Changes Tunneling Formats RDMA Storage Security VXLAN/IPinIP/STTP VXLAN-GPE GENEVE FlowTables OpenFlow 1.0 Custom Flow Tables iWarp v2 ROCEv1 ROCEv2 ROCEv3? New Tunneling Formats? Data Replication IPsec, SSL, TLS, DTLS (New versions), Deep Packet Inspection Erasure Coding, Storage Crypto in Cloud, Future Enhancements 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 OpenFlow 1.1 Compression/deduplication NSH iWarp v1 Typical Silicon PRQ timelines Custom Congestion Mgmt Algorithms High performance with flexibility, customizability, agility, scalability along with non-standardized implementations drive programmable acceleration solutions Custom Compression Custom Crypto Rough timelines
  • 10.
    PNIC •  Millions oflogic elements •  High memory bandwidth •  High network bandwidth •  Full custom pipeline •  Capable of compute and networking acceleration Comparisons Of Programmable Technologies iNIC •  Typical many core architecture •  Small cores with variable amount of HW acceleration •  Capable of networking acceleration •  Requires specialized OS/Development Environment Onload •  High performance CPU cores •  Designed for general purpose code •  Good single thread perf and balance throughput •  Easy to program " rich ecosystem FPGA NPU CPU Tradeoffs are Infrastructure Tax, Ease of Programmability, Jitter/Predictable Behavior, and Power/Perf
  • 11.
    PCIe PR Region 0 (e.g.Platform) Memory Interface NetworkInterface PR Region 1 (e.g. Application) FPGA Technology Introduction LOGIC ELEMENTS !  Main programmable component !  Millions of logic elements !  Simple logic, adders, and registers !  Interconnect with configurable fabric PCIE HOST INTERFACE !  Hardened + Soft host interface !  Hardened PCIe controller !  Soft interface allows different use models and drivers MEMORY INTERFACES !  Configurable high performance memory interfaces !  Hardened controllers NETWORK INTERFACE !  Configurable network interfaces !  Hard/soft interfaces MEMORY BLOCKS !  Thousands of 20Kb memory blocks !  Allows processing to stay on-chip VARIABLE PRECISION DSP BLOCKS !  Allows FPGA to perform compute intensive functions PARTIAL RECONFIGURATION !  Allows separate regions
  • 12.
    Datacenter Accelerator UseCases CPU Ethernet MAC i/pNIC Network Interface Accelerator CPU Interface VM VM VM In-line (eg. i/pNIC) •  Provides bump-in wire acceleration •  E.g. classification of traffic to VMs •  Packet manipulation acceleration •  Could be combination of ASIC + FPGA CPU Accelerator Look-aside Accelerator CPU Interface VM VM VM Look-Aside •  Provides look-aside acceleration •  Provides algorithmic acceleration for a SW application or service CPU Accelerator Acc. CPU Interface VM VM VM Multi-Function (eg. FPGA) •  Provides both in-line and look aside •  Ability to span multiple applications •  Packet manipulation acceleration Acc. Network Ether.
  • 13.
    Example – Azure*Catapult Source: MSFT Whitepaper Cloud-Scale-Acceleration-Architecture.pdf
  • 14.
    Takeaway •  Networking hasbecome an innovation vector for CSPs without a clear path to standardization •  Inline infrastructure and application acceleration is expected to grow into the future •  Intel has multiple options to facilitate this innovation: •  ASIC NIC – optimal power and performance for known IP •  General Purpose CPU – maximum flexibility for innovation •  FPGA – enabling continued innovation with better power and performance than GP
  • 15.
  • 16.
    16 Notices and Disclaimers Nolicense (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Intel, the Intel logo, Xeon® are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation.