Prem Jonnalagadda, Barefoot Networks, an Intel company
Notices and Disclaimers
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
configuration.
No product or component can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more
complete information about performance and benchmark results, visit http://www.intel.com/benchmarks .
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit http://www.intel.com/benchmarks .
Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions
may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies.
Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include
SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not
manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel
microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets
covered by this notice.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide
cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data
are accurate.
Intel, the Intel logo and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as property of others.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
© 2019 Intel Corporation.
2
Putting the Network Owners in the Driving Seat
“This is how I
process packets …”
“This is how you must
process packets”
Constrained
Network Operating
& Switch OS
Fixed-function switch
Driver
Scalable
Network Operating
& Switch OS
Programmable Switch
Driver
Headers and Metadata
Parser
Tables and Controls
3
Fixed vs. Programmable packet processing
Buffer
FixedParser
IPv4
Address
Table
IPv4
Logic
ACL
Logic
ACL
TCAM
MPLS
Tag
Table
MPLS
Logic
Ethernet
MAC
Address
Table
Ethernet
Logic
Fixed Pipeline: features and table-sizes are baked in at design time
Buffer
Programmable
Parser
M A
M
M
M
M
M
A
A
A
A
A
M A
M
M
M
M
M
A
A
A
A
A
M A
M
M
M
M
M
A
A
A
A
A
…
M A
M
M
M
M
M
A
A
A
A
A
Programmable Pipeline: all stages identical, customer-defined match-action logic
You declare which
headers are recognized
You declare what tables are needed and how packets are processed
4
Domain-Specific Architectures (DSAs)
Computers
Java*
Compiler
CPU
Graphics
OpenCL™
Compiler
GPU
Signal
Processing
Matlab*
Compiler
CSP
Machine
Learning
TensorFlow*
Compiler
TPU
Networking
P4*
Compiler
Tofino™ &
Tofino™ 2 (PISA)
Other names and brands may be claimed as the property of others.
5
6
Tofino™ & Tofino™ 2 Ethernet switch ASIC Family
Series 12.8 Tbps 8.0 Tbps 6.4 Tbps 3.2 Tbps 2.0 Tbps Markets
U-series ü ü ü
Ultra – Large Capacity & Highest Features
§ Service Provider
§ 5G Core, Edge Compute, …
§ Hyperscale
§ ML/DL Fabrics, Load Balancers, Firewalls, …
§ Storage – Next Generation Interconnect
M-series ü ü
Mainstream – High Bandwidth & Feature Rich
§ Enterprise
§ Data Center Leaf and Spine
H-series ü
Hyperscale – Optimized for Power & Efficiency
§ Efficient Interconnect for Compute
§ Reduced Complexity & Low Latency
Define Mac field
header ethernet_t {
bit<48> dstAddr;
bit<48> srcAddr;
bit<16> etherType;
}
Define table matching on mac field
table mac {
key = {
ingress_metadata.bd : exact;
l2_metadata.lkp_mac_da : exact;
}
actions = {
dmac_hit;
dmac_miss;
dmac_redirect_to_cpu;
}
default_action = dmac_miss;
size =MAC_TABLE_SIZE;
}
Define table actions
action dmac_hit(bit<16> ifindex, bit<16> port_lag_index) {
ingress_metadata.egress_ifindex = ifindex;
ingress_metadata.egress_port_lag_index = port_lag_index;
l2_metadata.same_if_check = l2_metadata.same_if_check, ^ ifindex;
}
• Open source
• Protocol independent
• Target independent
• Vendor independent
• Open Source
• p4 compiler -> p4 Runtime -> switch
P4 Code Example
7
RUNTIME
8
https://p4.org/assets/P4WS_2019/Speaker_Slides/1_9am_Intro.pdf
Barefoot P4 Studio Architecture
Switch P4*
Application
Tofino™ Native
Architecture
Control Plane (Local/Remote)
Barefoot Runtime Interface
Barefoot Model-driven Abstraction Interface
Switch Application APIs
SAI
PacketTestFramework
Unified ASIC Driver
ASIC ASIC Model
+
P4 Insight
Barefoot
P4 Compiler
9
Segment Routing v6 - Mobile User Plane
Data Network
gNB SRGW UPF1 UPF2
SA : 8000::1
DA : 2000::1
NH : IPV6
UDP
DP :0x0868
GTPU
TEID : 0x1234
SA : 1000::1
DA : 3000::1
NH : UDP
SA : 9000::1
DA : 7000::1
NH : SRH
Type : 4(SRH)
NH : IPv6
Segment List:
[0] 5000::1
[1] 6000::1
SA : 1000::1
DA : 3000::1
NH : UDP
SA : 9000::1
DA :
NH : IPV6
SA : 1000::1
DA : 3000::1
NH : UDP
SA : 1000::1
DA : 3000::1
NH : UDP
User
Equipment
SRv6 Core
Network
End.M.GTP6.D End (PSP) End.DT6
Data Network
gNB UPF1 UPF2
SA : 1000::1
DA : 2000::1
NH : IPV6
SA : 5000::1
DA : 3000::1
NH : IPV6
SA : 1000::1
DA : 2000::1
NH : UDP
SA : 5000::1
DA : 4000::1
NH : IPV6
SA : 1000::1
DA : 2000::1
NH : UDP
SA : 1000::1
DA : 2000::1
NH : IPV6
User
Equipment
SRv6 Core
Network
T.Encaps.Red End.MAP End.DT6
SRv6TrafficFlow−
TraditionalMode
SRv6TrafficFlow−
Enhancedwith
unchangedgNodeB
https://tools.ietf.org/html/draft-ietf-dmm-srv6-mobile-uplane-02#page-11
10
11
In-network ML acceleration
by Microsoft, KAUST, UofW, Barefoot
• In-network aggregation of model updates
• Switch role
• Integer vector addition
• Counting and comparison
• Integration
• TensorFlow using Horovod
• PyTorch/Caffe2 using Gloo
Up to 3x training speeduppaper: https://arxiv.org/abs/1903.06701
Other names and brands may be claimed as the property of others.
12
Cache-coherent Interconnect
• OmniXtend™ A new open approach to
providing cache coherent memory over
an Ethernet fabric.
• Implemented using P4 Programmable
Tofino Ethernet Switch ASIC.
https://blog.westerndigital.com/omnixtend-fabric-innovation-with-risc-v/
https://github.com/westerndigitalcorporation/omnixtend
Other names and brands may be claimed as the property of others.
Reduced
Complexity
Speed&
AgilityScale
Future
Proof
Data-Plane
Telemetry
Benefits of P4* Programmable Switches
Instrument the Data Plane with
Barefoot SPRINT™ Smart,
Programmable, Real-time, In-
band Network Telemetry (INT)
Scale data-plane resources to
match the needs of Hyper Scale
and Service Infrastructure
Adapt and innovate at
speed of Software
Continuously deliver and evolve
features on the same Hardware
Strip out complexity
that isn’t needed by the
User, Application and Operator
13
P4/FPGA, Packet Acceleration

P4/FPGA, Packet Acceleration

  • 1.
    Prem Jonnalagadda, BarefootNetworks, an Intel company
  • 2.
    Notices and Disclaimers Inteltechnologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks . Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks . Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, the Intel logo and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as property of others. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. © 2019 Intel Corporation. 2
  • 3.
    Putting the NetworkOwners in the Driving Seat “This is how I process packets …” “This is how you must process packets” Constrained Network Operating & Switch OS Fixed-function switch Driver Scalable Network Operating & Switch OS Programmable Switch Driver Headers and Metadata Parser Tables and Controls 3
  • 4.
    Fixed vs. Programmablepacket processing Buffer FixedParser IPv4 Address Table IPv4 Logic ACL Logic ACL TCAM MPLS Tag Table MPLS Logic Ethernet MAC Address Table Ethernet Logic Fixed Pipeline: features and table-sizes are baked in at design time Buffer Programmable Parser M A M M M M M A A A A A M A M M M M M A A A A A M A M M M M M A A A A A … M A M M M M M A A A A A Programmable Pipeline: all stages identical, customer-defined match-action logic You declare which headers are recognized You declare what tables are needed and how packets are processed 4
  • 5.
  • 6.
    6 Tofino™ & Tofino™2 Ethernet switch ASIC Family Series 12.8 Tbps 8.0 Tbps 6.4 Tbps 3.2 Tbps 2.0 Tbps Markets U-series ü ü ü Ultra – Large Capacity & Highest Features § Service Provider § 5G Core, Edge Compute, … § Hyperscale § ML/DL Fabrics, Load Balancers, Firewalls, … § Storage – Next Generation Interconnect M-series ü ü Mainstream – High Bandwidth & Feature Rich § Enterprise § Data Center Leaf and Spine H-series ü Hyperscale – Optimized for Power & Efficiency § Efficient Interconnect for Compute § Reduced Complexity & Low Latency
  • 7.
    Define Mac field headerethernet_t { bit<48> dstAddr; bit<48> srcAddr; bit<16> etherType; } Define table matching on mac field table mac { key = { ingress_metadata.bd : exact; l2_metadata.lkp_mac_da : exact; } actions = { dmac_hit; dmac_miss; dmac_redirect_to_cpu; } default_action = dmac_miss; size =MAC_TABLE_SIZE; } Define table actions action dmac_hit(bit<16> ifindex, bit<16> port_lag_index) { ingress_metadata.egress_ifindex = ifindex; ingress_metadata.egress_port_lag_index = port_lag_index; l2_metadata.same_if_check = l2_metadata.same_if_check, ^ ifindex; } • Open source • Protocol independent • Target independent • Vendor independent • Open Source • p4 compiler -> p4 Runtime -> switch P4 Code Example 7 RUNTIME
  • 8.
  • 9.
    Barefoot P4 StudioArchitecture Switch P4* Application Tofino™ Native Architecture Control Plane (Local/Remote) Barefoot Runtime Interface Barefoot Model-driven Abstraction Interface Switch Application APIs SAI PacketTestFramework Unified ASIC Driver ASIC ASIC Model + P4 Insight Barefoot P4 Compiler 9
  • 10.
    Segment Routing v6- Mobile User Plane Data Network gNB SRGW UPF1 UPF2 SA : 8000::1 DA : 2000::1 NH : IPV6 UDP DP :0x0868 GTPU TEID : 0x1234 SA : 1000::1 DA : 3000::1 NH : UDP SA : 9000::1 DA : 7000::1 NH : SRH Type : 4(SRH) NH : IPv6 Segment List: [0] 5000::1 [1] 6000::1 SA : 1000::1 DA : 3000::1 NH : UDP SA : 9000::1 DA : NH : IPV6 SA : 1000::1 DA : 3000::1 NH : UDP SA : 1000::1 DA : 3000::1 NH : UDP User Equipment SRv6 Core Network End.M.GTP6.D End (PSP) End.DT6 Data Network gNB UPF1 UPF2 SA : 1000::1 DA : 2000::1 NH : IPV6 SA : 5000::1 DA : 3000::1 NH : IPV6 SA : 1000::1 DA : 2000::1 NH : UDP SA : 5000::1 DA : 4000::1 NH : IPV6 SA : 1000::1 DA : 2000::1 NH : UDP SA : 1000::1 DA : 2000::1 NH : IPV6 User Equipment SRv6 Core Network T.Encaps.Red End.MAP End.DT6 SRv6TrafficFlow− TraditionalMode SRv6TrafficFlow− Enhancedwith unchangedgNodeB https://tools.ietf.org/html/draft-ietf-dmm-srv6-mobile-uplane-02#page-11 10
  • 11.
    11 In-network ML acceleration byMicrosoft, KAUST, UofW, Barefoot • In-network aggregation of model updates • Switch role • Integer vector addition • Counting and comparison • Integration • TensorFlow using Horovod • PyTorch/Caffe2 using Gloo Up to 3x training speeduppaper: https://arxiv.org/abs/1903.06701 Other names and brands may be claimed as the property of others.
  • 12.
    12 Cache-coherent Interconnect • OmniXtend™A new open approach to providing cache coherent memory over an Ethernet fabric. • Implemented using P4 Programmable Tofino Ethernet Switch ASIC. https://blog.westerndigital.com/omnixtend-fabric-innovation-with-risc-v/ https://github.com/westerndigitalcorporation/omnixtend Other names and brands may be claimed as the property of others.
  • 13.
    Reduced Complexity Speed& AgilityScale Future Proof Data-Plane Telemetry Benefits of P4*Programmable Switches Instrument the Data Plane with Barefoot SPRINT™ Smart, Programmable, Real-time, In- band Network Telemetry (INT) Scale data-plane resources to match the needs of Hyper Scale and Service Infrastructure Adapt and innovate at speed of Software Continuously deliver and evolve features on the same Hardware Strip out complexity that isn’t needed by the User, Application and Operator 13