P4/FPGA, Packet Acceleration

Prem Jonnalagadda, Barefoot Networks, an Intel company

Notices and Disclaimers
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
configuration.
No product or component can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more
complete information about performance and benchmark results, visit http://www.intel.com/benchmarks .
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit http://www.intel.com/benchmarks .
Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions
may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies.
Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include
SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not
manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel
microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets
covered by this notice.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide
cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data
are accurate.
Intel, the Intel logo and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as property of others.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
© 2019 Intel Corporation.
2

Putting the Network Owners in the Driving Seat
“This is how I
process packets …”
“This is how you must
process packets”
Constrained
Network Operating
& Switch OS
Fixed-function switch
Driver
Scalable
Network Operating
& Switch OS
Programmable Switch
Driver
Headers and Metadata
Parser
Tables and Controls
3

Fixed vs. Programmable packet processing
Buffer
FixedParser
IPv4
Address
Table
IPv4
Logic
ACL
Logic
ACL
TCAM
MPLS
Tag
Table
MPLS
Logic
Ethernet
MAC
Address
Table
Ethernet
Logic
Fixed Pipeline: features and table-sizes are baked in at design time
Buffer
Programmable
Parser
M A
M
M
M
M
M
A
A
A
A
A
M A
M
M
M
M
M
A
A
A
A
A
M A
M
M
M
M
M
A
A
A
A
A
…
M A
M
M
M
M
M
A
A
A
A
A
Programmable Pipeline: all stages identical, customer-defined match-action logic
You declare which
headers are recognized
You declare what tables are needed and how packets are processed
4

Domain-Specific Architectures (DSAs)
Computers
Java*
Compiler
CPU
Graphics
OpenCL™
Compiler
GPU
Signal
Processing
Matlab*
Compiler
CSP
Machine
Learning
TensorFlow*
Compiler
TPU
Networking
P4*
Compiler
Tofino™ &
Tofino™ 2 (PISA)
Other names and brands may be claimed as the property of others.
5

6
Tofino™ & Tofino™ 2 Ethernet switch ASIC Family
Series 12.8 Tbps 8.0 Tbps 6.4 Tbps 3.2 Tbps 2.0 Tbps Markets
U-series ü ü ü
Ultra – Large Capacity & Highest Features
§ Service Provider
§ 5G Core, Edge Compute, …
§ Hyperscale
§ ML/DL Fabrics, Load Balancers, Firewalls, …
§ Storage – Next Generation Interconnect
M-series ü ü
Mainstream – High Bandwidth & Feature Rich
§ Enterprise
§ Data Center Leaf and Spine
H-series ü
Hyperscale – Optimized for Power & Efficiency
§ Efficient Interconnect for Compute
§ Reduced Complexity & Low Latency

Define Mac field
header ethernet_t {
bit<48> dstAddr;
bit<48> srcAddr;
bit<16> etherType;
}
Define table matching on mac field
table mac {
key = {
ingress_metadata.bd : exact;
l2_metadata.lkp_mac_da : exact;
}
actions = {
dmac_hit;
dmac_miss;
dmac_redirect_to_cpu;
}
default_action = dmac_miss;
size =MAC_TABLE_SIZE;
}
Define table actions
action dmac_hit(bit<16> ifindex, bit<16> port_lag_index) {
ingress_metadata.egress_ifindex = ifindex;
ingress_metadata.egress_port_lag_index = port_lag_index;
l2_metadata.same_if_check = l2_metadata.same_if_check, ^ ifindex;
}
• Open source
• Protocol independent
• Target independent
• Vendor independent
• Open Source
• p4 compiler -> p4 Runtime -> switch
P4 Code Example
7
RUNTIME

8
https://p4.org/assets/P4WS_2019/Speaker_Slides/1_9am_Intro.pdf

Barefoot P4 Studio Architecture
Switch P4*
Application
Tofino™ Native
Architecture
Control Plane (Local/Remote)
Barefoot Runtime Interface
Barefoot Model-driven Abstraction Interface
Switch Application APIs
SAI
PacketTestFramework
Unified ASIC Driver
ASIC ASIC Model
+
P4 Insight
Barefoot
P4 Compiler
9

Segment Routing v6 - Mobile User Plane
Data Network
gNB SRGW UPF1 UPF2
SA : 8000::1
DA : 2000::1
NH : IPV6
UDP
DP :0x0868
GTPU
TEID : 0x1234
SA : 1000::1
DA : 3000::1
NH : UDP
SA : 9000::1
DA : 7000::1
NH : SRH
Type : 4(SRH)
NH : IPv6
Segment List:
[0] 5000::1
[1] 6000::1
SA : 1000::1
DA : 3000::1
NH : UDP
SA : 9000::1
DA :
NH : IPV6
SA : 1000::1
DA : 3000::1
NH : UDP
SA : 1000::1
DA : 3000::1
NH : UDP
User
Equipment
SRv6 Core
Network
End.M.GTP6.D End (PSP) End.DT6
Data Network
gNB UPF1 UPF2
SA : 1000::1
DA : 2000::1
NH : IPV6
SA : 5000::1
DA : 3000::1
NH : IPV6
SA : 1000::1
DA : 2000::1
NH : UDP
SA : 5000::1
DA : 4000::1
NH : IPV6
SA : 1000::1
DA : 2000::1
NH : UDP
SA : 1000::1
DA : 2000::1
NH : IPV6
User
Equipment
SRv6 Core
Network
T.Encaps.Red End.MAP End.DT6
SRv6TrafficFlow−
TraditionalMode
SRv6TrafficFlow−
Enhancedwith
unchangedgNodeB
https://tools.ietf.org/html/draft-ietf-dmm-srv6-mobile-uplane-02#page-11
10

11
In-network ML acceleration
by Microsoft, KAUST, UofW, Barefoot
• In-network aggregation of model updates
• Switch role
• Integer vector addition
• Counting and comparison
• Integration
• TensorFlow using Horovod
• PyTorch/Caffe2 using Gloo
Up to 3x training speeduppaper: https://arxiv.org/abs/1903.06701

12
Cache-coherent Interconnect
• OmniXtend™ A new open approach to
providing cache coherent memory over
an Ethernet fabric.
• Implemented using P4 Programmable
Tofino Ethernet Switch ASIC.
https://blog.westerndigital.com/omnixtend-fabric-innovation-with-risc-v/
https://github.com/westerndigitalcorporation/omnixtend

Reduced
Complexity
Speed&
AgilityScale
Future
Proof
Data-Plane
Telemetry
Benefits of P4* Programmable Switches
Instrument the Data Plane with
Barefoot SPRINT™ Smart,
Programmable, Real-time, In-
band Network Telemetry (INT)
Scale data-plane resources to
match the needs of Hyper Scale
and Service Infrastructure
Adapt and innovate at
speed of Software
Continuously deliver and evolve
features on the same Hardware
Strip out complexity
that isn’t needed by the
User, Application and Operator
13

P4/FPGA, Packet Acceleration

More Related Content

What's hot

Similar to P4/FPGA, Packet Acceleration

More from Liz Warner

Recently uploaded

P4/FPGA, Packet Acceleration