LFSMM Verifier Optimizations and 1 M Instructions

•

0 likes•105 views

This document discusses recent optimizations to the Linux kernel's BPF verifier including rare explored state removal, read marking backpropagation pruning, and removing a large verifier lock. It analyzes profiling results showing the top functions consuming cycles in the BPF verifier. Additionally, it proposes further optimizations such as pruning point analysis and elimination, in-place branch pruning, and tail elimination. Finally, it acknowledges challenges for verifying very large 1 million instruction programs.

© 2018 NETRONOME SYSTEMS, INC. 1
Veriﬁer optimization work
Jakub Kicinski <kuba@kernel.org>
LFSMM
BPF Microconference
San Juan, 2 May 2019

© 2019 NETRONOME SYSTEMS, INC. 2CONFIDENTIAL
Recent optimizations from Alexei
● rare explored state removal
most explored states never prune any later walks - remove states after:
miss_cnt > 3 + hit_cnt * 3
● read marking backpropagation pruning
read marks are propagated to source states, once state with read mark
already set is reached, propagation can stop
● big veriﬁer lock removal
already covered

© 2019 NETRONOME SYSTEMS, INC. 3CONFIDENTIAL
Cycles spent*
* sum over Cilium test programs
Function cycles % do_check % insn prog % insn walk
Total (do_check) 2613 100.00%
copy_verifier_state 558 21.35%
regsafe 368 14.08%
free_verifier_state 167 6.39%
check_cond_jmp_op 252 9.64% 10.13% 10.15%
check_alu_op 100 3.83% 59.13% 57.02%
check_mem_access 89 3.41% 23.53% 26.28%
check_helper_call 80 3.06% 5.65% 4.62%
mark_reg_read 229 8.76%
mark_reg_unknown 71 2.72%
mark_reg_known 15 0.57%

© 2019 NETRONOME SYSTEMS, INC. 4CONFIDENTIAL
Cycles spent*
* sum over Cilium test programs
Function cycles % do_check % insn prog % insn walk
Total (do_check) 2613 100.00%
copy_verifier_state 558 21.35%
regsafe 368 14.08%
free_verifier_state 167 6.39%
check_cond_jmp_op 252 9.64% 10.13% 10.15%
check_alu_op 100 3.83% 59.13% 57.02%
check_mem_access 89 3.41% 23.53% 26.28%
check_helper_call 80 3.06% 5.65% 4.62%
mark_reg_read 229 8.76%
mark_reg_unknown 71 2.72%
mark_reg_known 15 0.57%
Trivial micro optimization - avoid the use of zalloc+memcpy
19.41%

© 2019 NETRONOME SYSTEMS, INC. 5CONFIDENTIAL
Pruning point analysis
n prunes sum(points)
0 5137
1 615
2 242
3 167
4 51
5 39
6 45
7 19
8 24
9 17
10 11

© 2019 NETRONOME SYSTEMS, INC. 6CONFIDENTIAL
Pruning point elimination
● pruning points are too dense - every 3.8 instruction in Cilium progs
● 80% of conditional branch pruning points with 0 hits
● replacing the pruning heuristic with marking every 10th instruction gives
4-20% do_check speedup for Cilium progs
● 33% more instructions walked
● no good heuristic apparent, yet
● pruning on fall through insn, rather than jmp - 4%
● in-place branch pruning
Branch 9279 27.55%
Shallow 4641 13.78%
Pruning 24397 72.45%
Total 33676

© 2019 NETRONOME SYSTEMS, INC. 7CONFIDENTIAL
Other ideas
● tail elimination:
r0 = const
exit
covered by the shallow branch optimization
● pure function detection/pruning (callsite independent)
real-life beneﬁt unclear due to small number of no-inline samples
● “fudge” builtin:
var = __builtin_constant_relaxed(5, 0xff)
hints the veriﬁer should loosen the info about the constant

© 2019 NETRONOME SYSTEMS, INC. 8CONFIDENTIAL
1M instruction challenges
● jump offset (16 bit)
● instruction patching is quadratic
● pruning state grows as O(stack frames x prog len)
● execution time estimation?

Case: Quality Management—Toyota Quality Control Analytics at Toyota As part of the process for improving the quality of their cars, Toyota engineers have identifi ed a potential improvement does happen to get too large, it can cause the accelerator to bind and create a potential problem for the driver. (Note: This part of the case has been fabricated for teaching purposes, and none of these data were obtained from Toyota.) Let’s assume that, as a first step to improving the process, a sample of 40 washers coming from the machine that produces the washers was taken and the thickness measured in millimeters. The following table has the measurements from the sample: 1.9 2.0 1.9 1.8 2.2 1.7 2.0 1.9 1.7 1.8 1.8 2.2 2.1 2.2 1.9 1.8 2.1 1.6 1.8 1.6 2.1 2.4 2.2 2.1 2.1 2.0 1.8 1.7 1.9 1.9 2.1 2.0 2.4 1.7 2.2 2.0 1.6 2.0 2.1 2.2 Questions 1 If the specification is such that no washer should be greater than 2.4 millimeters, assuming that the thick-nesses are distributed normally, what fraction of the output is expected to be greater than this thickness? The average thickness in the sample is 1.9625 and the standard deviation is .209624. The probability that the thickness is greater than 2.4 is Z = (2.4 – 1.9625)/.209624 = 2.087068 1 - NORMSDIST(2.087068) = .018441 fraction defective, so 1.8441 percent of the washers are expected to have a thickness greater than 2.4. 2 If there are an upper and lower specification, where the upper thickness limit is 2.4 and the lower thick-ness limit is 1.4, what fraction of the output is expected to be out of tolerance? The upper limit is given in a. The lower limit is 1.4 so Z = (1.4 – 1.9625)/.209624 = -2.68337. NORMSDIST(-2.68337) = .003644 fraction defective, so .3644 percent of the washers are expected to have a thickness lower than 1.4. The total expected fraction defective would be .018441 + .003644 = .022085 or about 2.2085 percent of the washers would be expected to be out of tolerance. 3 What is the Cpk for the process? 4 What would be the Cpk for the process if it were centered between the specification limits (assume the process standard deviation is the same)? The center of the specification limits is 1.9, which is used for X-bar in the following: 5 What percentage of output would be expected to be out of tolerance if the process were centered? Z = (2.4 – 1.9)/.209624 = 2.385221 Fraction defective would be 2 x (1-NORMSDIST(2.385221)) = 2 x .008534 = .017069, about 1.7 percent. 6 Set up X - and range control charts for the current process. Assume the operators will take samples of 10 washers at a time. Observation Sample 1 2 3 4 5 6 7 8 9 10 X-bar R 1 1.9 2 1.9 1.8 2.2 1.7 2 1.9 1.7 1.8 1.89 0.5 2 1.8 2.2 2.1 2.2 1.9 1.8 2.1 1.6 1.8 1.6 1.91 0.6 3 2.1 2.4 2.2 2.1 2.1 2 1.8 1.7 1.9 1.9 2.02 0.7 4 2.1 2 2.4 1.7 2.2 2 1.6 2 2.1 2.2 2.03 0.8 Mean: 1.9625 0.65 From Exhibit 10.13, with sample size of 10, A2 = .31, D3 = .22 and D4 = 1.78 The upper control limit for the X-bar ch.

IRJET - Steering Wheel Angle Prediction for Self-Driving Cars

IRJET Journal

CONDITION-BASED MAINTENANCE USING SENSOR ARRAYS AND TELEMATICS

ijmnct

Emergence of uniquely addressable embeddable devices has raised the bar on Telematics capabilities. Though the technology itself is not new, its application has been quite limited until now. Sensor based telematics technologies generate volumes of data that are orders of magnitude larger than what operators have dealt with previously. Real-time big data computation capabilities have opened the flood gates for creating new predictive analytics capabilities into an otherwise simple data log systems, enabling real-time control and monitoring to take preventive action in case of any anomalies. Condition-based-maintenance, usage-based-insurance, smart metering and demand-based load generation etc. are some of the predictive analytics use cases for Telematics. This paper presents the approach of condition-based maintenance using real-time sensor monitoring, Telematics and predictive data analytics.

Condition-based Maintenance with sensor arrays and telematics

Gopalakrishna Palem

Emergence of uniquely addressable embeddable devices has raised bar on Telematics capabilities. Sensor based Telematics technologies generate volumes of data that are orders of magnitude larger than what operators have dealt with previously. Real-time big data architectures enable real-time control and monitoring of data to detect anomalies and take preventive action. Condition-based-maintenance, usage-based-insurance, smart metering and demand-based load generation are some of the predictive analytics use cases for Telematics with real-time data streaming. This paper presents indepth analysis of condition-based maintenance using real-time sensor monitoring, Telematics and predictive data analytics.

Quality Improvement and Automation of a Flywheel Engraving Machine

IRJET Journal

IRJET- Video Based Traffic Sign Detection by Scale Based Frame Fusion Technique

IRJET Journal

Addressing the Challenges of Safety verification for LPDDR4. ✓Avoid traditional approach of starting functional safety after functional verification : Iterative and expensive development phase 1. Functional Safety Need to be Architected and not added later. 2. Safety Analysis must start prior to implementation. ‘Design for safety/verification’ 3. Reuse & Synergize : Nominal and Functional Safety Verification. ✓Fault optimization with formal and other techniques is necessary to overcome challenges with scaling simulation and analysis. ✓Integrated push button fault simulation flow is need of hour and saves verification engineers time. ✓Analog defect modelling and coverage can be performed based on IEEE P2427.

markomanolis_phd_defenseGeorge Markomanolis

MasterClass_Quectel_LPWAN NBIOT LTE M Power

BoNg711963

OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring

NETWAYS

Nowadays system administrators have great choices when it comes down to Linux performance profiling and monitoring. The challenge is to pick the appropriate tools and interpret their results correctly. This talk is a chance to take a tour through various performance profiling and benchmarking tools, focusing on their benefit for every sysadmin. More than 25 different tools are presented. Ranging from well known tools like strace, iostat, tcpdump or vmstat to new features like Linux tracepoints or perf_events. You will also learn which tools can be monitored by Icinga and which monitoring plugins are already available for that. At the end the goal is to gather reference points to look at, whenever you are faced with performance problems. Take the chance to close your knowledge gaps and learn how to get the most out of your system.

Verify High Sigma WhitepaperSolido Design Automation

EEP301: Ca06 sample

Umang Gupta

Multi-Direction Pedestrian Wind Comfort Analysis

SimScale

SimScale’s Multi-Direction Pedestrian Wind Comfort Analysis tool is demonstrated in this free webinar. With the new professional feature, engineers, planners, and architects alike can simplify the process of calculating wind comfort. This tool will help to improve the accuracy of wind calculations, aggregate data, and evaluate the yearly average wind comfort anywhere at any location in the world. Take a look at the webinar recording on YouTube: https://www.youtube.com/watch?v=gNC9FxnJQOg&feature=youtu.be

Linux Performance 2018 (PerconaLive keynote)

Brendan Gregg

Keynote for PerconaLive 2018 by Brendan Gregg. Video: https://youtu.be/sV3XfrfjrPo?t=30m51s . "At over one thousand code commits per week, it's hard to keep up with Linux developments. This keynote will summarize recent Linux performance features, for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability, Kyber for disk I/O scheduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the most out of your systems, whether they are databases or application servers, with the latest Linux kernels and exciting features."

Classical Techniques for PID Tunning: Review

IRJET Journal

Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...

Netronome

From the Infra//Structure Conference May 2019 by Ron Renwick of Netronome Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applications Hyperscalers and Edge Cloud providers have recognized economic value of disaggregated infrastructure. Netronome Agilio SmartNICs enable disaggregated architectures to perform with up to 30x lower tail latency while encrypting every session using KTLS security.

LFSMM AF XDP Queue I-DS

Netronome

Similar to LFSMM Verifier Optimizations and 1 M Instructions

IRJET- FPGA Implementation of an Improved Watchdog Timer for Safety Critical ...

IRJET Journal

Practical Guidelines for Solving Difficult Mixed Integer Programs

IBM Decision Optimization

IRJET- Design and Implementation of High Speed FPGA Configuration using SBI

IRJET Journal

Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio

IBM Decision Optimization

IRJET- Traffic Sign and Drowsiness Detection using Open-CV

IRJET Journal

Mathworks CAE simulation suite – case in point from automotive and aerospace.

WMG centre High Value Manufacturing Catapult

itSMF Presentation March 2009

jdmoore

Performance Optimization of HPC Applications: From Hardware to Source Code

Fisnik Kraja

IBM Streams V4.1 and Incremental Checkpointing

lisanl

Meter anomaly detection

sabyasachi choudhury

IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...

IRJET Journal

Qualifying a high performance memory subsysten for Functional Safety

Pankaj Singh

markomanolis_phd_defenseGeorge Markomanolis

MasterClass_Quectel_LPWAN NBIOT LTE M Power

BoNg711963

OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring

NETWAYS

Verify High Sigma WhitepaperSolido Design Automation

EEP301: Ca06 sample

Umang Gupta

Multi-Direction Pedestrian Wind Comfort Analysis

SimScale

Linux Performance 2018 (PerconaLive keynote)

Brendan Gregg

Classical Techniques for PID Tunning: Review

IRJET Journal

Similar to LFSMM Verifier Optimizations and 1 M Instructions (20)

IRJET- FPGA Implementation of an Improved Watchdog Timer for Safety Critical ...

Practical Guidelines for Solving Difficult Mixed Integer Programs

IRJET- Design and Implementation of High Speed FPGA Configuration using SBI

Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio

IRJET- Traffic Sign and Drowsiness Detection using Open-CV

Mathworks CAE simulation suite – case in point from automotive and aerospace.

itSMF Presentation March 2009

Performance Optimization of HPC Applications: From Hardware to Source Code

IBM Streams V4.1 and Incremental Checkpointing

Meter anomaly detection

IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...

Qualifying a high performance memory subsysten for Functional Safety

markomanolis_phd_defense

MasterClass_Quectel_LPWAN NBIOT LTE M Power

OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring

Verify High Sigma Whitepaper

EEP301: Ca06 sample

Multi-Direction Pedestrian Wind Comfort Analysis

Linux Performance 2018 (PerconaLive keynote)

Classical Techniques for PID Tunning: Review

More from Netronome

Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...

Netronome

LFSMM AF XDP Queue I-DS

Netronome

Using Network Acceleration for an Optimized Edge Cloud Server Architecture

Netronome

With the rise of cloud-native principles, applications are increasingly able to take advantage of diverse, specialized and distributed infrastructure. The emergence of Edge Cloud solutions promises faster and more immersive application experiences, as well as infrastructure primitives for 5G, IoT, mobility, and more. However, this new resource comes with space and power constraints that can only be overcome by using new disaggregated architectures that leverage network acceleration and optimally sized CPUs. The session will highlight how the capabilities unleashed by hardware offload of eBPF in edge cloud microservers will enable developers to efficiently leverage the massive amounts of data on the edge and to create next-generation real-time applications.

Offloading TC Rules on OVS Internal Ports

Netronome

Quality of Service Ingress Rate Limiting and OVS Hardware Offloads

Netronome

ODSA Sub-Project Launch

Netronome

The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.

Flexible and Scalable Domain-Specific Architectures

Netronome

This talk introduces the concept of a domain-specific architecture (DSA) using the Netronome Flow Processor (NFP) as an example, it will cover the motivation, design and implementation. It will explore how this architecture’s flexibility has been leveraged in the past to handle unique platforms such as the Facebook Yosemite v2 Platform. Finally approaches for designing flexible chipsets in the future will be explored, including the value of system wide computational modeling.

Unifying Network Filtering Rules for the Linux Kernel with eBPF

Netronome

At the core of fast network packet processing lies the ability to filter packets, or in other words, to apply a set of rules on packets, usually consisting of a pattern to match (L2 to L4 source and destination addresses and ports, protocols, etc.) and corresponding actions (redirect to a given queue, or drop the packet, etc.). Over the years, several filtering frameworks have been added to Linux. While at the lower level, ethtool can be used to configure N-tuple rules on the receive side for the hardware, the upper layers of the stack got equipped with rules for firewalling (Netfilter), traffic shaping (TC), or packet switching (Open vSwitch for example). In this presentation, Quentin Monnet reviewed the needs for those filtering frameworks and the particularities of each one. Then focuses on the changes brought by eBPF and XDP in this landscape: as BPF programs allow for very flexible processing and can be attached very low in the stack—at the driver level, or even run on the NIC itself—they offer filtering capabilities with no precedent in terms of performance and versatility in the kernel. Lastly, the third part explores potential leads in order to create bridges between the different rule formats and to make it easier for users to build their filtering eBPF programs.

Massively Parallel RISC-V Processing with Transactional Memory

Netronome

Offloading Linux LAG Devices Via Open vSwitch and TC

Netronome

Converting Open vSwitch (OVS) kernel rules to TC Flower rules has become the standard way to offload the datapath to SmartNICs and other hardware devices. Binding such TC rules to 'offloadable' ports (such as SmartNIC representers) has been shown to enable the acceleration of packet processing while saving CPU resources on the hosting server. However, one scenario not yet well defined is the case where offloadable ports are bound to a higher level Link Aggregation (LAG) netdev, such as a Linux Bond or Team device, and where this netdev is added to an OVS bridge. This talk describes an implementation that offloads rules that either ingress or egress to a LAG device. It highlights changes made to OVS (included in v2.9) as well as to core TC code and the driver layer in the Linux kernel. Rather than introduce new features into the kernel to handle LAG offload, the design expands upon recent, independently added kernel features including the concept of TC blocks. It is shown how, with slight modification, TC blocks can be used by OVS to represent LAG devices.

eBPF Debugging Infrastructure - Current Techniques

Netronome

eBPF (extended Berkeley Packet Filter), in particular with its driver-level hook XDP (eXpress Data Path), has increased in importance over the past few years. As a result, the ability to rapidly debug and diagnose problems is becoming more relevant. This talk will cover common issues faced and techniques to diagnose them, including the use of bpftool for map and program introspection, the use of disassembly to inspect generated assembly code and other methods such as using debug prints and how to apply these techniques when eBPF programs are offloaded to the hardware. The talk will also explore where the current gaps in debugging infrastructure are and suggest some of the next steps to improve this, for example, integrations with tools such as strace, valgrind or even the LLDB debugger.

Efficient JIT to 32-bit Arches

Netronome

eBPF has 64-bit general purpose registers, therefore 32-bit architectures normally need to use register pair to model them and need to generate extra instructions to manipulate the high 32-bit in the pair. Some of these overheads incurred could be eliminated if JIT compiler knows only the low 32-bit of a register is interested. This could be known through data flow (DF) analysis techniques. Either the classic iterative DF analysis or "path-sensitive" version based on verifier's code path walker. In this talk, implementations for both versions of DF analyzer will be presented. We will see how a def-use chain based classic eBPF DF analyser looks first, and will see the possibility to integrate it with previous proposed eBPF control flow graph framework to make a stand-alone eBPF global DF analyser which could potentially serve as a library. Then, another "path-sensitive" DF analyser based on the existing verifier code path walker will be presented. We will discuss how function calls, path prune, path switch affect the implementation. Finally, we will summarize pros and cons for each, and will see how could each of them be adapted to 64-bit and 32-bit architecture back-ends. Also, eBPF has 32-bit sub-register and ALU32 instructions associated, enable them (-mattr=+alu32) in LLVM code-gen could let the generated eBPF sequences carry more 32-bit information which could potentially easy flow analyser. This will be briefly discussed in the talk as well.

eBPF & Switch Abstractions

Netronome

eBPF (extended Berkeley Packet Filter) has been shown to be a flexible kernel construct used for a variety of use cases, such as load balancing, intrusion detection systems (IDS), tracing and many others. One such emerging use case revolves around the proposal made by William Tu for the use of eBPF as a data path for Open vSwitch. However, there are broader switching use cases developing around the use of eBPF capable hardware. This talk is designed to explore the bottlenecks that exist in generalising the application of eBPF further to both container switching as well as physical switching.

eBPF Tooling and Debugging Infrastructure

Netronome

eBPF, in particular with its driver-level hook XDP, has increased in importance over the past few years. As a result, the ability to rapidly debug and diagnose problems is becoming more relevant. This session will cover common issues faced and techniques to diagnose them, including the use of bpftool for map and program introspection, the disassembling of programs to inspect generated eBPF instructions and other methods such as using debug prints and how to apply these techniques when eBPF programs are offloaded to the hardware.

BPF Hardware Offload Deep Dive

Netronome

The first version of eBPF hardware offload was merged into the Linux kernel in October 2016 and became part of Linux v4.9. For the last two years the project has been growing and evolving to integrate more closely with the core kernel infrastructure and enable more advanced use cases. This talk will explain the internals of the kernel architecture of the offload and how it allows seamless execution of unmodified eBPF datapaths in HW.

Demystify eBPF JIT Compiler

Netronome

This slide deck focuses on eBPF JIT compilation infrastructure and how it plays an important role in the entire eBPF life cycle inside the Linux kernel. First, it does quite a number of control flow checks to reject vulnerable programs and then JIT compiles the eBPF program to either host or offloading target instructions which boost performance. However, there is little documentation about this topic which this slide deck will dive into.

eBPF/XDP

Netronome

Netronome's half-day tutorial on host data plane acceleration at ACM SIGCOMM 2018 introduced attendees to models for host data plane acceleration and provided an in-depth understanding of SmartNIC deployment models at hyperscale cloud vendors and telecom service providers. Presenter Bios Jakub Kicinski is a long term Linux kernel contributor, who has been leading the kernel team at Netronome for the last two years. Jakub’s major contributions include the creation of BPF hardware offload mechanisms in the kernel and bpftool user space utility, as well as work on the Linux kernel side of OVS offload. David Beckett is a Software Engineer at Netronome with a strong technical background of computer networks including academic research with DDoS. David has expertise in the areas of Linux architecture and computer programming. David has a Masters Degree in Electrical, Electronic Engineering at Queen’s University Belfast and continues as a PhD student studying Emerging Application Layer DDoS threats.

P4 Introduction

Netronome

Host Data Plane Acceleration: SmartNIC Deployment Models

Netronome

SIGCOMM 2018: This tutorial introduces multiple models for host data plane acceleration with SmartNICs, provides a detailed understanding of SmartNIC deployment models at hyperscale cloud vendors and telecom service providers, and introduces various open source resources available for research and product development in this space. Presenter Bio Simon focuses on upstream open source activities at Netronome. He is working on allowing offload of OVS offload on the Agilio platform as well as the broader question of how best to enable programming hardware offload in the Linux kernel and other upstream open source projects.

The Power of SmartNICs

Netronome

More from Netronome (20)

Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...

LFSMM AF XDP Queue I-DS

Using Network Acceleration for an Optimized Edge Cloud Server Architecture

Offloading TC Rules on OVS Internal Ports

Quality of Service Ingress Rate Limiting and OVS Hardware Offloads

ODSA Sub-Project Launch

Flexible and Scalable Domain-Specific Architectures

Unifying Network Filtering Rules for the Linux Kernel with eBPF

Massively Parallel RISC-V Processing with Transactional Memory

Offloading Linux LAG Devices Via Open vSwitch and TC

eBPF Debugging Infrastructure - Current Techniques

Efficient JIT to 32-bit Arches

eBPF & Switch Abstractions

eBPF Tooling and Debugging Infrastructure

BPF Hardware Offload Deep Dive

Demystify eBPF JIT Compiler

eBPF/XDP

P4 Introduction

Host Data Plane Acceleration: SmartNIC Deployment Models

The Power of SmartNICs

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

The Future of Platform Engineering

Jemma Hussein Allen

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Elevating Tactical DDD Patterns Through Object Calisthenics

Dorra BARTAGUIZ

After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!

Generating a custom Ruby SDK for your web service or Rails API using Smithy

g2nightmarescribd

Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

The Future of Platform Engineering

Leading Change strategies and insights for effective change management pdf 1.pdf

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

GraphRAG is All You need? LLM & Knowledge Graph

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

Assuring Contact Center Experiences for Your Customers With ThousandEyes

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Elevating Tactical DDD Patterns Through Object Calisthenics

Generating a custom Ruby SDK for your web service or Rails API using Smithy

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

How world-class product teams are winning in the AI era by CEO and Founder, P...

Designing Great Products: The Power of Design and Leadership by Chief Designe...

LFSMM Verifier Optimizations and 1 M Instructions

2. © 2019 NETRONOME SYSTEMS, INC. 2CONFIDENTIAL Recent optimizations from Alexei ● rare explored state removal most explored states never prune any later walks - remove states after: miss_cnt > 3 + hit_cnt * 3 ● read marking backpropagation pruning read marks are propagated to source states, once state with read mark already set is reached, propagation can stop ● big veriﬁer lock removal already covered

3. © 2019 NETRONOME SYSTEMS, INC. 3CONFIDENTIAL Cycles spent* * sum over Cilium test programs Function cycles % do_check % insn prog % insn walk Total (do_check) 2613 100.00% copy_verifier_state 558 21.35% regsafe 368 14.08% free_verifier_state 167 6.39% check_cond_jmp_op 252 9.64% 10.13% 10.15% check_alu_op 100 3.83% 59.13% 57.02% check_mem_access 89 3.41% 23.53% 26.28% check_helper_call 80 3.06% 5.65% 4.62% mark_reg_read 229 8.76% mark_reg_unknown 71 2.72% mark_reg_known 15 0.57%

4. © 2019 NETRONOME SYSTEMS, INC. 4CONFIDENTIAL Cycles spent* * sum over Cilium test programs Function cycles % do_check % insn prog % insn walk Total (do_check) 2613 100.00% copy_verifier_state 558 21.35% regsafe 368 14.08% free_verifier_state 167 6.39% check_cond_jmp_op 252 9.64% 10.13% 10.15% check_alu_op 100 3.83% 59.13% 57.02% check_mem_access 89 3.41% 23.53% 26.28% check_helper_call 80 3.06% 5.65% 4.62% mark_reg_read 229 8.76% mark_reg_unknown 71 2.72% mark_reg_known 15 0.57% Trivial micro optimization - avoid the use of zalloc+memcpy 19.41%

6. © 2019 NETRONOME SYSTEMS, INC. 6CONFIDENTIAL Pruning point elimination ● pruning points are too dense - every 3.8 instruction in Cilium progs ● 80% of conditional branch pruning points with 0 hits ● replacing the pruning heuristic with marking every 10th instruction gives 4-20% do_check speedup for Cilium progs ● 33% more instructions walked ● no good heuristic apparent, yet ● pruning on fall through insn, rather than jmp - 4% ● in-place branch pruning Branch 9279 27.55% Shallow 4641 13.78% Pruning 24397 72.45% Total 33676

7. © 2019 NETRONOME SYSTEMS, INC. 7CONFIDENTIAL Other ideas ● tail elimination: r0 = const exit covered by the shallow branch optimization ● pure function detection/pruning (callsite independent) real-life beneﬁt unclear due to small number of no-inline samples ● “fudge” builtin: var = __builtin_constant_relaxed(5, 0xff) hints the veriﬁer should loosen the info about the constant

8. © 2019 NETRONOME SYSTEMS, INC. 8CONFIDENTIAL 1M instruction challenges ● jump offset (16 bit) ● instruction patching is quadratic ● pruning state grows as O(stack frames x prog len) ● execution time estimation?

LFSMM Verifier Optimizations and 1 M Instructions

Recommended

Recommended

More Related Content

Similar to LFSMM Verifier Optimizations and 1 M Instructions

Similar to LFSMM Verifier Optimizations and 1 M Instructions (20)

More from Netronome

More from Netronome (20)

Recently uploaded

Recently uploaded (20)

LFSMM Verifier Optimizations and 1 M Instructions