OpenCAPI is an open standard interface that provides high bandwidth and low latency connections between processors, accelerators, memory and storage. It addresses the growing need for increased performance driven by workloads like AI and the limitations of Moore's Law. OpenCAPI supports a heterogeneous system architecture with technologies like FPGAs and different memory types. It uses a thin protocol stack and virtual addressing to minimize latency. The SNAP framework also makes programming accelerators using OpenCAPI easier by abstracting the hardware details.
5 p9 pnor and open bmc overview - finalYutaka Kawai
This document provides an overview of P9 PNOR (BIOS) and OpenBMC. It discusses firmware components like SBE, Hostboot, OCC, OPAL, and Petitboot that make up the PNOR. It explains how to build your own PNOR image and view boot logs from the OpenBMC console. It also covers topics like OpenBMC overview, roadmap, and a demonstration of the latest web UI.
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)Linaro
ATOS is an Auto Tuning Optimization System that is able to find automatically the best performance/size tradeoff from a build system and a training application. The input of ATOS tools are a build command and a run command. From the build command, ATOS will infer an internal build configuration that it will run with different sets of compiler options. These build configurations are executed with the run command from which code size and performance will be extracted.
From the set of build configurations that ATOS explores, one can extract the preferred trade-off between code size and performance. The extracted build configuration can be archived and replayed later in order to generate the optimized executable without any modification into the initial build system.
The nice property of ATOS is that NO modification of the sources or the makefiles are needed. ATOS can work on any large/deep project, as soon as the compiler used is gcc or LLVM under Linux.
BKK16-312 Integrating and controlling embedded devices in LAVALinaro
Previous introductory tutorials on LAVA have focussed on virtual platforms. This is an end-to-end tutorial as a basis to evaluate LAVA with one or more embedded targets using U-Boot. It integrates both a physical bootloader device with a stand-alone installation of LAVA, along with a simple PDU for target power control which is based on off-the-shelf Arduino components and fully integrated with pdudaemon. It covers device requirements, device configuration for 32- and 64-bit platforms, use of lavatool, tftp, pduclient and logging via the LAVA web interface and /var.
The document provides instructions for running an Intel DPDK hands-on session to demonstrate packet forwarding using the l3fwd example. It describes downloading and compiling DPDK, getting and applying patches to l3fwd, configuring three VMs with pktgen to generate and receive packets and l3fwd to forward between them, and running l3fwd and pktgen manually or automatically on system startup.
The document discusses porting uClinux, a Linux distribution for systems without memory management units (MMUs), to a new processor architecture. It describes uClinux and how it differs from a standard Linux distribution by not having virtual memory or memory protection. It then discusses the specific port of uClinux to the SiTel SC14450 processor, including an overview of the chip and architecture, the approach taken which involved porting the uClinux kernel, uClibc library, and elf2flt binary conversion tool, and some of the challenges encountered like debugging.
An overview of Intel® Omni-Path Architecture (Intel® OPA), its main APIs and software capabilities, which includes high-performance computing (HPC) applications and cluster management features.
This slide was presented at FPGA Extreme Conference #6 held at Dowango on Feb 1st, 2015. (It was originally in Japanese but translated to English)
Audience of the presentation was people new to OpenFlow and network processing using hardware, but interested in how FPGA is used in network processing.
Event home page (only Japanese)
http://connpass.com/event/10638/
5 p9 pnor and open bmc overview - finalYutaka Kawai
This document provides an overview of P9 PNOR (BIOS) and OpenBMC. It discusses firmware components like SBE, Hostboot, OCC, OPAL, and Petitboot that make up the PNOR. It explains how to build your own PNOR image and view boot logs from the OpenBMC console. It also covers topics like OpenBMC overview, roadmap, and a demonstration of the latest web UI.
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)Linaro
ATOS is an Auto Tuning Optimization System that is able to find automatically the best performance/size tradeoff from a build system and a training application. The input of ATOS tools are a build command and a run command. From the build command, ATOS will infer an internal build configuration that it will run with different sets of compiler options. These build configurations are executed with the run command from which code size and performance will be extracted.
From the set of build configurations that ATOS explores, one can extract the preferred trade-off between code size and performance. The extracted build configuration can be archived and replayed later in order to generate the optimized executable without any modification into the initial build system.
The nice property of ATOS is that NO modification of the sources or the makefiles are needed. ATOS can work on any large/deep project, as soon as the compiler used is gcc or LLVM under Linux.
BKK16-312 Integrating and controlling embedded devices in LAVALinaro
Previous introductory tutorials on LAVA have focussed on virtual platforms. This is an end-to-end tutorial as a basis to evaluate LAVA with one or more embedded targets using U-Boot. It integrates both a physical bootloader device with a stand-alone installation of LAVA, along with a simple PDU for target power control which is based on off-the-shelf Arduino components and fully integrated with pdudaemon. It covers device requirements, device configuration for 32- and 64-bit platforms, use of lavatool, tftp, pduclient and logging via the LAVA web interface and /var.
The document provides instructions for running an Intel DPDK hands-on session to demonstrate packet forwarding using the l3fwd example. It describes downloading and compiling DPDK, getting and applying patches to l3fwd, configuring three VMs with pktgen to generate and receive packets and l3fwd to forward between them, and running l3fwd and pktgen manually or automatically on system startup.
The document discusses porting uClinux, a Linux distribution for systems without memory management units (MMUs), to a new processor architecture. It describes uClinux and how it differs from a standard Linux distribution by not having virtual memory or memory protection. It then discusses the specific port of uClinux to the SiTel SC14450 processor, including an overview of the chip and architecture, the approach taken which involved porting the uClinux kernel, uClibc library, and elf2flt binary conversion tool, and some of the challenges encountered like debugging.
An overview of Intel® Omni-Path Architecture (Intel® OPA), its main APIs and software capabilities, which includes high-performance computing (HPC) applications and cluster management features.
This slide was presented at FPGA Extreme Conference #6 held at Dowango on Feb 1st, 2015. (It was originally in Japanese but translated to English)
Audience of the presentation was people new to OpenFlow and network processing using hardware, but interested in how FPGA is used in network processing.
Event home page (only Japanese)
http://connpass.com/event/10638/
Challenges for Deploying a High-Performance Computing Application to the CloudIntel® Software
The cloud computing environment consists of many providers with endless hardware configurations, pricing options, and geographic constraints. One of the main reasons high-performance computing (HPC) in the cloud is because of the large amount and different types of hardware available. This makes the software developer responsible for developing highly scalable and portable software across different environments. Once you have a scalable piece of software, the challenge is, how can you test and deploy it in the cloud. Successfully running on these different clouds is not straightforward.
In this session, we lay out the challenges related to launching a software cloud using Rescale* Tools to resolve the challenges.
This document provides an overview of Vector Packet Processing (VPP), an open source packet processing platform developed as part of the FD.io project. VPP is based on DPDK for high performance packet processing in userspace. It includes a full networking stack and can perform L2/L3 forwarding and routing at speeds of over 14 million packets per second on a single core. VPP processing is divided into individual nodes connected by a graph. Packets are passed between nodes as vectors to support batch processing. VPP supports both single and multicore modes using different threading models. It can be used to implement routers, switches, and other network functions and topologies.
Introduction to OpenDaylight & Application DevelopmentMichelle Holley
This document provides an introduction to OpenDaylight, an open source platform for Software-Defined Networking (SDN). It outlines what OpenDaylight is, its community and releases, the components within OpenDaylight including northbound and southbound interfaces, and some example network applications that can be built on OpenDaylight. It also provides an overview of how to develop applications using OpenDaylight, covering technologies like OSGi, MD-SAL, and the Yang modeling language.
FBTFTP: an opensource framework to build dynamic tftp serversAngelo Failla
Talk given at EuroPython2016, Bilbao:
https://ep2016.europython.eu/conference/talks/fbtftp-facebooks-python3-framework-for-tftp-servers
TFTP was first standardized in ’81 (same year I was born!) and one of its primary uses is in the early stage of network booting. TFTP is very simple to implement, and one of the reasons it is still in use is that its small footprint allows engineers to fit the code into very low resource, single board computers, system-on-a-chip implementations and mainboard chipsets, in the case of modern hardware.
It is therefore a crucial protocol deployed in almost every data center environment. It is used, together with DHCP, to chain load Network Boot Programs (NBPs), like Grub2 and iPXE. They allow machines to bootstrap themselves and install operating systems off of the network, downloading kernels and initrds via HTTP and starting them up.
At Facebook, we have been using the standard in.tftpd daemon for years, however, we started to reach its limitations. Limitations that were partially due to our scale and the way TFTP was deployed in our infrastructure, but also to the protocol specifications based on requirements from the 80’s.
To address those limitations we ended up writing our own framework for creating dynamic TFTP servers in Python3, and we decided to open source it.
I will take you thru the framework and the features it offers. I’ll discuss the specific problems that motivated us to create it. We will look at practical examples of how touse it, along with a little code, to build your own server that are tailored to your own infra needs.
Packet processing in the fast path involves looking up bit patterns and deciding on an actions at line rate. The complexity of these functions at Line Rate, have been traditionally handled by ASICs and NPUs. However with the availability of faster and cheaper CPUs and hardware/software accelerations, it is possible to move these functions onto commodity hardware. This tutorial will talk about the various building blocks available to speed up packet processing both hardware based e.g. SR-IOV, RDT, QAT, VMDq, VTD and software based e.g. DPDK, Fd.io/VPP, OVS etc and give hands on lab experience on DPDK and fd.io fast path look up with following sessions. 1: Introduction to Building blocks: Sujata Tibrewala
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
The document discusses DPDK and software dataplane acceleration for Open vSwitch. It provides an overview of the OVS architecture and its evolution to integrate with DPDK. It shares one user's experience of initial challenges in using DPDK/OVS and improvements over time. Suggestions are made to improve areas like debugging, testing, documentation and training to enhance the usability of DPDK/OVS. Performance tuning techniques like using multiple threads are also briefly covered.
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
Speaker: Paul Navrátil, Texas Advanced Computing Center (TACC)
The design emphasis for supercomputing systems has moved from raw performance to performance-per-watt, and as a result, supercomputing architectures are converging on processors with wide vector units and many processing cores per chip. Such processors are capable of performant image rendering purely in software. This improved capability is fortuitous, since the prevailing homogeneous system designs lack dedicated, hardware-accelerated rendering subsystems for use in data visualization. Reliance on this “software-defined” rendering capability will grow in importance since, due to growing data sizes, visualizations must be performed on the same machine where the data is produced. Further, as data sizes outgrow disk I/O capacity, visualization will be increasingly incorporated into the simulation code itself (in situ visualization).
This talk presents recent work in high-fidelity visualization using the OSPRay ray tracing framework on TACC’s local and remote visualization systems. We present work using OSPRay within ParaView Catalyst in situ framework from Kitware, including capitalizing on opportunities to reduce data costs migrating through VTK filters for visualization. We highlight the performance opportunities and advantages of Intel® Advanced Vector Extensions 512, the memory system improvements possible with Intel® Xeon Phi™ processor multi-channel DRAM (MCDRAM) and the Intel® Omni-Path Architecture interconnect.
This document discusses OpenCAPI acceleration using the OpenCAPI Acceleration Framework (oc-accel). It provides an overview of the oc-accel components and workflow, benchmarks the OC-Accel bandwidth and latency, and provides examples of how to fully utilize OC-Accel capabilities to accelerate functions on an FPGA. The document also outlines the OC-Accel development process and previews upcoming features like support for ODMA to port existing PCIe accelerators to OpenCAPI.
DPDK Summit 2015 - RIFT.io - Tim MortsolfJim St. Leger
DPDK Summit 2015 in San Francisco.
Presentation by RIFT.io's CTO Tim Mortsolf.
For additional details and the video recording please visit www.dpdksummit.com.
Cellular technology with Embedded Linux - COSCUP 2016SZ Lin
This document provides steps and information for building your own Internet of Things (IoT) device using cellular technology. It discusses selecting a cellular module, enabling the device driver and utilities, and understanding cellular network generations and protocols like AT commands, QMI and MBIM. The document also addresses frequently asked questions about establishing connections and troubleshooting issues. Overall it serves as a guide for getting hands-on experience developing cellular-connected IoT devices.
LAS16-300: Mini Conference 2 Cortex-M Software - Device ConfigurationLinaro
LAS16-300: Mini Conference 2 RTOS-Zephyr - Device Configuration
Speakers: Andy Gross
Date: September 28, 2016
★ Session Description ★
SoC Vendors, board vendors, software middle layers, scripting languages, etc all need to have access to system configuration information (pin muxes, what sensors are on a system, what amount of memory, flash, etc, etc). We need a means to convey this in a vendor neutral mechanism but also one that is friendly for Cortex-M/constrained footprint devices. This session will be to discuss the topic, how its done today, what kinda tooling might exist from different vendors, what we could utilize (device tree) and what issues that creates.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-300
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-300/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
Возможности интерпретатора Python в NX-OSCisco Russia
The document discusses a webinar presented by Cisco TAC Engineer Anton Tugai about the capabilities of the Python interpreter in NX-OS. Some key points:
- Tugai gave a presentation on trends in Cisco SDN and current solutions.
- The webinar covered an introduction to Python, how Python is integrated into NX-OS, examples, and a demonstration.
- Native Python interpreter is available on Nexus switches starting from certain software versions, allowing Python scripts to run directly on the switch and execute CLI commands.
The document describes a workshop on Xilinx Vivado High Level Synthesis (HLS) tools held at NECST. The agenda includes an introduction to hardware design flow, the Vivado HLS design flow, kernel creation and optimization, and a hands-on example of implementing a vector addition using Vivado HLS. The example takes the participants through various implementation versions to optimize the kernel by applying directives for loop pipelining, array partitioning, and memory optimizations.
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
DPDK greatly improves packet processing performance and throughput by allowing applications to directly access hardware and bypass kernel involvement. It can improve performance by up to 10 times, allowing over 80 Mbps throughput on a single CPU or double that with two CPUs. This enables telecom and networking equipment manufacturers to develop products faster and with lower costs. DPDK achieves these gains through techniques like dedicated core affinity, userspace drivers, polling instead of interrupts, and lockless synchronization.
This document provides an overview of the Xilinx SDAccel design flow for FPGA hardware acceleration using OpenCL. It begins with an introduction to the hardware design flow and SDAccel framework. It then covers OpenCL concepts including the computational and memory models. The remainder of the document demonstrates the SDAccel design flow through examples, including specifying a kernel in OpenCL or C/C++, software emulation, hardware emulation, and building for the FPGA board.
Netronome's half-day tutorial on host data plane acceleration at ACM SIGCOMM 2018 introduced attendees to models for host data plane acceleration and provided an in-depth understanding of SmartNIC deployment models at hyperscale cloud vendors and telecom service providers.
Presenter Bio
Jaco Joubert is a Software Engineer at Netronome focusing on P4 and its applications on the Netronome SmartNIC. He recently started investigating network acceleration for Deep Learning on distributed systems. Prior to Netronome he worked on mobile application development and was a researcher at Telkom SA focusing on the mobile core after completing his Masters Degree in Computer, Electronic Engineering in 2014.
This document discusses OpenCAPI, an open standard for high-performance input/output between processors and accelerators. It provides background on the industry drivers for developing such a standard, an overview of OpenCAPI technology and capabilities, examples of OpenCAPI-based systems from IBM and partners, and performance metrics. The document aims to promote OpenCAPI and growing an open ecosystem around it to support accelerated computing workloads.
The Open Coherent Accelerator Processor Interface (OpenCAPI) is an industry-standard architecture targeted for emerging accelerator solutions and workloads. This session will address these following areas : 1.) The latest technology advancements surround OpenCAPI, 2.) The OpenCAPI strategy as it relates to the other industry acceleration standards. ie Intel's CXL, Gen-Z and CCIX, 3.) The open initiatives surrounding OMI and OpenCAPI 3.0 and GitHub, 4.) Industry Open Source Initiatives around OpenCAPI, 5.) OC-Accel - Our new FPGA programming framework, supporting OpenCAPI 3.0, targeting higher level programming languages such as C, C++ 6.) Interesting Use Cases
Challenges for Deploying a High-Performance Computing Application to the CloudIntel® Software
The cloud computing environment consists of many providers with endless hardware configurations, pricing options, and geographic constraints. One of the main reasons high-performance computing (HPC) in the cloud is because of the large amount and different types of hardware available. This makes the software developer responsible for developing highly scalable and portable software across different environments. Once you have a scalable piece of software, the challenge is, how can you test and deploy it in the cloud. Successfully running on these different clouds is not straightforward.
In this session, we lay out the challenges related to launching a software cloud using Rescale* Tools to resolve the challenges.
This document provides an overview of Vector Packet Processing (VPP), an open source packet processing platform developed as part of the FD.io project. VPP is based on DPDK for high performance packet processing in userspace. It includes a full networking stack and can perform L2/L3 forwarding and routing at speeds of over 14 million packets per second on a single core. VPP processing is divided into individual nodes connected by a graph. Packets are passed between nodes as vectors to support batch processing. VPP supports both single and multicore modes using different threading models. It can be used to implement routers, switches, and other network functions and topologies.
Introduction to OpenDaylight & Application DevelopmentMichelle Holley
This document provides an introduction to OpenDaylight, an open source platform for Software-Defined Networking (SDN). It outlines what OpenDaylight is, its community and releases, the components within OpenDaylight including northbound and southbound interfaces, and some example network applications that can be built on OpenDaylight. It also provides an overview of how to develop applications using OpenDaylight, covering technologies like OSGi, MD-SAL, and the Yang modeling language.
FBTFTP: an opensource framework to build dynamic tftp serversAngelo Failla
Talk given at EuroPython2016, Bilbao:
https://ep2016.europython.eu/conference/talks/fbtftp-facebooks-python3-framework-for-tftp-servers
TFTP was first standardized in ’81 (same year I was born!) and one of its primary uses is in the early stage of network booting. TFTP is very simple to implement, and one of the reasons it is still in use is that its small footprint allows engineers to fit the code into very low resource, single board computers, system-on-a-chip implementations and mainboard chipsets, in the case of modern hardware.
It is therefore a crucial protocol deployed in almost every data center environment. It is used, together with DHCP, to chain load Network Boot Programs (NBPs), like Grub2 and iPXE. They allow machines to bootstrap themselves and install operating systems off of the network, downloading kernels and initrds via HTTP and starting them up.
At Facebook, we have been using the standard in.tftpd daemon for years, however, we started to reach its limitations. Limitations that were partially due to our scale and the way TFTP was deployed in our infrastructure, but also to the protocol specifications based on requirements from the 80’s.
To address those limitations we ended up writing our own framework for creating dynamic TFTP servers in Python3, and we decided to open source it.
I will take you thru the framework and the features it offers. I’ll discuss the specific problems that motivated us to create it. We will look at practical examples of how touse it, along with a little code, to build your own server that are tailored to your own infra needs.
Packet processing in the fast path involves looking up bit patterns and deciding on an actions at line rate. The complexity of these functions at Line Rate, have been traditionally handled by ASICs and NPUs. However with the availability of faster and cheaper CPUs and hardware/software accelerations, it is possible to move these functions onto commodity hardware. This tutorial will talk about the various building blocks available to speed up packet processing both hardware based e.g. SR-IOV, RDT, QAT, VMDq, VTD and software based e.g. DPDK, Fd.io/VPP, OVS etc and give hands on lab experience on DPDK and fd.io fast path look up with following sessions. 1: Introduction to Building blocks: Sujata Tibrewala
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
The document discusses DPDK and software dataplane acceleration for Open vSwitch. It provides an overview of the OVS architecture and its evolution to integrate with DPDK. It shares one user's experience of initial challenges in using DPDK/OVS and improvements over time. Suggestions are made to improve areas like debugging, testing, documentation and training to enhance the usability of DPDK/OVS. Performance tuning techniques like using multiple threads are also briefly covered.
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
Speaker: Paul Navrátil, Texas Advanced Computing Center (TACC)
The design emphasis for supercomputing systems has moved from raw performance to performance-per-watt, and as a result, supercomputing architectures are converging on processors with wide vector units and many processing cores per chip. Such processors are capable of performant image rendering purely in software. This improved capability is fortuitous, since the prevailing homogeneous system designs lack dedicated, hardware-accelerated rendering subsystems for use in data visualization. Reliance on this “software-defined” rendering capability will grow in importance since, due to growing data sizes, visualizations must be performed on the same machine where the data is produced. Further, as data sizes outgrow disk I/O capacity, visualization will be increasingly incorporated into the simulation code itself (in situ visualization).
This talk presents recent work in high-fidelity visualization using the OSPRay ray tracing framework on TACC’s local and remote visualization systems. We present work using OSPRay within ParaView Catalyst in situ framework from Kitware, including capitalizing on opportunities to reduce data costs migrating through VTK filters for visualization. We highlight the performance opportunities and advantages of Intel® Advanced Vector Extensions 512, the memory system improvements possible with Intel® Xeon Phi™ processor multi-channel DRAM (MCDRAM) and the Intel® Omni-Path Architecture interconnect.
This document discusses OpenCAPI acceleration using the OpenCAPI Acceleration Framework (oc-accel). It provides an overview of the oc-accel components and workflow, benchmarks the OC-Accel bandwidth and latency, and provides examples of how to fully utilize OC-Accel capabilities to accelerate functions on an FPGA. The document also outlines the OC-Accel development process and previews upcoming features like support for ODMA to port existing PCIe accelerators to OpenCAPI.
DPDK Summit 2015 - RIFT.io - Tim MortsolfJim St. Leger
DPDK Summit 2015 in San Francisco.
Presentation by RIFT.io's CTO Tim Mortsolf.
For additional details and the video recording please visit www.dpdksummit.com.
Cellular technology with Embedded Linux - COSCUP 2016SZ Lin
This document provides steps and information for building your own Internet of Things (IoT) device using cellular technology. It discusses selecting a cellular module, enabling the device driver and utilities, and understanding cellular network generations and protocols like AT commands, QMI and MBIM. The document also addresses frequently asked questions about establishing connections and troubleshooting issues. Overall it serves as a guide for getting hands-on experience developing cellular-connected IoT devices.
LAS16-300: Mini Conference 2 Cortex-M Software - Device ConfigurationLinaro
LAS16-300: Mini Conference 2 RTOS-Zephyr - Device Configuration
Speakers: Andy Gross
Date: September 28, 2016
★ Session Description ★
SoC Vendors, board vendors, software middle layers, scripting languages, etc all need to have access to system configuration information (pin muxes, what sensors are on a system, what amount of memory, flash, etc, etc). We need a means to convey this in a vendor neutral mechanism but also one that is friendly for Cortex-M/constrained footprint devices. This session will be to discuss the topic, how its done today, what kinda tooling might exist from different vendors, what we could utilize (device tree) and what issues that creates.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-300
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-300/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
Возможности интерпретатора Python в NX-OSCisco Russia
The document discusses a webinar presented by Cisco TAC Engineer Anton Tugai about the capabilities of the Python interpreter in NX-OS. Some key points:
- Tugai gave a presentation on trends in Cisco SDN and current solutions.
- The webinar covered an introduction to Python, how Python is integrated into NX-OS, examples, and a demonstration.
- Native Python interpreter is available on Nexus switches starting from certain software versions, allowing Python scripts to run directly on the switch and execute CLI commands.
The document describes a workshop on Xilinx Vivado High Level Synthesis (HLS) tools held at NECST. The agenda includes an introduction to hardware design flow, the Vivado HLS design flow, kernel creation and optimization, and a hands-on example of implementing a vector addition using Vivado HLS. The example takes the participants through various implementation versions to optimize the kernel by applying directives for loop pipelining, array partitioning, and memory optimizations.
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
DPDK greatly improves packet processing performance and throughput by allowing applications to directly access hardware and bypass kernel involvement. It can improve performance by up to 10 times, allowing over 80 Mbps throughput on a single CPU or double that with two CPUs. This enables telecom and networking equipment manufacturers to develop products faster and with lower costs. DPDK achieves these gains through techniques like dedicated core affinity, userspace drivers, polling instead of interrupts, and lockless synchronization.
This document provides an overview of the Xilinx SDAccel design flow for FPGA hardware acceleration using OpenCL. It begins with an introduction to the hardware design flow and SDAccel framework. It then covers OpenCL concepts including the computational and memory models. The remainder of the document demonstrates the SDAccel design flow through examples, including specifying a kernel in OpenCL or C/C++, software emulation, hardware emulation, and building for the FPGA board.
Netronome's half-day tutorial on host data plane acceleration at ACM SIGCOMM 2018 introduced attendees to models for host data plane acceleration and provided an in-depth understanding of SmartNIC deployment models at hyperscale cloud vendors and telecom service providers.
Presenter Bio
Jaco Joubert is a Software Engineer at Netronome focusing on P4 and its applications on the Netronome SmartNIC. He recently started investigating network acceleration for Deep Learning on distributed systems. Prior to Netronome he worked on mobile application development and was a researcher at Telkom SA focusing on the mobile core after completing his Masters Degree in Computer, Electronic Engineering in 2014.
This document discusses OpenCAPI, an open standard for high-performance input/output between processors and accelerators. It provides background on the industry drivers for developing such a standard, an overview of OpenCAPI technology and capabilities, examples of OpenCAPI-based systems from IBM and partners, and performance metrics. The document aims to promote OpenCAPI and growing an open ecosystem around it to support accelerated computing workloads.
The Open Coherent Accelerator Processor Interface (OpenCAPI) is an industry-standard architecture targeted for emerging accelerator solutions and workloads. This session will address these following areas : 1.) The latest technology advancements surround OpenCAPI, 2.) The OpenCAPI strategy as it relates to the other industry acceleration standards. ie Intel's CXL, Gen-Z and CCIX, 3.) The open initiatives surrounding OMI and OpenCAPI 3.0 and GitHub, 4.) Industry Open Source Initiatives around OpenCAPI, 5.) OC-Accel - Our new FPGA programming framework, supporting OpenCAPI 3.0, targeting higher level programming languages such as C, C++ 6.) Interesting Use Cases
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
The document describes an IBM workshop on CAPI and OpenCAPI technologies. It provides an overview of FPGA acceleration using SNAP, including how SNAP simplifies FPGA programming using a C/C++ based approach. Examples of use cases for FPGA acceleration like video processing and machine learning inference are also presented.
This document summarizes a presentation about FlashGrid, an alternative to Oracle Exadata that aims to achieve similar performance levels using commodity hardware. It discusses the key components of FlashGrid including the Linux kernel, networking protocols like Infiniband and NVMe, and hardware. Benchmarks show FlashGrid achieving comparable IOPS and throughput to Exadata on a single server. While Exadata has proprietary advantages, FlashGrid offers excellent raw performance at lower cost and with simpler maintenance through the use of standard technologies.
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy
This document introduces hardware acceleration using FPGAs with OpenCAPI. It discusses how classic FPGA acceleration has issues like slow CPU-managed memory access and lack of data coherency. OpenCAPI allows FPGAs to directly access host memory, providing faster memory access and data coherency. It also introduces the OC-Accel framework that allows programming FPGAs using C/C++ instead of HDL languages, addressing issues like long development times. Example applications demonstrated significant performance improvements using this approach over CPU-only or classic FPGA acceleration methods.
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
DPDK Summit 2015 in San Francisco.
Presentation by Charles Shiflett, Aspera.
For additional details and the video recording please visit www.dpdksummit.com.
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
HighLoad++ 2017
Зал «Москва», 7 ноября, 13:00
Тезисы:
http://www.highload.ru/2017/abstracts/2909.html
OpenDataPlane (ODP, https://www.opendataplane.org) является open-source-разработкой API для сетевых data plane-приложений, представляющий абстракцию между сетевым чипом и приложением. Сейчас вендоры, такие как TI, Freescale, Cavium, выпускают SDK с поддержкой ODP на своих микросхемах SoC. Если проводить аналогию с графическим стеком, то ODP можно сравнить с OpenGL API, но только в области сетевого программирования.
...
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
Intel s'intéresse tout particulièrement aux FPGA et notamment au potentiel qu'ils apportent lorsque les ISV et développeurs ont des besoins très spécifiques en Génomique, traitement d'images, traitement de bases de données, et même dans le Cloud. Dans ce document vous aurez l'occasion d'en savoir plus sur notre stratégie, et sur un programme de recherche lancé par Intel et Altera impliquant des Xeon E5 équipés... de FPGA
Intel is looking at FPGA and what they bring to ISVs and developers and their very specific needs in genomics, image processing, databases, and even in the cloud. In this document you will have the opportunity to learn more about our strategy, and a research program initiated by Intel and Altera involving Xeon E5 with... FPGA inside.
Auteur(s)/Author(s):
P. K. Gupta, Director of Cloud Platform Technology, Intel Corporation
DPDK is a set of drivers and libraries that allow applications to bypass the Linux kernel and access network interface cards directly for very high performance packet processing. It is commonly used for software routers, switches, and other network applications. DPDK can achieve over 11 times higher packet forwarding rates than applications using the Linux kernel network stack alone. While it provides best-in-class performance, DPDK also has disadvantages like reduced security and isolation from standard Linux services.
POLYTEDA LLC a provider of semiconductor design software and PV-services, announced the general availability of PowerDRC/LVS version 2.0.1. This release is dedicated to delivering further significant improvements for multi-CPU mode and some new LVS functionality. From now XOR operation supports multi-CPU mode to dramatically increase performance
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
SK Telecom is optimizing Ceph for all-flash storage to improve performance and efficiency. Recent work includes enhancing BlueStore, implementing quality of service controls, and exploring data deduplication techniques. Looking ahead, SKT aims to further leverage NVRAM/SSD technologies and expand use of all-flash Ceph in its cloud infrastructure.
POLYTEDA LLC, a provider of semiconductor design software and PV-services announced the general availability of PowerDRC/LVS version 2.2.
This release is dedicated to delivering fill layer generation for multi-CPU mode, new KLayout integration functionality and other significant improvements for multi-CPU mode
Madhu Rangarajan will provide an overview of Networking trends they are seeing in Cloud, various network topologies and tradeoffs, and trends in the acceleration of packet processing workloads. They will also talk about some of the work going on in Intel to address these trends, including FPGAs in the datacenter.
Design Considerations, Installation, and Commissioning of the RedRaider Cluster at the Texas Tech University
High Performance Computing Center
Outline of this talk
HPCC Staff and Students
Previous clusters
• History, Performance, usage Patterns, and Experience
Motivation for Upgrades
• Compute Capacity Goals
• Related Considerations
Installation and Benchmarks Conclusions and Q&A
TitanIC presented, "ODSA Use Case - SmartNIC," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
The document describes Oracle's new SPARC T4 servers, which provide up to 5x better single-threaded performance than previous SPARC servers. The SPARC T4 servers are optimized for Oracle software like the Oracle Database and WebLogic Suite. They include integrated security features like encryption without performance penalties. The document provides an overview of the SPARC T4 processor architecture and performance advantages, and describes how the new servers are optimized solutions for running Oracle applications.
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
byteLAKE's presentation from the PPAM 2019 conference.
Abstract:
The goal of this work is to adapt 4 CFD kernels to the Xilinx ALVEO U250 FPGA, including first-order step of the non-linear iterative upwind advection MPDATA schemes (non-oscillatory forward in time), the divergence part of the matrix-free linear operator formulation in the iterative Krylov scheme, tridiagonal Thomas algorithm for vertical matrix inversion inside preconditioner for the iterative solver, and computation of the psuedovelocity for the second pass of upwind algorithm in MPDATA. All the kernels use 3-dimensional compute domain consisted from 7 to 11 arrays. Since all kernels belong to the group of memory bound algorithms, our main challenge is to provide the highest utilization of global memory bandwidth. Our adaptation allows us to reduce the execution time upto 4x.
Find out more at: www.byteLAKE.com/en/CFD
Foot note:
This is the presentation about the non-AI version of byteLAKE's CFD kernels, highly optimized for Alveo FPGA. Based on this research project and many others in the CFD space, we decided to shift the course of the CFD Suite product development and leverage AI to accelerate computations and enable new possibilities. Instead of adapting CFD solvers to accelerators, we use AI and work on a cross-platform solution. More on the latest: www.byteLAKE.com/en/CFDSuite.
-
Update for 2020: byteLAKE is currently developing CFD Suite as AI for CFD Suite, a collection of AI/ Artificial Intelligence Models to accelerate and enable new features for CFD simulations. It is a cross-platform solution (not only for FPGAs). More: www.byteLAKE.com/en/CFDSuite.
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
Heterogeneous computing refers to systems that use more than one kind of processor and direct applications to run in the processor that is the most efficient for that specific task. Power Systems servers based on the POWER8 processor support several accelerators that are integrated into the system to improve the efficiency of an application.
05 high density openpower dual-socket p9 system design exampleYutaka Kawai
The document describes the design of a new high-density dual-socket OpenPOWER server system using Power9 CPUs. It discusses the disadvantages of the company's current product lineup and proposes a new concept using two-socket Power9 nodes in a 3U chassis with direct-attached memory and PCIe fabric backplane. The design process for the new "Nicole" motherboard is outlined, including surprises encountered during development related to power and memory requirements. Debugging issues are also summarized, such as CPUs not working between Power8 and Power9, incorrect voltage rail connections, and signal integrity problems.
04 accelerating dl inference with (open)capi and posit numbersYutaka Kawai
This was presented by Louis Ledoux and Marc Casas at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/1a/presentation_louis_ledoux_posit.pdf
This was presented by Dan Horák (Red Hat) at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/d2/op-eu-2019-desktop-openpower.pdf
02 ai inference acceleration with components all in open hardware: opencapi a...Yutaka Kawai
This was presented by Peng Fei GOU (IBM China) at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/68/NVDLA%20on%20OpenCAPI.pdf
01 high bandwidth acquisitioncomputing compressionall in a boxYutaka Kawai
This document discusses high bandwidth data acquisition, computing, and compression using an IBM Power9 server. It presents two options for the server configuration:
Option A involves intensive GPU processing using Nvidia GPUs with high bandwidth connectivity. Option B doubles the bandwidth by using two Power9 sockets, each connected to multiple GPUs and FPGAs with OpenCAPI links.
The document then discusses the steps involved: data acquisition with FPGAs, using unified host-GPU memory to reduce bandwidth needs, performing intensive computation on GPUs or FPGAs, hardware compression of data using the Power9's built-in NX-Gzip engine, and the high bandwidth capabilities of the AC922 server platform. Bandwidth tests
The document describes a hybrid memory subsystem (HMS) developed by BittWare that combines different memory technologies including Samsung zNAND, Samsung DDR4 SDRAM, and Everspin MRAM. The HMS has a capacity of 1.5TB or 3TB, uses an OpenCAPI 3.0 interface, and is optimized for sequential workloads with an average read latency of around 1us and bandwidth of 20GB/s. It is designed to provide memory expansion and persistence without major application changes at a lower cost than using only DRAM.
0 foundation update__final - Mendy FurmanekYutaka Kawai
This slide was presented by Mendy Furmanek at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/9c/Final%20-%20Mendy%20F..pdf
This document describes job descriptions for an OpenPOWER AE China and Taiwan team. It outlines that NDA and SOW documents must be signed to receive AE support. The key items of the SOW include the scope of services to assist a partner in developing a server based on POWER technology for up to 1 person year. It also details facilities, hours of coverage, charges, deliverables, completion criteria, and tools/services to be provided such as training, documentation, and debug boards at no initial charge.
IBM has a long history of contributing to and supporting open source projects including Linux kernel, Docker, Kubernetes, and OpenStack. In recent years, IBM has expanded its efforts in open hardware by forming the OpenPOWER foundation to foster innovation around its POWER processors, contributing to open chip designs and reference architectures, and pledging further contributions to grow an open hardware ecosystem. This includes opening the POWER instruction set architecture, providing open reference designs, and establishing open governance.
This document provides an overview of the SNAP framework, which utilizes Power CAPI technology to enable coherent acceleration between CPUs and FPGAs. Key points:
- CAPI allows direct memory access between CPUs and FPGAs, avoiding overhead of device drivers and memory copies. This reduces latency significantly compared to traditional PCIe.
- The SNAP framework uses CAPI to share memory coherently between applications running on CPUs and accelerators implemented on FPGAs.
- It includes a kernel driver, user library, and models the hardware interface to allow co-simulation of applications and accelerators.
- This framework takes advantage of features like DMA, atomic operations, and wake-ups to provide
This document summarizes an OpenPOWER/OpenCAPI meetup that took place on October 23, 2019 in Tokyo. The meetup included introductions, updates from the OpenPOWER foundation, feedback from the OpenPOWER Summit 2019 in North America, light talks from Xilinx Japan, KIOXIA, and NEC, as well as a Q&A session and free discussions.
1) The document introduces ExpEther and Wireless ExpEther, which extend PCI Express over Ethernet and provide reliable low-latency wireless connections, respectively.
2) ExpEther allows PCIe devices to be disaggregated over Ethernet networks while maintaining compatibility with existing software. Wireless ExpEther aggregates multiple wireless links to provide a virtual reliable connection with latency under 1ms.
3) NEC offers these technologies as IP cores and evaluation modules to enable wireless solutions for applications that require latency under 10ms, such as industrial robots, AGVs, and machine tools.
The document outlines the agenda for an OpenPOWER and OpenCAPI Meetup held on July 17, 2019 in Tokyo. The agenda included introductions, updates from the OpenPOWER foundation, presentations on NEC ExpEther virtual PCIe over Ethernet technology, an IBM AC922 performance demo, the H3 Falcon2 PCIe gen4 system, a light talk from Xilinx Japan, Q&A, and free discussions. Links were also provided to the Meetup group page and future events.
The 2018 OpenCAPI Contest attracted participants from universities and independent design houses. It helped promote the capabilities of CAPI/OpenCAPI and sparked further development work. After the contest, more design houses contacted IBM to discuss CAPI/OpenCAPI solutions, and universities recognized its advantages and are pursuing related research. The most effective way to further business is collaboration between IBM, design houses, and universities to develop demo solutions and bring real products to market.
The document discusses an OCP 48V solution presented at a 2019 Tokyo meetup. It references expansion boards, rackspace, and Google and Rackspace's P9 48V OCP design for powering servers more efficiently using a 48V standard. Component specifications are provided for PSUC1, C2 and PSUCE1, CE2 capacitors. A GitHub link is also included for the Zaius-Barreleye-G2 design.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
2. OpenCAPI Topics
ØIndustry Background
ØWhere/How OpenCAPI Technology is used
ØTechnology Overview and Advantages
ØHeterogeneous Computing
ØSNAP Framework for CAPI/OpenCAPI
Computation DataAccess
3. Industry Background that Defined OpenCAPI
§Growing computational demand due to emerging workloads (e.g., AI, cognitive, etc.)
§Moore’s Law not being supported by traditional silicon scaling
§Driving increased dependence on Hardware Acceleration for performance
• Hyperscale Datacenters and HPC need much higher network
bandwidth
• 100 Gb/s -> 200 Gb/s -> 400Gb/s are emerging
• Deep learning and HPC require more bandwidth between accelerators and
memory
• Emerging memory/storage technologies are driving need for bandwidth with low
latency
§ Hardware accelerators are defining the attributes of a high performance bus
• Growing demand for network performance and network offload
• Introduction of device coherency requirements (IBM’s introduction in 2013)
• Emergence of complex storage and memory solutions
• Various form factors with no one able to address everything (e.g., GPUs, FPGAs,
ASICs, etc.)
Computation DataAccess
…all Relevant to Modern Data Centers
4. Use Cases - A True Heterogeneous Architecture Built Upon OpenCAPI
OpenCAPI3.0
OpenCAPI3.1
5. 8 and 16Gbps PHY
Protocols Supported
• PCIe Gen3 x16 and PCIe Gen4 x8
• CAPI 2.0 on PCIe Gen4
PCIeGen4
P9
25Gbs
25Gbps PHY
Protocols Supported
• OpenCAPI 3.0
• NVLink 2.0
Silicon Die
Various packages
(scale-out, scale-up)
POWER9 IO Leading the Industry
• PCIe Gen4
• CAPI 2.0
• NVLink 2.0
• OpenCAPI 3.0
POWER9
6. Acceleration Paradigms with Great Performance
Examples: Encryption, Compression, Erasure prior to
delivering data to the network or storage
ProcessorChip
Egress Transform
Acc
DataDLx/TLx
ProcessorChip
Acc
Data
Bi-Directional Transform
Acc
DLx/TLx
Examples: NoSQL such as Neo4J with Graph Node Traversals, etc
Examples: Machine or Deep Learning such as Natural Language processing,
sentiment analysis or other Actionable Intelligence using OpenCAPI attached memory
ProcessorChip
Acc
DataDLx/TLx
Memory Transform Example: Basic work offload
Ingress Transform
ProcessorChip
Acc
DataDLx/TLx
Examples: Video Analytics, Network Security,
Deep Packet Inspection, Data Plane Accelerator,
Video Encoding (H.265), High Frequency Trading etc
Needle-in-a-haystack Engine
ProcessorChip
Acc
Examples: Database searches, joins, intersections, merges
Only the Needles are sent to the processor
Needle-In-A-Haystack Engine
DLx/TLx Needles
Large
Haystack
Of Data
OpenCAPI is ideal for acceleration due
to Bandwidth to/from accelerators, best
of breed latency, and flexibility of an
Open architecture
7. Comparison of Memory Paradigms
Emerging Storage Class Memory
ProcessorChip DLx/TLx
SCM
Data
Storage Class Memories have the potential to be
the next disruptive technology…..
Examples include ReRAM, MRAM, Z-NAND……
All are racing to become the defacto
OpenCAPI 3.1 Architecture
Ultra Low Latency ASIC buffer chip adding +5ns
on top of native DDR direct connect!!
Main Memory
ProcessorChip DLx/TLx
DDR4/5
Example: Basic DDR attach
Data
Storage Class Memory tiered with traditional DDR
Memory all built upon OpenCAPI 3.1 & 3.0
architecture.
Still have the ability to use Load/Store Semantics
Tiered Memory
ProcessorChip DLx/TLx
DDR4/5
DLx/TLx
SCM
Data Data
§Common physical interface between non-memory and memory devices
§OpenCAPI protocol was architected to minimize latency; excellent for classic DRAM memory
§Extreme bandwidth beyond classical DDR memory interface
§Agnostic interface will handle evolving memory technologies in the future (e.g., compute-in-mem)
§Ability to handle a memory buffer to decouple raw memory and host interface to optimize power, cost, perf
9. Latency Ping-Pong Test
§ Simple workload created to
simulate communication
between system and
attached FPGA
§ Bus traffic recorded with
protocol analyzer and
PowerBus traces
§ Response times and
statistics calculated
TL, DL, PHY
1.
2.
3.
4.
Host Code
Copy 512B from cache to FPGA
Poll on incoming 128B cache injection
Reset poll location
Repeat
TLx, DLx, PHYx
1.
2.
3.
4.
FPGA Code
Poll on 512B received from host
Reset poll location
DMA write 128B for cache injection
Repeat
OpenCAPI Link
PCIe Stack
1.
2.
3.
4.
Host Code
Copy 512B from cache to FPGA
Poll on incoming 128B cache injection
Reset poll location
Repeat
FPGA Code
1. Poll on 512B received from host
2. Reset poll location
3. DMA write 128B for cache injection
4. Repeat
* HIPrefers to hardened IP
PCIeLink
Altera PCIe HIP*
11. OpenCAPI Enabled FPGA Cards
Mellanox Innova2AcceleratorCard Alpha Data 9v3AcceleratorCard
Typicaleyediagram at 25Gb/s usingthese
cards
12. OpenCAPI Topics
ØIndustry Background
ØWhere/How OpenCAPI Technology is used
ØTechnology Overview and Advantages
ØHeterogeneous Computing
ØSNAP Framework for CAPI/OpenCAPI
Computation DataAccess
13. OpenCAPI: Heterogeneous Computing
ØWhy OpenCAPI could let the specific workloads run faster?
à FPGA (Various High Bandwidth I/O, great at deep Parallel & Pipeline designs)
ØHow OpenCAPI could let the software/application run faster?
à Shared Coherent Memory (with Virtual Address, Low Latency & Low Overhead Design)
Computation DataAccess
Single Processor
CPU
Distributed computing
CPU
CPU CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Heterogeneous Computing
CPU
GPU
ASIC
FPGA
14. FPGAs: What they are good at
l FPGA: Field Programmable Gate Array
l Configurable I/O and High-Speed Serial links
l Integrated Hard IP (Multiply/Add, SRAM, PLL, PCIe, Ethernet, DRAM
Controller,...etc. )
l Custom logic, Complex Special Instructions
l Bit/Matrix Manipulation, Image Processing, graphs, Neural Networks…etc.
l Great at deep Parallel & Pipeline Designs (for workloads)
Processing Engine
Processing Engine
Processing Engine
Processing Engine
Processing Engine
Processing Engine
Processing Engine
parallelism
pipeline
&
Hash
+
+
RAM
Instruction complexity
=
15. FPGAs: Different type of High Bandwidth I/O Cards
Alpha Data 9V3 (Networking) Mellanox Innova 2 (Networking [CX-5])
Nallatech 250S+ (Storage / NVMe SSDs)
16. Accelerated
OpenCAPI Device
OpenCAPI: Key Attributes for Acceleration
16
TL/DL 25Gb I/O
Any OpenCAPI Enabled Processor
Accelerated
Function
TLx/DLx
1. Architecture agnostic bus – Applicable with any system/microprocessor architecture
2. Optimized for High Bandwidth and Low Latency
- 25Gb/s Links (with SlimSAS connector)
- Removes PCIe layering and use brand new TL/DL--TLx/DLx thinner protocol instead (latency optimized)
- High performance industry standard interface design with zero ‘overhead’
3. Coherency & Virtual addressing
- Attached devices operate natively within application’s user space and coherently with host microprocessor
- Enables low overhead with no Kernel, hypervisor or firmware involvement
- Shared coherent data structures and pointers (put the data/mem closer to the Proc/FPGA)
- It’s all traditional thread level programming with CPU coherent device memory
4. Supports a wide range of use cases
- Architected for both Classic Memory and emerging Storage Class Memory
Caches
Application
§ Storage/Compute/Network etc
§ ASIC/FPGA/FFSA
§ FPGA, SOC, GPU Accelerator
§ Load/Store or Block Access
Standard System Memory
Device Memory
Advanced SCM
Solutions
BufferedSystemMemory
OpenCAPIMemoryBuffers
17. OpenCAPI: Virtual Addressing and Benefits
§An OpenCAPI device operates in the Virtual Address spaces of the applications that
it supports
•Eliminates kernel and device driver software overhead
•Allows device to operate on application memory without kernel-level data copies/pinned pages
•Simplifies programming effort to integrate accelerators into applications (SNAP)
•Improves accelerator performance
§The Virtual-to-Physical Address Translation occurs in the host CPU (no PSL logic needed)
•Reduces design complexity of OpenCAPI-attached devices
•Makes it easier to ensure interoperability between OpenCAPI devices and different CPU architectures
•Security - Since the OpenCAPI device never has access to a physical address, this eliminates the
possibility of a defective or malicious device accessing memory locations belonging to the kernel or
other applications that it is not authorized to access
18. OpenCAPI: Shared coherent data structures and pointers
No Kernel/Device Driver involved
(-- put the data/mem closer to the Proc/FPGA)
AFU: Attached Functional Unit
Typical I/O Model with Device Driver
OpenCAPI: Virtual Addressing and Benefits (deeper)
19. Ø The OpenCAPI Transaction Layer specifies the control and response packets
between a host and an endpoint OpenCAPI device: TL and TLX
Ø On the Host side the Transaction Layer converts:
• Host specific protocol requests into transaction layer defined commands
• TLx commands into host specific protocol requests.
• Responses
Ø On the Endpoint OpenCAPI device side, the Transaction Layer converts:
• AFU protocol requests into transaction layer commands
• TL commands into AFU protocol requests.
• Responses
Ø The OpenCAPI Data Link Layer supports a 25Gbps serial data rate per lane
connecting a processor to an FPGA or an ASIC that contains an endpoint
accelerator or device: DL and DLX
• The basic configuration supports 8 lanes running at 25.78125 GHz for a 25 GB/s data
rate.
Noted that: TL/DL/PHY (Host Side) è IBM P9 HW & FW both Ready now;
TLx/DLx/PHYx (Device) è I/O Vendors also have the reference design Ready
Host bus protocol
layer
TL
TL Frame/Parser
DL
PHY
PHYX
DLX
TLX Frame/Parser
TLX
AFU protocol layer
AFU
HostProcessorOpenCAPIDevice
OpenCAPI: Protocol Stack (much thinner than traditional PCIe)
Host bus interface
OpenCAPI packets
DL packet (format)
DL packet
Serial link
DLX packet
DLX packet (format)
AFU packets
AFU protocol stack
interface
Host bus protocol
stack interface
The full TL/DL specification can be obtained by simply going to opencapi.org
and registering under the technical à specifications pull down menu.
20. FPGA
• Cusomter application and accelerator
• Operation system enablement
• Little Endian Linux
• Reference Kernel Driver (ocxl)
• Reference User Library (libocxl)
• Hardware and reference designs to
enable coherent acceleration
Core
Processor
OS
App
(software)
Memory (Coherent)
AFU
TLx
DLx
25G
ocxl
libocxl
Ø OCSE (OpenCAPI Simulation Environment)
models the red outlined area
Ø OCSE enables AFU and Application co-
simulation only when the reference
libocxl and reference TLx/DLx are used.
Ø OCSE dependencies
ØFixed reference TLx/AFU interface
ØFixed reference libocxl user API
Ø Will be contributed to the OpenCAPI
consortium
Ø Development Progress: 90%
Cable
Memory (Coherent)
25G
DL
TL
OpenCAPI: OCSE (OpenCAPI Simulation Environment)
21. OpenCAPI: Two Factors of Low Latency
1. No Kernel/Device Driver process for mapping the mem address or
no need to move data back & forth between user-space, kernel, and
devices
• 2. Thinner layers of protocol
à Compare to PCIe
Typical I/O Model Flow with Device Driver:
DD Call
Copy or Pin
Source Data
MMIO Notify
Accelerator
Acceleration
Poll / Interrupt
Completion
Copy or Unpin
Result Data
Ret. From DD
Completion
300 Instructions 10,000 Instructions 3,000 Instructions
1,000 Instructions
1,000 Instructions
7.9µs 4.9µs
Flow with a Coherent Model (CAPI):
Shared Mem.
Notify
Shared Mem.
Completion
400 Instructions 100 Instructions
0.3µs 0.06µs
Acceleration
TL
TL Frame/Parser
DL
PHY
PHYX
DLX
TLX Frame/Parser
TLX
OpenCAPI links
Faster Memory Access &
Easier Programing
Faster &
more Effective Protocol
Total 0.36µs
Total ~13µs for data prep
22. OpenCAPI Topics
ØIndustry Background
ØWhere/How OpenCAPI Technology is used
ØTechnology Overview and Advantages
ØHeterogeneous Computing
ØSNAP Framework for CAPI/OpenCAPI
Computation DataAccess
23. SNAP Framework Concept for CAPI/OpenCAPI
Action X
Action Y
Action Z
CAPI/
OpenCAPI
SNAP
Vivado
HLS
CAPI/
OpenCAPI
FPGA becomes a peer of the CPU
è Action directly accesses host memory
SNAP
Manage server threads and actions
Manage access to I/Os (AXI to memory/network…)
è Action easily accesses resources
FPGA
Gives on-demand compute capabilities
Gives direct I/O access (AXI to storage/network…)
è Action directly accesses external resources
Vivado
HLS
Compile Action written in C/C++ code
Optimize code to get performance
è Develop Action code efficiently
+
+
+
=Best way to offload/accelerate a C/C++ code with:
- Minimum change in code
- Quick porting
- Better performance than CPU
FPGA
Storage, Networking, Analytics Programming framework
24. CAPI Development without SNAP
Process C
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Process B
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Process A
Slave Context
Libcxl/libocxl
cxl/ocxl
HDK:
CAPI
PSL
CAPI
Huge development effort
Performance focused, full cache line control
Programming based on libcxl/libocxl & the VHDL and Verilog code
Software
Program
Hardware Logic
Application on Host Acceleration on FPGA
25. SNAP: Focus on the Additional Acceleration Values
25
Process C
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Process B
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Process A
Slave Context
libcxl/libocxl
cxl/ocxl
SNAP
library
Job
Queue
HDK:
CAPI
PSL
CAPI
Software Program
DRAM
on-card
Network
(TBD)
NVMe
on-card
AXI
AXI
Host
DMA
Control
MMIO
Job
Manager
Job
Queue
AXI
PSL/AXI bridge
AXI lite
Quick and easy development
Use High Level Synthesis tool to compile C/C++ to RTL, or directly use RTL
Programming based on SNAP library function call and AXI interface
AXI is an industry standard for on-chip interconnection (https://www.arm.com/products/system-ip/amba-specifications)
Action1
VHDL
Action3
Go, …
Action2
C/C++
Hardware Action
Application on Host Acceleration on FPGA
26. Summary: SNAP Framework for POWER9
• SNAP will be supported on POWER9
ü Abstraction layer on CAPI, the SNAP actions will be portable to CAPI 2.0 & OpenCAPI
ü Minimum Code Changes & Quick Porting
Ø SNAP hides the differences (use the Standard AXI Interfaces to the I/Os… DRAM/NVMe/Ethernet…etc)
Ø Support higher level languages (Vivado HLS helps you covert C/C++ to VHDL/Verilog)
Ø SNAP for OpenCAPI development progress now is about 70%
Ø OpenCAPI Simulation Environment (OCSE) development progress is almost 90%
• All Open Source!! à https://www.github.com/open-power/snap
ü Driven by OpenPOWER Foundation Accelerator Workgroup
ü Cross-company Collaboration and Contributions
Power8 à Power9
CAPI1.0 à CAPI2.0/OpenCAPI
Action1
VHDL
Action3
…
Action2
C/C++
Software Program
AXI interfaceslibsnap APIs
Your current actions
27. Table of Enablement Deliveries
27
Item Availability
OpenCAPI 3.0 TLx and DLx Reference Xilinx FPGA
Designs (RTL and Specifications)
Today
Xilinx Vivado Project Build with Memcopy Exerciser Today
Device Discovery and Configuration Specification
and RTL
Today
AFU Interface Specification Today
Reference Card Design Enablement Specification Today
25Gbps PHY Signal Specification Today
25Gbps PHY Mechanical Specification Today
OpenCAPI Simulation Environment (OCSE) Tech
Preview
Today
Memcopy and Memory Home Agent Exercisers
AFP Exerciser
Today
Today
Reference Driver Available Today
# Join us! # OpenCAPI Consortium: https://opencapi.org/
30. Backup: Overall Application Porting Preparations
Keys (right person to do the right things)
Software profiling
Software/hardware function partition
Understand the data location, data size, data dependency
Parallelism estimation
Consider I/O bandwidth limitation
Decide SNAP mode and FPGA card
API parameters: prevent from using the interface between main application and
hardware action.
Decide what Algorithm(s) you
want to accelerate.
Validate Physical Card choice
Define your API Parameters
Decide on SNAP Mode.
To Execution phase
Start
Plan
31. Backup: Advantage Summary
Traditional IO Attached FPGA CAPI Attached FPGA Benefit
Device Driver w/ system calls to move
data from application memory to IO
memory and initiate data transfer
“Hardware” Device Driver – hardware
handles address translation, no system
call needed.
Less latency to initiate data transfer from
host memory to FPGA
Offload CPU
Limited to PCIe Gen3 Bandwidth and
hardware latency
Gen4 x8 and OpenCAPI provide higher
bandwidth and lower latency
Better roadmap for future performance
enhancements
Separate memory address domains True shared memory with processor.
Processor and FPGA use same virtual
address.
Programming framework: CAPI-SNAP.
Open source at
https://github.com/open-power/snap
• Easier programmability
• Enables pointer chasing, linked lists
• No pinning of pages
• FPGA access to all of system memory
• Scatter / Gather capability removes
need to prepare data in sequential
blocks
Virtualization and multi-process handled
by complex OS support
Added security and multi-process
capability
CAPI supports multi-process access in
hardware with security to prevent cross
process access
32. Backup: Comparison of IBM CAPI Implementations & Roadmap
Feature CAPI 1.0 CAPI 2.0 OpenCAPI 3.0 OpenCAPI 3.1 OpenCAPI 4.0
Processor Generation POWER8 POWER9 POWER9 Power9 Follow-On Power9 Follow-On
CAPI Logic Placement FPGA/ASIC FPGA/ASIC NA
DL/TL on Host
DLx/TLx on endpoint
FPGA/ASIC
NA
DL/TL on Host
DLx/TLx on Memory
Buffer
NA
DL/TL on Host
DLx/TLx on
endpoint
FPGA/ASIC
Interface
Lanes per Instance
Lane bit rate
PCIe Gen3
x8/x16
8 Gb/s
PCIe Gen4
2 x (Dual x8)
16 Gb/s
Direct 25G
x8
x4 fail down
25 Gb/s
Direct 25G
x8
x4 fail down
25 Gb/s
Direct 25G
x8
x4 fail down
25 Gb/s
Address Translation on
CPU
No Yes Yes Yes Yes
Native DMA from Endpoint
Accelerator
No Yes Yes NA Yes
Home Agent Memory on
OpenCAPI Endpoint with
Load/Store Access
No No Yes NA Yes
Native Atomic Ops to Host
Processor Memory from
Accelerator
No Yes Yes NA Yes
Host Memory Caching
Function
on Accelerator
Real Address
Cache in PSL
Real Address
Cache in PSL
No NA Effective Address
Cache in
Accelerator
Remove PCIe layers to
reduce latency
significantly